# User Acquisition Analysis Workflow

## Purpose

This analysis showcases my approach to tackling a analysis task of comparing different user cohorts, in an effort to find the best user acquisition channel. The aim is to demonstrate an approach, which allows for detecting subtle differences between the user cohorts, allows for forecasting the eventual value of each cohort, and delivers tangible value, as the more valuable channel can be identified and levaraged early


## Task

To identify early the differences in value between 2 user cohorts acquired from 2 competing user acquisition channels, and subsequently leverage the more valuable channel by investing the resources gained from divesting the less valuable channel.

<details>
<summary> Premise, task, and disclaimers (longer version) </summary>

### Premise

This analysis assumes a digital product or service, and focuses on user acquisition through digital channels where user attribution is straightforward. The basic premise of this analysis is best described as freemium game or other similar entertainment produc/service, but the approach can be applied with slight modifications to e.g. e-commerce shopping, professional use software, etc.

Essentially the business model should rely on acquiring user, who will then engange with the product, and spend money on in app purchases, or purchase licenses or subscriptions, or buy products from shop. Different kind of users will have a different spending profile, which will be more or less linked to their other behavior on the platform.

It is also assumed, that there is historical data from which the spending pattern of different user clusters can be reasonably well projected. This assumes that the product has history and the business is not treading a new path, which would be case in e.g. after launching completely new product or vertical, entering a vastly different new market, or experiencing substantial growth and expanding outside of a niche market into more general market position. Also it is assumed that there is historical data where the fundamental way of users engaging with the product and spending money is similar to current and near future situation. This means that no comprehensive changes in e.g. pricing, product structure, or business logic has been made. In case of a game or similar entertainment product it is assumed that there has not been substantial changes in game design or other engagement factors, which would alter the way how users both act when using the product, and how that activity is linked to spending profile. In short, for the purpose of this excercise, it is possible to draw valid conclusions about future based on historical data as far as user activity and spending predictions are concerned.


### Task

There is ongoing intiative comparing 2 user acquisition channels. The chosen metric to compare these 2 channels is the net value per user at 90 days after install. That is calculated as 90d Lifetime value per user - Cost per Acquisition.

The cutoff point of the comparison is chosen at 90 days, since after that there are too many uncertainties to warrant the assumption that the user behavior and value would be driven primarily by the user acquisition channel through which they have been acquired. After that the overall experience on the platform, possible reactivation campaigns, or changes in personal situation will likely influence the predicted value of the user more than the user acquisition channel. Moreover, there might be other changes in e.g. product and/or business logic, game design, or meta in case there is multiplayer component to the product.

The specific metrics will be properly displayed and visualized in the analysis proper part, but a preliminary situation is that the campaign has ran for so long that there is 30 days worth of user activity data for about 10000 acquired users in both channels. The CPA for both channels is roughly similar, and so is the 30 day LTV per user. However, there is a sense that there might be some critical differences in the profitability potential between the 2 cohorts. The task is to conclusively decide which channel to discontinue in order to leverage the more profitable channel as early as possible, to minimize the opprtunity cost incurred by running inefficient user acquisition channels.


### Disclaimers

While the scenario, high level figures, and other realities of this excercise are based on the real world experiences of the creator of this analysis example, the data is purely made up for the purpose of demonstration. This analysis or the dataset used here does not yield any strategic or operative insights on the businesses of any previous or current employer of the creator, nor any other real world company with which the creator might hav worked with through his career.

This also means that the data used in this excercise is simplified in nature compared to real world user data. The asusmptions in this excercise, e.g. boundaries between different user clusters, or the predictability of the user spending profiles, or the link between 30 day user activity and 90 day spending profile might not be as clear in real life as presented here. Thus, this analysis should not be taken as "ready to implement turnkey solution" to solving real world user acquisition value prediction tasks, but more as an example of the approach and knowledge basis of the creator of this example. In real world the actual solutions will be built on top of this foundation case by case, with modifications and additions to the approach applied as the situation necessitates.

Moreover, the creator is first and foremost a full stack data analyst, whose expertise lies in designing, building, and executing analysis pipelines, from problem statement and metric definition, through data pipeline and analysis tool development, to presenting and visualizing the results in a way which drives decision making creating measurable business impact and value. This means that the creator is not professional python developer. The creator's philosophy is to use python as tool to get things done and deliver impactful business insights. Thus, the code might contain unoptimized parts, or occasionally an approach is taken, which is considered to not adhere to the best practices by professional python developer community. When applying this (or similar approach) in production, the creator would prefer to build the analysis by using SQL and a professional BI tool, and would refine many parts of this code if the application is necessary to do with python.
</details>


## Code Setup

In [1]:
## Importing all necessary modules and libraries

import os
import utils.generate_dataset as generate


Utils properly loaded


<details>
<summary> Code Setup (longer explanation) </summary>

Here we import all necessary libraries and modules for this analysis workflow. I have implemented most of the actual code that is doing the analysis in several modules, which are part of this project in subrepositories. This workflow will mainly concern the analysis approach and methods, with each step explained primarily focusing on key takeaways and business implications. The more technical details in the code are not intended to be the focus of this excercise, but can be freely studied from the code modules.
</details>