# Pricing app

# Setup

### Import data

Firstly, we bring in our data. We require five variables:
- A unique identifier for the customer
- The name of the product that was purchased
- The price of the product that was purchased
- The number of units of the product purchased
- The revenue of the purchase

In [32]:
def import_data():
    import pandas as pd
    file_path = '/Users/patricksweeney/growth/07_Apps/Untitled Folder/Disaggregate models/Choice data 2.xlsx'
    data = pd.read_excel(file_path)
    return data

data = import_data()
data.head()

Unnamed: 0,id,product,revenue,volume,price
0,d4a531d7-c73b-4d28-badf-e38d99d637f9,Starter,18.0,1,18.0
1,7b1ebf39-f6c6-4b8e-ba80-715b2a346413,Team,90.0,3,30.0
2,e350df96-c3f4-4768-be1e-86c4fd32f39d,Team,105.0,3,35.0
3,b4d3bcc4-4045-49b8-9639-d3082fd03cf1,Starter,18.0,2,9.0
4,1249c200-3688-4c28-8a77-57974d303578,Team,180.0,6,30.0


### Check the data

As a precautionary measure, we check that:
- All IDs are unique
- $R = P \times V$
- $R > 0$
- $P > 0$
- $V > 0$

In [35]:
def check_data(data):
    # Check for unique IDs
    total_rows = len(data)
    duplicate_ids = total_rows - data['id'].nunique()
    if duplicate_ids > 0:
        duplicate_percentage = (duplicate_ids / total_rows) * 100
        print(f"Warning: {duplicate_percentage:.2f}% of the rows have duplicate IDs.")
    else:
        print("All IDs are unique.")

    # Compare total sum of revenue to sumproduct of price and volume
    total_revenue = data['revenue'].sum()
    total_price_volume = (data['price'] * data['volume']).sum()
    if total_revenue != total_price_volume:
        difference = total_price_volume - total_revenue
        percentage_difference = (difference / total_revenue) * 100
        print(f"Warning: Total P x V is {percentage_difference:.2f}% {'greater' if difference > 0 else 'less'} than total Revenue.")
    else:
        print("Total Revenue equals total Price times Volume.")

    # Check if revenue, price, volume are greater than 0 and calculate percentages
    for column in ['revenue', 'price', 'volume']:
        incorrect_values = data[data[column] <= 0]
        if not incorrect_values.empty:
            incorrect_percentage = (len(incorrect_values) / total_rows) * 100
            print(f"Warning: {incorrect_percentage:.2f}% of {column} values are not greater than 0.")
        else:
            print(f"All {column} values are greater than 0.")

# Example usage
check_data(data)


All IDs are unique.
Total Revenue equals total Price times Volume.
All revenue values are greater than 0.
All volume values are greater than 0.


### Add the 'No-Purchase' option

In [34]:
def add_nopurchase_option(data, paid_conversion_rate):
    import pandas as pd
    import numpy as np
    import uuid
    
    # Calculate the number of 'Other' rows to add
    current_count = len(data)
    total_count_needed = current_count / paid_conversion_rate
    other_count = int(total_count_needed - current_count)

    # Create a DataFrame for 'Other' choices
    other_data = pd.DataFrame({
        'product': ['Other'] * other_count,
        'price': [0] * other_count,
        'volume': [1] * other_count,
        # Add other columns as None
    })

    # Add other columns as None
    for col in data.columns:
        if col not in other_data:
            other_data[col] = np.nan

    # Generate unique UUIDs for the new rows
    other_data['id'] = [str(uuid.uuid4()) for _ in range(other_count)]

    # Append the 'Other' data to the original data
    updated_data = pd.concat([data, other_data], ignore_index=True)

    # Print the number of rows and the number of unique IDs
    print(f"Total number of rows: {len(updated_data)}")
    print(f"Number of unique IDs: {updated_data['id'].nunique()}")

    return updated_data

# Example usage
data = add_nopurchase_option(data, 0.04)
data.head()
data.tail()

Total number of rows: 74350
Number of unique IDs: 74350


Unnamed: 0,id,product,revenue,volume,price
0,d4a531d7-c73b-4d28-badf-e38d99d637f9,Starter,18.0,1,18.0
1,7b1ebf39-f6c6-4b8e-ba80-715b2a346413,Team,90.0,3,30.0
2,e350df96-c3f4-4768-be1e-86c4fd32f39d,Team,105.0,3,35.0
3,b4d3bcc4-4045-49b8-9639-d3082fd03cf1,Starter,18.0,2,9.0
4,1249c200-3688-4c28-8a77-57974d303578,Team,180.0,6,30.0


### Declare variables and data preprocessing

# Summary statistics

**Total Revenue**  

**Total Customers**  
**ARPA**  

**Total Seats**  
**ARPU**  



**Revenue mix**  
**Customer mix**  
**Seat mix**  

**Package ARPA**  
**Package ARPU**  


# Discrete choice and random utility models

Discrete choice models the probability of a consumer choosing $j$ MECE choices as a function of some attribute. A natural attribute is price.
In the context product choice, this usually means an $N+1$-dimesional demand system with $N$ inside goods and 1 additional 'Other' or 'No purchase' option.

In the eyes of the decision maker, each choice $j$ is associated with a utility $U_j$. The decision maker will choose the alternative with the highest $U$.

However, $U$ is not entirely determinstic. Total utility $U$ can be decomposed into a determinsitic component $V_j$ (observable to researcher), and an unobserved random variable $\epsilon_j$ which is Gumbel distributed.



# Revenue maximisation

In the interdependent case, a given product's demand $D_i$ is a function of all other product's prices.


$$\begin{equation}
   \max R = \sum_{i=1}^N p_i \cdot D_i(p_1, p_2, \ldots, p_N)
\end{equation}$$

Optimal prices are when all elements of the revenue Jacobian with respect to each price equal zero.

$$\begin{equation}
    \frac{\partial R}{\partial p_i} = 0 \quad \text{for each} \quad i = 1, 2, \ldots, N
\end{equation}$$


\begin{equation}
    \frac{\partial R}{\partial p_i} = D_i(p_1, \ldots, p_N) + p_i \cdot \frac{\partial D_i(p_1, \ldots, p_N)}{\partial p_i} + \sum_{\substack{j=1 \\ j \neq i}}^N \left( p_j \cdot \frac{\partial D_j(p_1, \ldots, p_N)}{\partial p_i} \right)
\end{equation}




# Outputs

### Pseudo-lift

**Revenue lift**  
**Customer lift (and churn)**  
**ARPA lift**

### PVM

### Customer-level distributions

### Product-level distributions