# Stripe Migration Analysis

This notebook performs a comprehensive analysis of migrating customers to new pricing plans.


## 1. Setup and Imports


In [192]:
import pandas as pd
from pathlib import Path


# Set pandas display options for better readability
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.width', None)


## 2. Configuration and Constants


In [193]:
# Brand Plans (IN_HOUSE customers)
BRAND_PLANS = {
    "eur": {
        "starter": {
            "month": {
                "price": 89,
                "credits": 3560,
                "price_per_credit": 89 / 3560,
            },
            "year": {
                "price": 89 * 10,
                "credits": 3560 * 10,
                "price_per_credit": 89 * 10 / 3560 * 10,
            }
        },
        "pro": {
            "month": {
                "price": 199,
                "credits": 14925,
                "price_per_credit": 199 / 14925,
            },
            "year": {
                "price": 199 * 10,
                "credits": 14925 * 10,
                "price_per_credit": 199 * 10 / 14925 * 10,
            }
        },
        "enterprise": {
            "month": {
                "price": 499,
                "credits": 49900,
                "price_per_credit": 499 / 49900,
            },
            "year": {
                "price": 499 * 10,
                "credits": 49900 * 10,
                "price_per_credit": 499 * 10 / 49900 * 10,
            }
        },
    },
    "usd": {
        "starter": {
            "month": {
                "price": 89,
                "credits": 3560,
                "price_per_credit": 89 / 3560,
            },
            "year": {
                "price": 89 * 10,
                "credits": 3560 * 10,
                "price_per_credit": 89 * 10 / 3560 * 10,
            }
        },
        "pro": {
            "month": {
                "price": 199,
                "credits": 14925,
                "price_per_credit": 199 / 14925,
            },
            "year": {
                "price": 199 * 10,
                "credits": 14925 * 10,
                "price_per_credit": 199 * 10 / 14925 * 10,
            }
        },
        "enterprise": {
            "month": {
                "price": 499,
                "credits": 49900,
                "price_per_credit": 499 / 49900,
            },
            "year": {
                "price": 499 * 10,
                "credits": 49900 * 10,
                "price_per_credit": 499 * 10 / 49900 * 10,
            }
        },
    },
}

# Agency Plans
AGENCY_PLANS = {
    "eur": {
        "intro": {
            "month": {
                "price": 89,
                "credits": 2250,
                "price_per_credit": 89 / 2250,
            },
            "year": {
                "price": 89 * 10,
                "credits": 2250 * 10,
                "price_per_credit": 89 * 10 / 2250 * 10,
            }
        },
        "growth": {
            "month": {
                "price": 199,
                "credits": 12935,
                "price_per_credit": 199 / 12935,
            },
            "year": {
                "price": 199 * 10,
                "credits": 12935 * 10,
                "price_per_credit": 199 * 10 / 12935 * 10,
            }
        },
        "scale": {
            "month": {
                "price": 499,
                "credits": 37425,
                "price_per_credit": 499 / 37425,
            },
            "year": {
                "price": 499 * 10,
                "credits": 37425 * 10,
                "price_per_credit": 499 * 10 / 37425 * 10,
            }
        },
    },
    "usd": {
        "intro": {
            "month": {
                "price": 89,
                "credits": 2250,
                "price_per_credit": 89 / 2250,
            },
            "year": {
                "price": 89 * 10,
                "credits": 2250 * 10,
                "price_per_credit": 89 * 10 / 2250 * 10,
            }
        },
        "growth": {
            "month": {
                "price": 199,
                "credits": 12935,
                "price_per_credit": 199 / 12935,
            },
            "year": {
                "price": 199 * 10,
                "credits": 12935 * 10,
                "price_per_credit": 199 * 10 / 12935 * 10,
            }
        },
        "scale": {
            "month": {
                "price": 499,
                "credits": 37425,
                "price_per_credit": 499 / 37425,
            },
            "year": {
                "price": 499 * 10,
                "credits": 37425 * 10,
                "price_per_credit": 499 * 10 / 37425 * 10,
            }
        },
    },
}

# Model pricing (credits per prompt)
MODEL_ID_PRICE_MAP = {
    "gpt-4o": 1,
    "chatgpt": 1,
    "sonar": 1,
    "google-ai-overview": 1,
    "llama-3-3-70b-instruct": 0.5,
    "gpt-4o-search": 1,
    "claude-sonnet-4": 2,
    "claude-3-5-haiku": 2,
    "gemini-1-5-flash": 1,
    "deepseek-r1": 1,
    "gemini-2-5-flash": 2,
    "google-ai-mode": 1,
    "grok-2-1212": 2,
    "gpt-3-5-turbo": 1,
}


## 3. Define Models

In [194]:
from typing import List, Literal, Optional

from pydantic import BaseModel, ConfigDict
from pydantic.alias_generators import to_camel


class CamelCaseModel(BaseModel):
    """Base model for camelCase to snake_case conversion"""

    model_config = ConfigDict(
        alias_generator=to_camel,
        populate_by_name=True,
    )


class Organization(CamelCaseModel):
    id: str
    company_id: str
    model_ids: List[str]
    prompt_limit: int
    prompts_count: int
    chat_interval_in_hours: int


class Company(CamelCaseModel):
    id: str
    name: str
    type: Literal["IN_HOUSE", "AGENCY", "PARTNER"]
    domain: Optional[str] = None
    stripe_customer_id: str
    stripe_subscription_id: str


class Discount(CamelCaseModel):
    id: str
    percent_off: Optional[int] = None
    amount_off: Optional[int] = None
    duration: Literal["forever", "once", "repeating"]
    duration_in_months: Optional[int] = None


class Subscription(CamelCaseModel):
    id: str
    currency: Literal["eur", "usd"]
    customer: str
    discounts: List[Discount]


class SubscriptionItem(BaseModel):
    """Flattened subscription item model"""
    customer_id: str
    subscription_id: str
    product_id: str
    unit_amount: int
    quantity: int
    interval: Literal["month", "year"]
    interval_count: int
    currency: Literal["eur", "usd"]
    discounts: List[str]


class Product(CamelCaseModel):
    id: str
    name: str
    prompt_limit: Optional[int] = None
    type: Optional[Literal["WORKSPACE", "MODELS", "PROMPTS"]] = None


class MigrationOutput(BaseModel):
    company_name: str
    company_domain: Optional[str]
    company_type: Literal["IN_HOUSE", "AGENCY", "PARTNER"]
    orgs_count: int
    orgs_count_hf: int
    current_mrr: int
    current_arr: int
    interval: str
    discount: int
    discounts: str
    prompt_usage: int
    prompt_capacity: int
    credits_usage: int
    credits_capacity: int
    plan_name: str
    mrr: int
    mrr_change: int
    arr_change: int
    extra_credits_purchased: int
    surplus_credits: int


print("✓ Models defined")


✓ Models defined


## 4. Load Data

In [195]:
def load_json(file_path: Path):
    """Load JSON and replace NaN with None."""
    df_raw = pd.read_json(file_path)
    records = df_raw.replace({float("nan"): None}).to_dict("records")
    return records

# Define data paths
base_path = Path.cwd().parent.parent
data_path = base_path / "data"

print(f"Loading data from: {data_path}")

# Load all data files
print("Loading source data...")
companies_raw = load_json(data_path / "processed_companies.json")
orgs_raw = load_json(data_path / "processed_organizations.json")
subs_raw = load_json(data_path / "stripe_subscriptions.json")
coupons_raw = load_json(data_path / "stripe_coupons.json")
products_raw = load_json(data_path / "stripe_products.json")

print(f"✓ Loaded {len(companies_raw)} companies")
print(f"✓ Loaded {len(orgs_raw)} organizations")
print(f"✓ Loaded {len(subs_raw)} subscription items")
print(f"✓ Loaded {len(coupons_raw)} coupons")
print(f"✓ Loaded {len(products_raw)} products")


Loading data from: /Users/matevz/dev/peec-ai/stripe-migration-analysis/data
Loading source data...
✓ Loaded 11903 companies
✓ Loaded 3494 organizations
✓ Loaded 1315 subscription items
✓ Loaded 48 coupons
✓ Loaded 55 products


## 5. Filter and Valdiate Data

In [196]:
# Filter companies
companies_filtered = [
    c
    for c in companies_raw
    if c["stripeSubscriptionId"]
    and c["stripeCustomerId"]
    and c["stripeSubscriptionStatus"] == "active"
]
print(f"Filtered companies: {len(companies_filtered)} (removed {len(companies_raw) - len(companies_filtered)})")

# Flatten and map products
product_mapped = [
    {
        "id": p["id"],
        "name": p["name"],
        "prompt_limit": int(p["metadata"].get("promptLimit"))
        if p["metadata"] and p["metadata"].get("type") == "WORKSPACE"
        else None,
        "type": p["metadata"].get("type")
        if p["metadata"] and p["metadata"].get("type")
        else None,
    }
    for p in products_raw
    if p["active"]
]
print(f"Filtered products: {len(product_mapped)} (removed {len(products_raw) - len(product_mapped)})")

# Map subscriptions
subscriptions_mapped = []
for sub in subs_raw:
    discounts = []
    for discount in sub.get("discounts", []):
        coupon = discount.get("coupon", {})
        discounts.append({
            "id": discount["id"],
            "percent_off": coupon.get("percent_off"),
            "amount_off": coupon.get("amount_off"),
            "duration": coupon.get("duration"),
            "duration_in_months": coupon.get("duration_in_months"),
        })

    subscriptions_mapped.append({
        "id": sub["id"],
        "currency": sub["currency"],
        "customer": sub["customer"],
        "discounts": discounts,
    })


# Flatten subscription items from nested structure
subscription_items_flat = []
for sub in subs_raw:
    for item in sub.get("items", {}).get("data", []):
        price = item.get("price", {})
        recurring = price.get("recurring", {})
        subscription_items_flat.append({
            "customer_id": sub["customer"],
            "subscription_id": sub["id"],
            "product_id": price.get("product"),
            "unit_amount": price.get("unit_amount"),
            "quantity": item.get("quantity", 1),  # Default to 1 if missing
            "interval": recurring.get("interval"),
            "interval_count": recurring.get("interval_count"),
            "currency": sub["currency"],
            "discounts": item["discounts"],
        })

print(f"Flattened subscription items: {len(subscription_items_flat)}")

# Validate with Pydantic models
companies = [Company.model_validate(c) for c in companies_filtered]
orgs = [Organization.model_validate(o) for o in orgs_raw]
products = [Product.model_validate(p) for p in product_mapped]
subscriptions = [Subscription.model_validate(s) for s in subscriptions_mapped]
subscription_items = [SubscriptionItem.model_validate(s) for s in subscription_items_flat]

# Create coupon lookup
coupons_map = {c["id"]: c for c in coupons_raw}

print(f"✓ Data validated: {len(companies)} companies, {len(orgs)} orgs, {len(products)} products, {len(subscription_items)} items")


Filtered companies: 1285 (removed 10618)
Filtered products: 47 (removed 8)
Flattened subscription items: 1577
✓ Data validated: 1285 companies, 3494 orgs, 47 products, 1577 items


## 6. Create DataFrames

In [197]:
# Convert validated models to DataFrames
companies_df = pd.DataFrame([c.model_dump() for c in companies])
orgs_df = pd.DataFrame([o.model_dump() for o in orgs])
subs_df = pd.DataFrame([s.model_dump() for s in subscriptions])
sub_items_df = pd.DataFrame([s.model_dump() for s in subscription_items])
products_df = pd.DataFrame([p.model_dump() for p in products])

print("✓ DataFrames created:")
print(f"  Companies: {companies_df.shape}")
print(f"  Organizations: {orgs_df.shape}")
print(f"  Subscriptions: {subs_df.shape}")
print(f"  Subscription Items: {sub_items_df.shape}")
print(f"  Products: {products_df.shape}")

✓ DataFrames created:
  Companies: (1285, 6)
  Organizations: (3494, 6)
  Subscriptions: (1315, 4)
  Subscription Items: (1577, 9)
  Products: (47, 4)


## 7. Calculate Orgs Credits Usage

In [198]:
def calculate_credits_usage(row: pd.Series) -> int:
    """Calculate required credits based on model usage and run frequency."""
    runs_per_month = 30
    model_prices = [MODEL_ID_PRICE_MAP.get(mid, 0) for mid in row["model_ids"]]
    return int(sum(model_prices) * row["prompts_count"] * runs_per_month)

def calculate_credits_capacity(row: pd.Series) -> int:
    """Calculate required credits based on prompt capacity and run frequency."""
    runs_per_month = 30
    model_prices = [MODEL_ID_PRICE_MAP.get(mid, 0) for mid in row["model_ids"]]
    return int(sum(model_prices) * row["prompt_limit"] * runs_per_month)

# Calculate credits for each organization
orgs_df["credits_usage"] = orgs_df.apply(calculate_credits_usage, axis=1)
orgs_df["credits_capacity"] = orgs_df.apply(calculate_credits_capacity, axis=1)

print("✓ Calculated credits for organizations")
orgs_df[["id", "company_id", "prompts_count", "prompt_limit", "credits_usage", "credits_capacity"]].head()

✓ Calculated credits for organizations


Unnamed: 0,id,company_id,prompts_count,prompt_limit,credits_usage,credits_capacity
0,20da1ff7-bed2-40e8-a5c0-cade5250e7ba,co_1fea122e-be87-47f8-b459-4bb426706d35,23,30,2760,3600
1,25f8bb17-0754-4840-ada6-40e7a9345f27,co_f5267b94-1922-4312-8e0d-b3b2b20864fa,21,25,2205,2625
2,28b0f80a-4e7a-4936-97b8-838150c78f70,co_ae8a374a-3893-4c21-857c-1bea3a469807,34,30,3060,2700
3,3d08ee7f-b5bd-4324-8524-f0f97ead5245,co_1f216996-4e82-46a6-9089-97c619ecf16c,10,120,1200,14400
4,4e838a3a-eb44-4378-bda1-c94de3357279,co_66c7c3ff-4038-4dcc-b225-89f7fc5e212a,33,55,6930,11550


## 8. Subscription Data

In [199]:
sub_items_df.head()

Unnamed: 0,customer_id,subscription_id,product_id,unit_amount,quantity,interval,interval_count,currency,discounts
0,cus_SyqKFZAHL42jJk,sub_1SKeRUKojVEYZPlXVLB582oV,prod_S6Vq3DJcPoXe3i,8900,1,month,1,eur,[]
1,cus_TGun6Ud32rgWy3,sub_1SKOadKojVEYZPlXJ9tNNome,prod_S6Vr6mr8BJrWGc,19900,1,month,1,eur,[]
2,cus_Sw6qaFlQPg9s1W,sub_1SKOHDKojVEYZPlXhgKZiXoE,prod_S6Vr6mr8BJrWGc,19900,1,month,1,eur,[]
3,cus_TGvTiJ59etQ8OY,sub_1SKNcGKojVEYZPlXXD9PlTEg,prod_SRfFIFqKBnTrlh,12900,1,month,1,eur,[]
4,cus_TGvTiJ59etQ8OY,sub_1SKNcGKojVEYZPlXXD9PlTEg,prod_S6Vq3DJcPoXe3i,0,1,month,1,eur,[]


In [200]:
# Step 1: Create coupons DataFrame from coupons_raw
coupons_df = pd.DataFrame(coupons_raw)
print(f"✓ Loaded {len(coupons_df)} coupons")

# Step 2: Process line item discounts
def apply_item_discounts(row):
    """Apply long-term discounts to a subscription item."""
    unit_amount = row['unit_amount']
    quantity = row['quantity']
    
    if unit_amount is None or pd.isna(unit_amount):
        return pd.Series({'discounted_amount': 0, 'discount_count': 0})
    
    # Multiply unit amount by quantity to get total line item amount
    total_amount = unit_amount * quantity
    discounted_amount = total_amount
    discount_count = 0
    
    # Process each discount on this item
    for discount_id in row['discounts']:
        # Look up the coupon for this discount
        coupon = coupons_df[coupons_df['id'] == discount_id]
        if coupon.empty:
            continue
        
        coupon_data = coupon.iloc[0]
        
        # Only apply long-term discounts (forever or repeating >= 12 months)
        is_forever = coupon_data['duration'] == 'forever'
        is_long_term_repeating = (
            coupon_data['duration'] == 'repeating' and 
            coupon_data['duration_in_months'] is not None and
            coupon_data['duration_in_months'] >= 12
        )
        
        if not (is_forever or is_long_term_repeating):
            continue
        
        # Apply amount_off discount (applies to line item total, not per unit)
        if coupon_data['amount_off'] is not None and not pd.isna(coupon_data['amount_off']):
            discounted_amount -= coupon_data['amount_off']
            discount_count += 1
        # Apply percent_off discount
        elif coupon_data['percent_off'] is not None and not pd.isna(coupon_data['percent_off']):
            discounted_amount *= (100 - coupon_data['percent_off']) / 100
            discount_count += 1
    
    return pd.Series({
        'discounted_amount': max(0, discounted_amount),
        'discount_count': discount_count
    })

# Apply discounts to each item
sub_items_df[['discounted_unit_amount', 'item_discount_count']] = sub_items_df.apply(apply_item_discounts, axis=1)

print(f"✓ Calculated discounted amounts for {len(sub_items_df)} subscription items")

# Step 3: Aggregate items to subscription level
sub_aggregated = sub_items_df.groupby('subscription_id').agg({
    'discounted_unit_amount': 'sum',
    'item_discount_count': 'sum',
    'interval': lambda x: x.mode()[0] if len(x.mode()) > 0 else x.iloc[0],  # Most common interval
    'currency': 'first'
}).reset_index()

sub_aggregated.rename(columns={'discounted_unit_amount': 'base_amount'}, inplace=True)

print(f"✓ Aggregated {len(sub_aggregated)} subscriptions")

# Step 4: Apply subscription-level discounts
def apply_subscription_discounts(row, subs_df, coupons_df):
    """Apply long-term subscription-level discounts."""
    sub_id = row['subscription_id']
    amount = row['base_amount']
    
    # Get subscription discounts
    sub_data = subs_df[subs_df['id'] == sub_id]
    if sub_data.empty:
        return pd.Series({'final_amount': amount, 'sub_discount_count': 0})
    
    discounts_list = sub_data.iloc[0]['discounts']
    if not discounts_list:
        return pd.Series({'final_amount': amount, 'sub_discount_count': 0})
    
    discounted_amount = amount
    discount_count = 0
    
    for discount in discounts_list:
        # Check if this is a long-term discount
        is_forever = discount.get('duration') == 'forever'
        is_long_term_repeating = (
            discount.get('duration') == 'repeating' and 
            discount.get('duration_in_months') is not None and
            discount.get('duration_in_months') >= 12
        )
        
        if not (is_forever or is_long_term_repeating):
            continue
        
        # Apply amount_off discount
        if discount.get('amount_off') is not None:
            discounted_amount -= discount['amount_off']
            discount_count += 1
        # Apply percent_off discount
        elif discount.get('percent_off') is not None:
            discounted_amount *= (100 - discount['percent_off']) / 100
            discount_count += 1
    
    return pd.Series({
        'final_amount': max(0, discounted_amount),
        'sub_discount_count': discount_count
    })

sub_aggregated[['final_amount', 'sub_discount_count']] = sub_aggregated.apply(
    lambda row: apply_subscription_discounts(row, subs_df, coupons_df), 
    axis=1
)

# Calculate total discount count
sub_aggregated['discount_count'] = sub_aggregated['item_discount_count'] + sub_aggregated['sub_discount_count']

print("✓ Applied subscription-level discounts")

# Step 5: Calculate MRR/ARR based on interval
def calculate_mrr_arr(row):
    """Calculate MRR and ARR based on interval."""
    amount = row['final_amount']
    interval = row['interval']
    
    if interval == 'month':
        return pd.Series({'mrr': amount / 100, 'arr': None})  # Convert cents to dollars/euros
    elif interval == 'year':
        arr = amount / 100  # Convert cents to dollars/euros
        mrr = arr / 12
        return pd.Series({'mrr': mrr, 'arr': arr})
    else:
        return pd.Series({'mrr': None, 'arr': None})

sub_aggregated[['mrr', 'arr']] = sub_aggregated.apply(calculate_mrr_arr, axis=1)

print("✓ Calculated MRR/ARR")

# Step 6: Join with companies data
# First, merge subscription data with subs_df to get customer_id
subscription_data = sub_aggregated.merge(
    subs_df[['id', 'customer']],
    left_on='subscription_id',
    right_on='id',
    how='left'
).drop(columns=['id'])

subscription_data.rename(columns={'customer': 'customer_id'}, inplace=True)

# Now merge with companies to get company_id
subscription_data = subscription_data.merge(
    companies_df[['id', 'stripe_customer_id']],
    left_on='customer_id',
    right_on='stripe_customer_id',
    how='left'
)

subscription_data.rename(columns={'id': 'company_id'}, inplace=True)

# Step 7: Create final DataFrame with required columns
subscription_data_df = subscription_data[[
    'company_id',
    'customer_id', 
    'subscription_id',
    'mrr',
    'arr',
    'discount_count',
    'currency',
    'interval',
]].copy()

print(f"\n✓ Created subscription_data_df with {len(subscription_data_df)} subscriptions")
print(f"  Monthly subscriptions: {subscription_data_df['arr'].isna().sum()}")
print(f"  Yearly subscriptions: {subscription_data_df['arr'].notna().sum()}")
print(f"  Subscriptions with discounts: {(subscription_data_df['discount_count'] > 0).sum()}")
print("\nSample data:")

subscription_data_df.head(10)


✓ Loaded 48 coupons
✓ Calculated discounted amounts for 1577 subscription items
✓ Aggregated 1315 subscriptions
✓ Applied subscription-level discounts
✓ Calculated MRR/ARR

✓ Created subscription_data_df with 1317 subscriptions
  Monthly subscriptions: 1277
  Yearly subscriptions: 40
  Subscriptions with discounts: 18

Sample data:


Unnamed: 0,company_id,customer_id,subscription_id,mrr,arr,discount_count,currency,interval
0,co_2ebe593d-1e29-4669-8e0f-9e315cd56f5a,cus_Rg0jFvpFlajive,sub_1QmeiLKojVEYZPlXi5MWl3yK,178.0,,1.0,eur,month
1,co_66c7c3ff-4038-4dcc-b225-89f7fc5e212a,cus_RgT8o2F5OOUaSZ,sub_1Qn6CJKojVEYZPlXWJbZYtmw,138.0,,0.0,eur,month
2,co_3b72236b-8e31-40a4-a1e8-0e764d0a66a7,cus_RkVsLVkfHVfbgw,sub_1Qr0qTKojVEYZPlXHvzPi4Za,1300.0,,0.0,eur,month
3,co_7dd92ff3-4b2e-4f2d-820e-943aa303f0a5,cus_Rle9u38yh4qQwV,sub_1Qs6r7KojVEYZPlXmUSaEzTm,350.0,,0.0,eur,month
4,co_8a9b51da-b538-4227-a77d-1c5095f454d1,cus_RlnXKGF0oxvbL9,sub_1QsFwnKojVEYZPlXWipZ7TE6,180.0,,0.0,eur,month
5,co_75a298ac-a7ee-411a-a001-3eca23dcf1c8,cus_RlqqHruUNFA2Xa,sub_1QsJ8mKojVEYZPlXn7AcJGYQ,1560.0,,0.0,eur,month
6,co_af273fff-420d-4707-a44f-f06dac7589d5,cus_Ro9CwwoUwQ4J1I,sub_1QuWuQKojVEYZPlXr9wtEj5I,199.0,,0.0,eur,month
7,co_1fea122e-be87-47f8-b459-4bb426706d35,cus_RpdUTheiW3FXhN,sub_1QvyDQKojVEYZPlXiKerghsd,210.0,,0.0,eur,month
8,co_653ed519-88e1-4c7d-9866-93f16feb8f9a,cus_Rq1hmmxhI6OuWE,sub_1QwLeAKojVEYZPlXe0EwbFeN,330.0,,0.0,eur,month
9,co_a5b9f4f0-d872-4fed-a217-d41a3c7284a9,cus_Rr69XZNAWsWkRq,sub_1QxNxGKojVEYZPlXy9fhtN82,199.0,,0.0,eur,month


## Purchased workspace prompts

In [201]:
items_with_product = sub_items_df.merge(
    products_df[["id", "name", "prompt_limit", "type"]],
    left_on="product_id",
    right_on="id",
    how="left",
)

# filter workspace items
sub_items_with_product = items_with_product[items_with_product["type"] == "WORKSPACE"]
print(f"Workspace items: {len(sub_items_with_product)}")
sub_items_with_product = sub_items_with_product[sub_items_with_product["unit_amount"] > 0]
print(f"Workspace items with amount > 0: {len(sub_items_with_product)}")

# group by customer_id and sum the prompt_limit and count workspace items
company_stats_stripe = (
    sub_items_with_product.groupby("customer_id")
    .agg({
        "prompt_limit": "sum",
        "product_id": "size"  # Count of workspace items with unit_amount > 0
    })
    .reset_index()
    .rename(columns={
        "prompt_limit": "purchased_capacity",
        "product_id": "orgs_purchased"
    })
)

Workspace items: 1467
Workspace items with amount > 0: 1413


## 9. Capacity Data

In [202]:
company_stats_fs = (
    orgs_df.groupby("company_id")
    .agg(
        orgs_count=("company_id", "size"),  # Count of organizations per company
        prompts_count=("prompts_count", "sum"),
        prompt_limit=("prompt_limit", "sum"),
        credits_usage=("credits_usage", "sum"),
        credits_capacity=("credits_capacity", "sum"),
        orgs_count_hf=("chat_interval_in_hours", lambda x: (x < 24).sum()),
    )
    .reset_index()
)

company_stats_fs.head()


Unnamed: 0,company_id,orgs_count,prompts_count,prompt_limit,credits_usage,credits_capacity,orgs_count_hf
0,co_004d676c-c61c-4888-bcf8-6ef606a156ed,1,6,25,540,2250,0
1,co_0066473d-9106-4e5f-b13c-7b756c207675,7,197,505,17730,45450,0
2,co_00a903f4-7115-4770-bde3-6e8eb9982243,1,25,25,2250,2250,0
3,co_00e9c907-6659-4829-9a93-558923266790,1,100,100,9000,9000,0
4,co_00f466ea-1612-4546-9292-ffc03d029c2e,1,11,25,990,2250,0


In [203]:
main = companies_df.merge(
    company_stats_stripe,
    left_on="stripe_customer_id",
    right_on="customer_id",
    how="left",
)

main = main.merge(
    company_stats_fs,
    left_on="id",
    right_on="company_id",
    how="left",
)

main = main.merge(
    subscription_data_df,
    left_on="id",
    right_on="company_id",
    how="left",
)

main.head()

Unnamed: 0,id,name,type,domain,stripe_customer_id,stripe_subscription_id,customer_id_x,purchased_capacity,orgs_purchased,company_id_x,orgs_count,prompts_count,prompt_limit,credits_usage,credits_capacity,orgs_count_hf,company_id_y,customer_id_y,subscription_id,mrr,arr,discount_count,currency,interval
0,co_0066473d-9106-4e5f-b13c-7b756c207675,Flying Cat,AGENCY,flyingcatmarketing.com,cus_T6T5U4MOGZ7ntZ,sub_1SDr10KojVEYZPlXy4X5spim,cus_T6T5U4MOGZ7ntZ,300.0,1.0,co_0066473d-9106-4e5f-b13c-7b756c207675,7.0,197.0,505.0,17730.0,45450.0,0.0,co_0066473d-9106-4e5f-b13c-7b756c207675,cus_T6T5U4MOGZ7ntZ,sub_1SDr10KojVEYZPlXy4X5spim,499.0,,0.0,eur,month
1,co_00e9c907-6659-4829-9a93-558923266790,Wickey,IN_HOUSE,wickey.de,cus_T40lDnXfwWLTJe,sub_1SD0bmKojVEYZPlXA2382RPf,cus_T40lDnXfwWLTJe,100.0,1.0,co_00e9c907-6659-4829-9a93-558923266790,1.0,100.0,100.0,9000.0,9000.0,0.0,co_00e9c907-6659-4829-9a93-558923266790,cus_T40lDnXfwWLTJe,sub_1SD0bmKojVEYZPlXA2382RPf,199.0,,0.0,eur,month
2,co_01766115-a8ce-40f5-8dc4-b391fcee3db0,Dot Dash,AGENCY,dotdash.io,cus_T2bSdUiWSeuKVX,sub_1SGhkrKojVEYZPlXgxccGh4K,cus_T2bSdUiWSeuKVX,25.0,1.0,co_01766115-a8ce-40f5-8dc4-b391fcee3db0,1.0,6.0,25.0,540.0,2250.0,0.0,co_01766115-a8ce-40f5-8dc4-b391fcee3db0,cus_T2bSdUiWSeuKVX,sub_1SGhkrKojVEYZPlXgxccGh4K,89.0,,0.0,usd,month
3,co_018bcf89-5317-4104-88df-9f6e77a52276,TrueClicks,IN_HOUSE,trueclicks.com,cus_SytMawkXSQzCwo,sub_1S2vbmKojVEYZPlX79pt2DcB,cus_SytMawkXSQzCwo,25.0,1.0,co_018bcf89-5317-4104-88df-9f6e77a52276,1.0,9.0,25.0,810.0,2250.0,0.0,co_018bcf89-5317-4104-88df-9f6e77a52276,cus_SytMawkXSQzCwo,sub_1S2vbmKojVEYZPlX79pt2DcB,89.0,,0.0,eur,month
4,co_01dbefdf-03bd-4788-90c3-8aeb11c359f7,CommsCo,AGENCY,thecommsco.com,cus_SlmRXxPfcvT5vJ,sub_1RqEu2KojVEYZPlX9mMIMKnT,cus_SlmRXxPfcvT5vJ,25.0,1.0,co_01dbefdf-03bd-4788-90c3-8aeb11c359f7,4.0,76.0,100.0,6840.0,9000.0,0.0,co_01dbefdf-03bd-4788-90c3-8aeb11c359f7,cus_SlmRXxPfcvT5vJ,sub_1RqEu2KojVEYZPlX9mMIMKnT,280.0,,0.0,eur,month


## 10. Migration Analysis


In [204]:
# Step 2: Calculate baseline_credits_needed
# Use the higher of what they're using vs what they have configured
main['baseline_credits_needed'] = main[['credits_usage', 'credits_capacity']].max(axis=1)

print("✓ Calculated baseline_credits_needed")
print(f"  Average baseline credits: {main['baseline_credits_needed'].mean():.0f}")
main[['name', 'credits_usage', 'credits_capacity', 'baseline_credits_needed']].head()


✓ Calculated baseline_credits_needed
  Average baseline credits: 11021


Unnamed: 0,name,credits_usage,credits_capacity,baseline_credits_needed
0,Flying Cat,17730.0,45450.0,45450.0
1,Wickey,9000.0,9000.0,9000.0
2,Dot Dash,540.0,2250.0,2250.0
3,TrueClicks,810.0,2250.0,2250.0
4,CommsCo,6840.0,9000.0,9000.0


In [205]:
# Step 3: Filter migration_df (exclude 100% discounts and missing data)
# Calculate effective discount percentage
main['effective_discount_pct'] = 0
has_mrr = main['mrr'].notna()
has_arr = main['arr'].notna()

# For monthly subscriptions
monthly_mask = has_mrr & main['mrr'].notna()
# Calculate what they should pay for baseline credits
# We'll use this later, but for now just filter

# Filter out companies with issues
migration_df = main[
    # Must have subscription data
    (has_mrr | has_arr) &
    # Must have credit data
    main['baseline_credits_needed'].notna() &
    # Exclude 100% discount/free accounts (paying less than 1 EUR/USD)
    ((main['mrr'].fillna(0) > 1) | (main['arr'].fillna(0) > 12))
].copy()

print("✓ Filtered migration_df")
print(f"  Total companies in main: {len(main)}")
print(f"  Companies for migration: {len(migration_df)}")
print(f"  Excluded (100% discount or missing data): {len(main) - len(migration_df)}")
print(f"  By type: {migration_df['type'].value_counts().to_dict()}")

migration_df.head()


✓ Filtered migration_df
  Total companies in main: 1286
  Companies for migration: 1234
  Excluded (100% discount or missing data): 52
  By type: {'IN_HOUSE': 709, 'AGENCY': 525}


Unnamed: 0,id,name,type,domain,stripe_customer_id,stripe_subscription_id,customer_id_x,purchased_capacity,orgs_purchased,company_id_x,orgs_count,prompts_count,prompt_limit,credits_usage,credits_capacity,orgs_count_hf,company_id_y,customer_id_y,subscription_id,mrr,arr,discount_count,currency,interval,baseline_credits_needed,effective_discount_pct
0,co_0066473d-9106-4e5f-b13c-7b756c207675,Flying Cat,AGENCY,flyingcatmarketing.com,cus_T6T5U4MOGZ7ntZ,sub_1SDr10KojVEYZPlXy4X5spim,cus_T6T5U4MOGZ7ntZ,300.0,1.0,co_0066473d-9106-4e5f-b13c-7b756c207675,7.0,197.0,505.0,17730.0,45450.0,0.0,co_0066473d-9106-4e5f-b13c-7b756c207675,cus_T6T5U4MOGZ7ntZ,sub_1SDr10KojVEYZPlXy4X5spim,499.0,,0.0,eur,month,45450.0,0
1,co_00e9c907-6659-4829-9a93-558923266790,Wickey,IN_HOUSE,wickey.de,cus_T40lDnXfwWLTJe,sub_1SD0bmKojVEYZPlXA2382RPf,cus_T40lDnXfwWLTJe,100.0,1.0,co_00e9c907-6659-4829-9a93-558923266790,1.0,100.0,100.0,9000.0,9000.0,0.0,co_00e9c907-6659-4829-9a93-558923266790,cus_T40lDnXfwWLTJe,sub_1SD0bmKojVEYZPlXA2382RPf,199.0,,0.0,eur,month,9000.0,0
2,co_01766115-a8ce-40f5-8dc4-b391fcee3db0,Dot Dash,AGENCY,dotdash.io,cus_T2bSdUiWSeuKVX,sub_1SGhkrKojVEYZPlXgxccGh4K,cus_T2bSdUiWSeuKVX,25.0,1.0,co_01766115-a8ce-40f5-8dc4-b391fcee3db0,1.0,6.0,25.0,540.0,2250.0,0.0,co_01766115-a8ce-40f5-8dc4-b391fcee3db0,cus_T2bSdUiWSeuKVX,sub_1SGhkrKojVEYZPlXgxccGh4K,89.0,,0.0,usd,month,2250.0,0
3,co_018bcf89-5317-4104-88df-9f6e77a52276,TrueClicks,IN_HOUSE,trueclicks.com,cus_SytMawkXSQzCwo,sub_1S2vbmKojVEYZPlX79pt2DcB,cus_SytMawkXSQzCwo,25.0,1.0,co_018bcf89-5317-4104-88df-9f6e77a52276,1.0,9.0,25.0,810.0,2250.0,0.0,co_018bcf89-5317-4104-88df-9f6e77a52276,cus_SytMawkXSQzCwo,sub_1S2vbmKojVEYZPlX79pt2DcB,89.0,,0.0,eur,month,2250.0,0
4,co_01dbefdf-03bd-4788-90c3-8aeb11c359f7,CommsCo,AGENCY,thecommsco.com,cus_SlmRXxPfcvT5vJ,sub_1RqEu2KojVEYZPlX9mMIMKnT,cus_SlmRXxPfcvT5vJ,25.0,1.0,co_01dbefdf-03bd-4788-90c3-8aeb11c359f7,4.0,76.0,100.0,6840.0,9000.0,0.0,co_01dbefdf-03bd-4788-90c3-8aeb11c359f7,cus_SlmRXxPfcvT5vJ,sub_1RqEu2KojVEYZPlX9mMIMKnT,280.0,,0.0,eur,month,9000.0,0


In [206]:
# Step 4: Plan selection helper functions
def select_plan_for_company(row):
    """
    Select the most expensive plan that fits within the company's current MRR/ARR.
    Returns: plan_name, plan_credits, plan_price, credit_price
    """
    company_type = row["type"]
    interval = row["interval"]

    # Determine which plan set to use
    if company_type == "IN_HOUSE":
        plans = BRAND_PLANS
    elif company_type == "AGENCY":
        plans = AGENCY_PLANS
    else:  # PARTNER
        plans = BRAND_PLANS  # Default to brand plans

    plans = plans[row["currency"]]

    # Get current payment amount (MRR for monthly, ARR for yearly)
    if interval == "month":
        current_payment = row["mrr"]
    else:  # year
        current_payment = row["arr"]

    # If no payment data, return None
    if pd.isna(current_payment) or current_payment <= 0:
        # throw error
        raise ValueError(f"Company {row['name']} has no payment data")

    # Find all plans that fit within budget
    affordable_plans = []
    for plan_name, plan_data in plans.items():
        if interval in plan_data:
            plan_info = plan_data[interval]
            plan_price = plan_info["price"]

            if plan_price <= current_payment:
                affordable_plans.append(
                    {
                        "name": plan_name,
                        "price": plan_price,
                        "credits": plan_info["credits"],
                        "price_per_credit": plan_info["price_per_credit"],
                    }
                )

    # If no affordable plans, use the cheapest one (will need discount)
    if not affordable_plans:
        cheapest_plan_name = min(
            plans.keys(),
            key=lambda x: plans[x][interval]["price"]
            if interval in plans[x]
            else float("inf"),
        )
        cheapest_plan = plans[cheapest_plan_name][interval]
        return pd.Series(
            {
                "new_plan": cheapest_plan_name,
                "plan_credits": cheapest_plan["credits"],
                "plan_price": cheapest_plan["price"],
                "credit_price": cheapest_plan["price_per_credit"],
            }
        )

    # Select the most expensive affordable plan
    selected_plan = max(affordable_plans, key=lambda x: x["price"])

    return pd.Series(
        {
            "new_plan": selected_plan["name"],
            "plan_credits": selected_plan["credits"],
            "plan_price": selected_plan["price"],
            "credit_price": selected_plan["price_per_credit"],
        }
    )


# Apply plan selection
migration_df[["new_plan", "plan_credits", "plan_price", "credit_price"]] = (
    migration_df.apply(select_plan_for_company, axis=1)
)

print("✓ Selected plans for companies")
print("\nPlan distribution:")
print(migration_df["new_plan"].value_counts())
migration_df[["name", "type", "interval", "mrr", "arr", "new_plan", "plan_price"]].head(
    10
)


✓ Selected plans for companies

Plan distribution:
new_plan
starter       473
intro         258
pro           212
growth        202
scale          65
enterprise     24
Name: count, dtype: int64


Unnamed: 0,name,type,interval,mrr,arr,new_plan,plan_price
0,Flying Cat,AGENCY,month,499.0,,scale,499
1,Wickey,IN_HOUSE,month,199.0,,pro,199
2,Dot Dash,AGENCY,month,89.0,,intro,89
3,TrueClicks,IN_HOUSE,month,89.0,,starter,89
4,CommsCo,AGENCY,month,280.0,,growth,199
5,Gear4music,IN_HOUSE,month,199.0,,pro,199
6,Betmode,IN_HOUSE,month,199.0,,pro,199
7,RivalMind,AGENCY,month,199.0,,growth,199
8,Harper James,IN_HOUSE,month,499.0,,enterprise,499
9,addmustard,AGENCY,month,199.0,,growth,199


In [None]:
# Step 5: Calculate purchased_credits and extra_credits_granted
def calculate_extra_credits(row):
    """
    Calculate purchased credits (they're paying for) vs granted credits (we give free).
    """
    baseline_needed = row['baseline_credits_needed']
    plan_credits = row['plan_credits']
    plan_price = row['plan_price']
    credit_price = row['credit_price']
    
    # Get current payment
    if row['interval'] == 'month':
        current_payment = row['mrr']
    else:
        current_payment = row['arr']
    
    # Handle missing data
    if pd.isna(baseline_needed) or pd.isna(plan_credits) or pd.isna(current_payment):
        return pd.Series({
            'purchased_credits': 0,
            'extra_credits_granted': 0,
            'extra_credits_granted_10': 0
        })
    
    # Calculate credit gap
    credits_gap = baseline_needed - plan_credits
    
    # Case C: Plan has enough credits
    if credits_gap <= 0:
        return pd.Series({
            'purchased_credits': 0,
            'extra_credits_granted': 0,
            'extra_credits_granted_10': 0
        })
    
    # Case A: Paying enough for extra credits
    available_for_credits = current_payment - plan_price
    if available_for_credits > 0:
        # They can afford to buy credits
        affordable_credits = available_for_credits / credit_price if credit_price > 0 else 0
        purchased = min(credits_gap, affordable_credits)
        
        # If they still need more after purchasing, grant the rest
        remaining_gap = credits_gap - purchased
        granted = max(0, remaining_gap)
        granted_10 = granted * 1.10
        
        return pd.Series({
            'purchased_credits': purchased,
            'extra_credits_granted': granted,
            'extra_credits_granted_10': granted_10
        })
    
    # Case B: Need credits but not paying enough
    else:
        return pd.Series({
            'purchased_credits': 0,
            'extra_credits_granted': credits_gap,
            'extra_credits_granted_10': credits_gap * 1.10
        })

# Apply credit calculations
migration_df[['purchased_credits', 'extra_credits_granted', 'extra_credits_granted_10']] = \
    migration_df.apply(calculate_extra_credits, axis=1)

print("✓ Calculated extra credits")
print(f"  Companies purchasing credits: {(migration_df['purchased_credits'] > 0).sum()}")
print(f"  Companies needing granted credits: {(migration_df['extra_credits_granted'] > 0).sum()}")

migration_df.drop(columns=["id", "customer_id_x", "company_id_x", "company_id_y", "customer_id_y", ]).to_csv("migration_df.csv", index=False)

migration_df[['name', 'baseline_credits_needed', 'plan_credits', 'purchased_credits', 'extra_credits_granted']].head(10)


✓ Calculated extra credits
  Companies purchasing credits: 172
  Companies needing granted credits: 101


Unnamed: 0,name,baseline_credits_needed,plan_credits,purchased_credits,extra_credits_granted
0,Flying Cat,45450.0,37425,0.0,8025.0
1,Wickey,9000.0,14925,0.0,0.0
2,Dot Dash,2250.0,2250,0.0,0.0
3,TrueClicks,2250.0,3560,0.0,0.0
4,CommsCo,9000.0,12935,0.0,0.0
5,Gear4music,9000.0,14925,0.0,0.0
6,Betmode,9000.0,14925,0.0,0.0
7,RivalMind,9450.0,12935,0.0,0.0
8,Harper James,27000.0,49900,0.0,0.0
9,addmustard,2250.0,12935,0.0,0.0


In [161]:
# Step 6: Calculate discounts for underpaying companies
def calculate_discount(row):
    """
    Calculate discount needed for companies paying less than the cheapest plan.
    """
    company_type = row['type']
    interval = row['interval']
    
    # Determine which plan set to use
    if company_type == 'IN_HOUSE':
        plans = BRAND_PLANS
        cheapest_plan_name = 'starter'
    elif company_type == 'AGENCY':
        plans = AGENCY_PLANS
        cheapest_plan_name = 'intro'
    else:  # PARTNER
        plans = BRAND_PLANS
        cheapest_plan_name = 'starter'

    plans = plans[row["currency"]]
    
    # Get current payment
    if interval == 'month':
        current_payment = row['mrr']
    else:
        current_payment = row['arr']
    
    # Handle missing data
    if pd.isna(current_payment):
        return pd.Series({
            'discount_amount': 0,
            'discount_pct': 0
        })
    
    # Get cheapest plan price
    if interval in plans[cheapest_plan_name]:
        cheapest_price = plans[cheapest_plan_name][interval]['price']
    else:
        return pd.Series({
            'discount_amount': 0,
            'discount_pct': 0
        })
    
    # Calculate discount if underpaying
    if current_payment < cheapest_price:
        discount_amount = cheapest_price - current_payment
        discount_pct = (discount_amount / cheapest_price) * 100
        return pd.Series({
            'discount_amount': discount_amount,
            'discount_pct': discount_pct
        })
    else:
        return pd.Series({
            'discount_amount': 0,
            'discount_pct': 0
        })

# Apply discount calculations
migration_df[['discount_amount', 'discount_pct']] = \
    migration_df.apply(calculate_discount, axis=1)

print("✓ Calculated discounts")
print(f"  Companies needing discount: {(migration_df['discount_pct'] > 0).sum()}")
print(f"  Average discount %: {migration_df[migration_df['discount_pct'] > 0]['discount_pct'].mean():.1f}%")

migration_df[migration_df['discount_pct'] > 0][['name', 'mrr', 'arr', 'plan_price', 'discount_amount', 'discount_pct']].head(10)


✓ Calculated discounts
  Companies needing discount: 1
  Average discount %: 14.6%


Unnamed: 0,name,mrr,arr,plan_price,discount_amount,discount_pct
600,EmberTribe,76.0,,89,13.0,14.606742


In [209]:
# Step 7: Calculate total_available_credits
migration_df['total_available_credits'] = (
    migration_df['plan_credits'] + 
    migration_df['purchased_credits'] + 
    migration_df['extra_credits_granted']
)

print("✓ Calculated total_available_credits")
print(f"  Average total credits: {migration_df['total_available_credits'].mean():.0f}")

migration_df[['name', 'plan_credits', 'purchased_credits', 'extra_credits_granted', 'total_available_credits']].head()


✓ Calculated total_available_credits
  Average total credits: 16129


Unnamed: 0,name,plan_credits,purchased_credits,extra_credits_granted,total_available_credits
0,Flying Cat,37425,0.0,8025.0,45450.0
1,Wickey,14925,0.0,0.0,14925.0
2,Dot Dash,2250,0.0,0.0,2250.0
3,TrueClicks,3560,0.0,0.0,3560.0
4,CommsCo,12935,0.0,0.0,12935.0


In [163]:
# Step 4: Plan selection helper functions
def select_plan_for_company(row):
    """
    Select the most expensive plan that fits within the company's current MRR/ARR.
    Returns: plan_name, plan_credits, plan_price, credit_price
    """
    company_type = row['type']
    interval = row['interval']

    
    # Determine which plan set to use
    if company_type == 'IN_HOUSE':
        plans = BRAND_PLANS
    elif company_type == 'AGENCY':
        plans = AGENCY_PLANS
    else:  # PARTNER
        plans = BRAND_PLANS  # Default to brand plans
    
    # Get current payment amount (MRR for monthly, ARR for yearly)
    if interval == 'month':
        current_payment = row['mrr']
    else:  # year
        current_payment = row['arr']
    
    return pd.Series({ 'new_plan': "none", 'plan_credits': 0, 'plan_price': 0, 'credit_price': 0 })
    # If no payment data, return None
    if pd.isna(current_payment) or current_payment <= 0:
        return pd.Series({
            'new_plan': None,
            'plan_credits': None,
            'plan_price': None,
            'credit_price': None
        })
    
    # Find all plans that fit within budget
    affordable_plans = []
    for plan_name, plan_data in plans.items():
        if interval in plan_data:
            plan_info = plan_data[interval]
            plan_price = plan_info['price']
            
            if plan_price <= current_payment:
                affordable_plans.append({
                    'name': plan_name,
                    'price': plan_price,
                    'credits': plan_info['credits'],
                    'price_per_credit': plan_info['price_per_credit']
                })
    
    # If no affordable plans, use the cheapest one (will need discount)
    if not affordable_plans:
        cheapest_plan_name = min(plans.keys(), 
                                  key=lambda x: plans[x][interval]['price'] if interval in plans[x] else float('inf'))
        cheapest_plan = plans[cheapest_plan_name][interval]
        return pd.Series({
            'new_plan': cheapest_plan_name,
            'plan_credits': cheapest_plan['credits'],
            'plan_price': cheapest_plan['price'],
            'credit_price': cheapest_plan['price_per_credit']
        })
    
    # Select the most expensive affordable plan
    selected_plan = max(affordable_plans, key=lambda x: x['price'])
    
    return pd.Series({
        'new_plan': selected_plan['name'],
        'plan_credits': selected_plan['credits'],
        'plan_price': selected_plan['price'],
        'credit_price': selected_plan['price_per_credit']
    })

# Apply plan selection
migration_df[['new_plan', 'plan_credits', 'plan_price', 'credit_price']] = \
    migration_df.apply(select_plan_for_company, axis=1)

print("✓ Selected plans for companies")
print("\nPlan distribution:")
print(migration_df['new_plan'].value_counts())
migration_df[['name', 'type', 'interval', 'mrr', 'arr', 'new_plan', 'plan_price']].head(10)


✓ Selected plans for companies

Plan distribution:
new_plan
none    1234
Name: count, dtype: int64


Unnamed: 0,name,type,interval,mrr,arr,new_plan,plan_price
0,Flying Cat,AGENCY,month,499.0,,none,0
1,Wickey,IN_HOUSE,month,199.0,,none,0
2,Dot Dash,AGENCY,month,89.0,,none,0
3,TrueClicks,IN_HOUSE,month,89.0,,none,0
4,CommsCo,AGENCY,month,280.0,,none,0
5,Gear4music,IN_HOUSE,month,199.0,,none,0
6,Betmode,IN_HOUSE,month,199.0,,none,0
7,RivalMind,AGENCY,month,199.0,,none,0
8,Harper James,IN_HOUSE,month,499.0,,none,0
9,addmustard,AGENCY,month,199.0,,none,0


In [211]:
# Step 8: Calculate unused_credits
# Total available minus actual usage
migration_df['unused_credits'] = migration_df['total_available_credits'] - migration_df['credits_usage']

# Ensure non-negative
migration_df['unused_credits'] = migration_df['unused_credits'].clip(lower=0)

print("✓ Calculated unused_credits")
print(f"  Average unused credits: {migration_df['unused_credits'].mean():.0f}")
print(f"  Companies with unused credits: {(migration_df['unused_credits'] > 0).sum()}")

migration_df[['name', 'total_available_credits', 'credits_usage', 'unused_credits']].head(10)


✓ Calculated unused_credits
  Average unused credits: 9019
  Companies with unused credits: 1147


Unnamed: 0,name,total_available_credits,credits_usage,unused_credits
0,Flying Cat,45450.0,17730.0,27720.0
1,Wickey,14925.0,9000.0,5925.0
2,Dot Dash,2250.0,540.0,1710.0
3,TrueClicks,3560.0,810.0,2750.0
4,CommsCo,12935.0,6840.0,6095.0
5,Gear4music,14925.0,9000.0,5925.0
6,Betmode,14925.0,2790.0,12135.0
7,RivalMind,12935.0,9450.0,3485.0
8,Harper James,49900.0,17010.0,32890.0
9,addmustard,12935.0,2160.0,10775.0


In [165]:
# Step 9: Calculate unused_purchased_credits
# How many of the PAID extra credits they're not using
# Formula: max(0, purchased_credits - (credits_usage - plan_credits))
migration_df['unused_purchased_credits'] = (
    migration_df['purchased_credits'] - 
    (migration_df['credits_usage'] - migration_df['plan_credits'])
).clip(lower=0)

print("✓ Calculated unused_purchased_credits")
print(f"  Companies with unused purchased credits: {(migration_df['unused_purchased_credits'] > 0).sum()}")
print(f"  Average unused purchased credits: {migration_df[migration_df['unused_purchased_credits'] > 0]['unused_purchased_credits'].mean():.0f}")

migration_df[migration_df['unused_purchased_credits'] > 0][['name', 'purchased_credits', 'credits_usage', 'plan_credits', 'unused_purchased_credits']].head(10)


✓ Calculated unused_purchased_credits
  Companies with unused purchased credits: 15
  Average unused purchased credits: 1146


Unnamed: 0,name,purchased_credits,credits_usage,plan_credits,unused_purchased_credits
138,Check24,2740.0,2340.0,0,400.0
156,Brandness B.V.,2250.0,720.0,0,1530.0
212,Jung von Matt IMPACT,9565.0,8100.0,0,1465.0
326,Floowi,2440.0,1350.0,0,1090.0
331,BIT Capital,940.0,630.0,0,310.0
343,Lattice Publishing,13066.3,12000.0,0,1066.3
464,AKQA,5065.0,2790.0,0,2275.0
520,Forward Progress,2250.0,2160.0,0,90.0
599,Propellic,48075.0,47520.0,0,555.0
715,adMates,2250.0,1620.0,0,630.0


In [166]:
# Step 10: Calculate MRR at risk (normalized to annual)
# Revenue we'd lose if they drop unused purchased credits
migration_df['mrr_at_risk'] = (
    migration_df['unused_purchased_credits'] * 
    migration_df['credit_price'] * 
    migration_df['interval'].apply(lambda x: 12 if x == 'month' else 1)
)

print("✓ Calculated mrr_at_risk")
print(f"  Total MRR at risk (annual): ${migration_df['mrr_at_risk'].sum():,.2f}")
print(f"  Companies with MRR at risk: {(migration_df['mrr_at_risk'] > 0).sum()}")
print(f"  Average MRR at risk: ${migration_df[migration_df['mrr_at_risk'] > 0]['mrr_at_risk'].mean():,.2f}")

migration_df[migration_df['mrr_at_risk'] > 0].sort_values('mrr_at_risk', ascending=False)[['name', 'interval', 'unused_purchased_credits', 'credit_price', 'mrr_at_risk']].head(10)


✓ Calculated mrr_at_risk
  Total MRR at risk (annual): $0.00
  Companies with MRR at risk: 0
  Average MRR at risk: $nan


Unnamed: 0,name,interval,unused_purchased_credits,credit_price,mrr_at_risk


In [167]:
# Summary Statistics & Validation
print("=" * 80)
print("MIGRATION ANALYSIS SUMMARY")
print("=" * 80)

print("\n📊 OVERALL STATISTICS:")
print(f"  Total companies analyzed: {len(migration_df)}")
print(f"  Total current MRR: ${migration_df['mrr'].fillna(0).sum():,.2f}")
print(f"  Total current ARR: ${(migration_df['mrr'].fillna(0).sum() * 12):,.2f}")

print("\n📋 PLAN DISTRIBUTION:")
plan_counts = migration_df['new_plan'].value_counts()
for plan, count in plan_counts.items():
    pct = (count / len(migration_df)) * 100
    print(f"  {plan}: {count} ({pct:.1f}%)")

print("\n💰 CREDIT ECONOMICS:")
print(f"  Companies purchasing extra credits: {(migration_df['purchased_credits'] > 0).sum()}")
print(f"  Total purchased credits: {migration_df['purchased_credits'].sum():,.0f}")
print(f"  Companies receiving granted credits: {(migration_df['extra_credits_granted'] > 0).sum()}")
print(f"  Total granted credits: {migration_df['extra_credits_granted'].sum():,.0f}")
print(f"  Total granted credits (with 10% buffer): {migration_df['extra_credits_granted_10'].sum():,.0f}")

print("\n🎁 DISCOUNTS:")
print(f"  Companies needing discounts: {(migration_df['discount_pct'] > 0).sum()}")
if (migration_df['discount_pct'] > 0).sum() > 0:
    print(f"  Average discount: {migration_df[migration_df['discount_pct'] > 0]['discount_pct'].mean():.1f}%")
    print(f"  Max discount: {migration_df['discount_pct'].max():.1f}%")

print("\n⚠️  RISK ANALYSIS:")
print(f"  Total unused credits: {migration_df['unused_credits'].sum():,.0f}")
print(f"  Total unused PURCHASED credits: {migration_df['unused_purchased_credits'].sum():,.0f}")
print(f"  Total MRR at risk (annual): ${migration_df['mrr_at_risk'].sum():,.2f}")
print(f"  Companies with at-risk revenue: {(migration_df['mrr_at_risk'] > 0).sum()}")

print("\n✅ VALIDATION:")
# Check that baseline needs are met
needs_met = migration_df['total_available_credits'] >= migration_df['baseline_credits_needed']
print(f"  Companies with sufficient credits: {needs_met.sum()} / {len(migration_df)}")
if not needs_met.all():
    print(f"  ⚠️  WARNING: {(~needs_met).sum()} companies have insufficient credits!")

print("\n" + "=" * 80)


MIGRATION ANALYSIS SUMMARY

📊 OVERALL STATISTICS:
  Total companies analyzed: 1234
  Total current MRR: $253,058.91
  Total current ARR: $3,036,706.94

📋 PLAN DISTRIBUTION:
  none: 1234 (100.0%)

💰 CREDIT ECONOMICS:
  Companies purchasing extra credits: 172
  Total purchased credits: 2,150,260
  Companies receiving granted credits: 101
  Total granted credits: 2,474,940
  Total granted credits (with 10% buffer): 2,722,434

🎁 DISCOUNTS:
  Companies needing discounts: 1
  Average discount: 14.6%
  Max discount: 14.6%

⚠️  RISK ANALYSIS:
  Total unused credits: 13,852,474
  Total unused PURCHASED credits: 17,190
  Total MRR at risk (annual): $0.00
  Companies with at-risk revenue: 0

✅ VALIDATION:
  Companies with sufficient credits: 1234 / 1234



In [168]:
# Display final migration dataframe with key columns
key_columns = [
    # Identifiers
    'name', 'type', 'customer_id',
    # Current state
    'currency', 'interval', 'mrr', 'arr', 'discount_count',
    # Credit analysis  
    'baseline_credits_needed', 'credits_usage', 'credits_capacity',
    'orgs_count', 'orgs_count_hf', 'orgs_purchased',
    # New plan
    'new_plan', 'plan_credits', 'plan_price',
    # Extra credits
    'purchased_credits', 'extra_credits_granted', 'extra_credits_granted_10',
    # Discounts
    'discount_pct', 'discount_amount',
    # Totals
    'total_available_credits',
    # Risk
    'unused_credits', 'unused_purchased_credits', 'mrr_at_risk'
]

# Filter to columns that exist
existing_columns = [col for col in key_columns if col in migration_df.columns]

print(f"Migration DataFrame with {len(migration_df)} companies and {len(existing_columns)} columns")
print("\nTop 20 companies by current MRR:")
migration_df[existing_columns].sort_values('mrr', ascending=False, na_position='last').head(20)


Migration DataFrame with 1234 companies and 25 columns

Top 20 companies by current MRR:


Unnamed: 0,name,type,currency,interval,mrr,arr,discount_count,baseline_credits_needed,credits_usage,credits_capacity,orgs_count,orgs_count_hf,orgs_purchased,new_plan,plan_credits,plan_price,purchased_credits,extra_credits_granted,extra_credits_granted_10,discount_pct,discount_amount,total_available_credits,unused_credits,unused_purchased_credits,mrr_at_risk
235,UNESCO,IN_HOUSE,usd,month,4970.0,,0.0,45000.0,45000.0,45000.0,1.0,0.0,1.0,none,0,0,0.0,0.0,0.0,0.0,0.0,49900.0,4900.0,0.0,0.0
425,Growth Plays,AGENCY,eur,month,2466.0,,0.0,237000.0,182670.0,237000.0,17.0,2.0,3.0,none,0,0,147525.0,52050.0,57255.0,0.0,0.0,294255.0,111585.0,0.0,0.0
82,Advice Interactive,AGENCY,usd,month,2260.0,,0.0,184500.0,178290.0,184500.0,3.0,0.0,2.0,none,0,0,132075.0,15000.0,16500.0,0.0,0.0,201000.0,22710.0,0.0,0.0
599,Propellic,AGENCY,eur,month,2150.0,,0.0,85500.0,47520.0,85500.0,31.0,0.0,2.0,none,0,0,48075.0,0.0,0.0,0.0,0.0,85500.0,37980.0,555.0,0.0
1022,Future PLC (Dell),AGENCY,eur,month,2101.5,,0.0,171000.0,158490.0,171000.0,6.0,0.0,,none,0,0,120187.5,13387.5,14726.25,0.0,0.0,185726.25,27236.25,0.0,0.0
1144,primelis,AGENCY,eur,month,2000.0,,0.0,1143630.0,336450.0,1143630.0,79.0,0.0,,none,0,0,112575.0,993630.0,1092993.0,0.0,0.0,2236623.0,1900173.0,0.0,0.0
1010,Seer Interactive,AGENCY,eur,month,1965.6,,0.0,484500.0,249270.0,484500.0,25.0,0.0,,none,0,0,109995.0,337080.0,370788.0,0.0,0.0,855288.0,606018.0,0.0,0.0
977,TIpi Group,AGENCY,eur,month,1900.0,,0.0,146700.0,141570.0,146700.0,18.0,2.0,3.0,none,0,0,105075.0,4200.0,4620.0,0.0,0.0,151320.0,9750.0,0.0,0.0
1101,Limitless Agency,AGENCY,eur,month,1809.25,,0.0,80910.0,50040.0,80910.0,29.0,0.0,2.0,none,0,0,43485.0,0.0,0.0,0.0,0.0,80910.0,30870.0,0.0,0.0
568,Suchhelden,AGENCY,eur,month,1560.0,,0.0,72000.0,38070.0,72000.0,26.0,4.0,1.0,none,0,0,34575.0,0.0,0.0,0.0,0.0,72000.0,33930.0,0.0,0.0


In [169]:
# Interesting segments for deeper analysis

print("🔍 HIGH-RISK COMPANIES (Top 10 by MRR at risk):")
print("=" * 80)
high_risk = migration_df[migration_df['mrr_at_risk'] > 0].sort_values('mrr_at_risk', ascending=False).head(10)
print(high_risk[['name', 'mrr', 'unused_purchased_credits', 'mrr_at_risk']])

print("\n\n💡 COMPANIES NEEDING LARGE DISCOUNTS (Top 10):")
print("=" * 80)
big_discounts = migration_df[migration_df['discount_pct'] > 0].sort_values('discount_pct', ascending=False).head(10)
if len(big_discounts) > 0:
    print(big_discounts[['name', 'mrr', 'arr', 'plan_price', 'discount_pct', 'discount_amount']])
else:
    print("No companies need discounts!")

print("\n\n🎁 COMPANIES RECEIVING FREE CREDITS (Top 10):")
print("=" * 80)
free_credits = migration_df[migration_df['extra_credits_granted'] > 0].sort_values('extra_credits_granted', ascending=False).head(10)
if len(free_credits) > 0:
    print(free_credits[['name', 'baseline_credits_needed', 'plan_credits', 'extra_credits_granted', 'mrr']])
else:
    print("No companies need free granted credits!")

print("\n\n💰 TOP REVENUE COMPANIES (Top 10 by MRR):")
print("=" * 80)
top_revenue = migration_df.sort_values('mrr', ascending=False, na_position='last').head(10)
print(top_revenue[['name', 'type', 'mrr', 'arr', 'new_plan', 'orgs_count', 'baseline_credits_needed']])


🔍 HIGH-RISK COMPANIES (Top 10 by MRR at risk):
Empty DataFrame
Columns: [name, mrr, unused_purchased_credits, mrr_at_risk]
Index: []


💡 COMPANIES NEEDING LARGE DISCOUNTS (Top 10):
           name   mrr  arr  plan_price  discount_pct  discount_amount
600  EmberTribe  76.0  NaN           0     14.606742             13.0


🎁 COMPANIES RECEIVING FREE CREDITS (Top 10):
                   name  baseline_credits_needed  plan_credits  \
1144           primelis                1143630.0             0   
1010   Seer Interactive                 484500.0             0   
869             Globant                 378000.0             0   
483               Glide                  99300.0             0   
905        Nerdoptimize                 180000.0             0   
425        Growth Plays                 237000.0             0   
797   We Communications                  62400.0             0   
288     Aikido Security                  60000.0             0   
258            Peak Ace               

## Migration Analysis Complete

The migration analysis has been completed with the following components:

### Methodology
1. **Plan Selection**: Selected the most expensive plan that fits within each company's current MRR/ARR budget
2. **Credit Matching**: Added purchased credits (if they're paying enough) or granted free credits (if they're not) to meet their baseline needs
3. **Discount Calculation**: Calculated required discounts for companies paying less than the cheapest plan
4. **Risk Analysis**: Identified revenue at risk from companies with unused purchased credits

### Key Outputs
- **`migration_df`**: Complete migration plan for each company with:
  - Selected plan and pricing
  - Purchased vs granted credits
  - Discount requirements
  - Risk metrics (unused credits, MRR at risk)

### Next Steps
You can export this data or perform additional analysis:
```python
# Example: Export to CSV
# migration_df.to_csv('../data/migration_plan.csv', index=False)

# Example: Filter specific segments
# high_risk = migration_df[migration_df['mrr_at_risk'] > 100]
# needs_discount = migration_df[migration_df['discount_pct'] > 0]
```
