# Portfolio General Info

## PitchBook Data Cleaning & Enrichment Logic

To enable **fund-level analysis** for VC fact sheets and simulation, this section transforms raw PitchBook exports into a clean, enriched dataset. The process ensures consistency, fills key metadata, and adds internal codes for integration with other systems.

| Step | Description                                                                                   |
|------|-----------------------------------------------------------------------------------------------|
| 1    | Upload Excel file (PitchBook export) and load after skipping metadata rows                   |
| 2    | Rename original column headers into standardized internal names                               |
| 3    | Parse `CLOSE_DATE` into YYYY-MM-DD format for consistency                                     |
| 4    | Create unique `PORTFOLIOCODE` as `FNDxxxx` (e.g., FND0001) for internal tracking              |
| 5    | Infer `COUNTRY` based on keywords in `FUND_LOCATION` (e.g., “Palo Alto” → “United States”)    |
| 6    | Use `restcountries` API to lookup matching `BASECURRENCYCODE` per country                     |
| 7    | Add constants for synthetic product code and portfolio category                               |
| 8    | Reorder and finalize selected columns for export                                              |



In [None]:
# !pip install pandas openpyxl requests
import pandas as pd
import requests
from datetime import datetime

# 1. Upload PitchBook Excel file
from google.colab import files
uploaded = files.upload()
file_path = list(uploaded.keys())[0]
df = pd.read_excel(file_path, skiprows=6)

# 2. Rename columns
df = df.rename(columns={
    "Funds": "FUND_NAME",
    "Investor": "FIRM_NAME",
    "Fund Type": "STRATEGY",
    "Vintage": "VINTAGE_YEAR",
    "Close Date": "CLOSE_DATE",
    "Fund Size": "FUND_SIZE_MILLIONS",
    "Fund Location": "FUND_LOCATION"
})

# 3. Drop rows with missing FUND_NAME
df = df[df["FUND_NAME"].notna()].reset_index(drop=True)

# 4. Format dates
df["CLOSE_DATE"] = pd.to_datetime(df["CLOSE_DATE"], errors="coerce").dt.strftime("%Y-%m-%d")
today = datetime.today().strftime("%Y-%m-%d")

# 5. Generate PORTFOLIOCODE
df["PORTFOLIOCODE"] = [f"FND{str(i+1).zfill(4)}" for i in range(len(df))]

# 6. Infer COUNTRY from FUND_LOCATION
def infer_country(loc):
    if not isinstance(loc, str):
        return "Unknown"
    loc = loc.lower()
    country_map = {
        "united states": ["ca", "ny", "boston", "palo alto", "menlo", "mountain view", "new york", "san francisco", "tx"],
        "united kingdom": ["london"],
        "france": ["paris"],
        "germany": ["berlin"],
        "india": ["bangalore", "delhi", "mumbai"],
        "japan": ["tokyo"],
        "china": ["beijing", "shanghai"],
        "south korea": ["seoul"],
        "canada": ["toronto", "montreal"],
        "israel": ["tel aviv"]
    }
    for country, keywords in country_map.items():
        if any(kw in loc for kw in keywords):
            return country.title()
    return "Unknown"

df["COUNTRY"] = df["FUND_LOCATION"].apply(infer_country)

# 7. Get BASECURRENCYCODE using restcountries API
def get_currency(country):
    try:
        if country == "Unknown":
            return "USD", "US Dollar"
        r = requests.get(f"https://restcountries.com/v3.1/name/{country}")
        c = r.json()[0]["currencies"]
        code = list(c.keys())[0]
        return code, c[code]["name"]
    except:
        return "USD", "US Dollar"

currency_info = df["COUNTRY"].apply(get_currency)
df["BASECURRENCYCODE"] = currency_info.apply(lambda x: x[0])

# 8. Clean STRATEGY
def simplify_strategy(original):
    if pd.isna(original):
        return "General"
    original = original.lower()
    if "early" in original:
        return "Early Stage"
    elif "later" in original:
        return "Later Stage"
    else:
        return "General"

df["STRATEGY"] = df["STRATEGY"].apply(simplify_strategy)

# 9. Create PRODUCTCODE
df["FIRM_ABBR"] = df["FIRM_NAME"].str.upper().str.replace(" ", "").str[:4]
strategy_map = {
    "Early Stage": "EARLY",
    "General": "GEN",
    "Later Stage": "LATE"
}
df["STRATEGY_ABBR"] = df["STRATEGY"].map(strategy_map)
df["PRODUCTCODE"] = df["FIRM_ABBR"] + "_" + df["STRATEGY_ABBR"]
df.drop(columns=["FIRM_ABBR", "STRATEGY_ABBR"], inplace=True)

# 10. Add PORTFOLIOCATEGORY
df["PORTFOLIOCATEGORY"] = "Fund"

# 11. Final columns (metrics like IRR/TVPI/DPI/etc. are excluded)
final_cols = [
    "PORTFOLIOCODE", "FIRM_NAME", "FUND_NAME", "STRATEGY", "VINTAGE_YEAR",
    "CLOSE_DATE", "FUND_SIZE_MILLIONS",
    "FUND_LOCATION", "COUNTRY", "BASECURRENCYCODE",
    "PRODUCTCODE", "PORTFOLIOCATEGORY"
]
df["VINTAGE_YEAR"] = df["VINTAGE_YEAR"].astype("Int64")

# 12. Final DataFrame
portfolio_general_info_df = df[final_cols]
portfolio_general_info_df


Saving PitchBook_Search_Result_Columns_2025_07_14_20_43_36.xlsx to PitchBook_Search_Result_Columns_2025_07_14_20_43_36 (1).xlsx


Unnamed: 0,PORTFOLIOCODE,FIRM_NAME,FUND_NAME,STRATEGY,VINTAGE_YEAR,CLOSE_DATE,FUND_SIZE_MILLIONS,FUND_LOCATION,COUNTRY,BASECURRENCYCODE,PRODUCTCODE,PORTFOLIOCATEGORY
0,FND0001,Tiger Global Management,Tiger Global Private Investment Partners XV,Early Stage,2021,2022-03-01,12700.0,"New York, NY",United States,USD,TIGE_EARLY,Fund
1,FND0002,Sequoia Capital,Sequoia Capital Global Growth Fund III,General,2018,2018-12-31,8170.02,"Menlo Park, CA",United States,USD,SEQU_GEN,Fund
2,FND0003,Tiger Global Management,Tiger Global Private Investment Partners XIV,General,2021,2021-03-31,6700.0,"New York, NY",United States,USD,TIGE_GEN,Fund
3,FND0004,Thrive Capital,Thrive Capital Partners IX Growth,Later Stage,2024,2024-08-06,4392.19,"New York, NY",United States,USD,THRI_LATE,Fund
4,FND0005,Accel Partners Management (Palo Alto),Accel Leaders IV,Later Stage,2024,2022-06-22,4112.98,"Palo Alto, CA",United States,USD,ACCE_LATE,Fund
5,FND0006,Bessemer Venture Partners,Bessemer Venture Partners XII,General,2022,2022-09-09,3850.0,"Redwood City, CA",United States,USD,BESS_GEN,Fund
6,FND0007,New Enterprise Associates,New Enterprise Associates 17,General,2019,2020-02-28,3572.34,"Menlo Park, CA",United States,USD,NEWE_GEN,Fund
7,FND0008,Flagship Pioneering,Flagship Pioneering Fund VII,Early Stage,2020,2021-06-14,3370.0,"Cambridge, MA",United States,USD,FLAG_EARLY,Fund
8,FND0009,New Enterprise Associates,New Enterprise Associates 16,General,2017,2017-06-30,3350.25,"Menlo Park, CA",United States,USD,NEWE_GEN,Fund


## VC Fund Universe Simulation – Methodology

To build a realistic simulation of a venture capital portfolio, we expanded 9 original real-world VC funds into a dataset of 100 funds by generating 91 synthetic records. The generation logic reflects typical venture industry dynamics and aligns with the VC reference cheat sheet.

### Key Simulation Principles

| Component            | Generation Logic                                                                 |
|---------------------|----------------------------------------------------------------------------------|
| `PORTFOLIOCODE`      | Sequential ID in `FNDxxxx` format                                                |
| `FIRM_NAME`          | Randomly sampled from 9 top-tier VC firms                                        |
| `FUND_NAME`          | Patterned as `[Firm Name] Opportunity Fund IV` to reflect real-world naming norms |
| `STRATEGY`           | Randomly assigned from existing strategies: Early Stage, General, Later Stage   |
| `VINTAGE_YEAR`       | Even-numbered years between 2010–2024 to simulate 2–3 year launch intervals     |
| `CLOSE_DATE`         | Random date between 2011–2025                                                    |
| `FUND_SIZE_MILLIONS` | Distribution between $500M and $15B                                      |
| `FUND_LOCATION`      | Sampled from known VC hubs (NYC, Palo Alto, Boston, etc.)                        |
| `COUNTRY`            | Inherited from selected firm                                                     |
| `BASECURRENCYCODE`   | Based on firm’s country                                                          |
| `PRODUCTCODE`        | Generated as `[FIRM_ABBR]_[STRATEGY_ABBR]` (e.g., `TIGG_EARLY`)                  |
| `PORTFOLIOCATEGORY`  | Constant: `"Fund"`                                                               |


In [None]:
import random
import numpy as np
import pandas as pd

# 1. Copy the existing portfolio data
base_funds = portfolio_general_info_df.copy()

# 2. Prepare values for generation
target_fund_count = 100
additional_needed = target_fund_count - len(base_funds)
additional_funds = []

# 3. Extract unique values
strategies = base_funds["STRATEGY"].dropna().unique().tolist()
locations = base_funds["FUND_LOCATION"].dropna().unique().tolist()
firms = base_funds["FIRM_NAME"].dropna().unique().tolist()

# 4. Strategy short name mapping
strategy_abbr = {
    "Early Stage": "EARLY",
    "General": "GEN",
    "Later Stage": "LATE"
}

# 5. Generate synthetic funds with aligned FIRM_NAME and FUND_NAME
for i in range(additional_needed):
    firm = random.choice(firms)
    strategy = random.choice(strategies)
    vintage_year = random.choice(range(2010, 2025, 2))
    fund_suffix = random.choice(['II', 'III', 'IV', 'V', 'VI', 'VII', 'VIII', 'IX'])

    # Generate abbreviations
    firm_abbr = firm.upper().replace(" ", "")[:4]
    strat_abbr = strategy_abbr.get(strategy, "GEN")
    product_code = f"{firm_abbr}_{strat_abbr}"

    # Consistent fund naming: based on selected firm
    fund_name = f"{firm} Opportunity Fund {fund_suffix}"

    fund = {
        "PORTFOLIOCODE": f"FND{str(len(base_funds) + i + 1).zfill(4)}",
        "FIRM_NAME": firm,
        "FUND_NAME": fund_name,
        "STRATEGY": strategy,
        "VINTAGE_YEAR": vintage_year,
        "CLOSE_DATE": f"{random.randint(2011, 2025)}-{random.randint(1, 12):02d}-{random.randint(1, 28):02d}",
        "FUND_SIZE_MILLIONS": round(random.uniform(500, 15000), 2),
        "FUND_LOCATION": random.choice(locations),
        "COUNTRY": base_funds[base_funds["FIRM_NAME"] == firm]["COUNTRY"].mode().iloc[0],
        "BASECURRENCYCODE": base_funds[base_funds["FIRM_NAME"] == firm]["BASECURRENCYCODE"].mode().iloc[0],
        "PRODUCTCODE": product_code,
        "PORTFOLIOCATEGORY": "Fund"
    }

    additional_funds.append(fund)

# 6. Combine the original and generated
generated_df = pd.DataFrame(additional_funds)

# 7. Clean base_funds: remove performance columns if present
cols_to_drop = ["IRR", "TVPI", "DPI", "RVPI", "FUND_NAV", "CONTRIBUTED", "DISTRIBUTED"]
base_funds = base_funds.drop(columns=[col for col in cols_to_drop if col in base_funds.columns])

# 8. Final combined dataframe
portfolio_general_info_df = pd.concat([base_funds, generated_df], ignore_index=True)

# 9. (Optional) Reorder columns
final_cols = [
    "PORTFOLIOCODE", "FIRM_NAME", "FUND_NAME", "STRATEGY", "VINTAGE_YEAR",
    "CLOSE_DATE", "FUND_SIZE_MILLIONS",
    "FUND_LOCATION", "COUNTRY", "BASECURRENCYCODE",
    "PRODUCTCODE", "PORTFOLIOCATEGORY"
]
portfolio_general_info_df = portfolio_general_info_df[final_cols]

# 10. Show final result
portfolio_general_info_df


Unnamed: 0,PORTFOLIOCODE,FIRM_NAME,FUND_NAME,STRATEGY,VINTAGE_YEAR,CLOSE_DATE,FUND_SIZE_MILLIONS,FUND_LOCATION,COUNTRY,BASECURRENCYCODE,PRODUCTCODE,PORTFOLIOCATEGORY
0,FND0001,Tiger Global Management,Tiger Global Private Investment Partners XV,Early Stage,2021,2022-03-01,12700.00,"New York, NY",United States,USD,TIGE_EARLY,Fund
1,FND0002,Sequoia Capital,Sequoia Capital Global Growth Fund III,General,2018,2018-12-31,8170.02,"Menlo Park, CA",United States,USD,SEQU_GEN,Fund
2,FND0003,Tiger Global Management,Tiger Global Private Investment Partners XIV,General,2021,2021-03-31,6700.00,"New York, NY",United States,USD,TIGE_GEN,Fund
3,FND0004,Thrive Capital,Thrive Capital Partners IX Growth,Later Stage,2024,2024-08-06,4392.19,"New York, NY",United States,USD,THRI_LATE,Fund
4,FND0005,Accel Partners Management (Palo Alto),Accel Leaders IV,Later Stage,2024,2022-06-22,4112.98,"Palo Alto, CA",United States,USD,ACCE_LATE,Fund
...,...,...,...,...,...,...,...,...,...,...,...,...
95,FND0096,Bessemer Venture Partners,Bessemer Venture Partners Opportunity Fund III,Later Stage,2022,2023-06-18,2254.57,"Redwood City, CA",United States,USD,BESS_LATE,Fund
96,FND0097,Thrive Capital,Thrive Capital Opportunity Fund IV,Later Stage,2016,2020-02-22,944.37,"New York, NY",United States,USD,THRI_LATE,Fund
97,FND0098,Accel Partners Management (Palo Alto),Accel Partners Management (Palo Alto) Opportun...,Later Stage,2024,2014-10-24,6608.27,"Palo Alto, CA",United States,USD,ACCE_LATE,Fund
98,FND0099,New Enterprise Associates,New Enterprise Associates Opportunity Fund VII,Later Stage,2014,2017-08-02,14734.58,"Menlo Park, CA",United States,USD,NEWE_LATE,Fund


In [None]:
# 7. Export to Excel
#portfolio_general_info_df.to_excel('portfolio_data.xlsx', index=False)

Excel file saved as 'portfolio_data.xlsx' with 100 funds


# Account

## Account Table — Generation Logic

This section creates a synthetic `Account` table, simulating both **institutional** and **individual** LP (Limited Partner) investors. The logic incorporates realistic LP types, fund commitment patterns, and geographic variability to support fund analytics.

| Step | Description                                                                                         |
|------|-----------------------------------------------------------------------------------------------------|
| 1    | Hardcoded 25 institutional LPs using representative PitchBook-style firm names, types, and countries |
| 2    | Queried `restcountries` API to enrich each LP with currency code, currency name, and FX rate        |
| 3    | Estimated committed capital using LP type and number of funds (e.g., pension funds commit more)     |
| 4    | Used Faker to generate 25 individual LPs; scaled commitments by geography (e.g., US > India)         |
| 5    | Merged institutional and individual accounts into unified `accounts_df` (total 50 LP records)        |


In [None]:
import pandas as pd
import random
import requests
from faker import Faker

fake = Faker()

# 1. LP Firm Data (Institutional)
lp_data = [
    ("European Investment Fund", "Fund of Funds", "Luxembourg"),
    ("California Public Employees’ Retirement System", "Public Pension Fund", "United States"),
    ("HarbourVest Partners", "Fund of Funds", "United States"),
    ("Employees’ Retirement System of the State of Hawaii", "Corporate Pension", "United States"),
    ("University of Michigan", "Endowment", "United States"),
    ("Adams Street Partners", "Fund of Funds", "United States"),
    ("MacArthur Foundation", "Foundation", "United States"),
    ("University of Texas IMC", "Endowment", "United States"),
    ("Regents of UC", "Endowment", "United States"),
    ("SBC Master Pension", "Corporate Pension", "United States"),
    ("NYS Common Retirement Fund", "Public Pension Fund", "United States"),
    ("Lucent Pension Plan", "Corporate Pension", "United States"),
    ("Rockefeller Foundation", "Foundation", "United States"),
    ("Michigan Treasury", "Government Agency", "United States"),
    ("Engineers Local 3 Fund", "Union Pension Fund", "United States"),
    ("Knightsbridge Advisers", "Fund of Funds", "United States"),
    ("MassMutual", "Corporate Pension", "United States"),
    ("CDPQ", "Public Pension Fund", "Canada"),
    ("Mass PRIM Board", "Public Pension Fund", "United States"),
    ("SF Employees’ Retirement", "Public Pension Fund", "United States"),
    ("Michigan Retirement", "Public Pension Fund", "United States"),
    ("Sherman Fairchild Foundation", "Foundation", "United States"),
    ("HP Retirement Plan", "Corporate Pension", "United States"),
    ("Lexington Partners", "Secondary LP", "United States"),
    ("Illinois Municipal Fund", "Public Pension Fund", "United States"),
]

# 2. Currency info
def get_currency_info(country):
    try:
        if country == "Unknown":
            return "USD", "US Dollar", 1.0
        r = requests.get(f"https://restcountries.com/v3.1/name/{country}")
        r.raise_for_status()
        data = r.json()[0]
        currency_code = list(data["currencies"].keys())[0]
        currency_name = data["currencies"][currency_code]["name"]
        fx_dict = {"USD": 1.0, "EUR": 1.1, "GBP": 1.3, "CAD": 0.74, "JPY": 0.0067}
        fx_to_usd = fx_dict.get(currency_code, 1.0)
        return currency_code, currency_name, fx_to_usd
    except:
        return "USD", "US Dollar", 1.0

# 3. Institutional Accounts
institutional_accounts = []
for i, (name, lp_type, country) in enumerate(lp_data):
    currency_code, currency_name, fx = get_currency_info(country)
    num_funds = random.randint(1, 100)
    if "Pension" in lp_type:
        base_amt = random.uniform(120, 300)
    elif "Endowment" in lp_type or "Government" in lp_type:
        base_amt = random.uniform(70, 200)
    elif "Foundation" in lp_type or "Fund of Funds" in lp_type:
        base_amt = random.uniform(40, 120)
    else:
        base_amt = random.uniform(60, 150)
    local_amt = round(base_amt * (1 + num_funds / 100), 2)

    institutional_accounts.append({
        "Account ID": f"ACC{i+1:04}",
        "Investor Type": "Institutional",
        "Account Name": name,
        "Type": lp_type,
        "Country": country,
        "Account Currency": currency_code,
        "Currency Name": currency_name,
        "FX to USD": fx,
        "Committed Capital (Local)": local_amt,
        "Committed Capital (USD)": round(local_amt * fx, 2),
        "Number of Funds": num_funds,
        "NAV (USD)": round(random.uniform(40, 140), 2),
        "Start Date": random.choice(["2012-11-23", "2013-09-08", "2017-11-02"])
    })

# 4. Individual Accounts
countries = ["United States", "United Kingdom", "Germany", "France", "Canada", "Australia", "Netherlands", "Japan", "India", "Brazil"]
individual_accounts = []

for i in range(25):
    name = fake.name()
    country = random.choice(countries)
    currency_code, currency_name, fx = get_currency_info(country)
    num_funds = random.randint(1, 10)
    if country in ["United States", "United Kingdom", "Germany", "Canada"]:
        base_amt = random.uniform(60, 150)
    else:
        base_amt = random.uniform(20, 80)
    local_amt = round(base_amt * (1 + num_funds / 10), 2)

    individual_accounts.append({
        "Account ID": f"ACC{i+len(institutional_accounts)+1:04}",
        "Investor Type": "Individual",
        "Account Name": name,
        "Type": "Private Individual",
        "Country": country,
        "Account Currency": currency_code,
        "Currency Name": currency_name,
        "FX to USD": fx,
        "Committed Capital (Local)": local_amt,
        "Committed Capital (USD)": round(local_amt * fx, 2),
        "Number of Funds": num_funds,
        "NAV (USD)": round(random.uniform(40, 120), 2),
        "Start Date": random.choice(["2012-11-23", "2013-09-08", "2017-11-02"])
    })

# 5. Combine
accounts_df = pd.DataFrame(institutional_accounts + individual_accounts)
accounts_df


Unnamed: 0,Account ID,Investor Type,Account Name,Type,Country,Account Currency,Currency Name,FX to USD,Committed Capital (Local),Committed Capital (USD),Number of Funds,NAV (USD),Start Date
0,ACC0001,Institutional,European Investment Fund,Fund of Funds,Luxembourg,EUR,Euro,1.1,135.91,149.5,81,129.7,2013-09-08
1,ACC0002,Institutional,California Public Employees’ Retirement System,Public Pension Fund,United States,USD,United States dollar,1.0,456.13,456.13,57,77.61,2017-11-02
2,ACC0003,Institutional,HarbourVest Partners,Fund of Funds,United States,USD,United States dollar,1.0,139.03,139.03,43,70.31,2012-11-23
3,ACC0004,Institutional,Employees’ Retirement System of the State of H...,Corporate Pension,United States,USD,United States dollar,1.0,297.61,297.61,67,60.82,2013-09-08
4,ACC0005,Institutional,University of Michigan,Endowment,United States,USD,United States dollar,1.0,242.87,242.87,39,107.3,2017-11-02
5,ACC0006,Institutional,Adams Street Partners,Fund of Funds,United States,USD,United States dollar,1.0,120.94,120.94,1,52.03,2013-09-08
6,ACC0007,Institutional,MacArthur Foundation,Foundation,United States,USD,United States dollar,1.0,130.71,130.71,18,63.62,2012-11-23
7,ACC0008,Institutional,University of Texas IMC,Endowment,United States,USD,United States dollar,1.0,126.4,126.4,41,49.22,2017-11-02
8,ACC0009,Institutional,Regents of UC,Endowment,United States,USD,United States dollar,1.0,232.98,232.98,33,95.59,2017-11-02
9,ACC0010,Institutional,SBC Master Pension,Corporate Pension,United States,USD,United States dollar,1.0,301.3,301.3,2,52.65,2013-09-08


# Portfolio-Account Association

### Portfolio-Account Association Logic

This section creates a many-to-many mapping between investment accounts (LPs) and VC funds, using the `Number of Funds` field in each account as the basis for how many funds each LP commits to.

| Step | Description                                                                 |
|------|-----------------------------------------------------------------------------|
| 1    | Extracted portfolio and account identifiers from `portfolio_general_info_df` and `accounts_df` |
| 2    | For each account, sampled `Number of Funds` from available pool             |
| 3    | Ensured no duplicate fund assignments by tracking assigned funds            |
| 4    | If insufficient unique funds remain, allowed sampling with replacement       |
| 5    | Created a mapping table (`portfolio_account_map_df`) linking accounts to portfolios |


In [None]:
import pandas as pd
import random

# 1. Prepare portfolio and account ID lists
portfolio_codes = portfolio_general_info_df["PORTFOLIOCODE"].tolist()
account_ids = accounts_df["Account ID"].tolist()

# Copy the full fund list to manage duplicates
available_funds = portfolio_codes.copy()

# 2. Create mapping between accounts and portfolios
mapping = []
for _, row in accounts_df.iterrows():
    account_id = row["Account ID"]
    num_funds = int(row["Number of Funds"])

    # If not enough unique funds left, sample with replacement from full pool
    if num_funds > len(available_funds):
        selected_funds = random.sample(portfolio_codes, num_funds)
    else:
        selected_funds = random.sample(available_funds, num_funds)
        # Remove selected funds to prevent reuse
        available_funds = [f for f in available_funds if f not in selected_funds]

    # Append account-fund pairs
    for fund in selected_funds:
        mapping.append({
            "PORTFOLIOCODE": fund,
            "ACCOUNTID": account_id
        })

# 3. Create final mapping DataFrame
portfolio_account_map_df = pd.DataFrame(mapping)
portfolio_account_map_df


Unnamed: 0,PORTFOLIOCODE,ACCOUNTID
0,FND0044,ACC0001
1,FND0040,ACC0001
2,FND0056,ACC0001
3,FND0045,ACC0001
4,FND0005,ACC0001
...,...,...
1236,FND0081,ACC0050
1237,FND0011,ACC0050
1238,FND0005,ACC0050
1239,FND0052,ACC0050


# Productmaster

## Product Master Generation Logic

This script generates a **Product Master** dataset based on existing `PRODUCTCODE`s extracted from `portfolio_general_info_df`.  

### 🛠️ Key Generation Logic

| Component        | Logic                                                                 |
|------------------|-----------------------------------------------------------------------|
| `PRODUCTCODE`     | Inherited from portfolio info (e.g., `TIGG_EARLY`, `SEQU_GEN`)       |
| `STRATEGY`        | Extracted from `PRODUCTCODE` suffix (`EARLY`, `GEN`, `LATE`)         |
| `PRODUCTNAME`     | Follows format: `Assette [Strategy Label] [Vehicle Label]`           |
| `VEHICLETYPE`     | Randomly selected from: Separate Account, Commingled Fund, Mutual Fund |
| `VEHICLECATEGORY` | `"Segregated"` for SMA, `"Pooled"` otherwise                         |
| `SHARECLASS`      | Logic-based assignment:                                               |
|                  | - Retail → for Mutual Funds                                           |
|                  | - Institutional → for Separate Accounts                              |
|                  | - Random → for Commingled Funds (Institutional / Retail / Offshore)  |
| `ASSETCLASS`      | Fixed: `Private Equity`                                               |


In [None]:
import pandas as pd
import random

# 1. Get unique PRODUCTCODEs from your portfolio
unique_productcodes = portfolio_general_info_df["PRODUCTCODE"].unique()

# 2. Map strategy codes to full names
strategy_name_map = {
    "EARLY": "Early Stage",
    "GEN": "Venture Capital",
    "LATE": "Growth Equity"
}

# 3. Define vehicle types and category mapping
vehicle_types = ["Separate Account", "Commingled Fund", "Mutual Fund"]
vehicle_categories = {
    "Separate Account": "Segregated",
    "Commingled Fund": "Pooled",
    "Mutual Fund": "Pooled"
}

# 4. Extract strategy abbreviation from PRODUCTCODE
def extract_strategy_code(product_code):
    return product_code.split("_")[1]

# 5. Assign SHARECLASS based on vehicle type logic
def assign_shareclass(vehicle_type):
    if "MUTUAL" in vehicle_type.upper():
        return "Retail"
    elif "SEPARATE" in vehicle_type.upper():
        return "Institutional"
    else:
        return random.choice(["Institutional", "Retail", "Offshore"])

# 6. Generate a readable PRODUCTNAME
def generate_product_name(strategy_abbr, vehicle_type):
    firm_prefix = "Assette"
    strategy_label = strategy_name_map.get(strategy_abbr, "Venture")
    vehicle_label = "SMA" if "Separate" in vehicle_type else vehicle_type
    return f"{firm_prefix} {strategy_label} {vehicle_label}"

# 7. Build the product master records
product_rows = []
for product_code in unique_productcodes:
    strategy_abbr = extract_strategy_code(product_code)
    vehicle_type = random.choice(vehicle_types)

    product_rows.append({
        "PRODUCTCODE": product_code,
        "PRODUCTNAME": generate_product_name(strategy_abbr, vehicle_type),
        "STRATEGY": strategy_abbr,
        "VEHICLECATEGORY": vehicle_categories[vehicle_type],
        "VEHICLETYPE": vehicle_type,
        "ASSETCLASS": "Private Equity",
        "SHARECLASS": assign_shareclass(vehicle_type),
    })

# 8. Convert to DataFrame
product_master_df = pd.DataFrame(product_rows)

# 9. Preview result
product_master_df


Unnamed: 0,PRODUCTCODE,PRODUCTNAME,STRATEGY,VEHICLECATEGORY,VEHICLETYPE,ASSETCLASS,SHARECLASS
0,TIGE_EARLY,Assette Early Stage Commingled Fund,EARLY,Pooled,Commingled Fund,Private Equity,Offshore
1,SEQU_GEN,Assette Venture Capital SMA,GEN,Segregated,Separate Account,Private Equity,Institutional
2,TIGE_GEN,Assette Venture Capital SMA,GEN,Segregated,Separate Account,Private Equity,Institutional
3,THRI_LATE,Assette Growth Equity SMA,LATE,Segregated,Separate Account,Private Equity,Institutional
4,ACCE_LATE,Assette Growth Equity SMA,LATE,Segregated,Separate Account,Private Equity,Institutional
5,BESS_GEN,Assette Venture Capital SMA,GEN,Segregated,Separate Account,Private Equity,Institutional
6,NEWE_GEN,Assette Venture Capital SMA,GEN,Segregated,Separate Account,Private Equity,Institutional
7,FLAG_EARLY,Assette Early Stage Mutual Fund,EARLY,Pooled,Mutual Fund,Private Equity,Retail
8,FLAG_GEN,Assette Venture Capital SMA,GEN,Segregated,Separate Account,Private Equity,Institutional
9,ACCE_EARLY,Assette Early Stage Commingled Fund,EARLY,Pooled,Commingled Fund,Private Equity,Offshore
