# Audience Creation Using LLMs


In [1]:
pip install openai==0.28



In [2]:
# import necessary libraries
import pandas as pd
import numpy as np
import openai

In [3]:
openai.api_key = "***"

## Preprocessing before using LLMs

In [4]:
# import clean dataframe with customer information
df_segment = pd.read_csv('/content/df_segment.csv')
df_segment.head(3)

Unnamed: 0,CustomerCity,CustomerState,Gender,Age,EmploymentStatus,Education,Occupation,Industry,Income,OrderCount,...,OrderMonthMode,OrderWeekOfMonthMode,OrderSeasonMode,ItemPriceMean,MainCategoryMode,SubCategoryMode,TypeMode,PurchaseFlag,CampaignFlag,Segment
0,Los Angeles city,California,female,5 to 17 years,unemployed,less_than_high_school_diploma,Unknown,Unknown,46148,0,...,0.0,0.0,0.0,0.0,Unknown,Unknown,Unknown,0,0,2
1,Nashville-Davidson metropolitan government (ba...,Tennessee,male,35 to 44 years,employed,some_college_or_associates_degree,management_business_science_arts,manufacturing,42639,0,...,0.0,0.0,0.0,0.0,Unknown,Unknown,Unknown,0,0,2
2,Louisville/Jefferson County metro government (...,Kentucky,female,5 to 17 years,unemployed,less_than_high_school_diploma,Unknown,Unknown,42019,0,...,0.0,0.0,0.0,0.0,Unknown,Unknown,Unknown,0,0,2


In [5]:
df_segment.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 23 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   CustomerCity          5000 non-null   object 
 1   CustomerState         5000 non-null   object 
 2   Gender                5000 non-null   object 
 3   Age                   5000 non-null   object 
 4   EmploymentStatus      5000 non-null   object 
 5   Education             5000 non-null   object 
 6   Occupation            5000 non-null   object 
 7   Industry              5000 non-null   object 
 8   Income                5000 non-null   int64  
 9   OrderCount            5000 non-null   int64  
 10  OrderItemMean         5000 non-null   float64
 11  TotalAmountMax        5000 non-null   float64
 12  TotalAmountMin        5000 non-null   float64
 13  OrderMonthMode        5000 non-null   float64
 14  OrderWeekOfMonthMode  5000 non-null   float64
 15  OrderSeasonMode      

In [6]:
# estimate the percentage of customer who purchased for each segment
purchase_percentage = df_segment[['PurchaseFlag', 'Segment']].groupby('Segment').agg({
    'PurchaseFlag': ['mean']
}).reset_index()

purchase_percentage.columns = [
    'Segment',
    'PurchasePercentage'
]

purchase_percentage

Unnamed: 0,Segment,PurchasePercentage
0,1,0.178763
1,2,0.179022
2,3,0.18806


The dataframe contains object-typed columns with long string values. For example, the 'Gender' column has categories like 'female' and 'male' instead of 'f' and 'm'. When dealing with longer strings, the token count increases in the LLM model, which can push us closer to the maximum token limit. To address this, we created dictionaries for object-typed columns and abbreviated each category to reduce the number of tokens.

In [7]:
state_dict = {
    "Alabama": "AL",
    "Alaska": "AK",
    "Arizona": "AZ",
    "Arkansas": "AR",
    "California": "CA",
    "Colorado": "CO",
    "Connecticut": "CT",
    "Delaware": "DE",
    "Florida": "FL",
    "Georgia": "GA",
    "Hawaii": "HI",
    "Idaho": "ID",
    "Illinois": "IL",
    "Indiana": "IN",
    "Iowa": "IA",
    "Kansas": "KS",
    "Kentucky": "KY",
    "Louisiana": "LA",
    "Maine": "ME",
    "Maryland": "MD",
    "Massachusetts": "MA",
    "Michigan": "MI",
    "Minnesota": "MN",
    "Mississippi": "MS",
    "Missouri": "MO",
    "Montana": "MT",
    "Nebraska": "NE",
    "Nevada": "NV",
    "New Hampshire": "NH",
    "New Jersey": "NJ",
    "New Mexico": "NM",
    "New York": "NY",
    "North Carolina": "NC",
    "North Dakota": "ND",
    "Ohio": "OH",
    "Oklahoma": "OK",
    "Oregon": "OR",
    "Pennsylvania": "PA",
    "Rhode Island": "RI",
    "South Carolina": "SC",
    "South Dakota": "SD",
    "Tennessee": "TN",
    "Texas": "TX",
    "Utah": "UT",
    "Vermont": "VT",
    "Virginia": "VA",
    "Washington": "WA",
    "West Virginia": "WV",
    "Wisconsin": "WI",
    "Wyoming": "WY"
}


In [8]:
age_dict = {
    '5 to 17 years': '5-17',
    '18 to 24 years': '18-24',
    '25 to 34 years': '25-34',
    '35 to 44 years': '35-44',
    '45 to 54 years': '45-54',
    '55 to 64 years': '55-64',
    '65 to 74 years': '65-74',
    '75 years and over': '75+'
}

In [9]:
education_abb = {
    'less_than_high_school_diploma': '< HS Diploma',
    'some_college_or_associates_degree': 'College/Associates',
    'high_school_graduate': 'HS Graduate',
    'bachelors_degree': 'Bachelors',
    'graduate_or_professional_degree': 'Graduate/Professional'
}

In [10]:
occupation_abb = {
    'Unknown': 'Un',
    'management_business_science_arts': 'Management/Business/Arts',
    'sales_and_office_occupations': 'Sales/Office',
    'service_occupations': 'Service',
    'production_transportation_material_moving': 'Production/Transport',
    'natural_resources_construction_maintenance': 'Resources/Construction'
}

In [11]:
industry_abb = {
    'Unknown': 'Un',
    'educational_services_health_care_social_assistance': 'Edu/Health/Social',
    'professional_scientific_management': 'Prof/Sci/Mgmt',
    'arts_entertainment_recreation_accommodation_food_services': 'Arts/Ent/Rec/Food',
    'retail_trade': 'Retail',
    'finance_insurance_real_estate': 'Finance/Ins/RE',
    'manufacturing': 'Manufacturing',
    'other_services': 'Other Services',
    'construction': 'Construction',
    'public_administration': 'Public Admin',
    'transportation_warehousing_utilities': 'Trans/Warehousing/Util',
    'wholesale_trade': 'Wholesale',
    'information': 'Information',
    'agriculture_forestry_fishing_mining': 'Agric/Forestry/Fish/Mine'
}

In [12]:
city_abb = {
    'Jacksonville city': 'Jacksonville',
    'Charlotte city': 'Charlotte',
    'Nashville-Davidson metropolitan government (balance)': 'Nashville',
    'San Antonio city': 'San Antonio',
    'Seattle city': 'Seattle',
    'Tucson city': 'Tucson',
    'Fresno city': 'Fresno',
    'Phoenix city': 'Phoenix',
    'Columbus city': 'Columbus',
    'Las Vegas city': 'Las Vegas',
    'Detroit city': 'Detroit',
    'Memphis city': 'Memphis',
    'Washington city': 'Washington',
    'Dallas city': 'Dallas',
    'San Francisco city': 'San Francisco',
    'Austin city': 'Austin',
    'Denver city': 'Denver',
    'Philadelphia city': 'Philadelphia',
    'Louisville/Jefferson County metro government (balance)': 'Louisville',
    'San Diego city': 'San Diego',
    'Oklahoma City city': 'Oklahoma City',
    'El Paso city': 'El Paso',
    'Albuquerque city': 'Albuquerque',
    'Portland city': 'Portland',
    'Boston city': 'Boston',
    'Houston city': 'Houston',
    'New York city': 'New York',
    'Milwaukee city': 'Milwaukee',
    'Fort Worth city': 'Fort Worth',
    'San Jose city': 'San Jose',
    'Los Angeles city': 'Los Angeles',
    'Indianapolis city (balance)': 'Indianapolis',
    'Baltimore city': 'Baltimore',
    'Chicago city': 'Chicago'
}

In [13]:
category_abb = {
    'Unknown': 'Un',
    'accessories': 'Accs',
    'home & kitchen': 'Home & Kit',
    'beauty & health': 'Beauty & Hlth',
    'bags & luggage': 'Bags & Lug',
    'appliances': 'Applncs',
    'sports & fitness': 'Sports & Fit',
    'car & motorbike': 'Car & Moto',
    'tv audio & cameras': 'TV & Audio',
    'kids\' fashion': 'Kids\' Fash',
    'toys & baby products': 'Toys & Baby',
    'industrial supplies': 'Indus Supplies',
    'grocery & gourmet foods': 'Groc & Gourmet',
    'stores': 'Stores',
    'men\'s shoes': 'Men\'s Shoes',
    'men\'s clothing': 'Men\'s Cloth',
    'women\'s clothing': 'Women\'s Cloth',
    'pet supplies': 'Pet Suppl',
    'home kitchen pets': 'Home Kit Pets',
    'music': 'Music',
    'women\'s shoes': 'Women\'s Shoes'
}

In [14]:
subcat_abb = {
    'Unknown': 'Un',
    'All Exercise & Fitness': 'Exercise & Fit',
    'Casual Shoes': 'Casual Shoes',
    'All Electronics': 'Electronics',
    'Badminton': 'Badminton',
    'All Appliances': 'Appliances',
    'All Pet Supplies': 'Pet Supplies',
    'Baby Products': 'Baby Prod',
    'All Grocery & Gourmet Foods': 'Groc & Gourmet',
    'Clothing': 'Clothing',
    'Air Conditioners': 'Air Cond',
    'Bedroom Linen': 'Bed Linen',
    'Amazon Fashion': 'Amazon Fash',
    'Car & Bike Care': 'Car & Bike Care',
    'Ethnic Wear': 'Ethnic Wear',
    'All Home & Kitchen': 'Home & Kit',
    'Car Electronics': 'Car Electronics',
    'Bags & Luggage': 'Bags & Lug',
    'All Car & Motorbike Products': 'Car & Moto',
    'Baby Fashion': 'Baby Fashion',
    'All Sports Fitness & Outdoors': 'Sports & Outdoor',
    'Cameras': 'Cameras',
    'Ballerinas': 'Ballerinas',
    'Diet & Nutrition': 'Diet & Nut',
    'Beauty & Grooming': 'Beauty & Groom',
    'Indoor Lighting': 'Indoor Light',
    'Camera Accessories': 'Cam Accs',
    'Formal Shoes': 'Formal Shoes',
    'Cricket': 'Cricket',
    'Camping & Hiking': 'Camping & Hike',
    'STEM Toys Store': 'STEM Toys',
    'Industrial & Scientific Supplies': 'Indus & Sci Supplies',
    'Gold & Diamond Jewellery': 'Gold & Diamond',
    'Cardio Equipment': 'Cardio Equip',
    'Car Parts': 'Car Parts',
    'Strength Training': 'Strength Train',
    'Garden & Outdoors': 'Garden & Outdrs',
    'Football': 'Football',
    'Coffee Tea & Beverages': 'Coffee & Tea',
    'Lingerie & Nightwear': 'Lingerie & Night',
    'Home Audio & Theater': 'Home Audio & Thea',
    'Diapers': 'Diapers',
    'Home Entertainment Systems': 'Home Ent Sys',
    'Headphones': 'Headphones',
    'Heating & Cooling Appliances': 'Heat & Cool Appl',
    'Baby Bath Skin & Grooming': 'Baby Bath & Groom',
    'Speakers': 'Speakers',
    'Backpacks': 'Backpacks',
    'Fashion & Silver Jewellery': 'Fashion & Silver',
    'Kids\' Fashion': 'Kids\' Fash',
    'Kitchen Storage & Containers': 'Kit Storage & Cont',
    'Health & Personal Care': 'Health & Care',
    'Home Improvement': 'Home Improv',
    'School Bags': 'School Bags',
    'Wallets': 'Wallet',
    'Cycling': 'Cycling',
    'Furniture': 'Furniture',
    'Shirts': 'Shirts',
    'Car Accessories': 'Car Accs',
    'Televisions': 'TVs',
    'Kids\' Watches': 'Kids\' Watches',
    'Men\'s Fashion': 'Men\'s Fash',
    'Washing Machines': 'Washing Mach',
    'Fashion Sandals': 'Fashion Sandals',
    'Janitorial & Sanitation Supplies': 'Janitorial & San Sup',
    'Dog supplies': 'Dog Supplies',
    'Snack Foods': 'Snack Foods',
    'International Toy Store': 'Intl Toy Store',
    'Security Cameras': 'Sec Cameras',
    'Handbags & Clutches': 'Handbags & Clutch',
    'Fitness Accessories': 'Fit Accs',
    'Running': 'Running',
    'Home Furnishing': 'Home Furnish',
    'Travel Duffles': 'Travel Duff',
    'Fashion Sales & Deals': 'Fashion Sales',
    'Test Measure & Inspect': 'Test & Inspect',
    'Motorbike Accessories & Parts': 'Moto Accs & Parts',
    'Musical Instruments & Professional Audio': 'Musical & Pro Audio',
    'Home Décor': 'Home Décor',
    'Jewellery': 'Jewelry',
    'Toys Gifting Store': 'Toys Gift Store',
    'Refurbished & Open Box': 'Refurb & Open Box',
    'Home Storage': 'Home Storage',
    'Watches': 'Watches',
    'Sunglasses': 'Sunglasses',
    'Household Supplies': 'Household Sup',
    'Personal Care Appliances': 'Pers Care Appl',
    'Nursing & Feeding': 'Nursing & Feed',
    'Strollers & Prams': 'Strollers & Prams',
    'Make-up': 'Make-up',
    'Lab & Scientific': 'Lab & Sci',
    'Women\'s Fashion': 'Women\'s Fash',
    'Yoga': 'Yoga',
    'Western Wear': 'Western Wear',
    'Innerwear': 'Innerwear',
    'Sports Shoes': 'Sports Shoes',
    'Luxury Beauty': 'Lux Beauty',
    'Travel Accessories': 'Travel Accs',
    'The Designer Boutique': 'Designer Boutique',
    'Rucksacks': 'Rucksacks',
    'Jeans': 'Jeans',
    'Kitchen & Dining': 'Kit & Dining',
    'Sewing & Craft Supplies': 'Sewing & Craft',
    'Kids\' Shoes': 'Kids\' Shoes',
    'Sportswear': 'Sportswear',
    'Kitchen & Home Appliances': 'Kit & Home Appl',
    'Value Bazaar': 'Value Bazaar',
    'Kids\' Clothing': 'Kids\' Cloth',
    'Suitcases & Trolley Bags': 'Suitcases & Trolley',
    'Toys & Games': 'Toys & Games',
    'Refrigerators': 'Fridges',
    'T-shirts & Polos': 'T-shirts & Polos',
    'Shoes': 'Shoes'
}

In [15]:
df_segment['CustomerCity'] = df_segment['CustomerCity'].map(city_abb)
df_segment['CustomerState'] = df_segment['CustomerState'].map(state_dict)
df_segment['Gender'] = df_segment['Gender'].replace({'female': 'f','male': 'm'})
df_segment['Age'] = df_segment['Age'].map(age_dict)
df_segment['EmploymentStatus'] = df_segment['EmploymentStatus'].replace({'employed': 1,'unemployed': 0})
df_segment['Education'] = df_segment['Education'].map(education_abb)
df_segment['Occupation'] = df_segment['Occupation'].map(occupation_abb)
df_segment['Industry'] = df_segment['Industry'].map(industry_abb)
df_segment['MainCategoryMode'] = df_segment['MainCategoryMode'].map(category_abb)
df_segment['SubCategoryMode'] = df_segment['SubCategoryMode'].map(subcat_abb)

df_segment.head(3)

Unnamed: 0,CustomerCity,CustomerState,Gender,Age,EmploymentStatus,Education,Occupation,Industry,Income,OrderCount,...,OrderMonthMode,OrderWeekOfMonthMode,OrderSeasonMode,ItemPriceMean,MainCategoryMode,SubCategoryMode,TypeMode,PurchaseFlag,CampaignFlag,Segment
0,Los Angeles,CA,f,5-17,0,< HS Diploma,Un,Un,46148,0,...,0.0,0.0,0.0,0.0,Un,Un,Unknown,0,0,2
1,Nashville,TN,m,35-44,1,College/Associates,Management/Business/Arts,Manufacturing,42639,0,...,0.0,0.0,0.0,0.0,Un,Un,Unknown,0,0,2
2,Louisville,KY,f,5-17,0,< HS Diploma,Un,Un,42019,0,...,0.0,0.0,0.0,0.0,Un,Un,Unknown,0,0,2


Since LLMs can understand text data, we transformed the dataframes to csv files to later use in the models.

In [17]:
# transform dfs to csv
df_str = df_segment.to_csv(index = False)
purchase_percentage_str = purchase_percentage.to_csv(index = False)

Although LLMs can analyze the dataset as a whole, when we put this df we exceed the max_token limit. Therefore, we will split the dataframe into three to store the information for each segment separately.

In [26]:
segments = df_segment['Segment'].unique()

# hold dfs for each segment
segmented_dfs = {}

# create df for each segment
for segment in segments:
    segmented_dfs[segment] = df_segment[df_segment['Segment'] == segment]

In [38]:
# name the segments with their segment numbers and transform into csv
for i in range(1, len(segments)+1):
  globals()[f'df{i}'] = segmented_dfs[i]
  globals()[f'df{i}_str'] = segmented_dfs[i].to_csv(index=False)

In [39]:
df1_str

"CustomerCity,CustomerState,Gender,Age,EmploymentStatus,Education,Occupation,Industry,Income,OrderCount,OrderItemMean,TotalAmountMax,TotalAmountMin,OrderMonthMode,OrderWeekOfMonthMode,OrderSeasonMode,ItemPriceMean,MainCategoryMode,SubCategoryMode,TypeMode,PurchaseFlag,CampaignFlag,Segment\nSan Diego,CA,m,65-74,1,< HS Diploma,Sales/Office,Wholesale,60797,1,1.0,95.96,95.96,1.0,2.0,1.0,23.99,Stores,Amazon Fash,Unknown,1,0,1\nSan Diego,CA,m,75+,1,College/Associates,Sales/Office,Prof/Sci/Mgmt,60797,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Un,Un,Unknown,0,0,1\nSeattle,WA,m,25-34,1,Graduate/Professional,Sales/Office,Edu/Health/Social,61037,14,2.0,5445.58,5445.58,2.0,1.0,1.0,1361.395,Sports & Fit,Cam Accs,podcast,1,0,1\nWashington,,m,35-44,1,Bachelors,Service,Public Admin,63124,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Un,Un,Unknown,0,0,1\nSan Diego,CA,f,65-74,1,Bachelors,Management/Business/Arts,Wholesale,60797,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Un,Un,podcast,0,0,1\nSan Jose,CA,m,35-44,0,Bachelors,Un,Un,76593,0,0.0,0.0,

In [40]:
prompt_df1 = f"""
Analyze the following customer data for Segment 1. Categorize customers into distinct audience segments based on behavior, preferences, and demographics.
For this segment, provide a brief description of key characteristics and suggest targeted marketing strategies to enhance engagement and sales.

Data:
{df1_str}
"""

prompt_df3 = f"""
Analyze the following customer data for Segment 3. Categorize customers into distinct audience segments based on behavior, preferences, and demographics.
For this segment, provide a brief description of key characteristics and suggest targeted marketing strategies to enhance engagement and sales.

Data:
{df3_str}
"""

In [49]:
response_df1 = openai.ChatCompletion.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": prompt_df1}
    ],
    max_tokens = 750,
    temperature = 0.7
)

In [50]:
print("Analysis for Segment 1:")
print(response_df1.choices[0].message['content'].strip())

Analysis for Segment 1:
### Customer Segmentation for Segment 1

Based on the provided customer data for Segment 1, we can categorize the customers into distinct audience segments based on their behavior, preferences, and demographics.

#### 1. **High-Value Shoppers**
- **Demographics**: 
  - Age: 25-34
  - Gender: Male and Female
  - Education: Graduate/Professional or Bachelor's
  - Occupation: Management/Business/Arts, Sales/Office
  - Income: Ranges around $60,000 - $70,000
- **Behavior**:
  - High total amount spent on orders (some over $5000).
  - Frequent shoppers with multiple orders (14+).
  - Engage with content such as podcasts and influencer marketing.
- **Targeted Marketing Strategies**:
  - **Loyalty Programs**: Implement a rewards system that offers discounts or exclusive access based on order count or total spending.
  - **Personalized Recommendations**: Use data analytics to provide tailored product suggestions based on previous purchases.
  - **Engagement Through Cont

In [52]:
response_df3 = openai.ChatCompletion.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": prompt_df3}
    ],
    max_tokens = 750,
    temperature = 0.7
)

print("Analysis for Segment 3:")
print(response_df3.choices[0].message['content'].strip())

Analysis for Segment 3:
To analyze the customer data for Segment 3, we will categorize customers into distinct audience segments based on their behavior, preferences, and demographics. The analysis will focus on identifying key characteristics and suggesting targeted marketing strategies for each identified segment.

### Audience Segmentation

1. **Young Adults (18-34 years)**
   - **Characteristics**: 
     - Age: 18-34
     - Employment Status: Majority are employed.
     - Education: Predominantly College/Associates.
     - Gender: Both genders are represented.
     - Behavior: Engagement through social media and podcasts; some have made purchases but many are inactive.
   - **Marketing Strategies**:
     - Utilize social media campaigns targeting platforms popular among younger demographics (Instagram, TikTok).
     - Offer promotions on trending products or experiences (e.g., tech gadgets, fashion).
     - Engage with influencers to promote products through authentic storytelling.

Since the second segment has more customer comparing to others, it can't be fit in the model without splitting. Therefore, we randomly split df2 and read the outputs separately.

In [54]:
df2_v1 = df2.sample(frac = 0.5,
                    random_state = 42)
df2_v2 = df2.drop(df2_v1.index)

In [56]:
prompt_df2_v1 = f"""
Analyze the following customer data for Segment 2. Categorize customers into distinct audience segments based on behavior, preferences, and demographics.
For this segment, provide a brief description of key characteristics and suggest targeted marketing strategies to enhance engagement and sales.

Data:
{df2_v1}
"""

prompt_df2_v2 = f"""
Analyze the following customer data for Segment 2. Categorize customers into distinct audience segments based on behavior, preferences, and demographics.
For this segment, provide a brief description of key characteristics and suggest targeted marketing strategies to enhance engagement and sales.

Data:
{df2_v2}
"""

In [57]:
response_df2 = openai.ChatCompletion.create(
    model = "gpt-4o-mini",
    messages=[
        {"role": "user", "content": prompt_df2_v1}
    ],
    max_tokens = 750,
    temperature = 0.7
)

print("Analysis for Segment 2 Version 1:")
print(response_df2.choices[0].message['content'].strip())


Analysis for Segment 2 Version 1:
Based on the provided customer data for Segment 2, we can analyze various characteristics such as demographics, behavior, and preferences to categorize customers into distinct audience segments. Below is a breakdown of key characteristics and suggested targeted marketing strategies for enhancing engagement and sales.

### Audience Segmentation

1. **Young Adolescents (Ages 5-17)**
   - **Demographics**: Primarily female, located in cities like San Antonio.
   - **Employment Status**: Not employed (0).
   - **Education**: Below high school diploma.
   - **Behavior**: Very low order count, indicating minimal engagement.
   - **Marketing Strategy**: 
     - Leverage social media platforms popular among younger audiences (e.g., TikTok, Instagram).
     - Collaborate with influencers who appeal to this age group.
     - Create engaging, interactive content (games, contests) to draw in both this demographic and their parents.

2. **Young Professionals (Ages 

In [59]:
response_df2 = openai.ChatCompletion.create(
    model = "gpt-4o-mini",
    messages=[
        {"role": "user", "content": prompt_df2_v2}
    ],
    max_tokens = 750,
    temperature = 0.7
)

print("Analysis for Segment 2:")
print(response_df2.choices[0].message['content'].strip())


Analysis for Segment 2:
Based on the provided customer data for Segment 2, we can categorize customers into distinct audience segments based on their behavior, preferences, and demographics. Here’s an analysis and segmentation, along with key characteristics and targeted marketing strategies:

### Audience Segmentation

1. **Young Adults (18-24)**
   - **Characteristics:**
     - Gender: Predominantly female
     - Employment Status: Mostly unemployed
     - Education: Varied, but many are still in high school or have < HS Diploma
     - Income: Lower income levels (average around $40,000)
     - Order Count: Most have low or no orders
   - **Targeted Marketing Strategies:**
     - Utilize social media platforms like Instagram and TikTok for engagement.
     - Offer promotions and discounts on popular items (like clothing and tech gadgets).
     - Create engaging content that relates to their lifestyle (e.g., back-to-school campaigns).

2. **Middle-Aged Professionals (35-54)**
   - **C

### The effectiveness of using LLMs for this purpose

*   LLMs can process large datasets quickly and efficiently. This speeds up the process of creating audience segments and recommendations.
*   They are adaptable for mixed-datatypes and can understand the information as long as the column name and values are consistent.
*   Rather than only giving analysis, they can also give recommendations.

*   It can be computationally expensive for large datasets.
*   Limitation of the token limit restricts the amount of data we can pass in a single request. Compressing a df to fit it may cause reducing the granularity of analysis.
*   LLMs function as black boxes, making it hard to explain how certain analysis suggestions were made. I tried Gemini for this project before but the analysis output was not sufficient. It is important to select the correct engine for each task.


**Conclusion:** LLMs offer quick and efficient audience segmentation, adapt to mixed data types, and provide actionable recommendations. However, they can be computationally expensive, limited by token constraints, and lack transparency in how their suggestions are generated.