### **Problem 5: Customer Purchase Behavior and Retargeting**

The marketing team wants to segment customers by behavior for retargeting:

* Extract first and last names from `Customer Name`.
* Create a full address breakdown including house number and city.
* Categorize customers based on total spent: Low (<50), Medium (50–100), High (100–200), Very High (>200).
* Flag customers who made multiple purchases (same name and email).
* Detect outliers: customers whose total price is > 3 std deviations from the mean.
* Randomly sample 20% of the dataset to simulate an A/B test group.

*Hint: For segmentation, use bins with manual labels; for sampling, draw from entire DataFrame.*


In [117]:
import pandas as pd
import numpy as np
import re

In [118]:
data = pd.read_csv('fila_heat_filament_sales_april2025.csv')

In [119]:
df = pd.DataFrame(data)

In [120]:
df.head(3)

Unnamed: 0,Date Purchased,Receipt Number,Customer Name,Customer Address,Phone Number,Email,Store Location,Product Name,Product Code,Bar Code,Material Name,Color,Weight,Supplier,Lot Number,Price,Quantity,Tax,Total Price
0,2025-04-01,1ff49b78-8946-4e85-b59c-de66bacfb3d0,Danielle Johnson,"3321 Brittany Bypass, North Jefferyhaven, 79408",8386379402,danielle.johnson@hotmail.com,"5423 Garcia Light, West Melanieview, 06196",Standard PLA Filament,PLA-792,6184960000000.0,PLA,Blue,500,3DFilaments,L5012,26.69,1,1.87,28.56
1,2025-04-01,434308bc-89fa-4a68-8fb5-d27bbeb79919,Tracie Wyatt,"64752 Kelly Skyway, Jacquelineland, 80341",+1-283-276-4835x0305,tracie.wyatt@yahoo.com,"1395 Diana Locks, Thomasberg, 32826",Flexible TPU Filament,TPU-338,9696530000000.0,TPU,Purple,500,ProtoPolymers,L1520,20.88,2,2.92,44.68
2,2025-04-01,52fbe43b-9954-4eb4-8025-7ad1eb2263dd,Eric Moore,"691 James Mountain, Tashatown, 89667",001-184-514-6270x4828,eric.moore@gmail.com,"489 Eric Track, New Stephanie, 70015",Flexible TPU Filament,TPU-325,7015430000000.0,TPU,Purple,1000,PrintPro,L4257,41.47,4,11.61,177.49


In [121]:
# 1. Extract first and last names from `Customer Name`.
names_pattern = r'^([A-Za-z]+)\s+([A-Za-z]+)$'

In [122]:
df[['First Name', 'Last Name']] = df['Customer Name'].str.extract(names_pattern)

In [123]:
df[['Customer Name', 'First Name', 'Last Name']].head(3)

Unnamed: 0,Customer Name,First Name,Last Name
0,Danielle Johnson,Danielle,Johnson
1,Tracie Wyatt,Tracie,Wyatt
2,Eric Moore,Eric,Moore


In [124]:
# 2. Create a full address breakdown including house number and city.
customer_address_pattern = r'^(\d+)\s+([A-Za-z\s]+),\s+([A-Za-z\s]+),\s+(\d{5})$'

In [125]:
df[['House No.', 'Street', 'City', 'Zip Code']] = df['Customer Address'].str.extract(customer_address_pattern)

In [126]:
df[['Customer Address', 'House No.', 'Street', 'City', 'Zip Code']].head(3)

Unnamed: 0,Customer Address,House No.,Street,City,Zip Code
0,"3321 Brittany Bypass, North Jefferyhaven, 79408",3321,Brittany Bypass,North Jefferyhaven,79408
1,"64752 Kelly Skyway, Jacquelineland, 80341",64752,Kelly Skyway,Jacquelineland,80341
2,"691 James Mountain, Tashatown, 89667",691,James Mountain,Tashatown,89667


In [127]:
# 3. Categorize customers based on total spent: Low (<50), Medium (50–100), High (100–200), Very High (>200).
customer_bins = [0, 50, 100, 200, df['Total Price'].max()]

In [128]:
df['Customer Categories'] = pd.cut(df['Total Price'], customer_bins, labels=['low', 'medium', 'high', 'very high'])

In [129]:
df[['Customer Name', 'Total Price', 'Customer Categories']].head(3)

Unnamed: 0,Customer Name,Total Price,Customer Categories
0,Danielle Johnson,28.56,low
1,Tracie Wyatt,44.68,low
2,Eric Moore,177.49,high


In [130]:
# replace NaNs (from missing values) with a label
df['Customer Categories'].cat.add_categories('missing').fillna('missing')

0         low
1         low
2        high
3      medium
4      medium
        ...  
355      high
356      high
357      high
358      high
359      high
Name: Customer Categories, Length: 360, dtype: category
Categories (5, object): ['low' < 'medium' < 'high' < 'very high' < 'missing']

In [131]:
# 4. Flag customers who made multiple purchases (same name and email).
multiple_purchaser = (
    df.groupby(['Customer Name', 'Email'])['Product Code']
    .size()
    .reset_index(name='Total Purchases')
)

In [132]:
multiple_purchaser.head(3)

Unnamed: 0,Customer Name,Email,Total Purchases
0,Aaron Johnson,aaron.johnson@yahoo.com,1
1,Adam Barry,adam.barry@hotmail.com,1
2,Adam Cortez,adam.cortez@hotmail.com,1


In [133]:
multiple_purchaser['Flag (more than 1 purchase)'] = multiple_purchaser['Total Purchases'].apply(lambda x: 'FLAGGED' if x > 1 else '')

In [134]:
multiple_purchaser.head(3)

Unnamed: 0,Customer Name,Email,Total Purchases,Flag (more than 1 purchase)
0,Aaron Johnson,aaron.johnson@yahoo.com,1,
1,Adam Barry,adam.barry@hotmail.com,1,
2,Adam Cortez,adam.cortez@hotmail.com,1,


In [135]:
# merge back to original df
df = df.merge(
    multiple_purchaser[['Customer Name', 'Email', 'Flag (more than 1 purchase)']],
    on=['Customer Name', 'Email'],
    how='left' # keeps original rows
)

In [136]:
df.head(3)

Unnamed: 0,Date Purchased,Receipt Number,Customer Name,Customer Address,Phone Number,Email,Store Location,Product Name,Product Code,Bar Code,...,Tax,Total Price,First Name,Last Name,House No.,Street,City,Zip Code,Customer Categories,Flag (more than 1 purchase)
0,2025-04-01,1ff49b78-8946-4e85-b59c-de66bacfb3d0,Danielle Johnson,"3321 Brittany Bypass, North Jefferyhaven, 79408",8386379402,danielle.johnson@hotmail.com,"5423 Garcia Light, West Melanieview, 06196",Standard PLA Filament,PLA-792,6184960000000.0,...,1.87,28.56,Danielle,Johnson,3321,Brittany Bypass,North Jefferyhaven,79408,low,
1,2025-04-01,434308bc-89fa-4a68-8fb5-d27bbeb79919,Tracie Wyatt,"64752 Kelly Skyway, Jacquelineland, 80341",+1-283-276-4835x0305,tracie.wyatt@yahoo.com,"1395 Diana Locks, Thomasberg, 32826",Flexible TPU Filament,TPU-338,9696530000000.0,...,2.92,44.68,Tracie,Wyatt,64752,Kelly Skyway,Jacquelineland,80341,low,
2,2025-04-01,52fbe43b-9954-4eb4-8025-7ad1eb2263dd,Eric Moore,"691 James Mountain, Tashatown, 89667",001-184-514-6270x4828,eric.moore@gmail.com,"489 Eric Track, New Stephanie, 70015",Flexible TPU Filament,TPU-325,7015430000000.0,...,11.61,177.49,Eric,Moore,691,James Mountain,Tashatown,89667,high,


In [137]:
# 5. Detect outliers: customers whose total price is > 3 std deviations from the mean.
mean_price = df['Total Price'].mean()

In [138]:
std_price = df['Total Price'].std()

In [139]:
threshold = mean_price + 3 * std_price

In [140]:
df['Outlier (3 * std Rule)'] = df['Total Price'].apply(lambda x: 'OUTLIER' if x > threshold else '')

In [141]:
df[['Total Price', 'Outlier (3 * std Rule)']].head(3)

Unnamed: 0,Total Price,Outlier (3 * std Rule)
0,28.56,
1,44.68,
2,177.49,


In [142]:
# 6. Randomly sample 20% of the dataset to simulate an A/B test group.
ab_test_group = df.sample(frac=0.2, random_state=42)

In [143]:
ab_test_group.head(3)

Unnamed: 0,Date Purchased,Receipt Number,Customer Name,Customer Address,Phone Number,Email,Store Location,Product Name,Product Code,Bar Code,...,Total Price,First Name,Last Name,House No.,Street,City,Zip Code,Customer Categories,Flag (more than 1 purchase),Outlier (3 * std Rule)
224,2025-04-19,d03c9c2c-bc37-49b1-b48b-9a8c7eb3aefc,Andrea Wilson,"95535 Hull Freeway, North Brianna, 30270",(890)030-1027x24055,andrea.wilson@hotmail.com,"342 Mendoza Crossing, North Johnside, 27483",Pro ABS Filament,ABS-828,5374580000000.0,...,39.19,Andrea,Wilson,95535,Hull Freeway,North Brianna,30270,low,,
42,2025-04-04,b787ef8d-3495-411e-ae27-d4321dc1e7eb,Gina Leblanc,"414 Mendez Forges, South Eric, 98508",+1-568-783-3918x785,gina.leblanc@gmail.com,"108 Ashley Drive, West Melissa, 58413",Pro ABS Filament,ABS-787,9718710000000.0,...,59.83,Gina,Leblanc,414,Mendez Forges,South Eric,98508,medium,,
285,2025-04-24,28876b7f-6469-4f53-bf50-85e7058cdd44,Kathryn Edwards,"339 Brad Knoll, Marshallmouth, 43016",(777)226-6025x6265,kathryn.edwards@gmail.com,"2292 Cody Lock, Hernandezmouth, 26421",Pro ABS Filament,ABS-838,4843380000000.0,...,59.83,Kathryn,Edwards,339,Brad Knoll,Marshallmouth,43016,medium,,


In [144]:
df['AB Group'] = pd.Series(df.index.isin(ab_test_group.index), index=df.index).map({True: 'Test', False: 'Control'})

In [145]:
df.head(3)

Unnamed: 0,Date Purchased,Receipt Number,Customer Name,Customer Address,Phone Number,Email,Store Location,Product Name,Product Code,Bar Code,...,First Name,Last Name,House No.,Street,City,Zip Code,Customer Categories,Flag (more than 1 purchase),Outlier (3 * std Rule),AB Group
0,2025-04-01,1ff49b78-8946-4e85-b59c-de66bacfb3d0,Danielle Johnson,"3321 Brittany Bypass, North Jefferyhaven, 79408",8386379402,danielle.johnson@hotmail.com,"5423 Garcia Light, West Melanieview, 06196",Standard PLA Filament,PLA-792,6184960000000.0,...,Danielle,Johnson,3321,Brittany Bypass,North Jefferyhaven,79408,low,,,Control
1,2025-04-01,434308bc-89fa-4a68-8fb5-d27bbeb79919,Tracie Wyatt,"64752 Kelly Skyway, Jacquelineland, 80341",+1-283-276-4835x0305,tracie.wyatt@yahoo.com,"1395 Diana Locks, Thomasberg, 32826",Flexible TPU Filament,TPU-338,9696530000000.0,...,Tracie,Wyatt,64752,Kelly Skyway,Jacquelineland,80341,low,,,Control
2,2025-04-01,52fbe43b-9954-4eb4-8025-7ad1eb2263dd,Eric Moore,"691 James Mountain, Tashatown, 89667",001-184-514-6270x4828,eric.moore@gmail.com,"489 Eric Track, New Stephanie, 70015",Flexible TPU Filament,TPU-325,7015430000000.0,...,Eric,Moore,691,James Mountain,Tashatown,89667,high,,,Control
