It looks like the data was in pretty good shape from your last file 'data_cleaning.ipynb.' Now we should move on to answering questions the marketing team has for you in your interview: Customer Segments, Product Information, and Marketing Return on Investment.

Customer Segment Questions:
1. Segment customers based on demographic data (age, gender, location)
2. Which demographics represent High Frequency Customers? 
3. Which demographics represent Low Frequency Customers?
4. HF vs. LF customers more likely to use a discount code?
5. Average Order Value from High Frequency Customers, AOV for Low Frequency Customers
6. What is the Customer Lifetime Value based on 2022 purchasing data?

Product Information Questions:
1. Total Sales by Product Type for 2022 
2. Which categories have the highest conversion rate? Which products have the highest conversion rate?
3. Top seasonality by total sales 
4. Total sales amount by month and marketing campaign (Which marketing campaigns had the highest total sales?)

Marketing Return on Investment:
1. Compare CPAs for Tik Tok vs. Email 
2. Which marketing campaigns had the highest conversions?
3. Which demographic had the highest conversion rate?

You start exploring their customer base via customer demographics - a good place to start would be age and gender. 

In [None]:
--Customer Segment #1:
SELECT
Location,
CASE
WHEN Age < 30 AND Gender = 'Female' THEN 'Young Female'
 WHEN Age < 30 AND Gender = 'Male' THEN 'Young Male'
WHEN Age >= 30 AND Gender = 'Female' THEN 'Adult Female'
WHEN Age >= 30 AND Gender = 'Male' THEN 'Adult Male'
ELSE 'Other'
END AS Customer_Segment,
COUNT(*) AS Segment_Count
FROM shopping_trends
GROUP BY Location, Customer_Segment
ORDER BY Segment_Count DESC;

Great, so far it looks like our top customer segments for 2022 were Adult Males. 
By using data from mock_data, let's see if we can categorize our customer segments into High Frequency and Low Frequency segments. 

In [None]:
--Customer Segment #2:
SELECT
    Location,
    CASE
        WHEN Age < 30 AND Gender = 'Female' THEN 'Young Female'
        WHEN Age < 30 AND Gender = 'Male' THEN 'Young Male'
        WHEN Age >= 30 AND Gender = 'Female' THEN 'Adult Female'
        WHEN Age >= 30 AND Gender = 'Male' THEN 'Adult Male'
        ELSE 'Other'
    END AS Customer_Segment,
    COUNT(*) AS Segment_Count
FROM shopping_trends
WHERE Frequency_of_Purchases IN ("Weekly", "Biweekly", "Monthly") 
GROUP BY Location, Customer_Segment
ORDER BY Segment_Count DESC;

In [None]:
--Customer Segment #3:
SELECT
    Location,
    CASE
        WHEN Age < 30 AND Gender = 'Female' THEN 'Young Female'
        WHEN Age < 30 AND Gender = 'Male' THEN 'Young Male'
        WHEN Age >= 30 AND Gender = 'Female' THEN 'Adult Female'
        WHEN Age >= 30 AND Gender = 'Male' THEN 'Adult Male'
        ELSE 'Other'
    END AS Customer_Segment,
    COUNT(*) AS Segment_Count
FROM shopping_trends
WHERE Frequency_of_Purchases IN ("Quarterly","Annually") 
GROUP BY Location, Customer_Segment
ORDER BY Segment_Count DESC;

In [None]:
--Customer Segment #4:
--Are High Frequency Customers or Low Frequency Customers more likely to use a discount code?
--If High Frequency then Frequency_of_Purchases = weekly, biweekly, monthly;  if Low Frequency then Frequency_of_Purchases = quarterly, yearly;SELECT Frequency_of_Purchases, Discount_Applied
SELECT Frequency_of_Purchases, Discount_Applied
FROM merged_data
WHERE Frequency_of_Purchases IN ("quarterly", "annually");

In [None]:
-- Customer Segment #5: Average Order Value from High Frequency Customers 
SELECT Frequency_of_Purchases, AVG (Purchase_Amount_USD)
FROM merged_data
GROUP BY Frequency_of_Purchases;

In [None]:
--Customer Segment #6. What is the Customer Lifetime Value based on 2022 purchasing data? (See SQL DB LITE)
-- In order to calcualte CLV, you first need to calculate Customer Value:
-- Calculate Average Order Value from SQL no.5 
--Calculate Average Number of Purchases 
--Calculate Customer Lifetime Value (CLV):
-- CLV = Customer Value * Avg Customer Lifespan 

In [None]:
--Product Information #1: Total Sales by Product Type for 2022 
SELECT Purchase_Amount_USD, Item_Purchased
FROM merged_data
GROUP BY Item_Purchased
ORDER BY Purchase_Amount_USD DESC;
-- count statement?

In [None]:
--Product Information #2:  Which products had the highest conversion rate?
SELECT Item_Purchased, tiktok_campaign, COUNT (tiktok_conversion), email_campaign, COUNT (email_conversion)
FROM merged_data
GROUP BY tiktok_campaign, email_campaign
ORDER BY tiktok_conversion DESC, email_conversion DESC;

In [None]:
-- Product Information #3 Total Sales Amount for seasonal clothing
SELECT Season, Total (Purchase_Amount_USD) DESC
FROM merged_data
GROUP BY Season;

In [None]:
--Product Information #4: Total sales amount by month and marketing campaign
SELECT email_campaign, email_dates, tiktok_campaign, tiktok_dates, Purchase_Amount_USD
FROM merged_data
WHERE MONTH (email_dates, tiktok_dates)
GROUP BY MONTH (email_dates, tiktok_dates);

In [None]:
--Marketing ROI #1: Compare CPAs for Tik Tok vs. Email 
--Remove dollar sign and CAST to FLOAT
SELECT
  AVG(CAST(REPLACE(tiktok_cpa, '$', '') AS FLOAT)) AS avg_tiktok_cpa,
  AVG(CAST(REPLACE(email_cpa, '$', '') AS FLOAT)) AS avg_email_cpa
FROM merged_data;

In [None]:
--Marketing ROI #2: Which marketing campaigns had the highest conversions?
SELECT
  email_campaign,
  tiktok_campaign,
  CASE
    WHEN MAX(email_conversion) > MAX(tiktok_conversion) THEN 'Email'
    WHEN MAX(email_conversion) < MAX(tiktok_conversion) THEN 'TikTok'
    ELSE 'Equal'  -- If they are equal
  END AS higher_conversion_campaign
FROM merged_data
GROUP BY email_campaign, tiktok_campaign;

In [19]:
---- Marketing ROI #3: Which demographic had the highest conversion rate?
SELECT
Location,
CASE
WHEN Age < 30 AND Gender = 'Female' THEN 'Young Female'
WHEN Age < 30 AND Gender = 'Male' THEN 'Young Male'
WHEN Age >= 30 AND Gender = 'Female' THEN 'Adult Female'
WHEN Age >= 30 AND Gender = 'Male' THEN 'Adult Male'
ELSE 'Other'
END AS Customer_Segment,
    COUNT(*) AS Segment_Count,
    AVG(tiktok_conversion) AS Avg_TikTok_Conversion,
    AVG(email_conversion) AS Avg_Email_Conversion
FROM
    merged_data
GROUP BY
    Location, Customer_Segment;

SyntaxError: invalid syntax (2247389623.py, line 4)