<font size="18">**User Insights in the C2C Fashion Ecosystem: A Data-Driven Exploration**

Group 6: Shiv Nag, Michael Webber, Rebecca Bubis, Sai Nruthya Vaka, Bilvika Bassetty, Fan Hong (Sally) Kong

Tableau Dashboard: https://public.tableau.com/app/profile/michael.webber/viz/BA775FinalDraft/FinalStory?publish=yes

# **Table of Contents**

## <font size=3> 1. Introduction
## <font size=3> 2. Executive Summary
## <font size=3> 3. Cleaning Phase
## <font size=3> 4. Exploratory Phase
## <font size=3> 5. Conclusion
## <font size=3> 6. Challenges
## <font size=3> 7. References
## <font size=3> 8. Generative AI Disclosure

# **1. Introduction**

## **Business Problem Definition**

C2C, also known as consumer to consumer, is an emerging market for retail shoppers where consumers sell directly to one another usually through an intermediary platform. Though it is increasing in popularity, the market capacity of C2C is still less than traditional B2C retailers (Business to consumer). On the dataset, there is an increase in the ID Hash of new users registering to be consumers or suppliers on the C2C platforms. If the team identifies demographics and customer groups that are underserved in the C2C space, the team can make recommendations on how businesses can change their engagement efforts, offerings, and targeted incentives to improve a good user experience for the consumers. This approach can help convert potential buyers, driving sales and strengthening the platform’s customer base, and ultimately boost overall profitability and create a stronger customer base.

## **Motivation**

The motivation behind this project stems from the company’s vision to lead the transformation of the C2C fashion market by empowering users and driving sustainable commerce. As a competitive platform, understanding user behavior is essential to foster meaningful engagement, improving user retention, and maximizing sales opportunities. With the increasing consumer demand for personalized experiences and eco-conscious choices, this project aims to position the platform as a trusted, innovative leader in the second-hand fashion space. By leveraging deeper data insights, the company can enhance customer satisfaction, refine its offerings, and expand its market reach, all while contributing to a more sustainable and connected fashion ecosystem.

## **Dataset Description**

The following website is where we were able to gather the customer and user information for the e-commerce company:

https://data.world/jfreex/e-commerce-users-of-a-french-c2c-fashion-store/workspace/project-summary?agentid=jfreex&datasetid=e-commerce-users-of-a-french-c2c-fashion-store


The following website is where we were able to gather the country codes:

https://datahub.io/core/country-list


These datasets contain a list of users identified by unique ID Hash for 2020 and 2024. The name of the original file for 2020 was “users.6M0xxK.2020.public.csv ” and for 2024 was “users.6M0xxK.2024.public.csv”. The original 2020 CSV file contains 24 columns with 98,913 rows, and the original 2024 CSV file contains 21 columns and a total of 20,743 rows. The following ERD shows the relationship between our different datasets as well as the specific columns that make up the data. The representation of the data that we have displayed below is created for the cleaned versions of the datasets described, so there may be some differences.

To begin, the goal is to utilize the datasets and conduct a year to year analysis to identify key trends in current user behavior (language, country, intent of joining platform, social media usage etc.) and compare it to new users who have joined to identify reasons for user retention and KPIs such as click-through rate.


## **Entity Relationship Diagram (ERD)**

In [5]:
import base64
from IPython.display import HTML

# Read and encode the image as Base64
with open("erd.png", "rb") as img_file:
    b64_image = base64.b64encode(img_file.read()).decode("utf-8")

# Embed the image in the notebook
HTML(f'<img src="data:image/jpeg;base64,{b64_image}" width="500"/>')

# **2. Executive Summary**

## **Overview:** 

This report analyzes user engagement and satisfaction within a C2C fashion ecosystem, highlighting regional trends, platform preferences, and demographic insights. Western Europe, particularly France, exhibits the highest user activity, while Italy stands out with a stronger focus on selling. Platform preferences vary, with most regions favoring web access, whereas mobile apps are more popular in Italy and Spain. User satisfaction has improved over time, though low engagement in some areas and limited app usage suggest growth potential. The findings aim to provide actionable recommendations for increasing user retention, diversifying engagement, and boosting conversions.

## **Key Findings:**

1. User Engagement Trends:
The data reveals a highly engaged user base with notable year-over-year growth in engagement from 2020 to 2024. Engagement levels were measured based on products wished and purchased, with the most active users concentrated in countries such as France, the United States, and the United Kingdom.

2. Platform Usage:
The platform primarily attracts users via its website, with France, the United States, and the United Kingdom leading in website-based access. However, countries like Italy and Spain significantly prefer mobile applications, particularly Android, which indicates regional differences in platform adoption.

3. User Satisfaction:
Satisfaction scores have improved from 2020 to 2024, with average satisfaction increasing by 3%, indicating overall positive user sentiment. This improvement reflects effective user experience and product quality enhancements.

4. Demographics and Regional Insights:
France, United States, and United Kingdom are the largest markets in terms of user base and activity. Smaller European markets like Denmark and Sweden show higher retention rates and stronger user engagement, despite their smaller size. Countries like Spain and Australia demonstrate lower retention rates and need focused re-engagement efforts. Germany represents a balanced market with high retention and moderate user activity, making it an ideal candidate for expansion.

5. Social Media and Conversion:
Social media influence appears minimal regarding actual purchases, with high-follow users not correlating with high purchase activity. Interestingly, users with smaller followings were often top buyers and sellers, suggesting other factors drive purchasing behavior within this ecosystem.
Conversion rates (wished items to actual purchases) are slightly higher among male users (11%) than female users (7.4%), despite a higher overall volume of female users.

## **Objectives:**

The objective of this analysis is to optimize the French C2C fashion platform by understanding user behavior, geographic trends, and platform usage to improve engagement and retention. The study aims to identify untapped demographics (e.g., younger women, male users), enhance website and app experiences, and drive wishlist-to-purchase conversions. It focuses on leveraging high-retention markets like Denmark and Sweden as models for growth, while addressing churn in key markets like France and the US. Strategic recommendations include localized campaigns, incentivized app usage, and targeted re-engagement in underperforming regions, with an emphasis on diversifying the user base and expanding into untapped markets for sustained growth.

## **Methods:**

Data collection, Data cleaning and pre processing, Exploratory Data Analysis (EDA) using SQL queries, Geospatial Analysis, Tableau (Data Visualisation)

## **Recommendations:**

1. Enhance User Engagement: Target younger women to create curated product offerings and marketing campaigns tailored to younger female demographics, as they are underrepresented on the platform. Expand male user base to increase product diversity and launch campaigns targeted at male users to tap into their higher wishlist-to-purchase conversion rates.

2. Optimize Platform Features: Improve website experience to focus on enhancing the website's user interface (UI) and user experience (UX), as the majority of users prefer the website over the app. Also encourage app adoption by offering exclusive discounts, loyalty rewards, and seamless user experiences across mobile platforms.

3. Geographic Focus: Localized Campaigns - Roll out re-engagement campaigns tailored to underperforming regions like Spain and Australia to increase retention and activity. Market expansion in high-retention markets like Denmark and Sweden as blueprints for engaging larger markets like Germany, France, and the US. Explore untapped markets in Asia and North America, leveraging insights from existing markets to tailor strategies for new regions.

4. Support Sellers: Encourage sellers to complete their profiles with photos and grow their follower base to boost visibility and sales. Provide tools and guidance for sellers to leverage social media to drive traffic to their listings.

6. Mitigate Churn and Drive Retention: Focus on retention strategies for high-churn regions like France and the US, such as loyalty programs and churn risk identification. Incentivize both buyers and sellers with personalized rewards to sustain engagement and activity.

# **3. Cleaning Phase**

## **Original Cleaning**

While our dataset is comprehensive, it was not entirely clean and required several cleaning steps in order to move into the EDA phase and gather valuable, actionable insights. Our team of data scientists has completed the following steps in order to clean the data.

In [None]:
%%bigquery
DELETE FROM `fall24-ba775-a06.master_dataset.Country Codes`
WHERE string_field_0 = 'Name';

-- Create the new Country Codes Table
CREATE TABLE `fall24-ba775-a06.master_dataset.Country Codes New` AS
SELECT
  string_field_0 AS name,
  string_field_1 AS country_code,
FROM `fall24-ba775-a06.master_dataset.Country Codes`;

-- Rename the table
ALTER TABLE `fall24-ba775-a06.master_dataset.Country Codes New`
RENAME TO `Country Codes`;

-- Convert the country codes to lower case
UPDATE `fall24-ba775-a06.master_dataset.Country Codes`
SET country_code = LOWER(country_code)
WHERE TRUE;

In [None]:
%%bigquery
-- Modified the language abbreviations to their full spelling
UPDATE `fall24-ba775-a06.master_dataset.2020 Data`
SET language = 
    CASE 
        WHEN language = 'en' THEN 'English'
        WHEN language = 'fr' THEN 'French'
        WHEN language = 'es' THEN 'Spanish'
        WHEN language = 'de' THEN 'German'
        WHEN language = 'it' THEN 'Italian'
    END
WHERE language IN ('en', 'fr', 'es', 'de', 'it');

UPDATE `fall24-ba775-a06.master_dataset.2024 Data`
SET language = 
    CASE 
        WHEN language = 'en' THEN 'English'
        WHEN language = 'fr' THEN 'French'
        WHEN language = 'es' THEN 'Spanish'
        WHEN language = 'de' THEN 'German'
        WHEN language = 'it' THEN 'Italian'
    END
WHERE language IN ('en', 'fr', 'es', 'de', 'it');


-- Converted the abbreviations for Male and Female to their full spelling for clarity
UPDATE `fall24-ba775-a06.master_dataset.Large Table`
SET gender = 
    CASE 
        WHEN gender = 'M' THEN 'Male'
        WHEN gender = 'F' THEN 'Female'
    END
WHERE gender IN ('M', 'F');

UPDATE `fall24-ba775-a06.master_dataset.2020 Data`
SET gender = 
    CASE 
        WHEN gender = 'M' THEN 'Male'
        WHEN gender = 'F' THEN 'Female'
    END
WHERE gender IN ('M', 'F');

UPDATE `fall24-ba775-a06.master_dataset.2024 Data`
SET gender = 
    CASE 
        WHEN gender = 'M' THEN 'Male'
        WHEN gender = 'F' THEN 'Female'
    END
WHERE gender IN ('M', 'F');

In [None]:
%%bigquery

-- Create the new clean 2020 table

CREATE TABLE `fall24-ba775-a06.master_dataset.clean_2020` AS
SELECT 
    identifierHash AS user_id,
    language,
    cc.name AS country,
    socialNbFollowers AS followers,
    socialNbFollows AS follows,
    socialProductsLiked AS productsLiked,
    productsListed,
    productsSold,
    productsPassRate,
    productsWished,
    productsBought,
    gender,
    civilityGenderId,
    civilityTitle,
    hasAnyApp,
    hasAndroidApp,
    hasIosApp,
    hasProfilePicture,
    daysSinceLastLogin,
    seniority
    
FROM `fall24-ba775-a06.master_dataset.2020 Data` AS large
LEFT JOIN `fall24-ba775-a06.master_dataset.Country Codes` AS cc
ON cc.country_code = large.countryCode;


-- Create the new clean 2024 table

CREATE TABLE `fall24-ba775-a06.master_dataset.clean_2024` AS
SELECT 
    identifierHash AS user_id,
    language,
    countryCode AS country_code,
    cc.name AS country,
    socialNbFollowers AS followers,
    socialNbFollows AS follows,
    socialProductsLiked AS productsLiked,
    productsListed,
    productsSold,
    productsPassRate,
    productsWished,
    productsBought,
    gender,
    civilityGenderId,
    civilityTitle,
    seniority,
    websiteLongevity
    
FROM `fall24-ba775-a06.master_dataset.2024 Data` AS large
LEFT JOIN `fall24-ba775-a06.master_dataset.Country Codes` AS cc
ON cc.country_code = large.countryCode
WHERE large.countryCode IS NOT NULL;

In [None]:
%%bigquery

-- Add the signup date to 2020

ALTER TABLE `fall24-ba775-a06.master_dataset.clean_2020`
ADD COLUMN signup_date DATE;

-- Populate the signup date by using the information we were given in the original table

UPDATE `fall24-ba775-a06.master_dataset.clean_2020`
SET signup_date = DATE_SUB(CURRENT_DATE(), INTERVAL (seniority + 251) DAY)
WHERE TRUE;

-- Drop the seniority column from 2020 and 2024

ALTER TABLE `fall24-ba775-a06.master_dataset.clean_2020`
DROP COLUMN seniority;

ALTER TABLE `fall24-ba775-a06.master_dataset.clean_2024`
DROP COLUMN seniority;

-- Create the website_creationDate

ALTER TABLE `fall24-ba775-a06.master_dataset.clean_2024`
ADD COLUMN website_creationDate DATE;

-- Populate the website_creationDate

UPDATE `fall24-ba775-a06.master_dataset.clean_2024`
SET website_creationDate = DATE_SUB(CURRENT_DATE(), INTERVAL (websiteLongevity + 251) DAY)
WHERE TRUE;

-- Drop the websiteLongevity as it is redundant

ALTER TABLE `fall24-ba775-a06.master_dataset.clean_2024`
DROP COLUMN websiteLongevity;

**Summary of critical observations:** The team cleaned both datasets into “cleaned_2020” and “cleaned_2024” through converting any abbreviation and capital letters in the column or data to lowercase to standardize all data points and only including relevant variable columns to our business goals. While limiting to the columns that were relevant, we did add two columns based on previous information, but catered it to our needs. The team also created a new table after joining the country codes onto the larger table called “Country Codes”.


## **Additional Cleaning**

After the initial cleaning and EDA phase, we noticed gaps in our analysis due to missing data for 2024. To address this issue, we imputed the missing values for 2024 by filling them with the corresponding values from 2020. This approach allows us to conduct a more comprehensive analysis of the changes in the e-commerce space between 2020 and 2024.

In [None]:
%%bigquery

CREATE TABLE `fall24-ba775-a06.master_dataset.clean_2024_new` AS
SELECT 
    large.identifierHash AS user_id,
    large.language,
    cc.name AS country,
    CASE 
        WHEN small.socialNbFollowers IS NULL THEN large.socialNbFollowers
        ELSE small.socialNbFollowers END AS followers,
    CASE 
        WHEN small.socialNbFollows IS NULL THEN large.socialNbFollows
        ELSE small.socialNbFollows END AS follows,
    CASE 
        WHEN small.socialProductsLiked IS NULL THEN large.socialProductsLiked
        ELSE small.socialProductsLiked END AS productsLiked,
    CASE 
        WHEN small.productsListed IS NULL THEN large.productsListed
        ELSE small.productsListed END AS productsListed,
    CASE 
        WHEN small.productsSold IS NULL THEN large.productsSold
        ELSE small.productsSold END AS productsSold,
    CASE 
        WHEN small.productsPassRate IS NULL THEN large.productsPassRate
        ELSE small.productsPassRate END AS productsPassRate,
    CASE 
        WHEN small.productsWished IS NULL THEN large.productsWished
        ELSE small.productsWished END AS productsWished,
    CASE 
        WHEN small.productsBought IS NULL THEN large.productsBought
        ELSE small.productsBought END AS productsBought,
    CASE 
        WHEN small.gender IS NULL THEN large.gender
        ELSE small.gender END AS gender,
    CASE 
        WHEN small.civilityGenderId IS NULL THEN large.civilityGenderId
        ELSE small.civilityGenderId END AS civilityGenderId,
    CASE 
        WHEN small.civilityTitle IS NULL THEN large.civilityTitle
        ELSE small.civilityTitle END AS civilityTitle,    
    CAST('2013-10-01' AS DATE) AS website_creationDate,
    
FROM `fall24-ba775-a06.master_dataset.2020 Data` AS large
LEFT JOIN `fall24-ba775-a06.master_dataset.Country Codes` AS cc
ON cc.country_code = large.countryCode
LEFT JOIN `fall24-ba775-a06.master_dataset.2024 Data` AS small
ON small.identifierHash = large.identifierHash
WHERE large.countryCode IS NOT NULL;

-- Delete the old table

DROP TABLE `fall24-ba775-a06.master_dataset.clean_2024`;

--Rename the new table to the original table name

ALTER TABLE `fall24-ba775-a06.master_dataset.clean_2024_new`
RENAME TO `clean_2024`;

-- Creating the year columns

ALTER TABLE `fall24-ba775-a06.master_dataset.clean_2020`
ADD COLUMN year STRING;

ALTER TABLE `fall24-ba775-a06.master_dataset.clean_2024`
ADD COLUMN year STRING;

-- Setting the year for each clean file

UPDATE `fall24-ba775-a06.master_dataset.clean_2020`
SET year = '2020'
WHERE True;

UPDATE `fall24-ba775-a06.master_dataset.clean_2024`
SET year = '2024'
WHERE True;

**Summary of Critical Observations:** We created a new table by joining the 2020 and 2024 datasets. After joining, we selected only the columns present in the 2024 data. During the selection process, we checked for missing values in the 2024 data and filled them using the corresponding values from 2020. We then deleted the old 2024 table and renamed the new table to match the original name of the 2024 table. This ensures that any previously developed queries remain operational. Finally, we created and populated year columns in both the 2020 and 2024 datasets to help differentiate data entries when the tables are combined for year-to-year comparisons. With these steps completed, we are now ready to dive deeper into our analysis.

# **4. Exploratory Phase**

Our team of data analysts is now poised for the in-depth EDA phase of the project, diving deeper and uncovering inisghts into the C2C market. We begin with an overview of our data and observe user demographics. Next, we explore user activity, engagement on the platform, and social media interactions and usage both at a high and granular level. We end with an analysis of C2C satisfaction rates, engagement, and buyer and seller conversion. Our specific results are shown below with our questions of interest along with a step-by-step analysis of each query and its finding.

## **1) User demographics: what is the gender distribution of C2C users, and how have they interacted with the platform?**

### **1.1) User demographics in 2020**

In [37]:
%%bigquery
WITH total_users_cte AS (
  SELECT COUNT(DISTINCT user_id) AS total_count
  FROM `fall24-ba775-a06.master_dataset.clean_2020`
),
civility_title_cte AS (
  SELECT
    civilityTitle,
    gender,
    COUNT(user_id) AS user_count,
    SUM(productsBought) AS total_bought,
    SUM(productsWished) AS total_wished,
    ROUND(SUM(productsBought) * 100.0 / NULLIF(SUM(productsWished), 0), 2) AS conversion_rate
  FROM `fall24-ba775-a06.master_dataset.clean_2020`
  GROUP BY civilityTitle, gender
)
SELECT 
  ct.civilityTitle,
  ct.gender,
  ct.user_count,
  ROUND((ct.user_count * 100.0) / tu.total_count, 2) AS civility_title_percentage,
  ct.total_bought,
  ct.total_wished,
  ct.conversion_rate
FROM civility_title_cte ct
CROSS JOIN total_users_cte tu
ORDER BY civility_title_percentage DESC, ct.conversion_rate DESC, ct.user_count DESC;


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,civilityTitle,gender,user_count,civility_title_percentage,total_bought,total_wished,conversion_rate
0,mrs,Female,75684,76.52,12457,126526,9.85
1,mr,Male,22792,23.04,4262,25063,17.01
2,miss,Female,437,0.44,287,2972,9.66


### **1.2) User Demographics in 2024**

In [1]:
%%bigquery
WITH total_users_cte AS (
  SELECT COUNT(DISTINCT user_id) AS total_count
  FROM fall24-ba775-a06.master_dataset.clean_2024
),
civility_title_cte AS (
  SELECT
    civilityTitle,
    gender,
    COUNT(user_id) AS user_count,
    SUM(productsBought) AS total_bought,
    SUM(productsWished) AS total_wished,
    ROUND(SUM(productsBought) * 100.0 / NULLIF(SUM(productsWished), 0), 2) AS conversion_rate
  FROM fall24-ba775-a06.master_dataset.clean_2024
  GROUP BY civilityTitle, gender
)
SELECT 
  ct.civilityTitle,
  ct.gender,
  ct.user_count,
  ROUND((ct.user_count * 100.0) / tu.total_count, 2) AS civility_title_percentage,
  ct.total_bought,
  ct.total_wished,
  ct.conversion_rate
FROM civility_title_cte ct
CROSS JOIN total_users_cte tu
ORDER BY civility_title_percentage DESC, ct.conversion_rate DESC, ct.user_count DESC;

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,civilityTitle,gender,user_count,civility_title_percentage,total_bought,total_wished,conversion_rate
0,mrs,Female,59806,60.46,10198,99352,10.26
1,mr,Male,38762,39.19,59742,330686,18.07
2,miss,Female,345,0.35,264,2813,9.38


This initial SQL query analysis unveils gender-based trends in the C2C landscape. We can see a huge disparity between male and female users, with women making up 76.52% of users and men only 23.48% in 2020. In 2024, we see some movement toward a balanced split as women make up 60.81% and men make up 39.19% of all users. Based on external research on C2C platforms, there is an abundance of women's clothing and accessories bought and sold in this market, as compared with men's items, explaining the much larger female customer base. **This explains the imbalance but also opens an opportunity for C2C platforms to expand their customer base - by attracting and offering more items for men to buy and sell in a C2C market platform.** However, we see growth in male users which is promising, but more alarmingly, shrinkage in female users. **Further analysis is required on this topic and will be discussed later.** We also observed the majority of users (75,684 in 2020 and 59806 in 2024) use the title "Mrs.," suggesting a largely female user base, followed by "Mr." (22,792 in 2020 and 38762 in 2024) and a smaller "Miss" group (437 in 2020 and 345 in 2024). **With the extreme disparity between "Mrs." and "Miss" this also indicates potential for targeted marketing toward unmarried or young adult users, with a focus on products or content appealing to this audience. More effort is required to attract younger or unmarried women as they are a huge minority in this space currently.** In 2020, females had a lower wishlist-to-purchase conversion rate of 9.66 - 9.85%, compared to surprising 17.01% for males. In 2024, females had a lower wishlist-to-purchase conversion rate of 9.38 - 10.26%, compared to growing 18.07% for males. The disparity in conversion rates indicates that females were less likely to purchase items they added to their wishlist as compared to men. **This gap suggests potential opportunities to better understand the preferences and purchasing behavior of female customers, in order to increase their conversion rate. It would be extremely profitable to increase female conversion rates as well considering the female dominance in the C2C buying and selling space.**

## **2) Which countries have the most total and active C2C users, and how many products that were listed on the C2C website were actually sold in those countries?**

### **2.1) User and Buy/Sell Comparison for 2020**

In [39]:
%%bigquery
WITH total_users_cte AS (
  SELECT 
    country, 
    COUNT(DISTINCT user_id) AS total_users
  FROM `fall24-ba775-a06.master_dataset.clean_2020`
  GROUP BY country
),
active_users_cte AS (
  SELECT 
    country, 
    COUNT(user_id) AS active_users_2020
  FROM `fall24-ba775-a06.master_dataset.clean_2020`
  WHERE (productsBought > 0 OR productsListed > 0) 
  GROUP BY country
),
products_summary_cte AS (
  SELECT 
    country,
    SUM(total_products_bought) AS total_products_bought,
    SUM(total_products_sold) AS total_products_sold
  FROM (
    SELECT 
      country,
      SUM(productsBought) AS total_products_bought,
      SUM(productsSold) AS total_products_sold
    FROM `fall24-ba775-a06.master_dataset.clean_2020`
    GROUP BY country
    UNION ALL
    SELECT 
      country,
      SUM(productsBought) AS total_products_bought,
      SUM(productsSold) AS total_products_sold
    FROM `fall24-ba775-a06.master_dataset.clean_2024`
    GROUP BY country
  ) AS combined_data
  GROUP BY country
)
SELECT 
  t.country,
  t.total_users,
  a.active_users_2020,
  p.total_products_bought,
  p.total_products_sold
FROM total_users_cte t
LEFT JOIN active_users_cte a ON t.country = a.country
LEFT JOIN products_summary_cte p ON t.country = p.country
ORDER BY p.total_products_bought DESC
LIMIT 10;


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,country,total_users,active_users_2020,total_products_bought,total_products_sold
0,France,25135,1697,21500,15418
1,United States,20602,1022,14782,10993
2,United Kingdom,11310,930,12642,7846
3,Germany,6567,630,6434,3948
4,Italy,8015,676,6071,8662
5,Spain,5706,343,4930,5256
6,Australia,2719,134,2654,1640
7,Belgium,1666,109,2161,962
8,Netherlands,1529,167,2081,916
9,Sweden,1826,164,1822,1645


### **2.2) User and Buy/Sell Comparison for 2024**

In [2]:
%%bigquery
WITH total_users_cte AS (
  SELECT 
    country, 
    COUNT(DISTINCT user_id) AS total_users
  FROM `fall24-ba775-a06.master_dataset.clean_2024`
  GROUP BY country
),
active_users_cte AS (
  SELECT 
    country, 
    COUNT(user_id) AS active_users_2024
  FROM `fall24-ba775-a06.master_dataset.clean_2024`
  WHERE (productsBought > 0 OR productsListed > 0) 
  GROUP BY country
),
products_summary_cte AS (
  SELECT 
    country,
    SUM(total_products_bought) AS total_products_bought,
    SUM(total_products_sold) AS total_products_sold
  FROM (
    SELECT 
      country,
      SUM(productsBought) AS total_products_bought,
      SUM(productsSold) AS total_products_sold
    FROM `fall24-ba775-a06.master_dataset.clean_2024`
    GROUP BY country
    UNION ALL
    SELECT 
      country,
      SUM(productsBought) AS total_products_bought,
      SUM(productsSold) AS total_products_sold
    FROM `fall24-ba775-a06.master_dataset.clean_2024`
    GROUP BY country
  ) AS combined_data
  GROUP BY country
)
SELECT 
  t.country,
  t.total_users,
  a.active_users_2024,
  p.total_products_bought,
  p.total_products_sold
FROM total_users_cte t
LEFT JOIN active_users_cte a ON t.country = a.country
LEFT JOIN products_summary_cte p ON t.country = p.country
ORDER BY p.total_products_bought DESC
LIMIT 10;


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,country,total_users,active_users_2024,total_products_bought,total_products_sold
0,France,25135,3105,35854,24808
1,United States,20602,2239,24824,20052
2,United Kingdom,11310,1500,20936,12080
3,Italy,8015,1082,9700,11684
4,Germany,6567,960,9598,6824
5,Spain,5706,664,7804,8532
6,Australia,2719,289,4512,3210
7,Netherlands,1529,254,3088,1404
8,Belgium,1666,192,2886,1696
9,Sweden,1826,259,2512,2618


Based on the SQL analysis, through 2024 we observe that France has the most users at 25k. Despite being European centric, the U.S. comes in second with nearly 21k users and the U.K. follows with 11k. The count drops off after the top-3 and is mostly compiled of Western European nations. Being that the C2C platform observed is French, we can conclude that C2C platforms, even those online, are affected by geography and proximity to the platform origin, though some distant users can be attained as we see the U.S, and Australia in the top-10. More research can be done on marketing tactics for C2C platforms. **However, to expand its customer base, we recommend C2C platforms to market in other areas such as North America and Asia to ensure continued year-over-year growth.**  Looking at active users, this is defined as users who purchased or listed at least one product. This is a helpful comparison to the total user count to analyze how many registered users are actually conducting transactions and if the difference is big. For example, in 2020 the U.S. has the second most users with nearly 21k, however, in the analysis of active users it shows there are only 1k, meaning less than 5% of registered users are engaging with the C2C platform. This shows improvement in 2024 as the active user count increases to 2k for the U.S., for example, boosting their engagement to about 10%. **More efforts are still needed to sustain and increase user activity through providing user incentives such as advertising campaigns, discounts, and site-wide promotions.** When it comes to buying versus selling, this query gives us a much deeper insight into the different user behviors with the C2C platforms. The overarching consensus is that users access C2C platforms to buy items rather than sell. Some countries are balanced such as Sweden (2.5k bought vs. 5.6k sold) while most others have a wide disparity. Depite being ranked lower in user count, Germany ranks 2nd in products bought, so their engagement is much higher per user than the U.S. One distinct outlier is Italy, who has more than double the products sold than bought. **Another identified area for growth opportunities and increased engagement for C2C platforms is to give sellers incentives to post and sell their products on the site, as it is a different avenue from buying, yet can still help user activity.** We will dive deeper into the application and social media usage to uncover user behaviors.

## **3) In-Depth Exploration of user engagement**

### **3.1) Which and how users of countries most engaging with C2C platforms across different access methods (website (noapp), Android app, iOS app, or both apps), and which countries show the highest reliance on mobile apps?**

In [1]:
%%bigquery
SELECT
   country,
   COUNT(user_id) AS total_users,
   SUM(CASE
           WHEN hasAndroidApp = FALSE AND hasIosApp = FALSE THEN 1
           ELSE 0
       END) AS website,
   SUM(CASE
           WHEN hasAndroidApp = TRUE AND hasIosApp = FALSE THEN 1
           ELSE 0
       END) AS android_only,
   SUM(CASE
           WHEN hasAndroidApp = FALSE AND hasIosApp = TRUE THEN 1
           ELSE 0
       END) AS ios_only,
   SUM(CASE
           WHEN hasAndroidApp = TRUE AND hasIosApp = TRUE THEN 1
           ELSE 0
       END) AS both_apps
FROM `fall24-ba775-a06.master_dataset.clean_2020`
GROUP BY country
ORDER BY total_users DESC
LIMIT 15;

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,country,total_users,website,android_only,ios_only,both_apps
0,France,25135,18855,1633,4608,39
1,United States,20602,18105,387,2084,26
2,United Kingdom,11310,8657,219,2421,13
3,Italy,8015,4931,706,2346,32
4,Germany,6567,4392,246,1918,11
5,Spain,5706,3916,816,967,7
6,Australia,2719,1750,44,923,2
7,Denmark,1892,1099,34,756,3
8,Sweden,1826,995,35,789,7
9,Belgium,1666,966,70,630,0


This query is intended to help explain if users prefered one platform over another for the C2C platform. We observe that top engaging countries primarily use the platform via the website, with France, United States, and United Kingdom leading in website-only usage. However, in other applications such as iOS only and Android only, the trend changes which depicts a difference in user preference for application platform across countries. Some countries show a preference for mobile apps: United States and France have high iOS-only engagement, while Spain and Italy have notable Android-only users. Based on these insights, for app promotion, we recommend **boosting mobile app adoption in low-app-usage countries, like Belgium and Denmark, through targeted campaigns.** For platform-specific improvements it is advised to **enhance iOS features for high iOS markets (e.g., United States and France) and Android for Android-preferred regions (e.g., Spain and Italy).** C2C platforms can also optimize websites further by **ensuring a top-tier web experience, as it remains the primary access method globally.** This will ensure that there is adequate application support to expand the market reach by the C2C platform as it is shown that different countries have different preference of how to use the platform. 

### **3.2) Which countries have the highest retention rate?**

In [1]:
%%bigquery
WITH engagement_data AS (
    SELECT 
        user_id,
        country,
        productsWished,
        productsBought
    FROM `fall24-ba775-a06.master_dataset.clean_2020`
    UNION ALL
    SELECT 
        user_id,
        country,
        productsWished,
        productsBought
    FROM `fall24-ba775-a06.master_dataset.clean_2024`
)

SELECT
    country,
    COUNT(user_id) AS total_users,
    SUM(
        CASE 
            WHEN productsWished > 0 OR productsBought > 0 THEN 1 
            ELSE 0 
        END
    ) AS retained_users,
    ROUND(
        SUM(
            CASE 
                WHEN productsWished > 0 OR productsBought > 0 THEN 1 
                ELSE 0 
            END
        ) * 100.0 / NULLIF(COUNT(user_id), 0), 
        2
    ) AS retention_percentage
FROM engagement_data
GROUP BY country
HAVING total_users > 10
ORDER BY retained_users DESC
LIMIT 10;


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,country,total_users,retained_users,retention_percentage
0,France,50270,6952,13.83
1,United States,41204,5908,14.34
2,United Kingdom,22620,3728,16.48
3,Germany,13134,2656,20.22
4,Italy,16030,2647,16.51
5,Spain,11412,1470,12.88
6,Australia,5438,919,16.9
7,Denmark,3784,790,20.88
8,Sweden,3652,710,19.44
9,Netherlands,3058,614,20.08


The query shows the retention rates for different countries. France and the United States stand out as the largest markets, with total users of 50,270 and 41,204 respectively, but they have relatively low retention rates of 13.83% and 14.34%. These figures emphasize the need for targeted strategies to enhance user engagement, such as re-engagement campaigns to address drop-off points. **On the other hand, smaller markets like Denmark, Sweden, and the Netherlands show much higher retention rates of 20.88%, 19.44%, and 20.08%, despite their lower user bases. This indicates that these markets have established effective strategies, which could be analyzed and deployed in larger markets to improve retention.** It is interesting to note that Germany is a high-potential market with a strong retention rate of 20.22% and a moderate user base of 13,134, highlighting opportunities for further market penetration and sustained engagement. Italy and the United Kingdom, with similar user bases but retention rates of 16.51% and 16.48%, also present potential opportunities to boost engagement. Meanwhile, Spain and Australia, with retention rates of 12.88% and 16.90%, show signs of underperformance. **Overall, these insights suggest significant opportunities to optimize user retention in high-potential markets like Germany and replicate successful strategies from smaller, high-retention countries. Also, identifying the gaps in large user-base markets like France and the United States can significantly enhance overall engagement and profitability.**

### **3.3) What are the top 10 countries with the highest number of users in the High Churn Risk category?**

In [35]:
%%bigquery

WITH AggregateStats AS (
   SELECT
       AVG(followers) AS avg_followers,
       AVG(productsLiked) AS avg_productsLiked
   FROM
       `fall24-ba775-a06.master_dataset.clean_2020`
),
UserActivity AS (
   SELECT
       user_id,
       country,
       productsLiked,
       productsBought,
       followers,
       (productsLiked + productsBought + followers) AS activity_score,
       CASE
           WHEN followers < (SELECT avg_followers FROM AggregateStats) 
                AND productsLiked < (SELECT avg_productsLiked FROM AggregateStats) 
           THEN 'High Churn Risk'
           WHEN followers BETWEEN (SELECT avg_followers FROM AggregateStats) AND 20 
           THEN 'Moderate Churn Risk'
           ELSE 'Low Churn Risk'
       END AS churn_risk
   FROM
       `fall24-ba775-a06.master_dataset.clean_2020`
    UNION ALL
    SELECT 
        user_id,
       country,
       productsLiked,
       productsBought,
       followers,
       (productsLiked + productsBought + followers) AS activity_score,
       CASE
           WHEN followers < (SELECT avg_followers FROM AggregateStats) 
                AND productsLiked < (SELECT avg_productsLiked FROM AggregateStats) 
           THEN 'High Churn Risk'
           WHEN followers BETWEEN (SELECT avg_followers FROM AggregateStats) AND 20 
           THEN 'Moderate Churn Risk'
           ELSE 'Low Churn Risk'
       END AS churn_risk
    FROM `fall24-ba775-a06.master_dataset.clean_2024`
),
CountryRiskAggregation AS (
   SELECT
       country,
       churn_risk,
       COUNT(user_id) AS total_users,
       SUM(activity_score) AS total_activity_score
   FROM
       UserActivity
   GROUP BY
       country, churn_risk
),
DominantRisk AS (
   SELECT
       country,
       churn_risk,
       total_users,
       total_activity_score,
       RANK() OVER (
           PARTITION BY country 
           ORDER BY total_users DESC, total_activity_score DESC
       ) AS risk_rank 
   FROM
       CountryRiskAggregation
),
TopCountries AS (
   SELECT
       churn_risk AS risk_type,
       ARRAY_AGG(country ORDER BY total_users DESC LIMIT 10) AS top_countries
   FROM
       DominantRisk
   WHERE
       risk_rank = 1
   GROUP BY
       churn_risk
)

-- Final query to output risk type and top countries
SELECT 
   risk_type,
   country
FROM
   TopCountries, UNNEST(top_countries) AS country
WHERE risk_type = "High Churn Risk";

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,risk_type,country
0,High Churn Risk,France
1,High Churn Risk,United States
2,High Churn Risk,United Kingdom
3,High Churn Risk,Italy
4,High Churn Risk,Germany
5,High Churn Risk,Spain
6,High Churn Risk,Australia
7,High Churn Risk,Sweden
8,High Churn Risk,Denmark
9,High Churn Risk,Belgium


In this query, the user activity score was analyzed based on the followers and the products liked and bought by country. High Churn Risk users are those with followers and products liked both below the calculated averages. We combined the 2020 and 2024 data to analyze the churn rate. The churn risk is then aggregated by country, providing a regional perspective and enabling strategic resource allocation for retention efforts.
Understanding churn risk helps identify user groups that require intervention to improve retention. By using average engagement metrics as benchmarks, the query dynamically adjusts to the dataset's characteristics. Targeting the top 10 countries in the High Churn Risk category prioritizes efforts where the risk of user churn is most significant. Based on the churn risk analysis, **we recommend C2C platforms turning their attention towards these countries with high churn risk, and to understand more deeply why they are at risk. From there it will uncover action items to decrease such risk.**

### **3.4) Is there a change in type of user engagement between 2020 and 2024?**

In [3]:
%%bigquery
WITH totals AS (
    SELECT 
        '2020' AS year,
        SUM(follows) AS total_follows,
        SUM(followers) AS total_followers,
        SUM(productsLiked) AS total_liked,
        SUM(productsListed) AS total_listed,
        SUM(productsSold) AS total_sold,
        SUM(productsWished) AS total_wished,
        SUM(productsBought) AS total_bought
    FROM `fall24-ba775-a06.master_dataset.clean_2020`

    UNION ALL

    SELECT 
        '2024' AS year,
        SUM(follows) AS total_follows,
        SUM(followers) AS total_followers,
        SUM(productsLiked) AS total_liked,
        SUM(productsListed) AS total_listed,
        SUM(productsSold) AS total_sold,
        SUM(productsWished) AS total_wished,
        SUM(productsBought) AS total_bought
    FROM `fall24-ba775-a06.master_dataset.clean_2024`
)

SELECT
    -- transpose years into columns for each metric...asked ChatGPT
    metric,
    SUM(CASE WHEN year = '2020' THEN total_follows ELSE 0 END) AS `2020`,
    SUM(CASE WHEN year = '2024' THEN total_follows ELSE 0 END) AS `2024`,
    
    -- add aggregated column for percentage change...asked ChatGPT
    CONCAT(
        ROUND(
            (SUM(CASE WHEN year = '2024' THEN total_follows ELSE 0 END) 
            - SUM(CASE WHEN year = '2020' THEN total_follows ELSE 0 END)) / 
            NULLIF(SUM(CASE WHEN year = '2020' THEN total_follows ELSE 0 END), 0) * 100, 1
        ), '%'
    ) AS percent_change
    
FROM (
    SELECT 'total_follows' AS metric, total_follows, year FROM totals
    UNION ALL
    SELECT 'total_followers', total_followers, year FROM totals
    UNION ALL
    SELECT 'total_liked', total_liked, year FROM totals
    UNION ALL
    SELECT 'total_listed', total_listed, year FROM totals
    UNION ALL
    SELECT 'total_sold', total_sold, year FROM totals
    UNION ALL
    SELECT 'total_wished', total_wished, year FROM totals
    UNION ALL
    SELECT 'total_bought', total_bought, year FROM totals
)
GROUP BY metric
ORDER BY
    CASE 
        WHEN metric = 'total_follows' THEN 1
        WHEN metric = 'total_followers' THEN 2
        WHEN metric = 'total_liked' THEN 3
        WHEN metric = 'total_listed' THEN 4
        WHEN metric = 'total_sold' THEN 5
        WHEN metric = 'total_wished' THEN 6
        WHEN metric = 'total_bought' THEN 7
    END;


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,metric,2020,2024,percent_change
0,total_follows,833409,967113,16%
1,total_followers,339496,502629,48.1%
2,total_liked,437269,1383785,216.5%
3,total_listed,9229,20626,123.5%
4,total_sold,12027,53972,348.8%
5,total_wished,154561,432851,180.1%
6,total_bought,17006,70204,312.8%


Interestingly, while data from 2024 is still incomplete due to the ongoing year, there is a significant increase in user engagement. Key metrics such as products being liked, wished, and bought have seen substantial growth. Similarly, social media activity (i.e., follows and followers) has increased, though not as dramatically as consumer and vendor engagement. Vendor activity has seen significant growth as well, which is reflected in the increased number of products listed and sold.

The sharp increase in activity could be partially explained by the continued effects of the COVID-19 pandemic, which has driven a shift towards more online shopping and engagement with online platforms. These metrics suggest that the C2C platform is not only working but also experiencing growth. **The team recommends continued investment in expanding into new markets and acquiring new customers, as well as identifying strategies to further boost activity that directly translates into purchases.**

### **3.5) How does social media presence and engagement affect the sold and bought conversion rates and can it help viewers understand who is a seller or buyer on this platform?**

#### 3.5.1) Average follower ratio

In [26]:
%%bigquery
SELECT 
    AVG(
        CASE 
            WHEN follows = 0 THEN 0
            ELSE (followers / follows)
        END
    ) AS average_follower_ratio
FROM 
    fall24-ba775-a06.master_dataset.clean_2020
WHERE 
    (followers > 0 OR follows > 0) 
    AND (productsListed > 0 OR productsLiked > 0);

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,average_follower_ratio
0,0.532367


This query provides the average follower ratio to the dataset. The follower ratio is calculated through a division of followers by follows. The interpretation of the follower ratio is that is the ratio <1, then the user is following more users than they have as followers. This can mean that the user is a buyer and are engaging as viewers of products from the consumer standpoint. If the follower ratio is >1, then they have more followers than their follows (following). This can mean that the user is a seller because they have more engagement from other viewers for the products they listed. To continue, the next two querie builds off this one and examines the sold and bought conversions which calculated from the products listed, sold, bought, and wished in order to seek out if there is a behavioral pattern from the users as buyers or sellers on the C2C Platform. The average follower ratio will be used as a standard comparison for users when there is further examination of the sold and bought conversion.

#### 3.5.2) Top 10 Highest Bought Coversion Rate


In [36]:
%%bigquery
SELECT 
    user_id,  
    CASE 
        WHEN follows = 0 THEN 0
        ELSE ROUND(followers / follows, 3)
    END AS follower_ratio, -- Engagement Metric
    hasProfilePicture, -- Profile Completion
    ROUND(productsSold / NULLIF(productsListed, 0), 3) AS sold_conversion, -- Conversion Rate 1 for buyer
    ROUND(productsBought / NULLIF(productsLiked, 0), 3) AS bought_conversion -- Conversion Rate 2 for seller
FROM 
    fall24-ba775-a06.master_dataset.clean_2020
WHERE 
    (followers > 0 OR follows > 0) 
    AND (productsListed > 0 OR productsLiked > 0)
ORDER BY 
    bought_conversion DESC
LIMIT 10;

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,user_id,follower_ratio,hasProfilePicture,sold_conversion,bought_conversion
0,1845100452,2.875,True,1.0,202.5
1,1724252068,0.7,True,,93.0
2,1770127268,1.556,True,,73.0
3,1855913892,0.625,True,,47.0
4,3924557731,0.333,True,,33.0
5,1910046628,2.0,True,3.75,31.0
6,1841102756,1.429,True,0.333,27.0
7,1592655780,0.625,True,,24.0
8,1691549603,0.375,True,,24.0
9,1602682788,0.625,True,,23.0


This query shows the highest bought conversion for the top 10 users. The bought conversion is a metric used in this query to order by and is calculated by productsbought / productsliked. The higher the number, the more the user has acted on their decisions and followed through with their purchases.

A common trend amoungst the highest purchasers has profile picture completion which indicates their activeness on the platform. Another interesting trend is that most users does not have a sold conversion number and the null values they have 0 products sold or listed. This indicates that users with high bought_conversion is highly likely solely purchasers. It is also interesting to know that the follower ratio is relatively low across all users and indicates they may be following more people compared to their followers and suggest a consumer behavior that they are more engaging in other people's listings and intent to purchase compared to sell. Furthermore, the lower follower ratio indicates a higher following and in combination with the higher rate of purchase, it seems like engaging through following allows customers to seek out better fitted products.

#### 3.5.3) Top 10 Highest Sold Coversion Rate


In [27]:
%%bigquery
SELECT 
    user_id,  
    CASE 
        WHEN follows = 0 THEN 0
        ELSE ROUND(followers / follows, 3)
    END AS follower_ratio, -- Engagement Metric
    hasProfilePicture, -- Profile Completion
    ROUND(productsSold / NULLIF(productsListed, 0), 3) AS sold_conversion, -- Conversion Rate 1 for buyer
    ROUND(productsBought / NULLIF(productsLiked, 0), 3) AS bought_conversion -- Conversion Rate 2 for seller
FROM 
    fall24-ba775-a06.master_dataset.clean_2020
WHERE 
    (followers > 0 OR follows > 0) 
    AND (productsListed > 0 OR productsLiked > 0)
ORDER BY 
    sold_conversion DESC
LIMIT 10;

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,user_id,follower_ratio,hasProfilePicture,sold_conversion,bought_conversion
0,1914896292,2.5,True,26.0,
1,1914437540,3.375,True,24.0,
2,1914503076,3.875,False,24.0,0.0
3,1914109860,1.917,False,23.0,0.0
4,1917910948,2.75,True,22.0,
5,1920466852,5.889,True,21.2,0.0
6,1912799140,2.444,True,20.0,0.0
7,1912209316,2.375,True,19.0,
8,1919287204,7.0,False,16.5,0.0
9,1910177700,2.5,False,15.0,0.0


This query shows the top 10 highed sold conversion rates. This metric is calculated productsSold/productslisted. This metric measures how many products a user selles compared to their listed items. A higher number indicated they have sold more than they have currently listed and indicates a success in the user being on the seller side.

A common trend that is seen from this graph is that the users with a high sold conversion rate has a null or 0 value for the bought conversion rate. This means they did not purchase or like any products from other users. This is a good indication that these users use the platform as sellers. Moreover, majority of the users have a profile picture and a positive score above 1 for their follower ratio. This indicates they are engaging through having many followers compared to followings and have updated their profile with profile picture. There is a trend that the most successful sellers all seem to not purchase, focus on listing and selling, and utilizing the platform through engaging with other users to gain followers to promote their products.

**Users can optimize their market reach through aquiring more followers for sellers and engaging in more following as buyers. Furthermore, having a completed profile with a profile picture is more effective for having a higher sold and bought rate. Thus, social media presence is crucial to optimizing user engagement and users should expand their interactions with other users to reach a larger customer base and access better suited products.**

### **3.6) Does preference for how customers are using C2C impact how many products are being bought vs. wishlisted?**

In [31]:
%%bigquery
SELECT
   country,
   COUNT(user_id) AS total_users,
   CONCAT(ROUND(SUM(CASE
               WHEN hasAnyApp = TRUE THEN 1
               ELSE 0
           END) / COUNT(user_id) * 100, 1), '%') AS app_user_percent,
   CONCAT(ROUND(SUM(CASE
               WHEN hasAnyApp = FALSE THEN 1
               ELSE 0
           END) / COUNT(user_id) * 100, 1), '%') AS website_user_percent,
   ROUND(SUM(productsBought) / NULLIF(SUM(productsWished), 0), 2) AS wish_to_buy_ratio --number of products bought for each product wished--
FROM `fall24-ba775-a06.master_dataset.clean_2020`
GROUP BY country
ORDER BY total_users DESC
LIMIT 10;


Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,country,total_users,app_user_percent,website_user_percent,wish_to_buy_ratio
0,France,25135,25%,75%,0.16
1,United States,20602,12.1%,87.9%,0.13
2,United Kingdom,11310,23.5%,76.5%,0.12
3,Italy,8015,38.5%,61.5%,0.07
4,Germany,6567,33.1%,66.9%,0.11
5,Spain,5706,31.4%,68.6%,0.1
6,Australia,2719,35.6%,64.4%,0.1
7,Denmark,1892,41.9%,58.1%,0.09
8,Sweden,1826,45.5%,54.5%,0.08
9,Belgium,1666,42%,58%,0.18


The results indicate that countries with lower app usage correlate with a higher ratio of actual purchases per wishlisted item for the year 2020. For instance, France and the United States have lower percentages of app users and consequently higher website user counts, but also stronger ratios than other countries. Similarly, Sweden and Denmark have fairly high percentages of the population using the app, but this translates to fewer purchases. **To increase the number of purchases per wishlisted item, we recommend that C2C platforms focus their efforts on their websites, whether by improving the user interface or increasing advertising to drive more traffic to it.** Perhaps consumers could be incentivized to use the apps for purchases through discounts or exclusive sales if there is a desire to diversify customer access to the platform.

In [32]:
%%bigquery
SELECT
   country,
    COUNT(user_id) AS total_users,
    ROUND(SUM(productsBought) / NULLIF(SUM(productsWished), 0), 2) AS wish_to_buy_ratio_2020,
   (SELECT ROUND(SUM(productsBought) / NULLIF(SUM(productsWished), 0), 2)
    FROM `fall24-ba775-a06.master_dataset.clean_2024` AS clean_2024
    WHERE clean_2024.country = clean_2020.country) AS wish_to_buy_ratio_2024,
FROM `fall24-ba775-a06.master_dataset.clean_2020` AS clean_2020
GROUP BY country
ORDER BY total_users DESC
LIMIT 10;

Query is running:   0%|          |

Downloading:   0%|          |

Unnamed: 0,country,total_users,wish_to_buy_ratio_2020,wish_to_buy_ratio_2024
0,France,25135,0.16,0.19
1,United States,20602,0.13,0.16
2,United Kingdom,11310,0.12,0.2
3,Italy,8015,0.07,0.13
4,Germany,6567,0.11,0.13
5,Spain,5706,0.1,0.16
6,Australia,2719,0.1,0.13
7,Denmark,1892,0.09,0.15
8,Sweden,1826,0.08,0.17
9,Belgium,1666,0.18,0.19


From 2020 to 2024, there has been growth in the ratio of products purchased for each item wishlisted. This could be attributed to the COVID-19 pandemic, which led to an increased reliance on online purchasing platforms. **While the team's recommendation still prioritizes driving consumers to the website, the rise in online shopping makes it worthwhile to explore additional strategies to expand into new customer markets and drive revenue through both the apps and the website.**

In [1]:
import base64
from IPython.display import HTML

# Read and encode the image as Base64
with open("Dashboard #1 2020.png", "rb") as img_file:
    b64_image = base64.b64encode(img_file.read()).decode("utf-8")

# Embed the image in the notebook
HTML(f'<img src="data:image/jpeg;base64,{b64_image}" width="500"/>')

In [2]:
import base64
from IPython.display import HTML

# Read and encode the image as Base64
with open("Dashboard #1 2024.png", "rb") as img_file:
    b64_image = base64.b64encode(img_file.read()).decode("utf-8")

# Embed the image in the notebook
HTML(f'<img src="data:image/jpeg;base64,{b64_image}" width="500"/>')

In [3]:
import base64
from IPython.display import HTML

# Read and encode the image as Base64
with open("Dashboard #2 2020.png", "rb") as img_file:
    b64_image = base64.b64encode(img_file.read()).decode("utf-8")

# Embed the image in the notebook
HTML(f'<img src="data:image/jpeg;base64,{b64_image}" width="500"/>')

In [4]:
import base64
from IPython.display import HTML

# Read and encode the image as Base64
with open("Dashboard #2 2024.png", "rb") as img_file:
    b64_image = base64.b64encode(img_file.read()).decode("utf-8")

# Embed the image in the notebook
HTML(f'<img src="data:image/jpeg;base64,{b64_image}" width="500"/>')

## **Critical Observations from EDA phase:**

The EDA phase reveals significant gender, geographic, and behavioral trends with regard to C2C platforms. Female users dominate the platform, though their representation has declined from 76.52% in 2020 to 60.81% in 2024. Men, while a minority, demonstrate higher wishlist-to-purchase conversion rates, highlighting an opportunity to better engage the platform’s primary female audience and expand offerings for male users to sustain growth. Moreover, the majority of users identified as "Mrs.," suggesting a skew toward married or older women. This imbalance suggests untapped potential in younger or unmarried female demographics through targeted marketing and curated product selections.

Geographically, platform engagement mirrors proximity to its French origins, with France leading user and activity counts, followed by the U.S. and U.K. Retention and active user rates reveal key differences, with smaller European markets such as Denmark and Sweden outperforming the aforementioned larger countries retention. Germany, with high retention and moderate user base, presents an ideal market for expansion, while underperforming regions like Spain and Australia require targeted re-engagement campaigns. Behavioral insights show that users primarily access the platform via websites, with app engagement varying regionally. Enhancing mobile features and incentivizing app use, alongside optimizing web interfaces, can improve overall accessibility and transaction rates.

Our key recommendations include bolstering female user engagement and conversion rates, expanding marketing to younger women and male audiences, and focusing on high-potential markets with proven retention strategies. We also beleive user retention can be improved through churn risk mitigation, incentives for sellers, and region-specific campaigns that drive deeper engagement. As online activity grows post-COVID-19, C2C platforms must continue leveraging this momentum to strengthen both website and app functionalities, ensuring a seamless shopping experience that attracts and retains diverse user groups.

# **5. Conclusion**

This analysis of a French consumer-to-consumer platform includes user trends, demographics, consumer behaviors, and key opportunities from 2020 to 2024 . The platform has a strong female customer base, but the growing male user presence emphasizes the untapped potential. Country-level insights reveal France, the United States, and the UK as major markets with the highest activity, while smaller markets like Denmark, Sweden, and the Netherlands boast strong retention rates. The platform’s buyer and seller dynamics provide further insights. Buyers have lower follower ratios, indicating a preference for exploring others’ listings, while sellers with higher follower ratios and completed profiles focus on listing and selling.The analysis also revealed that countries with lower mobile app usage, like Belgium and Denmark, benefit from website engagement.

### **Recommendations:** 

**Leverage social media engagement:** 
The team encourages the sellers to enhance their profiles with photos, grow their follower base, and utilise social media to boost their sales conversion rates, as vendor engagement rose by 348% between 2020 and 2024. Buyers can be incentivized with personalized offers, discounts, or promotions to increase purchasing activity. The increased social media activity since 2020 potentially due to COVID-19, could benefit by encouraging users to engage actively through follows and profile updates.

**Improve regional retention:**
The platform should focus on localized re-engagement campaigns for large markets with lower retention rates, such as France and the US. Replicate effective strategies from smaller, high-retention markets like Denmark and Sweden.

**Expanding product offerings:** 
Developing products for male users can attract more engagement. Target preferences of unmarried women, a notable minority among female users, by targeted marketing. 

**Enhance Mobile App and Website Engagement:**
Optimize mobile app experiences for iOS and Android in key markets like the US and France. Strengthen website engagement for countries with lower mobile app usage, such as Belgium and Denmark.

**Increase customer access channels:** 
The platform should focus on diversifying customer access channels and improving wishlist-to-purchase conversions. Optimizing the website experience while incentivizing app usage through exclusive sales or discounts can serve different user preferences.

**Expand into untapped markets:** 
Asia and North America offers growth potential, while targeted advertising and improved retention strategies can address high churn risk in top markets like France, the US, and Italy. 

### **Next Steps:** 

**Product and Platform Optimization:** In order to enhance inventory management, analyze “products listed” vs. “products sold” to identify trends in popular and underperforming items. Staying on top of these trends will allow vendors to create and store the products that are landing in customers' hands the fastest.

**Pass Rate Optimization:** Outside of the data that is provided, investigate external factors affecting “product pass rate”. Looking into macro trends across the fashion industry could give enhanced insights as to why products are not being purchased. The price or seasonality of certain items may be playing a major role, but this would have to be investigated further.  

# **6. Challenges**

Our team faced several challenges throughout the project. The first issue was working with BigQuery in Jupyter Notebooks, as well as general SQL knowledge. While a couple of members in the group had experience using SQL, the team still faced a significant learning curve in achieving peak efficiency. At times, it felt as though we were starting the project without a thorough understanding of the intricate methods and syntax of SQL that could have simplified our queries. However, as the course progressed, we identified areas for improvement and adjusted accordingly. Another technical challenge was working with Tableau. Most of the group were novices with this platform, which made creating presentable dashboards a time-consuming process. Additionally, we discovered during class that Tableau lacks a built-in sharing component, which slowed down both the production and collaboration of our visualizations.

Beyond technology-related issues, we also encountered challenges with the data itself. Since we were working with a French e-commerce C2C dataset, several columns were in French, requiring translation. This is where we utilized joins in our project. However, aside from these joins, we struggled to find many relatable tables to merge with our dataset. While we successfully incorporated country codes, there weren’t many additional data sources available for integration. This limitation reduced our ability to perform deeper analysis and derive more impactful business insights. Nevertheless, we were still able to tell a compelling story with the resources we chose. The final data challenge we faced was related to the dataset's recency. Since the year is not yet complete, not all customer information was up to date. To address this, we populated these rows with the most recent data available in the database to fill the null values.

# **7. References**

Our dataset for the French e-commerce was obtained from [Data World](https://data.world/jfreex/e-commerce-users-of-a-french-c2c-fashion-store/workspace). Additionally, we gathered the country codes from https://datahub.io/core/country-list. Lastly, we have utilized ChatGPT which is listed here: https://chatgpt.com

# **8. Generative AI Disclosure**

For completing this portion of the Team Project, we utilized Generative AI functionalities in a handful of ways. Please see the description below for a detailed outline of how our group used the technology to facilitate the production of this project:

**Content Generation**: We used ChatGPT to brainstorm ideas and structure the initial outline of the project. It was also used to generate ideas for particular business problems and the different kinds of queries that would be interesting during the EDA phase.

**Research Assistance**: ChatGPT was used to quickly summarize our dataset so that we could understand what we were working with. Prior to the data cleaning, some of the columns had less than obvious names or industry specific titles that not everyone understood. Both ChatGPT and Google were helpful in understanding these.

**Code Review and Debugging**: Both ChatGPT and Gemini were utilized in optimizing the performance of our code. These platforms were also used to efficiently deal with errors to keep the project moving forward. Since many of us are new to SQL, we have not seen all of the different error messages, so it was helpful in quickly figuring out where we went wrong.

**Proofreading and Grammar Checks**: The built-in functionality of Word and Google Docs was used to make sure that spelling and grammar were accurate throughout the document. ChatGPT's ability to edit documents for grammar and spelling was also utilized.

Our team has reviewed, edited, and validated all AI-generated content to ensure its accuracy, relevance, and originality in accordance with academic integrity guidelines.