In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

: 

# Customer Shopping Behavior Analysis

## Project Objective
The goal of this project is to analyze customer shopping behavior to uncover spending patterns, category preferences, and the impact of discounts on purchase amounts. The insights from this analysis can help businesses improve pricing strategies and customer targeting.


In [None]:
import os

for dirname, _, filenames in os.walk("/kaggle/input"):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
## Import all the required packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid")

In [None]:
## Load the dataset
df = pd.read_csv("/kaggle/input/customer-shopping-trends-dataset/shopping_trends_updated.csv")
df.head()

In [None]:
## Shape of the data 
df.shape

## Column names and data types
df.info()

## Statistical Summary
df.describe()

## Dataset Overview

The dataset contains customer-level shopping data, including demographic information, product categories, purchase amounts, and discount usage. Each row represents an individual customer transaction.


# Data Cleaning + Column Renaming

## Data Cleaning
In this section, we clean the dataset by checking for missing values, ensuring correct data types, and standardizing column names for easier analysis.


In [None]:
## Check for number of missing values in each column
df.isnull().sum()

In [None]:
## Clean up column names -- all lower case, with underscores for spaces and no special characters
df.columns = (
    df.columns
        .str.lower()
        .str.replace(" ", "_")
        .str.replace("(", "")
        .str.replace(")", "")
)

df.columns

In [None]:
## Check for any inconsistencies in the data types of columns
df.dtypes

### Cleaning Summary
- The dataset contains minimal missing values.
- Column names were standardized for consistency.
- Data types were verified and found to be appropriate for analysis.


# Exploratory Data Analysis

## Exploratory Data Analysis (EDA)
In this section, we explore customer spending patterns, category performance, and the impact of demographic factors and discounts on purchase behavior.


## Q1. How does spending vary by gender?

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

sns.barplot(data=df, x="gender", y="purchase_amount_usd")
plt.title("Average Purchase Amount by Gender")
plt.xlabel("Gender")
plt.ylabel("Average Purchase Amount (USD)")
plt.show()


## Q2. Which product categories generate most revenue?

In [None]:
category_revenue = (
    df.groupby("category")["purchase_amount_usd"]
        .sum()
        .sort_values(ascending=False)
)

category_revenue.plot(kind="bar", figsize=(10,5))
plt.title("Total Product Revenue by Category")
plt.xlabel("Category")
plt.ylabel("Total Revenue Generated (USD)")

**Insight:**  
A small number of product categories contribute disproportionately to total revenue, indicating key focus areas for inventory and promotions.



## Q3. Do discounts increase the purchase amount?

In [None]:
sns.boxplot(data=df, x="discount_applied", y="purchase_amount_usd")
plt.title("Purchase Amount With vs Without Discount")
plt.xlabel("Discount Applied")
plt.ylabel("Purchase Amount (USD)")
plt.show()

**Insight:**  
While discounts can encourage purchases, higher spending is not guaranteed. This suggests the need for targeted discount strategies rather than blanket promotions.


## Q4. Is there a relation between Age vs spending behavior?

In [None]:
sns.scatterplot(data=df, x="age", y="purchase_amount_usd", alpha=0.6)
plt.title("Age vs Purchase Amount")
plt.xlabel("Age")
plt.ylabel("Purchase Amount (USD)")
plt.show()

**Insight:**  
The scatter plot shows no strong correlation between customer age and purchase amount. Customers across different age groups exhibit similar spending behavior, suggesting that age alone may not be a significant driver of purchase value.


# Key Insights & Business Takeaways



### Key Insights
- Customer spending is relatively consistent across different age groups, indicating that age alone is not a strong predictor of purchase amount.
- A small number of product categories contribute disproportionately to total revenue.
- Discounts influence purchasing behavior, but higher discounts do not always result in higher purchase amounts.
- Spending patterns vary across demographic segments, suggesting opportunities for targeted marketing.

### Business Takeaways
- Businesses should prioritize high-performing product categories to maximize revenue impact.
- Discount strategies should be targeted and data-driven rather than applied uniformly across all customers.
- Customer segmentation based on purchasing behavior and category preferences may be more effective than age-based targeting alone.


In [None]:
# Export aggregated data for frontend use
gender_spending = df.groupby("gender")["purchase_amount_usd"].mean()
category_revenue = df.groupby("category")["purchase_amount_usd"].sum()

gender_spending.to_csv("gender_spending.csv")
category_revenue.to_csv("category_revenue.csv")


: 