# Consumer age behaviour- Data Visualisation

This notebook will help visulaise the encoded shopping_trends dataset.


In [21]:

import os
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import matplotlib.pyplot as plt
from matplotlib import cm
import warnings
warnings.filterwarnings('ignore')

Will load the clean dataset and this will be the main

The ETL process has already:
1. Cleaned the raw shopping data
2. Identified elderly customers (Age ≥ 65)
3. Calculated gender distribution
4. Saved pre-processed files in `/dataset/clean/`

In [22]:
elderly_summary = pd.read_csv('../dataset/clean/elderly_gender_summary.csv')
print("✅ Loaded elderly gender summary from ETL")
print(elderly_summary)

✅ Loaded elderly gender summary from ETL
   gender  count
0    Male    542
1  Female    246


**Files loaded:**
- `elderly_gender_summary.csv`: Gender counts (542 men, 246 women)
- `shopping_trends_clean.csv`: Full cleaned dataset
- `elderly_customers.csv`: Pre-filtered elderly customers
- `area_data.csv`: Summary data for area plots

In [23]:
fig_bar = px.bar(elderly_summary, 
                 x='gender', 
                 y='count',
                 title='Elderly Customers by Gender - From ETL Data',
                 color='gender',
                 color_discrete_map={'Male': 'blue', 'Female': 'pink'},
                 text='count')

fig_bar.update_layout(
    xaxis_title='Gender',
    yaxis_title='Number of Customers',
    template='plotly_white'
)

fig_bar.show()

print(" Creating Pie Chart...")
fig_pie = px.pie(elderly_summary,
                 values='count',
                 names='gender',
                 title='Gender Distribution: Elderly Customers (542 men, 246 women)',
                 hole=0.3,
                 color='gender',
                 color_discrete_map={'Male': 'blue', 'Female': 'pink'})

fig_pie.update_traces(textposition='inside', textinfo='percent+label')
fig_pie.show()


area_data = pd.read_csv('../dataset/clean/area_data.csv')
fig_area = px.area(area_data,
                   x='Category',
                   y='Count',
                   title='Customer Distribution - Area View',
                   markers=True)

fig_area.show()

print("\n✅ All visualizations created from ETL cleaned data!")

 Creating Pie Chart...



✅ All visualizations created from ETL cleaned data!


have created an animated chart to help visualisation become easier.

In [None]:

counts = elderly_summary.set_index('gender')['count'].to_dict()
men_count = int(counts.get('Male', counts.get('male', 0)))
women_count = int(counts.get('Female', counts.get('female', 0)))

animation_data = pd.DataFrame({
    'Gender': ['Men', 'Women', 'Difference'],
    'Count': [men_count, women_count, men_count - women_count],
    'Stage': ['Base', 'Base', 'Highlight']
})

fig_animated = px.bar(
    animation_data,
    x='Gender',
    y='Count',
    color='Stage',
    color_discrete_map={'Base': 'lightgray', 'Highlight': 'red'},
    title='<b>Gender Difference Visualization</b><br>Highlighting the Gap',
    text='Count',
    template='plotly_dark'
)

fig_animated.update_layout(
    title_font=dict(size=20),
    xaxis_title="",
    yaxis_title="Number of Customers",
    showlegend=False
)

fig_animated.show()


**Key Finding from ETL:**
- Total elderly customers: 788
- Elderly men: 542 (68.8%)
- Elderly women: 246 (31.2%)
- **Hypothesis SUPPORTED**: More elderly men shop than women