# Exploratory Data Analysis (EDA)

## Project: E-commerce Shopper Behavior Analysis

This notebook explores customer behavior data to identify patterns, trends, and potential business insights.  
The goal is to support data-driven decision-making in e-commerce environments.

---

# Análise Exploratória de Dados (EDA)

## Projeto: Análise do Comportamento de Compradores em E-commerce

Este notebook explora dados de comportamento de clientes para identificar padrões, tendências e insights de negócio.  
O objetivo é apoiar a tomada de decisão baseada em dados em ambientes de e-commerce.




In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

pd.set_option("display.max_columns", None)


In [2]:
file_path = "c:\\Users\\Jenifer\\Downloads\\e_commerce_shopper_behaviour_and_lifestyle.csv"
df = pd.read_csv(file_path)

df.shape


(1000000, 60)

In [3]:
df.head()

Unnamed: 0,user_id,age,gender,country,urban_rural,income_level,employment_status,education_level,relationship_status,has_children,household_size,occupation,ethnicity,language_preference,device_type,weekly_purchases,monthly_spend,cart_abandonment_rate,review_writing_frequency,average_order_value,preferred_payment_method,coupon_usage_frequency,loyalty_program_member,referral_count,product_category_preference,shopping_time_of_day,weekend_shopper,impulse_purchases_per_month,browse_to_buy_ratio,return_frequency,budgeting_style,brand_loyalty_score,impulse_buying_score,environmental_consciousness,health_conscious_shopping,travel_frequency,hobby_count,social_media_influence_score,reading_habits,exercise_frequency,stress_from_financial_decisions,overall_stress_level,sleep_quality,physical_activity_level,mental_health_score,daily_session_time_minutes,product_views_per_day,ad_views_per_day,ad_clicks_per_day,wishlist_items_count,cart_items_average,checkout_abandonments_per_month,purchase_conversion_rate,app_usage_frequency,notification_response_rate,account_age_months,last_purchase_date,social_sharing_frequency,premium_subscription,return_rate
0,1,56,Female,Germany,Suburban,90860,Self-employed,Associate Degree,Single,0,5,Healthcare,Other,English,Mobile,4,2405,0,3,445,PayPal,3,1,8,Groceries,Morning,1,2,12,1,Strict,8,1,4,1,8,0,2,22,4,0,0,5,2,7,100,38,14,1,5,10,2,62,7,74,19,2025-06-22,6,1,50
1,2,69,Male,Japan,Suburban,35423,Unemployed,Bachelor,Single,1,2,Finance,Other,Mandarin,Mobile,13,3651,28,6,179,Google Pay,0,1,3,Groceries,Afternoon,1,3,93,12,Loose,4,3,7,1,4,1,5,14,4,4,5,9,8,5,28,19,9,2,17,5,7,54,5,23,8,2026-07-25,3,0,37
2,3,46,Female,India,Urban,21467,Self-employed,Associate Degree,Married,1,6,Healthcare,Other,Hindi,Mobile,10,2045,14,1,26,PayPal,0,0,10,Beauty,Afternoon,0,2,68,5,Moderate,2,2,5,1,6,2,0,14,4,0,0,8,10,10,61,38,1,4,18,3,3,33,7,12,13,2026-02-26,6,0,53
3,4,32,Male,Canada,Urban,41770,Self-employed,Bachelor,Widowed,0,4,Engineering,Hispanic,Hindi,Desktop,16,1611,11,2,403,Credit Card,2,1,0,Groceries,Afternoon,1,3,74,2,Moderate,3,2,0,1,12,1,1,15,0,2,0,9,2,1,78,35,9,2,8,5,9,26,4,19,9,2026-10-27,7,0,98
4,5,60,Female,Japan,Urban,183882,Employed,Associate Degree,Widowed,1,9,Other,Asian,English,Desktop,17,3476,57,3,68,Apple Pay,0,1,0,Groceries,Evening,0,6,34,12,Loose,7,8,9,0,6,5,10,20,5,8,10,5,3,2,27,50,16,0,10,8,0,18,7,30,3,2026-06-23,3,0,86


In [4]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 60 columns):
 #   Column                           Non-Null Count    Dtype 
---  ------                           --------------    ----- 
 0   user_id                          1000000 non-null  int64 
 1   age                              1000000 non-null  int64 
 2   gender                           1000000 non-null  object
 3   country                          1000000 non-null  object
 4   urban_rural                      1000000 non-null  object
 5   income_level                     1000000 non-null  int64 
 6   employment_status                1000000 non-null  object
 7   education_level                  1000000 non-null  object
 8   relationship_status              1000000 non-null  object
 9   has_children                     1000000 non-null  int64 
 10  household_size                   1000000 non-null  int64 
 11  occupation                       1000000 non-null  object
 12  e

In [5]:
df.describe()

Unnamed: 0,user_id,age,income_level,has_children,household_size,weekly_purchases,monthly_spend,cart_abandonment_rate,review_writing_frequency,average_order_value,coupon_usage_frequency,loyalty_program_member,referral_count,weekend_shopper,impulse_purchases_per_month,browse_to_buy_ratio,return_frequency,brand_loyalty_score,impulse_buying_score,environmental_consciousness,health_conscious_shopping,travel_frequency,hobby_count,social_media_influence_score,reading_habits,exercise_frequency,stress_from_financial_decisions,overall_stress_level,sleep_quality,physical_activity_level,mental_health_score,daily_session_time_minutes,product_views_per_day,ad_views_per_day,ad_clicks_per_day,wishlist_items_count,cart_items_average,checkout_abandonments_per_month,purchase_conversion_rate,app_usage_frequency,notification_response_rate,account_age_months,social_sharing_frequency,premium_subscription,return_rate
count,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0,1000000.0
mean,500000.5,49.003377,104994.565463,0.399426,5.505323,9.993011,2498.775654,40.212257,3.450584,255.031632,2.002277,0.499818,5.004928,0.500035,3.29001,55.000136,6.000272,4.999277,4.998423,5.005931,0.500046,5.993844,2.499994,5.004101,11.995084,3.501375,4.998577,4.999507,6.498723,4.993603,5.002855,60.014414,25.018014,10.001036,2.498174,9.99766,5.494336,4.997523,50.001442,3.49884,49.989012,12.50956,3.593763,0.359415,50.004949
std,288675.278933,18.193959,54851.476652,0.489781,2.873725,6.055124,1444.208674,25.433343,1.856725,141.708466,1.41541,0.5,3.162728,0.5,1.819482,26.268341,3.7411,3.161723,3.216289,3.164084,0.5,3.740345,1.708171,3.159897,7.209871,2.291257,3.16522,3.215078,1.706388,3.162574,3.160619,34.939233,14.72181,6.056522,1.708079,6.05453,2.870467,3.162831,29.163437,2.291567,29.151948,6.922197,1.932566,0.479829,29.159616
min,1.0,18.0,10000.0,0.0,1.0,0.0,0.0,0.0,0.0,10.0,0.0,0.0,0.0,0.0,0.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
25%,250000.75,33.0,57466.0,0.0,3.0,5.0,1249.0,18.0,2.0,132.0,1.0,0.0,2.0,0.0,2.0,32.0,3.0,2.0,2.0,2.0,0.0,3.0,1.0,2.0,6.0,2.0,2.0,2.0,5.0,2.0,2.0,30.0,12.0,5.0,1.0,5.0,3.0,2.0,25.0,1.0,25.0,7.0,2.0,0.0,25.0
50%,500000.5,49.0,105013.0,0.0,6.0,10.0,2498.0,40.0,3.0,255.0,2.0,0.0,5.0,1.0,3.0,55.0,6.0,5.0,5.0,5.0,1.0,6.0,2.0,5.0,12.0,4.0,5.0,5.0,6.0,5.0,5.0,60.0,25.0,10.0,2.0,10.0,5.0,5.0,50.0,3.0,50.0,13.0,4.0,0.0,50.0
75%,750000.25,65.0,152497.0,1.0,8.0,15.0,3750.0,62.0,5.0,378.0,3.0,1.0,8.0,1.0,5.0,78.0,9.0,8.0,8.0,8.0,1.0,9.0,4.0,8.0,18.0,6.0,8.0,8.0,8.0,8.0,8.0,90.0,38.0,15.0,4.0,15.0,8.0,8.0,75.0,6.0,75.0,19.0,5.0,1.0,75.0
max,1000000.0,80.0,200000.0,1.0,10.0,20.0,5000.0,90.0,8.0,500.0,4.0,1.0,10.0,1.0,7.0,100.0,12.0,10.0,10.0,10.0,1.0,12.0,5.0,10.0,24.0,7.0,10.0,10.0,9.0,10.0,10.0,120.0,50.0,20.0,5.0,20.0,10.0,10.0,100.0,7.0,100.0,24.0,8.0,1.0,100.0


## Business Problem

E-commerce companies need to understand customer behavior to increase conversion rates, reduce cart abandonment, and improve customer retention.

This analysis aims to identify key behavioral, demographic, and transactional factors that influence purchasing decisions.


## Problema de Negócio

Empresas de e-commerce precisam compreender o comportamento dos clientes para aumentar taxas de conversão, reduzir abandono de carrinho e melhorar a retenção.

Esta análise busca identificar fatores comportamentais, demográficos e transacionais que influenciam as decisões de compra.
