Skip to content

We have a dataset listing out customer purchase details. Identify customer purchase pattern through exploratory data analysis. Also predict purchase amount of customer against various products

Notifications You must be signed in to change notification settings

nnvij/RetailSales--Prediction

Repository files navigation

RetailSales--Prediction

  • The goal of a retail purchase prediction is to accurately forecast the demand for products and services to better manage inventory, anticipate customer needs, and maximize profits.
  • By leveraging data-driven models and predictive analytics, retailers can accurately forecast future sales and make more informed business decisions.

Problem Statement:

  • Exploratory data analysis to understand customer buying pattern
  • Build a regression model to predict purchase amount of customer against various products

Tools used:

  • Python, Pandas (data processing), Plotlyexpress,Sklearn

Data:

The dataset has 550,069 rows and 12 columns

Attributes:

Column ID Column Name Data type Description Masked
0 User_ID int64 Unique Id of customer False
1 Product_ID object Unique Id of product False
2 Gender object Sex of customer False
3 Age object Age of customer False
4 Occupation int64 Occupation code of customer True
5 City_Category object City of customer True
6 Stay_In_Current_City_Years object Number of years of stay in city False
7 Marital_Status int64 Marital status of customer False
8 Product_Category_1 int64 Category of product True
9 Product_Category_2 float64 Category of product True
10 Product_Category_3 float64 Category of product True
11 Purchase int64 Purchase amount False

Exploratory Data Analysis:

  • There are 5891 users in the dataset and 3631 uique products
  • 31% of Product_Category_2 and 69% of Product_Category_3 has missing values.
  • Average amount spent by female customers is 8k and male is 9k
  • Age group 26-35 has the highest total purchase across age groups
  • Product category 0 has highest revenue of purchase and Product category 4 has highest number of purchases
  • Unmarried customers in the age group of 26-35 have highest total purchase amount as compared to other customers
  • Maximum purchases where customer have stayed in the city only for 1 year

image image image image image image image image image image image image newplot - 2023-02-10T012343 639 image

Data Preprocessing:

  • Product category 2 and 3 has missing values, we will use SimpleImputer to fill missing values with median values.
  • Handle categorical columns Gender, Age, City_Category,Stay_In_Current_City_Years
  • Drop UserID,Product_ID columns

Modelling

  • Features(X): Gender, Age, Occupation,City_category, Stay_In_Current_City_Years,Marital_Status,Product_Category_1,Product_Category_2,Product_Category_3

  • Label: Purchase

  • Train test split 75% training and 25% test set

  • Evaluate using RMSE and RMSLE metric baseline model using LinearRegression,DecisionTreeRegressor and RandomForestRegressor. RandomForestRegressor() had the lowest RMSE and RMSLE score

  • Apply GridSearchCV to find the best parameter for RandomForestRegressor image image image

  • Product category_1 seem to have highest effect on purchase

  • Surprisingly gender has the least effect on purchase

About

We have a dataset listing out customer purchase details. Identify customer purchase pattern through exploratory data analysis. Also predict purchase amount of customer against various products

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published