## Customer Purchase Prediction & Effect of Micro-Numerosity

This project leverages data analytics and predictive modeling to forecast and enhance purchase predictions by incorporating micronumerous data elements. 

# Features

Purchase Prediction: Utilizes machine learning techniques to forecast consumer behavior related to purchases.
Micronumerosity Analysis: Incorporates fine-grained data elements for more precise predictions.
Customizable Models: Easily configure and adapt predictive models based on specific business requirements.
Model Used: In this project I used Random Forest Classifier model we can also use other classification models to predict.

import libraries

In [1]:
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

Reading csv file

In [2]:
purchase = pd.read_csv('https://github.com/YBIFoundation/Dataset/raw/main/Customer%20Purchase.csv')
purchase

Unnamed: 0,Customer ID,Age,Gender,Education,Review,Purchased
0,1021,30,Female,School,Average,No
1,1022,68,Female,UG,Poor,No
2,1023,70,Female,PG,Good,No
3,1024,72,Female,PG,Good,No
4,1025,16,Female,UG,Average,No
5,1026,31,Female,School,Average,Yes
6,1027,18,Male,School,Good,No
7,1028,60,Female,School,Poor,Yes
8,1029,65,Female,UG,Average,No
9,1030,74,Male,UG,Good,Yes


# Data Analysis

In [3]:
purchase.head()

Unnamed: 0,Customer ID,Age,Gender,Education,Review,Purchased
0,1021,30,Female,School,Average,No
1,1022,68,Female,UG,Poor,No
2,1023,70,Female,PG,Good,No
3,1024,72,Female,PG,Good,No
4,1025,16,Female,UG,Average,No


In [4]:
purchase.shape

(50, 6)

In [7]:
purchase.columns

Index(['Customer ID', 'Age', 'Gender', 'Education', 'Review', 'Purchased'], dtype='object')

In [24]:
purchase.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Customer ID  50 non-null     int64 
 1   Age          50 non-null     int64 
 2   Gender       50 non-null     object
 3   Education    50 non-null     object
 4   Review       50 non-null     object
 5   Purchased    50 non-null     object
dtypes: int64(2), object(4)
memory usage: 2.5+ KB


In [5]:
purchase.describe

<bound method NDFrame.describe of     Customer ID  Age  Gender Education   Review Purchased
0          1021   30  Female    School  Average        No
1          1022   68  Female        UG     Poor        No
2          1023   70  Female        PG     Good        No
3          1024   72  Female        PG     Good        No
4          1025   16  Female        UG  Average        No
5          1026   31  Female    School  Average       Yes
6          1027   18    Male    School     Good        No
7          1028   60  Female    School     Poor       Yes
8          1029   65  Female        UG  Average        No
9          1030   74    Male        UG     Good       Yes
10         1031   98  Female        UG     Good       Yes
11         1032   74    Male        UG     Good       Yes
12         1033   51    Male    School     Poor        No
13         1034   57  Female    School  Average        No
14         1035   15    Male        PG     Poor       Yes
15         1036   75    Male        UG

In [8]:
purchase.dtypes

Customer ID     int64
Age             int64
Gender         object
Education      object
Review         object
Purchased      object
dtype: object

In [9]:
purchase.nunique()

Customer ID    50
Age            41
Gender          2
Education       3
Review          3
Purchased       2
dtype: int64

In [10]:
purchase.value_counts()

Customer ID  Age  Gender  Education  Review   Purchased
1021         30   Female  School     Average  No           1
1058         94   Male    PG         Average  Yes          1
1048         69   Female  PG         Poor     No           1
1049         48   Male    School     Poor     No           1
1050         83   Female  UG         Average  Yes          1
1051         73   Male    UG         Average  No           1
1052         22   Female  School     Poor     Yes          1
1053         92   Male    UG         Average  Yes          1
1054         89   Female  PG         Good     Yes          1
1055         86   Male    School     Average  No           1
1056         74   Male    School     Poor     Yes          1
1057         34   Female  UG         Good     Yes          1
1059         45   Female  School     Good     No           1
1022         68   Female  UG         Poor     No           1
1060         76   Male    PG         Poor     No           1
1061         39   Male    Sch

In [12]:
purchase.duplicated().any()

False

In [13]:
# missing values in each column
missing_data = purchase.isnull().sum()
print(missing_data)

Customer ID    0
Age            0
Gender         0
Education      0
Review         0
Purchased      0
dtype: int64


Encoding

In [14]:
y = purchase['Purchased']
x = purchase.drop(['Purchased','Customer ID'],axis=1)
x.replace({'Review':{'Poor':0,'Average':1,'Good':2}},inplace=True)
x.replace({'Education':{'School':0,'UG':1,'PG':2}},inplace=True)
x.replace({'Gender':{'Male': 0,'Female':1}},inplace=True)
x.head()

Unnamed: 0,Age,Gender,Education,Review
0,30,1,0,1
1,68,1,1,0
2,70,1,2,2
3,72,1,2,2
4,16,1,1,1


Train test and split

In [15]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y, train_size=0.8)
x_train.shape, x_test.shape, y_train.shape, y_test.shape

((40, 4), (10, 4), (40,), (10,))

A random forest is a meta estimator that fits a number of decision tree regressors on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. 

In [16]:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(x_train,y_train)

In [30]:
y_pred = model.predict(x_test)
y_pred

array(['Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'No', 'Yes', 'No'],
      dtype=object)

In [None]:
# A confusion matrix is a matrix that summarizes the performance of a machine learning model on a set of test data. 

In [31]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
confusion_matrix(y_test,y_pred)

array([[4, 2],
       [0, 4]], dtype=int64)

In [32]:
accuracy_score(y_test,y_pred)

0.8

In [33]:
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

          No       1.00      0.67      0.80         6
         Yes       0.67      1.00      0.80         4

    accuracy                           0.80        10
   macro avg       0.83      0.83      0.80        10
weighted avg       0.87      0.80      0.80        10

