___

# Purchasing power parity : Online Retail Store

___

**Table of Contents**
1. Problem Statement
2. Project Objective
3. Data Description
4. Data Pre-processing Steps and Inspiration
5. Choosing the Algorithm for the Project
6. Motivation and Reasons For Choosing the Algorithm
7. Assumptions
8. Model Evaluation and Techniques
9. Inferences from the Same
10. Future Possibilities of the Project
11. Conclusion
12. References

___

**Problem Statement :**           
An online retail store is trying to understand the various customer purchase patterns for their
firm, you are required to give enough evidence based insights to provide the same.

____________

**Project Objective :**                  
The objective of this project is to analyze the customer purchase patterns of an online retail store using the online_retail.csv dataset.                              
The analysis should provide insights into customer behavior and generate actionable insights for the store.

__________

**Data Description :**          
The online_retail.csv contains 387961 rows and 8 columns.

|Feature Name |Description                  |
|-------------|-----------------------------|
|Invoice      |Invoice number               |
|StockCode    |Product ID                   |
|Description  |Product Description          |
|Quantity     |Quantity of the product      |
|InvoiceDate  |Date of the invoice          |
|Price        |Price of the product per unit|
|CustomerID   |Customer ID                  |
|Country      |Region of Purchase           |

In [1]:
import pandas as pd
data = pd.read_csv('OnlineRetail.csv',encoding="latin1")

In [2]:
data.head(3)

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,12/1/2010 8:26,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,12/1/2010 8:26,2.75,17850.0,United Kingdom


In [3]:
data.shape

(541909, 8)

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 541909 entries, 0 to 541908
Data columns (total 8 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   InvoiceNo    541909 non-null  object 
 1   StockCode    541909 non-null  object 
 2   Description  540455 non-null  object 
 3   Quantity     541909 non-null  int64  
 4   InvoiceDate  541909 non-null  object 
 5   UnitPrice    541909 non-null  float64
 6   CustomerID   406829 non-null  float64
 7   Country      541909 non-null  object 
dtypes: float64(2), int64(1), object(5)
memory usage: 33.1+ MB


In [5]:
data.describe()

Unnamed: 0,Quantity,UnitPrice,CustomerID
count,541909.0,541909.0,406829.0
mean,9.55225,4.611114,15287.69057
std,218.081158,96.759853,1713.600303
min,-80995.0,-11062.06,12346.0
25%,1.0,1.25,13953.0
50%,3.0,2.08,15152.0
75%,10.0,4.13,16791.0
max,80995.0,38970.0,18287.0


________________________

**Data Pre-processing Steps and Inspiration :**                  
The online_retail.csv dataset contains several missing values and outliers.                                
Therefore, it is important to perform data pre-processing steps prior to any analysis.                                                                      
This includes data cleaning and data wrangling techniques such as formatting, imputing missing values, removing outliers, and handling missing values.                    
Moreover, it is important to inspire the data to uncover useful information.             

In [6]:
data.isnull().sum()

InvoiceNo           0
StockCode           0
Description      1454
Quantity            0
InvoiceDate         0
UnitPrice           0
CustomerID     135080
Country             0
dtype: int64

In [7]:
description_null = data[data['Description'].isnull()]

In [8]:
description_null.head(3)

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
622,536414,22139,,56,12/1/2010 11:52,0.0,,United Kingdom
1970,536545,21134,,1,12/1/2010 14:32,0.0,,United Kingdom
1971,536546,22145,,1,12/1/2010 14:33,0.0,,United Kingdom


In [9]:
description_null['CustomerID'].isnull().sum()

1454

In [10]:
data['CustomerID'].value_counts().head(10)

17841.0    7983
14911.0    5903
14096.0    5128
12748.0    4642
14606.0    2782
15311.0    2491
14646.0    2085
13089.0    1857
13263.0    1677
14298.0    1640
Name: CustomerID, dtype: int64

In [11]:
data[data['CustomerID']==17841]

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
1441,C536543,22632,HAND WARMER RED RETROSPOT,-1,12/1/2010 14:30,2.10,17841.0,United Kingdom
1442,C536543,22355,CHARLOTTE BAG SUKI DESIGN,-2,12/1/2010 14:30,0.85,17841.0,United Kingdom
2037,536557,21495,SKULLS AND CROSSBONES WRAP,25,12/1/2010 14:41,0.42,17841.0,United Kingdom
2038,536557,46000R,POLYESTER FILLER PAD 45x30cm,2,12/1/2010 14:41,1.45,17841.0,United Kingdom
2039,536557,46000S,POLYESTER FILLER PAD 40x40cm,1,12/1/2010 14:41,1.45,17841.0,United Kingdom
...,...,...,...,...,...,...,...,...
537749,581334,23399,HOME SWEET HOME HANGING HEART,3,12/8/2011 12:07,0.85,17841.0,United Kingdom
537750,581334,22893,MINI CAKE STAND T-LIGHT HOLDER,12,12/8/2011 12:07,0.42,17841.0,United Kingdom
537751,581334,22371,AIRLINE BAG VINTAGE TOKYO 78,1,12/8/2011 12:07,4.25,17841.0,United Kingdom
537752,581334,22309,TEA COSY RED STRIPE,1,12/8/2011 12:07,2.55,17841.0,United Kingdom


In [12]:
# Deleted CustomerID columns which not useful as you above because there not single product and there are multiple purchase so CustomerID will useless
data.pop('CustomerID')
data

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,12/1/2010 8:26,2.55,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,12/1/2010 8:26,3.39,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,12/1/2010 8:26,2.75,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,12/1/2010 8:26,3.39,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,12/1/2010 8:26,3.39,United Kingdom
...,...,...,...,...,...,...,...
541904,581587,22613,PACK OF 20 SPACEBOY NAPKINS,12,12/9/2011 12:50,0.85,France
541905,581587,22899,CHILDREN'S APRON DOLLY GIRL,6,12/9/2011 12:50,2.10,France
541906,581587,23254,CHILDRENS CUTLERY DOLLY GIRL,4,12/9/2011 12:50,4.15,France
541907,581587,23255,CHILDRENS CUTLERY CIRCUS PARADE,4,12/9/2011 12:50,4.15,France


In [13]:
# Cleaning data
data = data.dropna()

In [14]:
data.isnull().sum()

InvoiceNo      0
StockCode      0
Description    0
Quantity       0
InvoiceDate    0
UnitPrice      0
Country        0
dtype: int64

In [22]:
data['Description'].value_counts().head(10)

WHITE HANGING HEART T-LIGHT HOLDER    2369
REGENCY CAKESTAND 3 TIER              2200
JUMBO BAG RED RETROSPOT               2159
PARTY BUNTING                         1727
LUNCH BAG RED RETROSPOT               1638
ASSORTED COLOUR BIRD ORNAMENT         1501
SET OF 3 CAKE TINS PANTRY DESIGN      1473
PACK OF 72 RETROSPOT CAKE CASES       1385
LUNCH BAG  BLACK SKULL.               1350
NATURAL SLATE HEART CHALKBOARD        1280
Name: Description, dtype: int64

In [18]:
data[data['StockCode']=='22423']

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,Country
880,536477,22423,REGENCY CAKESTAND 3 TIER,16,12/1/2010 12:27,10.95,United Kingdom
936,536502,22423,REGENCY CAKESTAND 3 TIER,2,12/1/2010 12:36,12.75,United Kingdom
1092,536525,22423,REGENCY CAKESTAND 3 TIER,2,12/1/2010 12:54,12.75,United Kingdom
1155,536528,22423,REGENCY CAKESTAND 3 TIER,1,12/1/2010 13:17,12.75,United Kingdom
1197,536530,22423,REGENCY CAKESTAND 3 TIER,1,12/1/2010 13:21,12.75,United Kingdom
...,...,...,...,...,...,...,...
539891,581449,22423,REGENCY CAKESTAND 3 TIER,1,12/8/2011 17:37,12.75,United Kingdom
539892,581449,22423,REGENCY CAKESTAND 3 TIER,1,12/8/2011 17:37,12.75,United Kingdom
540216,581472,22423,REGENCY CAKESTAND 3 TIER,2,12/8/2011 19:55,12.75,United Kingdom
541231,581495,22423,REGENCY CAKESTAND 3 TIER,10,12/9/2011 10:20,12.75,United Kingdom


In [30]:
data['Country'].value_counts().head(11)

United Kingdom    494024
Germany             9495
France              8557
EIRE                8196
Spain               2533
Netherlands         2371
Belgium             2069
Switzerland         2002
Portugal            1519
Australia           1259
Norway              1086
Name: Country, dtype: int64

In [32]:
!pip install apyori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
  Preparing metadata (setup.py) ... [?25ldone
[?25hBuilding wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25ldone
[?25h  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5974 sha256=88b372d0afcc3f56365eac12a6ce0f327db085b29aef0f8b831ff9b4738658f7
  Stored in directory: /home/hemantpatel/.cache/pip/wheels/32/2a/54/10c595515f385f3726642b10c60bf788029e8f3a1323e3913a
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


In [34]:
description = data['Description']

In [38]:
description.value_counts()

WHITE HANGING HEART T-LIGHT HOLDER     2369
REGENCY CAKESTAND 3 TIER               2200
JUMBO BAG RED RETROSPOT                2159
PARTY BUNTING                          1727
LUNCH BAG RED RETROSPOT                1638
                                       ... 
Missing                                   1
historic computer difference?....se       1
DUSTY PINK CHRISTMAS TREE 30CM            1
WRAP BLUE RUSSIAN FOLKART                 1
PINK BERTIE MOBILE PHONE CHARM            1
Name: Description, Length: 4223, dtype: int64

In [50]:
from sklearn.preprocessing import LabelEncoder
l_e = LabelEncoder()
description_le = description.copy()
a = l_e.fit(description).fit_transform(description_le)

In [51]:
description_le

0          WHITE HANGING HEART T-LIGHT HOLDER
1                         WHITE METAL LANTERN
2              CREAM CUPID HEARTS COAT HANGER
3         KNITTED UNION FLAG HOT WATER BOTTLE
4              RED WOOLLY HOTTIE WHITE HEART.
                         ...                 
541904            PACK OF 20 SPACEBOY NAPKINS
541905           CHILDREN'S APRON DOLLY GIRL 
541906          CHILDRENS CUTLERY DOLLY GIRL 
541907        CHILDRENS CUTLERY CIRCUS PARADE
541908          BAKING SET 9 PIECE RETROSPOT 
Name: Description, Length: 540455, dtype: object

In [52]:
a

array([3918, 3926,  913, ...,  749,  748,  304])

In [53]:
from  apyori import apriori

results = list(apriori(description_le))

TypeError: 'numpy.int64' object is not iterable

_________

**Choosing the Algorithm for the Project :**           
For this project, unsupervised learning algorithms such as clustering and association rule mining can be used to analyze the customer purchase patterns.                                 
Clustering algorithms, such as K-Means, can be used to group customers based on their purchase patterns.                                                                                   
Association rule mining algorithms, such as Apriori, can be used to uncover interesting relationships among items purchased.

_____________

**Motivation and Reasons For Choosing the Algorithm :**                  
The choice of algorithms is motivated by the need to uncover hidden patterns in the data that can provide insights into customer purchase behavior.           
Clustering algorithms can be used to group customers based on their purchase patterns, while association rule mining algorithms can be used to uncover interesting relationships among items purchased.


_____________

**Assumptions :**                        
> - It is assumed that the data is clean and free from any errors or inconsistencies.        
> - Furthermore, it is assumed that all features are relevant and that the features are sufficient to provide insights into customer purchase patterns.

___________

**Model Evaluation and Techniques :**                   
The clustering algorithms and association rule mining algorithms can be evaluated using different metrics such as accuracy, precision, recall, f1-score, and log-loss.           
Furthermore, model selection techniques such as cross-validation and grid search can be used to optimize the parameters of the models.

___________

**Inferences from the Same :**                       
The analysis of customer purchase patterns can provide useful insights into customer behavior.                     
For example, clustering algorithms can be used to group customers based on their purchase patterns.                  
Association rule mining algorithms can be used to uncover interesting relationships among items purchased.                        
These insights can then be used to inform marketing and product decisions.


________________________

**Future Possibilities of the Project :**                       
The analysis of customer purchase patterns can be further extended by incorporating other datasets such as demographic data and customer feedback data.                 
Furthermore, predictive analytics techniques such as regression and classification can be used to predict customer behavior based on past purchase patterns.            


_______________________________