Skip to content

Predicting Customer Conversion From Google Merchandise Store

Notifications You must be signed in to change notification settings

kaushil24/GoogleMerchandiseProject

 
 

Repository files navigation

1_UotPZNT_c_BjECWSx00pPw

PREDICTING CUSTOMER CONVERSION FROM GOOGLE MERCHANDISE STORE

This project aims to build a machine learning algorithm for predicting whether customers visiting Google's Merchandise Store will make a purchase in the next 7 days.

Contributed by: Lim Si Ling Evelyn, Lee Meng Yong, Jiang Nan, Desmond Ho
Pls contact desmondhw@gmail.com for any questions.

Data:
Google Merchandise Store clickstream data from Google Bigquery from 1 November 2016 to 30 July 2017.

Dataset was extracted by incorporating SQL queries into the Python codes in Google Colab - See Data_Extraction_and_Exploratory_Data_Analysis.ipynb under codes folder. https://bigquery.cloud.google.com/table/bigquery-public-data:google_analytics_sample.ga_sessions_20170801

Training Dataset:
Data\test_dataset.csv
Testing Dataset:
Data\train_validation_dataset.csv

Note: Target variable (transaction) is highly imbalanced with only 3.9% of the non-bounce sessions making a transaction.

Performance Metric:
F2-score. F2-score considers both precision and recall in the measurement, but implicit computation will give a higher weightage to recall measure, i.e. the ability for the model to be able to capture as many potential customers as possible.

Models:
Random Forest using data w/o any LDA feature engineering and w/o PCA obtained the highest F2-score for the test data out of all the combinations tried.

Best model

  • Codes\Models without LDA _ PCA.ipynb

Other models

  • Codes\Models with LDA.ipynb
  • Codes\Models with PCA.ipynb
  • Codes\Models without LDA _ PCA with undersampling.ipynb

Tableau:

  • Codes\outliers.twbx - This visual demonstrated why Anomaly Detection models didn't work well due to the minority class(purchase) not appearing as anomalies, but clustered with the majority.

Screenshot 2019-08-21 at 22 55 45

Report:
Our full report can be found in: 'Project Report - Applied Machine Learning.docx'

Requirements:

  • Python 3.7
  • Jupyter Notebook
  • Tableau 2019.2

Python Libraries Required:

  • datetime
  • numpy
  • pandas
  • scikit-learn
  • matplotlib.pyplot
  • seaborn
  • dmatrices
  • statsmodels
  • lightgbm

Python libraries above can be installed via pip. Note that we only tested with the versions above, newer versions might not work.

About

Predicting Customer Conversion From Google Merchandise Store

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%