Instacart-Market-Basket-Analysis

*The score is very close to the competition's winner's score i.e. 0.409 . Competition Leaderboard

Tech and Algo used: Auto-Encoder, Logistic-Regression, Decision Tree, Random-Forest, AdaBoost, Gradient Boosting, Feature-Engineering, Python, Tensorflow, Pandas, Sklearn, Matplolib, Seaborn, Plotly.

Problem Overview:

Goal: Predict which products will an Instacart consumer purchase again.

Instacart is a grocery ordering and delivery app.
Currently they use transactional data to develop models that predict which products a user will buy again, try for the first time, or add to their cart next during a session.
The goal is to predict which previously purchased products will be in a user’s next order.
For each orderid in the test set, we should predict a space-delimited list of product ids for that order.
Predict an explicit 'None' value for orders with no reordered items.
In the data provided, over 3 million grocery orders are present.
More than 200,000 Instacart users.
For each user, instacart provided between 4 and 100 of their orders in the dataset, with the sequence of products purchased in each order.

Reorder of a product by a user highly depends on the frequency and recency of past purchases.
Fruits and Vegetables are reordered much more than any other product.
Personal Care products are reordered very less.
Gradient Boosting gave the best result for the dataset.
Probability Calibration was needed since the dataset was highly imbalanced.

purchase_weight_order_up: Weight of user-product pair based on frequency of purchase and recency(order) of purchase.
reorder_weight_order_up: Weight of user-product pair based on frequency of reorder and recency(order) of reorder.
#orders_since_last_purchase_up: No. of orders placed by the user after his/her last purchase of the given product.
#reorders_in_last_3_orders_up: No. of times user has reordered the given product in his/her last 3 orders.
purchase_weight_days_up: Weight of user-product pair based on frequency of purchase and recency(days) of purchase.
#purchases_in_last_3_orders_up: No. of times user has purchased the given product in his/her last 3 orders.
p(reorder|user,product)_up: (#orders where given product was rerodered by user) / (Total #orders by user)
p(reorder|product)_p: (#reorders of product p) / (#purchases of product p)
exceed_in_max_lifetime_orders_up: No. of orders placed after the last purchase of the given product by the user - Max no. of orders after which user u purchased product p in past.
days_since_last_purchase_up: No. of days passed after the last purchase of the given product by the user.

Click here to download the dataset with all the 96 engineered features.

Description of data provided by Kaggle: Link

If you find this helpful, please do star the repo.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
1. Problem and Data Overview.ipynb		1. Problem and Data Overview.ipynb
2. Exploratory Data Analysis.ipynb		2. Exploratory Data Analysis.ipynb
3. Feature Engineering.ipynb		3. Feature Engineering.ipynb
4. Baseline(Rule-Based) Models.ipynb		4. Baseline(Rule-Based) Models.ipynb
5. Auto-Encoder for Ftr Extraction.ipynb		5. Auto-Encoder for Ftr Extraction.ipynb
6. First-Cut Approach.ipynb		6. First-Cut Approach.ipynb
7. More Feature Engineering.ipynb		7. More Feature Engineering.ipynb
8. More Predictive Models(ML).ipynb		8. More Predictive Models(ML).ipynb
Best F1-Score.png		Best F1-Score.png
F1-Score function.py		F1-Score function.py
Instacart Market Basket Analysis.ipynb		Instacart Market Basket Analysis.ipynb
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt