GitHub

Instacart Market basket analysis

Problem :

The goal of this competition was to predict grocery reorders: given a user’s purchase history (a set of orders, and the products purchased within each order), which of their previously purchased products will they repurchase in their next order?

Data Set information

The dataset for this competition is a relational set of files describing customers' orders over time. The dataset is anonymized and contains a sample of over 3 million grocery orders from more than 200,000 Instacart users.

Sample of 3,000,000 grocery orders
Number of users - over 200,000
For each user, between 4 and 100 of their orders was provided
The sequence of products purchased in each order was included
The data set also provides for the week and hour of day the order was placed, and
a relative measure of time between orders.

Data File descriptions

order_products_reorder.csv & order_products_none.csv

These files specify which products were purchased in each order. order_products__prior.csv contains previous order contents for all customers. 'reordered' indicates that the customer has a previous order that contains the product

order_id	product_id	add_to_cart_order	reordered
1	49302	1	1
1	11109	2	1
1	10246	3	0

orders.csv

This file tells to which set (prior, train, test) an order belongs. You are predicting reordered items only for the test set orders. 'order_dow' is the day of week. In the eval_set column,

prior - indicates that this order belongs to the prior order - not the latest order - this also belongs to the training set
train - this is from the latest order of the user and should be used in a training set
test - this indicates - we need to use this order/user pair to predict which product, the user re-ordered or not

order_id	user_id	eval_set	order_number	order_dow	order_hour_of_day	days_since_prior_order
2539329	1	prior	1	2	08	NA
2398795	1	prior	2	3	07	15.0
473747	1	prior	3	3	12	21.0

products.csv

product_id	product_name	aisle_id	department_id
1	Chocolate Sandwich Cookies	61	19
2	All-Seasons Salt	104	13
3	Robust Golden Unsweetened Oolong Tea	94	7

aisles.csv

aisle_id	aisle
1	prepared soups salads
2	specialty cheeses
3	energy granola bars

departments.csv

department_id	department
1	frozen
2	other
3	bakery

sample_submission.csv

order_id	products
17	39276
34	39276
137	39276

Approach

This is a Classification problem an in particular a Bianary Classification - Either a user's order has a product that is reordered (1) or is not reordered (0)
Appraoch will be similar to the one outlined in this blog post http://blog.kaggle.com/2017/09/21/instacart-market-basket-analysis-winners-interview-2nd-place-kazuki-onodera/
Predict Re-Orders: which previously purchased products will be in the next order. This depends on the User and the Product
Predict None - Will the User's next order NOT contain a re-order. If the probability is p, then we can infer the probability of a re-order is (1-p).
Using the probabilities from both the steps above predict the probability that a User will re-order

Submissions :

Submissions will include

Code in Python
Powerpoint Slide deck
Report

Sl.no	Files	Comments
1	code/py/InstacartMBA_ETL.ipynb	Code for Extraction Transformation and Loading
2	code/py/InstacartMBA_DataWrangling.ipynb	Code for Data Wrangling
3	code/py/InstacartMBA_DataStoryTelling.ipynb	Code for Data Story Telling
4	docs/Instacart_milestone-1_pdf.pdf	Milestone 1 report

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
code		code
docs		docs
input		input
submissions		submissions
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Instacart Market basket analysis

Problem :

Data Set information

Data File descriptions

order_products_reorder.csv & order_products_none.csv

orders.csv

products.csv

aisles.csv

departments.csv

sample_submission.csv

Approach

Submissions :

About

Releases

Packages

Languages

krajeshj/InstacartMBA

Folders and files

Latest commit

History

Repository files navigation

Instacart Market basket analysis

Problem :

Data Set information

Data File descriptions

order_products_reorder.csv & order_products_none.csv

orders.csv

products.csv

aisles.csv

departments.csv

sample_submission.csv

Approach

Submissions :

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages