## Dependencies:

In [19]:
import numpy as np
import pandas as pd

import warnings
warnings.filterwarnings("ignore")

print("All Modules Imported Successfully")

All Modules Imported Successfully


## Objectives:

Identify popular product bundles and predict next-purchase products to help optimize marketing strategies, such as product bundling and targeted promotions. The project aims to increase average order value (AOV) by encouraging customers to buy bundles of frequently purchased products together.

## Project Description:

This project involves understanding associations among products that customers frequently purchase together, using market basket analysis and supervised modeling to identify patterns and predict a user’s next likely purchases. By using association rules to discover product bundling opportunities and sequence-based supervised models to anticipate future purchases, this project provides insights that support effective marketing and personalization strategies.

## Data Considered and Description:

1. **Orders Dataset:** Tracks each order made by a user, allowing us to analyze the order sequence and timing.
2. **Order Products Dataset:** Provides data on products in each order, useful for determining co-purchased products.
3. **Products Dataset:** Contains product details, allowing us to understand product relationships within orders.

## Approach:

* Use **Apriori** or **FP-Growth** algorithms to perform association rule mining and find frequent item sets that are commonly bought together.
Identify association rules that have high confidence and lift to create potential product bundles.
* Use **Classification Algorithms** to predict a user's next likely purchase.

Engineer sequential features based on historical purchase behavior, such as:
1. Time Since Last Purchase: Days since the last purchase of a product.
2. Purchase Frequency: How often a user buys a product or category within a certain period.
3. Last N Purchases: Use one-hot encoding or frequency encoding to represent a user's last few purchases, allowing the model to capture trends in recent buying behavior.

**Day of the Week and Time of Day:** Include temporal features to help the model learn patterns around when certain products are typically purchased.


## Read the 3 Dataset

In [21]:
data = pd.read_csv("../data/order_products.csv")
orderProducts_df = pd.DataFrame(data)

In [23]:
orderProducts_df

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered
0,2,33120,1,1
1,2,28985,2,1
2,2,9327,3,0
3,2,45918,4,1
4,2,30035,5,0
...,...,...,...,...
32434484,3421083,39678,6,1
32434485,3421083,11352,7,0
32434486,3421083,4600,8,0
32434487,3421083,24852,9,1


In [25]:
data = pd.read_csv("../data/orders.csv")
orders_df = pd.DataFrame(data)

In [27]:
orders_df

Unnamed: 0,order_id,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order
0,2539329,1,prior,1,2,8,
1,2398795,1,prior,2,3,7,15.0
2,473747,1,prior,3,3,12,21.0
3,2254736,1,prior,4,4,7,29.0
4,431534,1,prior,5,4,15,28.0
...,...,...,...,...,...,...,...
3421078,2266710,206209,prior,10,5,18,29.0
3421079,1854736,206209,prior,11,4,10,30.0
3421080,626363,206209,prior,12,1,12,18.0
3421081,2977660,206209,prior,13,1,12,7.0


In [29]:
data = pd.read_csv("../data/products.csv")
products_df = pd.DataFrame(data)

In [31]:
products_df

Unnamed: 0,product_id,product_name,aisle_id,department_id
0,1,Chocolate Sandwich Cookies,61,19
1,2,All-Seasons Salt,104,13
2,3,Robust Golden Unsweetened Oolong Tea,94,7
3,4,Smart Ones Classic Favorites Mini Rigatoni Wit...,38,1
4,5,Green Chile Anytime Sauce,5,13
...,...,...,...,...
49683,49684,"Vodka, Triple Distilled, Twist of Vanilla",124,5
49684,49685,En Croute Roast Hazelnut Cranberry,42,1
49685,49686,Artisan Baguette,112,3
49686,49687,Smartblend Healthy Metabolism Dry Cat Food,41,8


In [35]:
merged_df = pd.merge(orderProducts_df, orders_df, on='order_id', how='left')

final_df = pd.merge(merged_df, products_df, on='product_id', how='left')

In [36]:
final_df

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered,user_id,eval_set,order_number,order_dow,order_hour_of_day,days_since_prior_order,product_name,aisle_id,department_id
0,2,33120,1,1,202279,prior,3,5,9,8.0,Organic Egg Whites,86,16
1,2,28985,2,1,202279,prior,3,5,9,8.0,Michigan Organic Kale,83,4
2,2,9327,3,0,202279,prior,3,5,9,8.0,Garlic Powder,104,13
3,2,45918,4,1,202279,prior,3,5,9,8.0,Coconut Butter,19,13
4,2,30035,5,0,202279,prior,3,5,9,8.0,Natural Sweetener,17,13
...,...,...,...,...,...,...,...,...,...,...,...,...,...
32434484,3421083,39678,6,1,25247,prior,24,2,6,21.0,Free & Clear Natural Dishwasher Detergent,74,17
32434485,3421083,11352,7,0,25247,prior,24,2,6,21.0,Organic Mini Sandwich Crackers Peanut Butter,78,19
32434486,3421083,4600,8,0,25247,prior,24,2,6,21.0,All Natural French Toast Sticks,52,1
32434487,3421083,24852,9,1,25247,prior,24,2,6,21.0,Banana,24,4


In [39]:
final_df.to_csv("../data/final.csv", index=False)