**Problem Statement:**
Backorders are unavoidable, but by anticipating which things will be backordered,
planning can be streamlined at several levels, preventing unexpected strain on
production, logistics, and transportation. ERP systems generate a lot of data (mainly
structured) and also contain a lot of historical data; if this data can be properly utilized, a
predictive model to forecast backorders and plan accordingly can be constructed.
Based on past data from inventories, supply chain, and sales, classify the products as
going into backorder (Yes or No).

**What is Backorder ?** 
* A backorder is an order for a good or service that cannot be filled at the current time due to a lack of available supply. The item may not be held in the company's available inventory but could still be in production, or the company may need to still manufacture more of the product. Backorders are an indication that demand for a company's product outweighs its supply.<br>



**Key Takeaways<br>**
* A backorder is an order for a good or service that cannot be filled immediately because of a lack of available supply. 
* Backorders give insight into a company's inventory management. A manageable backorder with a short turnaround is a net positive, but a large backorder with longer wait times can be problematic.
* Companies with manageable backorders tend to have high demand, while those that can't keep up may lose customers.
* However, backorders allow for a company to maintain lower levels of inventory, have lower risk of obsolesce and theft, and may result in natural marketing for its highly demanded product.
* Popular products in high demand (i.e. next generation gaming consoles or new iterations of cell phones) may experience backorders.
#Data Description <br>
**• sku** - Random product(sku) code <br>
**• national_inv**  - Current inventory level of that sku <br>
**• lead_time** - Transit time for product(if available at source)  <br>
**• in_transit_qty** - Quantity in transit from source <br>
**• forecast_x_month** - Forecast sales for the net 3, 6, 9 months <br>
**• sales_x_month** - Sales quantity for the prior 1, 3, 6, 9 months <br>
**• min_bank** - Minimum recommended amount to stock <br>
**• potential_issue** - Indictor variable noting potential issue with item <br>
**• pieces_past_due** - Parts overdue from source <br>
**• perf_x_months_avg** - Source performance in the last 6 and 12 months <br>
**• local_bo_qty** - Amount of stock orders overdue <br>
**• deck_risk** – Part risk flag<br>
**• oe_constraint** – Part risk flag<br>
**• ppap_risk** – Part risk flag<br>
**• stop_auto_buy** – Part risk flag<br>
**• rev_stop** – Part risk flag<br>
**• went_on_backorder** - Product went on backorder <br>
## Get the Data
Welcome to Backorder Machine Learning Prediction Project !! Our task will be , to predict whether the products will go on backorder or not .

## Download the Data

In [2]:
##imports
import pandas as pd
pd.set_option('display.max_columns', None)
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from pathlib import Path
import tarfile

In [None]:

##Link to download test dataset is https://drive.google.com/file/d/1ABbfz7MjPLj2taxUxwsWxXgYsH-TnBNV/view?usp=share_link
##Link to download train dataset is https://drive.google.com/file/d/1CTiyE6BZlDK-UNYUuuT8FXszBvuT9luS/view?usp=share_link

# df_test = pd.read_csv(r"/content/drive/MyDrive/bo project data/dataset/Kaggle_Test_Dataset_v2.csv")
# df_train = pd.read_csv(r"/content/drive/MyDrive/bo project data/dataset/Kaggle_Training_Dataset_v2.csv")

In [52]:
def extract_dataset():
    Path("datasets").mkdir(parents=True , exist_ok=True)
    with tarfile.open("Kaggle_Test_Dataset_v2.csv.tgz.gz") as bo_tar:
        bo_tar.extractall(path="datasets")
    with tarfile.open("Kaggle_Training_Dataset_v2.csv.tgz.gz") as bo_tar:
        bo_tar.extractall(path="datasets")
    
    return pd.read_csv(Path("datasets/Kaggle_Test_Dataset_v2.csv")),pd.read_csv(Path("datasets/Kaggle_Training_Dataset_v2.csv"))
test_df, train_df = extract_dataset()

  return pd.read_csv(Path("datasets/Kaggle_Test_Dataset_v2.csv")),pd.read_csv(Path("datasets/Kaggle_Training_Dataset_v2.csv"))
  return pd.read_csv(Path("datasets/Kaggle_Test_Dataset_v2.csv")),pd.read_csv(Path("datasets/Kaggle_Training_Dataset_v2.csv"))


In [55]:
test_df.head()


Unnamed: 0,sku,national_inv,lead_time,in_transit_qty,forecast_3_month,forecast_6_month,forecast_9_month,sales_1_month,sales_3_month,sales_6_month,sales_9_month,min_bank,potential_issue,pieces_past_due,perf_6_month_avg,perf_12_month_avg,local_bo_qty,deck_risk,oe_constraint,ppap_risk,stop_auto_buy,rev_stop,went_on_backorder
0,3285085,62.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,No,0.0,-99.0,-99.0,0.0,Yes,No,No,Yes,No,No
1,3285131,9.0,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,No,0.0,-99.0,-99.0,0.0,No,No,Yes,No,No,No
2,3285358,17.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,No,0.0,0.92,0.95,0.0,No,No,No,Yes,No,No
3,3285517,9.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,No,0.0,0.78,0.75,0.0,No,No,Yes,Yes,No,No
4,3285608,2.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,No,0.0,0.54,0.71,0.0,No,No,No,Yes,No,No


In [56]:
test_df.shape

(242076, 23)

In [57]:
train_df.shape


(1687861, 23)

In [58]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1687861 entries, 0 to 1687860
Data columns (total 23 columns):
 #   Column             Non-Null Count    Dtype  
---  ------             --------------    -----  
 0   sku                1687861 non-null  object 
 1   national_inv       1687860 non-null  float64
 2   lead_time          1586967 non-null  float64
 3   in_transit_qty     1687860 non-null  float64
 4   forecast_3_month   1687860 non-null  float64
 5   forecast_6_month   1687860 non-null  float64
 6   forecast_9_month   1687860 non-null  float64
 7   sales_1_month      1687860 non-null  float64
 8   sales_3_month      1687860 non-null  float64
 9   sales_6_month      1687860 non-null  float64
 10  sales_9_month      1687860 non-null  float64
 11  min_bank           1687860 non-null  float64
 12  potential_issue    1687860 non-null  object 
 13  pieces_past_due    1687860 non-null  float64
 14  perf_6_month_avg   1687860 non-null  float64
 15  perf_12_month_avg  1687860 non-n

In [None]:
train_df