# Structure of Jupyter Notebooks Development for Predictive Modeling

*1_problem_statement.ipynb*

## Problem statement

By analyzing a combination of supply chain dynamics, shipping times, carriers, supplier locations, production volumes, routes, and shipped product features, and providing supply chain teams with a chatbot for easy real-time tracking of shipments, along with an integrated tool to stay up-to-date with relevant news from various webpages, we can predict transportation costs and enhance decision-making towards supply chain expense management while keeping operations aligned with the latest industry standards and insights.

## Dataset Overview

For our proof of concept, we're utilizing a comprehensive dataset dedicated to supply chain analysis, which is accessible on [Kaggle](https://www.kaggle.com/datasets/harshsingh2209/supply-chain-analysis/download?datasetVersionNumber=1). This dataset serves as the foundational framework for the ongoing development, testing, and subsequent implementation of ETL (Extract, Transform, Load) pipelines. These pipelines will be tailored and integrated with actual customer databases once we secure the necessary data access permissions.

## Dataset Definition

*2_data_wrangling.ipynb*

## Stages:

1. Handle missing values
2. Define categorical features
3. Perform feature engineering
4. List insights for the Exploratory Data Analysis
5. Define the data transformations needed

## Output:

Dataset prepared for EDA

*3_EDA.ipynb*

## Stages:

1. Data Wrangling Dataset Ingestion
2. Analyze categorical and numerical features
3. Select features based on their correlations
4. Select features and the target variable
5. Examine the distribution of numerical features
6. Select features based on their correlations
7. Re-define steps in data wrangling stages (if applicable)
8. Clean the dataset for modeling

## Output:

Dataset for modeling


*4_modeling.ipynb*

## Stages:

1. EDA Dataset Ingestion
2. Choose Model Type
3. Train/Test Phase
4. Save Intermediate Datasets
5. Model Evaluation Metrics
6. Try Different ML Models
7. Pick a Useful Metric
8. Condense Models and Metrics
9. Visualization of Performance Plots
10. Saving the Model

## Output:

.pkl file model for later usage in pipelines and platform integration 

*Deploment*

Predictive feature integration with software platform.