
<img src="dataset-cover.jpg" alt="Circular Image" style="border-radius: 50%; display: block; margin: 0 auto; width: 200px; height: 200px;">
<h1>Logistics Supply chain real world data</h1>
<p>Real-World Insights: Optimizing Logistics and Supply Chain Data</p>
<h2>About Dataset</h2>
<p><b>Problem set :</b> This is a multi-label delivery delay prediction problem, which is a challenge often encountered by supply chain practitioners from various industries in their daily operations.</p>
<p><b>Testing :</b> A tabular dataset that consists of a set of variables related to delivery will be provided to the participants to develop their delay prediction models.</p>
<p><b>Evaluation Metric:</b> An ideal delay prediction algorithm shall be able to accurately predict delivery arrival status.</p>
<h2>Project Description</h2>
<p>Analyze data in depth and provide concrete strategies to <b>increase revenue</b> in the coming quarter.</p>
<h3>Source</h3>
<p><b>Kaggle: </b><a href="https://www.kaggle.com/datasets/pushpitkamboj/logistics-data-containing-real-world-data/data">click here!!!</a></p>

In [1]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

In [3]:
df = pd.read_csv("datasets/incom2024_delay_example_dataset.csv")
df.head()

Unnamed: 0,payment_type,profit_per_order,sales_per_customer,category_id,category_name,customer_city,customer_country,customer_id,customer_segment,customer_state,...,order_region,order_state,order_status,product_card_id,product_category_id,product_name,product_price,shipping_date,shipping_mode,label
0,DEBIT,34.448338,92.49099,9.0,Cardio Equipment,Caguas,Puerto Rico,12097.683,Consumer,PR,...,Western Europe,Vienna,COMPLETE,191.0,9.0,Nike Men's Free 5.0+ Running Shoe,99.99,2015-08-13 00:00:00+01:00,Standard Class,-1
1,TRANSFER,91.19354,181.99008,48.0,Water Sports,Albuquerque,EE. UU.,5108.1045,Consumer,CA,...,South America,Buenos Aires,PENDING,1073.0,48.0,Pelican Sunstream 100 Kayak,199.99,2017-04-09 00:00:00+01:00,Standard Class,-1
2,DEBIT,8.313806,89.96643,46.0,Indoor/Outdoor Games,Amarillo,Puerto Rico,4293.4478,Consumer,PR,...,Western Europe,Nord-Pas-de-Calais-Picardy,COMPLETE,1014.0,46.0,O'Brien Men's Neoprene Life Vest,49.98,2015-03-18 00:00:00+00:00,Second Class,1
3,TRANSFER,-89.463196,99.15065,17.0,Cleats,Caguas,Puerto Rico,546.5306,Consumer,PR,...,Central America,Santa Ana,PROCESSING,365.0,17.0,Perfect Fitness Perfect Rip Deck,59.99,2017-03-18 00:00:00+00:00,Second Class,0
4,DEBIT,44.72259,170.97824,48.0,Water Sports,Peabody,EE. UU.,1546.398,Consumer,CA,...,Central America,Illinois,COMPLETE,1073.0,48.0,Pelican Sunstream 100 Kayak,199.99,2015-03-30 00:00:00+01:00,Standard Class,1


In [4]:
df.shape

(15549, 41)

# 1. Data Cleaning

## 1.Relevant Features

<p>Based on the file <b>incom2024_delay_variable_description.csv</b>, the selected features are only those relevant for increasing revenue in the next quarter alone.</p>


In [5]:
relevant_features = [
    'sales',
    'profit_per_order',
    'sales_per_customer',
    'order_item_total_amount',
    'order_profit_per_order',
    'order_item_profit_ratio',
    'order_item_product_price',
    'order_item_discount',
    'order_item_discount_rate',
    'product_price',
    'order_item_quantity',
    'order_status',
    'customer_segment',
    'customer_country',
    'order_country',
    'market',
    'order_region',
    'customer_city',
    'order_city',
    'customer_state',
    'order_state',
    'category_name',
    'category_id',
    'product_category_id',
    'product_name',
    'product_card_id',
    'department_name',
    'department_id',
    'order_date',
    'shipping_date',
    'shipping_mode'
]

df_relevant = df[relevant_features]
df_relevant.shape

(15549, 31)

In [6]:
df_relevant.to_csv("datasets/df_relevant.csv", index=False)
print("File Saved!!!")

File Saved!!!


## 2. Data Missing

In [None]:
df = pd.read_csv("datasets/df_relevant.csv")
df.head()

Unnamed: 0,sales,profit_per_order,sales_per_customer,order_item_total_amount,order_profit_per_order,order_item_profit_ratio,order_item_product_price,order_item_discount,order_item_discount_rate,product_price,...,category_name,category_id,product_category_id,product_name,product_card_id,department_name,department_id,order_date,shipping_date,shipping_mode
0,99.99,34.448338,92.49099,84.99157,32.083145,0.41,99.99,12.623338,0.13,99.99,...,Cardio Equipment,9.0,9.0,Nike Men's Free 5.0+ Running Shoe,191.0,Footwear,3.0,2015-08-12 00:00:00+01:00,2015-08-13 00:00:00+01:00,Standard Class
1,199.99,91.19354,181.99008,181.99,91.23587,0.48,199.99,16.5,0.07,199.99,...,Water Sports,48.0,48.0,Pelican Sunstream 100 Kayak,1073.0,Fan Shop,7.0,2017-02-10 00:00:00+00:00,2017-04-09 00:00:00+01:00,Standard Class
2,99.96,8.313806,89.96643,93.81015,6.965549,0.09,49.98,6.6,0.06,49.98,...,Indoor/Outdoor Games,46.0,46.0,O'Brien Men's Neoprene Life Vest,1014.0,Fan Shop,7.0,2015-01-01 00:00:00+00:00,2015-03-18 00:00:00+00:00,Second Class
3,119.98,-89.463196,99.15065,99.8906,-95.4014,-0.8,59.99,16.942171,0.16,59.99,...,Cleats,17.0,17.0,Perfect Fitness Perfect Rip Deck,365.0,Apparel,4.0,2017-05-31 00:00:00+01:00,2017-03-18 00:00:00+00:00,Second Class
4,199.99,44.72259,170.97824,171.07587,44.569,0.27,199.99,29.99,0.15,199.99,...,Water Sports,48.0,48.0,Pelican Sunstream 100 Kayak,1073.0,Fan Shop,7.0,2015-03-28 00:00:00+00:00,2015-03-30 00:00:00+01:00,Standard Class


In [12]:
print("Total Missing Value:",df.isnull().sum().sum())

Total Missing Value: 0


## 3. Data Type

In [13]:
df.dtypes

sales                       float64
profit_per_order            float64
sales_per_customer          float64
order_item_total_amount     float64
order_profit_per_order      float64
order_item_profit_ratio     float64
order_item_product_price    float64
order_item_discount         float64
order_item_discount_rate    float64
product_price               float64
order_item_quantity         float64
order_status                 object
customer_segment             object
customer_country             object
order_country                object
market                       object
order_region                 object
customer_city                object
order_city                   object
customer_state               object
order_state                  object
category_name                object
category_id                 float64
product_category_id         float64
product_name                 object
product_card_id             float64
department_name              object
department_id               

# 2. EDA (Exploratory Data Analysis)

## 1. Core Revenue

## 2. ...