<a href="https://colab.research.google.com/github/rajupadhyaya121/FedEx-performance-analysis/blob/main/eda_submission_template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - FedEx Logistics Performance Analysis
##### **Contribution**    - Individual

# **Project Summary -**

This project is centered on optimizing FedEx’s global supply chain and logistics performance through systematic data analysis. In the modern business environment, where globalization and eCommerce have drastically increased the scale and complexity of logistics, organizations like FedEx must constantly refine their operations to maintain a competitive edge. Customer satisfaction is now directly tied to on-time deliveries, cost-effective shipping, and reliable vendor performance. Any inefficiencies—whether delays, high freight costs, or inconsistent vendor reliability—can result in both financial losses and reputational damage. This project leverages historical shipment and delivery data to provide insights into these challenges and identify practical strategies for improvement.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


1. Shipment Mode and Timeliness-FedEx manages shipments through multiple modes (Air, Sea, Road, etc.), but it is unclear how each mode affects the probability of on-time delivery. Understanding this relationship is crucial to optimize mode selection while balancing cost and reliability.

2. Geographical Delays-Certain countries or regions may consistently face higher shipment delays due to infrastructure, customs clearance, or vendor inefficiencies. Identifying these regions will help FedEx target improvements and strengthen its global supply chain.

3. Team / Regional Management Performance-Different internal teams and management regions (e.g., PMO–US, PMO–Asia) may show varying levels of efficiency. Measuring and comparing their on-time performance can uncover best practices and highlight areas needing operational support.

4. Shipment Weight and Insurance Costs-Heavier shipments often incur higher insurance expenses, but the exact relationship needs validation. Analyzing this correlation will help FedEx develop cost-optimization strategies for insurance and risk management.

5. Lead Time vs. Delivery Performance-Lead time (the gap between order placement and scheduled delivery) directly impacts delivery success. Very short lead times may increase delays, while longer ones might ensure smoother operations. Evaluating this relationship can guide realistic scheduling and improve customer satisfaction.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
df=pd.read_csv("/content/SCMS_Delivery_History_Dataset.csv")

### Dataset First View

In [None]:
# Dataset First Look
df.head()
df.tail()

### Dataset Rows & Columns count

In [None]:
df.describe()

### Dataset Information

1. ID – Unique shipment identifier.

2. Project Code – Code for shipment project.

3. PQ # – Purchase quote number.

4. PO/ SO # – Purchase order / Sales order number.

5. ASN/DN # – Advanced shipment / Delivery note number.

6. Country – Destination country.

7. Managed By – Region handling the shipment.

8. Fulfill Via – Transport method (Air/Truck/Ship).

9. Vendor INCO Term – Shipping terms agreed with vendor.

10. Shipment Mode – Mode of shipment (Air/Sea/Truck).

11. Shipment Date – Dispatch date of goods.

12. Scheduled Delivery Date – Planned delivery date.

13. Delivered to Client Date – Actual delivery date.

14. Delivery Recorded Date – Date recorded in system.

15. Product Group – Type/category of product.

16. Sub Classification – Subgroup of product.

7. Vendor – Supplier name.

18. Item Description – Description of item.

19. Molecule/Test Type – Type of medical test/molecule.

20. Brand – Brand of product.

21. Dosage – Dosage info for medicines.

22. Dosage Form – Form (tablet, vial, etc.).

23. Unit of Measure (Per Pack) – Unit size in pack.

24. Line Item Quantity – Quantity ordered.

25. Line Item Value (USD) – Value of item in USD.

26. Unit Price (USD) – Price per unit.

27. Manufacturing Site – Production location.

28. First Line Designation – Indicates first-line treatment (Yes/No).

29. Weight (Kilograms) – Shipment weight.

30. Freight Cost (USD) – Cost of shipment.

31. Line Item Insurance (USD) – Insurance charge for item.

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
missing_counts=df.isnull().sum()
missing_counts=missing_counts[missing_counts>0]
missing_counts.plot(kind='bar',color='orange')
plt.title("Missing Value Vizualization")
plt.ylabel("Count of Missing Values")
plt.show()

In [None]:
# Replacing Numerical NaN values
for col in df.select_dtypes(include=['float64','int64']).columns:
  df[col]=df[col].fillna(df[col].median())
# Replacing Categorical Data
for col in df.select_dtypes(include=['object']).columns:
  df[col]=df[col].fillna(df[col].mode()[0])
print(df.isnull().sum())

In [None]:
# Handaling Outlier
numeric_col=df.select_dtypes(include=['float64','int64']).columns
for col in numeric_col:
  q1=df[col].quantile(0.25)
  q3=df[col].quantile(0.75)
  iqr=q3-q1
  lb=q1-1.5*iqr
  ub=q3+1.5*iqr
df=df[(df[col]>=lb)&(df[col]<=ub)]

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# Top 5 country by Shipment count
print(df['Country'].value_counts().head(5))
plt.figure(figsize=(8,4))
df['Country'].value_counts().head(5).plot(kind='bar', color="orange")
plt.title("Top 5 Countries by Shipments")
plt.ylabel("Count")
plt.show()
# Top 5 Vendors
print("\nTop 5 Vendors:\n")
print(df['Vendor'].value_counts().head(5))
plt.figure(figsize=(8,4))
df['Vendor'].value_counts().head(5).plot(kind='bar', color="teal")
plt.title("Top 5 Vendors")
plt.ylabel("Count")
plt.show()

In [None]:
print(df['Freight Cost (USD)'].unique()[:20])

In [None]:
df['Freight Cost (USD)'] = pd.to_numeric(df['Freight Cost (USD)'], errors='coerce')
df['Line Item Value'] = pd.to_numeric(df['Line Item Value'], errors='coerce')
df[['Freight Cost (USD)', 'Line Item Value']] = df[['Freight Cost (USD)', 'Line Item Value']].fillna(0)
# 3. Correlation between Freight Cost and Line Item Value
print("\nCorrelation between Freight Cost & Line Item Value:\n")
print(df[['Freight Cost (USD)', 'Line Item Value']].corr())

plt.figure(figsize=(6,4))
sns.scatterplot(x="Line Item Value", y="Freight Cost (USD)", data=df, alpha=0.5)
plt.title("Freight Cost vs Line Item Value")
plt.show()

Shipment Mode vs On-Time Delivery

In [None]:
#1. Shipment Mode vs On-time Delivery(Get insight on which mode is more reliable)
df["PO Sent to Vendor Date"] = pd.to_datetime(df["PO Sent to Vendor Date"], errors="coerce")
df["Scheduled Delivery Date"] = pd.to_datetime(df["Scheduled Delivery Date"], errors="coerce")
df["Delivered to Client Date"] = pd.to_datetime(df["Delivered to Client Date"], errors="coerce")

# On-time / Delayed calculate
df["On_Time"] = df["Delivered to Client Date"] <= df["Scheduled Delivery Date"]

# Plot
plt.figure(figsize=(8,5))
sns.countplot(data=df, x="Shipment Mode", hue="On_Time")
plt.title("Shipment Mode vs On-Time Delivery")
plt.xlabel("Shipment Mode")
plt.ylabel("Count of Shipments")
plt.legend(title="On Time", labels=["Delayed","On Time"])
plt.show()

#Top 10 Countries with Highest Delivery Delays

In [None]:
# Get insight on which coutries are delivery late regularly
df['On_Time'] = df['Delivered to Client Date'] <= df['Scheduled Delivery Date']

# Delay % by Country
country_delay = 1 - df.groupby('Country')['On_Time'].mean()
country_delay.sort_values(ascending=False).head(10).plot(kind='barh', color='tomato')
plt.title("Top 10 Countries with Highest Delivery Delays")
plt.xlabel("Delay %")
plt.ylabel("Country")
plt.show()

#On-Time Delivery Share by Management Team

In [None]:
team_perf = df.groupby('Managed By')['On_Time'].mean()
team_perf.plot(kind='pie', autopct='%1.1f%%', figsize=(6,6), cmap="Set3")
plt.title("On-Time Delivery Share by Management Team")
plt.ylabel("")
plt.show()

#Shipmet Weight vs Insurance cost

In [None]:
plt.figure(figsize=(6,4))
sns.scatterplot(x='Weight (Kilograms)', y='Line Item Insurance (USD)', data=df, alpha=0.6)
plt.title("Shipment Weight vs Insurance Cost")
plt.xlabel("Weight (Kilograms)")
plt.ylabel("Insurance Cost (USD)")
plt.show()

#Lead time vs Delivery Performance

In [None]:
df['PO Sent to Vendor Date'] = pd.to_datetime(df['PO Sent to Vendor Date'], errors='coerce')
df['Scheduled Delivery Date'] = pd.to_datetime(df['Scheduled Delivery Date'], errors='coerce')

# Calculate lead time in days
df['Lead_Time'] = (df['Scheduled Delivery Date'] - df['PO Sent to Vendor Date']).dt.days
plt.figure(figsize=(10,6))
sns.kdeplot(data=df, x='Lead_Time', hue='On_Time', fill=True)
plt.title("Lead Time vs Delivery Performance")
plt.xlabel("Lead Time (Days)")
plt.ylabel("Density")
plt.show()

# **Conclusion**

The analysis of FedEx Logistics supply chain data highlights critical trends that influence operational efficiency and delivery performance.

1. Shipment Mode Impact – Air shipments show the highest likelihood of on-time delivery, outperforming Sea and Road modes. Although costlier, Air freight proves more reliable for urgent and high-priority shipments. FedEx can use this insight to balance cost and timeliness when allocating shipment modes.

2. Country-wise Delays – Certain countries consistently face higher delays, often due to customs clearance issues, regional infrastructure challenges, or vendor inefficiencies. By identifying these regions, FedEx can take targeted actions such as partnering with local logistics providers, redesigning routes, or improving vendor contracts.

3. Team / Region Performance – Performance varies across teams and regions. For example, some management offices like PMO–US deliver better on-time performance compared to others. This highlights an opportunity to replicate the practices of high-performing teams and provide additional training or resources where delays are frequent.

4. Weight vs Insurance Costs – Heavier shipments show a strong correlation with higher insurance costs. This is expected since heavier goods carry greater risks. FedEx can optimize by consolidating smaller loads, improving packaging, or renegotiating insurance terms to manage these expenses.

5. Lead Time vs Delivery Performance – Longer lead times generally correlate with better on-time performance, while very short lead times increase the risk of delays. This indicates that unrealistic scheduling creates bottlenecks. By setting optimal lead times, FedEx can balance customer expectations with operational feasibility.

The findings show that shipment mode, country-specific challenges, regional management practices, shipment weight, and lead time planning are the most influential factors in FedEx’s delivery performance. Addressing these areas will help reduce delays, cut costs, and improve customer satisfaction. With growing global eCommerce, these insights position FedEx to optimize its logistics network and maintain a strong competitive advantage.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***