# **Project Name**    -  FeDex Logistics EDA



##### **Project Type**    - EDA
##### **Contribution**    - Individual

# **Project Summary -**

This project focuses on performing Exploratory Data Analysis (EDA) on a FedEx logistics dataset, aiming to uncover insights and optimize logistics processes. The dataset consists of 10,324 entries and 33 columns, containing detailed shipment, product, and logistics information such as project codes, shipment modes, product descriptions, freight costs, and delivery dates. Key columns include shipment-related details like `Shipment Mode`, `Scheduled Delivery Date`, and `Freight Cost (USD)`, along with product-specific information such as `Brand`, `Dosage`, and `Item Description`.

EDA steps in this project include cleaning the data by handling missing values, converting incorrect data types, and managing outliers. Data visualization techniques are applied to identify patterns and correlations between different attributes, especially focusing on shipment costs, product types, and delivery performance. Descriptive statistics are used to summarize the key metrics, while deeper exploratory analysis helps identify areas for improvement in the logistics and supply chain process. The project uses tools like Pandas for data manipulation, Matplotlib and Seaborn for visualization, and Jupyter Notebook for analysis documentation. Insights gained from the analysis can potentially lead to improved decision-making and cost optimization in logistics operations.

# **GitHub Link -**

https://github.com/kush-agra-soni/12_fedex_logistics_eda.git

# **Problem Statement**


The problem statement for this project is to perform Exploratory Data Analysis (EDA) on a FedEx logistics dataset to identify patterns, trends, and insights that can optimize logistics processes and improve decision-making. The dataset contains information on over 10,000 shipments, including shipment details, product descriptions, freight costs, delivery dates, and more. The goal is to analyze the relationships between various attributes such as shipment modes, delivery performance, product types, and associated costs. The project aims to:

1. Clean the dataset by handling missing values, correcting data types, and addressing any inconsistencies.
2. Explore and visualize key metrics such as shipment costs, delivery times, and product characteristics.
3. Identify factors that impact freight costs, delivery delays, and shipment efficiency.
4. Provide actionable insights and recommendations for optimizing FedEx’s logistics operations, such as cost reduction strategies, improvements in shipment scheduling, and enhancements in product categorization.

By conducting thorough EDA, the project seeks to uncover valuable insights that can help FedEx enhance their logistics strategy, streamline operations, and ultimately deliver more efficient and cost-effective services to their clients.

#### **Define Your Business Objective?**

The business objective of this project is to leverage data-driven insights to optimize the logistics operations of FedEx, focusing on improving efficiency, reducing costs, and enhancing customer satisfaction. By performing Exploratory Data Analysis (EDA) on the logistics dataset, the aim is to:

1. **Optimize Freight Costs**: Identify patterns and factors that influence freight costs to develop strategies for reducing shipping expenses while maintaining service quality.

2. **Enhance Delivery Efficiency**: Analyze delivery timelines and factors contributing to delays, enabling improvements in shipment scheduling, route optimization, and resource allocation to ensure timely deliveries.

3. **Improve Product Categorization**: Explore how product classifications and descriptions impact logistics processes, leading to better inventory management and streamlined handling of shipments.

4. **Identify Operational Bottlenecks**: Pinpoint areas of inefficiency or inconsistencies in the logistics chain, allowing for process improvements that increase operational productivity.

5. **Support Data-Driven Decision Making**: Provide actionable insights to decision-makers, enabling them to make informed choices that optimize logistics operations, reduce costs, and improve overall service quality.

Ultimately, the business objective is to transform data into valuable insights that lead to more cost-effective, efficient, and reliable logistics solutions, benefiting both FedEx and its clients.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objs as go
import missingno as msno
from sklearn.preprocessing import StandardScaler
from scipy import stats
import plotly.express as px
import plotly.io as pio

### Dataset Loading

In [None]:
# Load Dataset

# GitHub raw URLs for your datasets
dataset_url = "https://raw.githubusercontent.com/kush-agra-soni/12_fedex_logistics_eda/refs/heads/main/SCMS_Delivery_History_Dataset.csv"

df = pd.read_csv(dataset_url)

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

### What did you know about your dataset?

The dataset in question is a comprehensive FedEx logistics dataset containing 10,324 entries and 33 columns, providing detailed information about shipments, product attributes, and logistics operations. The dataset includes both categorical and numerical data, covering various aspects of the shipment process, product details, and costs.

Key characteristics of the dataset include:

1. **Shipment Details**:
   - Columns such as `Project Code`, `PQ #`, `PO / SO #`, and `ASN/DN #` provide unique identifiers for each shipment, allowing for tracking and linking of relevant data.
   - `Shipment Mode` captures the mode of shipment (e.g., air, sea, land), and `Scheduled Delivery Date`, `Delivered to Client Date`, and `Delivery Recorded Date` provide information on shipment timelines.

2. **Product Information**:
   - Columns like `Product Group`, `Item Description`, `Brand`, and `Dosage` describe the nature of the products being shipped, including any pharmaceutical or medical items.
   - `Weight (Kilograms)` and `Unit of Measure (Per Pack)` provide details about the size and quantity of the products.

3. **Cost Information**:
   - `Freight Cost (USD)` and `Line Item Insurance (USD)` are critical for understanding shipping costs and any additional expenses related to insurance coverage.
   - `Pack Price`, `Unit Price`, and `Line Item Value` reflect the monetary aspects of the shipped items.

4. **Missing Data**:
   - Some columns have missing values, particularly in `Shipment Mode`, `Dosage`, `Line Item Insurance (USD)`, and `Weight (Kilograms)`, which need to be handled during data cleaning.

5. **Data Types**:
   - The dataset includes a mix of numerical (e.g., `Weight`, `Freight Cost`), categorical (e.g., `Country`, `Vendor`), and date-based (e.g., `PQ First Sent to Client Date`, `Scheduled Delivery Date`) data types.

Understanding these elements of the dataset is crucial for performing a thorough analysis, identifying trends, and providing actionable insights that can improve FedEx's logistics operations.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

The dataset includes several key variables that provide detailed insights into the logistics operations. **ID** is the unique identifier for each shipment entry, while **Project Code**, **PQ #**, and **PO / SO #** represent various order and project identifiers. **ASN/DN #** refers to the advanced shipment notice or delivery note number. **Country** indicates the destination country of the shipment, and **Managed By** specifies the person or team responsible for the shipment. **Fulfill Via** defines the method of shipment fulfillment, and **Vendor INCO Term** outlines the shipping terms. **Shipment Mode** denotes the mode of transportation (e.g., air, sea, land). Dates such as **PQ First Sent to Client Date**, **PO Sent to Vendor Date**, **Scheduled Delivery Date**, **Delivered to Client Date**, and **Delivery Recorded Date** track various stages of the shipment process. Product-related columns include **Product Group**, **Sub Classification**, **Vendor**, **Item Description**, **Molecule/Test Type**, and **Brand**. **Dosage** and **Dosage Form** provide specifics on the product's dosage and form, with some missing values. **Unit of Measure (Per Pack)**, **Line Item Quantity**, and **Line Item Value** reflect the shipment's unit details and monetary value. **Pack Price**, **Unit Price**, and **Freight Cost (USD)** give cost-related information, while **Manufacturing Site** and **First Line Designation** refer to the manufacturing location and classification. **Weight (Kilograms)** and **Line Item Insurance (USD)** indicate shipment weight and insurance cost, respectively.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_values = df.nunique()
unique_values

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Remove specified columns
df = df.drop(columns=['PQ #', 'PO / SO #', 'ASN/DN #', 'PQ First Sent to Client Date', 'PO Sent to Vendor Date'])

# Convert 'Weight (Kilograms)' and 'Freight Cost (USD)' to numeric, handling errors (non-numeric values)
df['Weight (Kilograms)'] = pd.to_numeric(df['Weight (Kilograms)'], errors='coerce')
df['Freight Cost (USD)'] = pd.to_numeric(df['Freight Cost (USD)'], errors='coerce')

# Fill missing values

# For 'Shipment Mode', since it's categorical, use the mode (most frequent value)
shipment_mode_mode = df['Shipment Mode'].mode()[0]
df['Shipment Mode'] = df['Shipment Mode'].fillna(shipment_mode_mode)

# For 'Line Item Insurance (USD)', use the median (since it's a numeric value)
line_item_insurance_median = df['Line Item Insurance (USD)'].median()
df['Line Item Insurance (USD)'] = df['Line Item Insurance (USD)'].fillna(line_item_insurance_median)

# For 'Dosage', if it's categorical, use mode or drop rows if it's not needed
dosage_mode = df['Dosage'].mode()[0] if df['Dosage'].isnull().sum() > 0 else None
if dosage_mode:
    df['Dosage'] = df['Dosage'].fillna(dosage_mode)

# If other columns have missing numeric values, you can fill them with the mean (or median if skewed)
df['Weight (Kilograms)'] = df['Weight (Kilograms)'].fillna(df['Weight (Kilograms)'].mean())
df['Freight Cost (USD)'] = df['Freight Cost (USD)'].fillna(df['Freight Cost (USD)'].mean())

# Round 'Weight (Kilograms)' and 'Freight Cost (USD)' to 2 decimal places
df['Weight (Kilograms)'] = df['Weight (Kilograms)'].round(2)
df['Freight Cost (USD)'] = df['Freight Cost (USD)'].round(2)

### What all manipulations have you done and insights you found?

Here’s a breakdown of the data manipulations performed and the insights gained:

### 1. **Column Removal:**
   - **Removed Unnecessary Columns:**
     - Columns like `PQ #`, `PO / SO #`, `ASN/DN #`, `PQ First Sent to Client Date`, and `PO Sent to Vendor Date` were removed as they seemed to be either identifiers or irrelevant to the analysis at hand. These columns likely contain unique or repetitive data that wouldn't add value for the analysis, improving the focus on more useful attributes.

### 2. **Data Type Conversion:**
   - **Converting Non-Numeric Columns to Numeric:**
     - **`Weight (Kilograms)` and `Freight Cost (USD)`** columns were initially in non-numeric formats (possibly due to incorrect data types or special characters). These were converted into numeric types using `pd.to_numeric()`. This step ensures proper handling of these columns for mathematical and statistical operations like calculating mean and filling missing values.
     - `errors='coerce'` was used to convert any non-numeric values to `NaN`, ensuring that errors due to invalid data would not stop the process.

### 3. **Handling Missing Data:**
   - **Filling Missing Data for Categorical Columns:**
     - **`Shipment Mode`** (a categorical column) had missing values. The most frequent value (mode) of this column was used to fill the missing entries. This is a common technique for categorical data, as the mode represents the most common or "default" category.
     
     - **`Dosage`** (another categorical column) had missing values. We used the mode of this column (if missing values exist) to fill the gaps. If the column had no missing values, this step is skipped.
   
   - **Filling Missing Data for Numeric Columns:**
     - **`Line Item Insurance (USD)`** had missing values, so the median was used to fill the gaps. The median is often preferred for numeric columns when the data might be skewed because it is less affected by outliers.
     - **`Weight (Kilograms)` and `Freight Cost (USD)`** had missing values and were filled with their respective means. For numeric data that is relatively normally distributed, using the mean is a reasonable approach. However, the data might be checked later for skewness or outliers if further accuracy is needed.

### 4. **Rounding Values:**
   - After filling missing values, **`Weight (Kilograms)`** and **`Freight Cost (USD)`** were rounded to **2 decimal places**. Rounding ensures that the data is clean and follows a standardized format, improving readability and precision, especially when these values will be used for analysis or reporting.

### 5. **Missing Data Check:**
   - At the end of the process, a check was done for any remaining missing values across all columns. This ensures that all missing data has been properly handled. If any column still contains missing values, further treatment may be needed.

### Insights:

- **Data Quality Improvement:**
  - The removal of unnecessary columns reduces noise in the dataset, focusing on more relevant features for analysis.
  - Converting the weight and cost columns into numeric types ensures they can be used in mathematical calculations without errors.
  
- **Handling of Missing Values:**
  - For categorical data like `Shipment Mode` and `Dosage`, filling with the mode ensures that the most common value is used, maintaining the consistency of the dataset.
  - For numerical data, using the mean (for normal distribution) or median (for skewed data) helps ensure that the missing values don’t bias the analysis.

- **Data Precision:**
  - Rounding `Weight (Kilograms)` and `Freight Cost (USD)` ensures a more standardized and easily interpretable dataset, especially when dealing with financial and physical quantities where extra precision is not necessary.

- **Next Steps:**
  - Now that missing values are handled and the data is cleaned, it’s easier to perform deeper statistical analysis, model training, or reporting.
  - Further exploration could include checking for outliers or data trends, particularly in `Weight (Kilograms)` and `Freight Cost (USD)`, as these are critical columns for the analysis of shipping and logistics.
  
This approach leads to a cleaner, more reliable dataset, which is crucial for accurate analysis, model training, or reporting.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 Distribution of Freight Cos

In [None]:
plt.figure(figsize=(10, 6))
sns.histplot(df['Freight Cost (USD)'], kde=True, bins=30)
plt.title('Distribution of Freight Cost (USD)')
plt.xlabel('Freight Cost (USD)')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

This histogram is appropriate for visualizing the distribution of a continuous numerical variable like freight cost. It effectively displays the frequency of different cost ranges, allowing us to quickly identify patterns and trends.

##### 2. What is/are the insight(s) found from the chart?

- Right-skewed distribution: The majority of freight costs are concentrated in the lower range, with a long tail towards higher costs. This indicates that most shipments have relatively low freight costs, while a smaller proportion incurs significantly higher expenses.
- Mode around $0: The highest frequency of shipments falls within the lowest cost range, likely representing shipments with minimal or no freight charges.
- Outliers: There are a few data points with exceptionally high freight costs, deviating from the general distribution. These could be due to factors like long distances, special handling requirements, or high-value goods.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights from the distribution of freight costs can positively impact business operations by:

- Identifying cost-saving opportunities: The right-skewed distribution suggests that focusing on reducing costs for the smaller proportion of high-cost shipments could yield significant savings.
- Optimizing shipping strategies: Understanding the factors contributing to high-cost shipments can help implement strategies like consolidating shipments, negotiating better rates with carriers, or optimizing routes to reduce freight expenses.
- Setting realistic pricing and budgeting: The distribution provides valuable information for setting competitive pricing and allocating budgets for shipping costs.
> While the insights themselves don't directly lead to negative growth, potential issues could arise from:

- Ignoring cost-saving opportunities: Failure to address the high-cost segment of shipments could result in unnecessary expenses and reduced profitability.
- Overlooking the impact of outliers: Neglecting the impact of outliers on overall shipping costs might lead to inaccurate budgeting and forecasting.
- Misinterpreting the distribution: Incorrectly interpreting the distribution could lead to suboptimal decision-making regarding shipping strategies and pricing.



#### Chart - 2 Relationship between Weight and Freight Cost

In [None]:
# Removed the outlier
df = df[df['ID'] != 3972]

plt.figure(figsize=(10, 6))
sns.scatterplot(x=df['Weight (Kilograms)'], y=df['Freight Cost (USD)'])
plt.title('Relationship between Weight and Freight Cost')
plt.xlabel('Weight (Kilograms)')
plt.ylabel('Freight Cost (USD)')
plt.show()

##### 1. Why did you pick the specific chart?

A scatter plot is the ideal choice for visualizing the relationship between two continuous numerical variables like weight and freight cost. It helps us identify patterns, trends, and potential correlations between the two variables.

##### 2. What is/are the insight(s) found from the chart?

- Positive correlation: There is a general positive trend between weight and freight cost, indicating that heavier shipments tend to have higher costs. However, the relationship is not perfectly linear.
- Outliers: Several data points deviate significantly from the general trend, representing shipments with either unusually high costs for their weight or unusually low costs for their weight. These outliers might be due to factors like special handling requirements, high-value goods, or negotiated discounts.
- Clusters: The data points seem to form clusters, suggesting that different groups of shipments might have distinct cost structures based on factors other than weight alone.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. The insights from the scatter plot can positively impact business operations by:

- Improving pricing strategies: By understanding the relationship between weight and cost, businesses can set more accurate and competitive pricing for shipments of different weights.
- Optimizing shipping strategies: Identifying outliers and clusters can help identify opportunities for cost reduction through consolidation, negotiation, or alternative shipping methods.
- Improving forecasting: Analyzing the patterns in the scatter plot can help businesses predict future shipping costs based on the weight of shipments.
2. While the insights themselves don't directly lead to negative growth, potential issues could arise from:

- Ignoring outliers: Neglecting the impact of outliers on pricing and cost estimation could lead to inaccurate calculations and financial losses.
- Oversimplifying the relationship: Assuming a perfectly linear relationship between weight and cost might lead to suboptimal decision-making.
- Failing to consider other factors: Overemphasizing weight as the sole determinant of cost might overlook other important factors like distance, handling requirements, and carrier rates.

#### Chart - 3  Shipment Mode vs. Freight Cost

In [None]:
plt.figure(figsize=(10, 6))
sns.barplot(x='Shipment Mode', y='Freight Cost (USD)', data=df, errorbar=None)
plt.title('Average Freight Cost by Shipment Mode')
plt.xlabel('Shipment Mode')
plt.ylabel('Average Freight Cost (USD)')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart is an appropriate choice for visualizing the average freight cost across different shipment modes. It effectively compares the average cost for each mode, making it easy to identify the most and least expensive options.

##### 2. What is/are the insight(s) found from the chart?

- Air Charter is the most expensive: Air charter has the highest average freight cost, significantly exceeding the costs of other modes.
- Truck is the most economical: Truck shipments have the lowest average cost, making it a cost-effective option for many businesses.
- Air and Ocean are similar: The average costs for air and ocean shipments are relatively close, suggesting that the choice between these modes might depend on factors other than cost, such as speed and reliability.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. The insights from the bar chart can positively impact business operations by:

- Optimizing shipping costs: By understanding the cost differences between shipment modes, businesses can choose the most cost-effective option for their specific needs.
- Improving decision-making: The insights can help businesses make informed decisions about trade-offs between cost, speed, and reliability when selecting shipment modes.
- Negotiating better rates: The knowledge of average costs can empower businesses to negotiate more favorable rates with carriers, especially for high-volume shipments.
2. While the insights themselves don't directly lead to negative growth, potential issues could arise from:

- Ignoring other factors: Focusing solely on cost might overlook other important factors like transit time, reliability, and cargo handling requirements.
- Overlooking the impact of outliers: If the average cost is significantly influenced by a few outliers, it might not accurately represent the typical cost for a particular shipment mode.
- Misinterpreting the chart: Incorrectly interpreting the chart might lead to suboptimal decisions, such as choosing a more expensive mode when a cheaper option is available.

#### Chart - 5 Line Item Quantity vs. Line Item Value

In [None]:
plt.figure(figsize=(10, 6))
sns.scatterplot(x=df['Line Item Quantity'], y=df['Line Item Value'])
plt.title('Line Item Quantity vs. Line Item Value')
plt.xlabel('Line Item Quantity')
plt.ylabel('Line Item Value (USD)')
plt.show()

##### 1. Why did you pick the specific chart?

A scatter plot is the appropriate choice for visualizing the relationship between two continuous numerical variables like line item quantity and line item value. It helps us identify patterns, trends, and potential correlations between the two variables.

##### 2. What is/are the insight(s) found from the chart?

- Positive correlation: There is a general positive trend between line item quantity and line item value, indicating that items with higher quantities tend to have higher values. However, the relationship is not perfectly linear.
- Outliers: Several data points deviate significantly from the general trend, representing items with either unusually high values for their quantity or unusually low values for their quantity. These outliers might be due to factors like high-value items, discounts, or promotional offers.
- Clusters: The data points seem to form clusters, suggesting that different groups of items might have distinct pricing structures or quantity-based discounts.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. The insights from the scatter plot can positively impact business operations by:

- Improving pricing strategies: By understanding the relationship between quantity and value, businesses can set more accurate and competitive pricing for items based on their quantity.
- Identifying opportunities for cost reduction: Identifying outliers and clusters can help identify opportunities for cost reduction through negotiation, consolidation, or alternative pricing strategies.
- Improving forecasting: Analyzing the patterns in the scatter plot can help businesses predict future sales revenue based on the quantity of items sold.
2. While the insights themselves don't directly lead to negative growth, potential issues could arise from:

- Ignoring outliers: Neglecting the impact of outliers on pricing and revenue estimation could lead to inaccurate calculations and financial losses.
- Oversimplifying the relationship: Assuming a perfectly linear relationship between quantity and value might lead to suboptimal decision-making.
- Failing to consider other factors: Overemphasizing quantity as the sole determinant of value might overlook other important factors like product category, brand, and customer preferences.

#### Chart - 6  Shipment Mode Distribution

In [None]:
shipment_mode_count = df['Shipment Mode'].value_counts()
shipment_mode_count.plot(kind='pie', figsize=(8, 8), autopct='%1.1f%%', startangle=90)
plt.title('Shipment Mode Distribution')
plt.ylabel('')
plt.show()

##### 1. Why did you pick the specific chart?

A pie chart is an appropriate choice for visualizing the distribution of categorical data like shipment modes. It effectively shows the proportion of each mode in relation to the total, making it easy to compare the relative frequencies.

##### 2. What is/are the insight(s) found from the chart?

- Air is the dominant mode: Air shipments account for the majority (70.1%) of the total shipments, indicating that air is the most frequently used mode of transportation.
- Truck is the second most common: Truck shipments make up 22.5% of the total, making it the second most popular mode.
- Air charter and ocean are less frequent: Air charter and ocean shipments have significantly lower proportions (2.6% and 4.7%, respectively), indicating that these modes are used less frequently.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. The insights from the pie chart can positively impact business operations by:

- Improving shipping strategy: Understanding the distribution of shipment modes can help businesses identify areas where they can optimize their shipping strategy, such as consolidating shipments, negotiating better rates, or exploring alternative modes.
- Allocating resources effectively: The insights can help businesses allocate resources appropriately to different shipment modes based on their relative importance and cost-effectiveness.
- Identifying potential cost-saving opportunities: By analyzing the distribution, businesses can identify modes that are underutilized or inefficient and explore ways to reduce costs associated with those modes.
2. While the insights themselves don't directly lead to negative growth, potential issues could arise from:

- Ignoring the impact of outliers: If the distribution is significantly influenced by a few outliers, it might not accurately represent the typical mode usage.
- Overlooking the importance of other factors: Focusing solely on the distribution of modes might overlook other important factors like cost, speed, and reliability.
- Misinterpreting the chart: Incorrectly interpreting the chart might lead to suboptimal decisions, such as over-reliance on a particular mode or underutilization of a cost-effective option.

#### Chart - 7  Total Line Item Value by Vendor

In [None]:
vendor_value = df.groupby('Vendor')['Line Item Value'].sum().nlargest(10)
vendor_value.plot(kind='barh', figsize=(14, 6), color='purple')
plt.title('Top 10 Vendors by Line Item Value')
plt.xlabel('Total Line Item Value (USD)')
plt.ylabel('Vendor')
plt.show()

##### 1. Why did you pick the specific chart?

A horizontal bar chart is appropriate for visualizing the total line item value by vendor. It effectively compares the relative values of different vendors, making it easy to identify the top-performing vendors.

##### 2. What is/are the insight(s) found from the chart?

- SCMS from RDC is the top vendor: SCMS from RDC has the highest total line item value, significantly exceeding the values of other vendors.
- Orgenics, Ltd. is the second-highest: Orgenics, Ltd. has the second-highest total line item value, indicating a strong contribution to the overall business.
- Other vendors have lower values: The remaining vendors have significantly lower total line item values compared to the top two, suggesting that they might have a smaller impact on the overall business.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Improving vendor relationships: Identifying the top-performing vendors can help businesses build stronger relationships with them, potentially leading to better deals, discounts, and improved service.
- Optimizing procurement strategies: Analyzing the distribution of line item values can help businesses identify opportunities for consolidation, standardization, or alternative sourcing strategies to reduce costs.
- Identifying potential risks: Identifying vendors with low performance or negative trends can help businesses mitigate risks associated with supply chain disruptions or quality issues.
> While the insights themselves don't directly lead to negative growth, potential issues could arise from:

- Overreliance on a single vendor: Overreliance on a single vendor, especially if they have a significant market share, could create risks in case of supply disruptions or price increases.
- Ignoring the impact of outliers: If the total line item value is significantly influenced by a few outliers, it might not accurately represent the typical performance of a vendor.
- Misinterpreting the chart: Incorrectly interpreting the chart might lead to suboptimal decisions, such as underinvesting in high-performing vendors or overinvesting in low-performing ones.

#### Chart - 8 Delivery Time Analysis

In [None]:
# Convert dates
df['Scheduled Delivery Date'] = pd.to_datetime(df['Scheduled Delivery Date'], errors='coerce')
df['Delivered to Client Date'] = pd.to_datetime(df['Delivered to Client Date'], errors='coerce')

# Calculate Delivery Time in days
df['Delivery Time'] = (df['Delivered to Client Date'] - df['Scheduled Delivery Date']).dt.days

# Filter out negative delivery times (if needed)
df = df[df['Delivery Time'] >= 0]

# Plot the data
plt.figure(figsize=(10, 6))
sns.lineplot(data=df, x='Scheduled Delivery Date', y='Delivery Time')
plt.title('Delivery Time vs Scheduled Delivery Date')
plt.xlabel('Scheduled Delivery Date')
plt.ylabel('Delivery Time (Days)')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

A line chart is an appropriate choice for visualizing the trend of delivery time over time. It helps us identify patterns, trends, and potential anomalies in the delivery performance.

##### 2. What is/are the insight(s) found from the chart?

- Fluctuating delivery times: The delivery times fluctuate significantly over the years, with periods of higher and lower delivery times. This indicates that there might be factors influencing delivery performance that are not consistent throughout the years.
- Spikes in delivery time: There are several instances of spikes in delivery time, indicating potential disruptions or challenges in the delivery process during those periods. These spikes might be due to factors like seasonal demand, supply chain issues, or unforeseen events.
- Overall downward trend: Despite the fluctuations, there seems to be a general downward trend in delivery times over the years, suggesting that there might be improvements in the delivery process or a reduction in factors affecting delivery performance.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Identifying areas for improvement: By analyzing the fluctuations and spikes in delivery time, businesses can identify specific areas where improvements can be made, such as inventory management, transportation logistics, or supplier relationships.
- Improving forecasting and planning: Understanding the historical trends and patterns in delivery times can help businesses improve their forecasting and planning processes, leading to better resource allocation and inventory management.
- Enhancing customer satisfaction: Reducing delivery times can lead to improved customer satisfaction, increased repeat business, and positive brand reputation.
> While the insights themselves don't directly lead to negative growth, potential issues could arise from:

- Ignoring the impact of outliers: If the trend is significantly influenced by a few outliers, it might not accurately represent the overall delivery performance.
- Oversimplifying the analysis: Focusing solely on the trend might overlook other important factors like seasonality, economic conditions, or industry-specific challenges.
- Misinterpreting the chart: Incorrectly interpreting the chart might lead to suboptimal decisions, such as implementing changes that do not address the root causes of delivery delays.

#### Chart - 9   Top 10 Brands by Line Item Value

In [None]:
top_brands = df.groupby('Brand')['Line Item Value'].sum().nlargest(10)
top_brands.plot(kind='bar', figsize=(10, 6), color='brown')
plt.title('Top 10 Brands by Line Item Value')
plt.xlabel('Brand')
plt.ylabel('Total Line Item Value (USD)')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

A vertical bar chart is appropriate for visualizing the total line item value by brand. It effectively compares the relative values of different brands, making it easy to identify the top-performing brands.

##### 2. What is/are the insight(s) found from the chart?

- Generic brand dominates: The generic brand has the highest total line item value, significantly exceeding the values of other brands. This suggests that generic products are popular or have higher sales volumes.
- Determine is the second-highest: Determine is the second-highest brand in terms of total line item value, indicating a strong market presence.
- Other brands have lower values: The remaining brands have significantly lower total line item values compared to the top two, suggesting that they might have a smaller market share or lower average selling prices.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1. The insights from the bar chart can positively impact business operations by:

- Improving inventory management: Identifying the top-performing brands can help businesses optimize their inventory levels and avoid stockouts or overstocking.
- Optimizing product assortment: Analyzing the distribution of line item values can help businesses identify underperforming products and make informed decisions about product discontinuation or promotional activities.
- Identifying opportunities for growth: By analyzing the performance of different brands, businesses can identify opportunities for growth through product launches, marketing campaigns, or strategic partnerships.
2. While the insights themselves don't directly lead to negative growth, potential issues could arise from:

- Overreliance on a single brand: Overreliance on a single brand, especially if it has a significant market share, could create risks in case of supply disruptions or changes in consumer preferences.
- Ignoring the impact of outliers: If the total line item value is significantly influenced by a few outliers, it might not accurately represent the typical performance of a brand.
- Misinterpreting the chart: Incorrectly interpreting the chart might lead to suboptimal decisions, such as underinvesting in high-performing brands or overinvesting in low-performing ones.

#### Chart - 10 Distribution of Freight Cost (USD) by Product Group

In [None]:
plt.figure(figsize=(12, 6))
sns.violinplot(x='Product Group', y='Freight Cost (USD)', data=df, hue='Product Group', palette='Set3', legend=False)

# Set title and labels
plt.title('Distribution of Freight Cost (USD) by Product Group', fontsize=14)
plt.xlabel('Product Group', fontsize=12)
plt.ylabel('Freight Cost (USD)', fontsize=12)

plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

A violin plot is an excellent choice for visualizing the distribution of a continuous numerical variable (freight cost) across different categorical groups (product groups). It combines the features of a box plot and a kernel density plot, providing a comprehensive view of the distribution, including density, quartiles, and outliers.

##### 2. What is/are the insight(s) found from the chart?

- Variation in distribution: The violin plots show that the distribution of freight costs varies across different product groups. Some groups have a wider range of costs, while others are more concentrated.
- Outliers: The vertical lines extending from the violins represent outliers, indicating that there are some shipments with exceptionally high or low freight costs within each product group.
- Median differences: The position of the white dots (medians) within each violin suggests differences in the central tendency of freight costs across product groups. Some groups have higher median costs than others.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Identifying cost-saving opportunities: By understanding the distribution of freight costs for each product group, businesses can identify areas where costs can be reduced, such as through negotiation with carriers, consolidation of shipments, or optimization of packaging.
- Improving pricing strategies: The insights can help businesses set more accurate and competitive pricing for different product groups based on their typical freight costs.
- Optimizing inventory management: By understanding the variability in freight costs, businesses can make informed decisions about inventory levels and stocking strategies to minimize transportation expenses.
> While the insights themselves don't directly lead to negative growth, potential issues could arise from:

- Ignoring the impact of outliers: Neglecting the impact of outliers on freight costs might lead to inaccurate cost estimations and budgeting.
- Oversimplifying the analysis: Focusing solely on the median or mean values might overlook important variations within each product group.
- Misinterpreting the chart: Incorrectly interpreting the violin plot might lead to suboptimal decisions, such as implementing changes that do not address the root causes of high freight costs.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

**Key Recommendations:**

* **Data Quality:** Ensure data accuracy and consistency for reliable insights.
* **Continuous Improvement:** Monitor KPIs and implement data-driven decisions.
* **Collaboration and Communication:** Foster teamwork and effective communication.

**Specific Actions:**

* **Optimize Freight Costs:** Focus on high-cost shipments, negotiate rates, and optimize mode selection.
* **Refine Pricing and Inventory:** Align pricing with costs, consider quantity discounts, and optimize inventory levels.
* **Strengthen Vendor Relationships:** Build stronger relationships with top-performing vendors and diversify the supplier base.
* **Improve Delivery Performance:** Identify root causes of delays and set realistic delivery targets.
* **Optimize Product Portfolio:** Analyze brand performance and make informed decisions about product launches and discontinuations.

By implementing these recommendations and continuously monitoring performance, the client can achieve their business objectives and improve their overall efficiency and profitability.


# **Conclusion**

By leveraging data-driven insights, the client can significantly enhance their business operations and achieve sustainable growth. The analysis of various charts and plots has provided valuable insights into areas such as freight costs, shipment modes, vendor performance, and product distribution. By addressing the identified issues and implementing the recommended strategies, the client can optimize their supply chain, improve customer satisfaction, and ultimately increase profitability.

It is crucial to note that data analysis is an ongoing process. Continuous monitoring and evaluation of key performance indicators will be essential to identify emerging trends and adapt strategies accordingly. By embracing a data-driven approach, the client can stay ahead of the competition and achieve long-term success.
