# **Project Name**    -



FedEx Logistics Performance Analysis(Exploratory Data Analysis)



# **Project Summary -**

Business Context FedEx Logistics manages a complex global supply chain, dealing with shipments across various regions, countries, and industries. This dataset provides an in-depth look at their logistics processes, capturing important information on purchase orders (POs), shipment methods, vendor agreements (INCO terms), delivery schedules, and product-specific details such as item descriptions and dosage forms. Effective management of these processes ensures timely delivery, minimizes freight costs, and improves customer satisfaction.

Given the rise of eCommerce and global distribution, companies like FedEx must continuously optimize their logistics operations to maintain a competitive edge. The dataset is designed to provide insights into how shipments are managed, identify bottlenecks or delays, and ensure cost-effectiveness. By analyzing the data, FedEx Logistics aims to streamline supply chain operations, improving delivery timelines and reducing costs for both the company and its customers.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


Efficient Approach to the Problem Statement

Clarity in defining the problem and the approach to solving it:
Innovative and efficient methodologies employed to achieve the project’s objectives.

Data Exploration Techniques and Logic:
Depth of exploratory data analysis (EDA) conducted to understand the dataset.
Logical use of tools and techniques to uncover trends, patterns, and anomalies.

Handling of Missing Values and Outliers:
Effectiveness of methods used to manage missing data and outliers.
Rationale behind the approach chosen, such as imputation or removal of anomalies.

Visualization Logic:
Quality and relevance of visualizations used to represent data insights.
Use of appropriate charts, plots, and graphs to communicate findings effectively.

Forming Insights and Understandings:
Ability to generate actionable insights from the data.
Logical deductions and the formation of conclusions that align with the project goals.

Stakeholder Usefulness:
Clear understanding of how the project results are beneficial to stakeholders.
Demonstration of how insights can drive decision-making and add value to the business or target audience.

#### **Define Your Business Objective?**

Answer Here.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go



### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')
df=pd.read_csv('/content/drive/MyDrive/Projects/project2/SCMS_Delivery_History_Dataset.csv')


In [None]:
df.columns

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
num_duplicates = df.duplicated().sum()
print("Number of duplicate rows:", num_duplicates)


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_per_col = df.isnull().sum()
print(missing_per_col)


In [None]:
# Visualizing the missing values
import missingno as msno
import matplotlib.pyplot as plt

# assuming df is your DataFrame
msno.bar(df)            # bar chart: count of non-missing vs missing per column
plt.show()

# msno.matrix(df)         # matrix view: where missing values are (by row/column)
# plt.show()

# msno.heatmap(df)        # correlation among missingness of columns
# plt.show()

# msno.dendrogram(df)     # clusters columns by similarity of missingness pattern
# plt.show()


In [None]:
df.columns

### What did you know about your dataset?

the dataset have['ID', 'Project Code', 'PQ #', 'PO / SO #', 'ASN/DN #', 'Country',
       'Managed By', 'Fulfill Via', 'Vendor INCO Term', 'Shipment Mode',
       'PQ First Sent to Client Date', 'PO Sent to Vendor Date',
       'Scheduled Delivery Date', 'Delivered to Client Date',
       'Delivery Recorded Date', 'Product Group', 'Sub Classification',
       'Vendor', 'Item Description', 'Molecule/Test Type', 'Brand', 'Dosage',
       'Dosage Form', 'Unit of Measure (Per Pack)', 'Line Item Quantity',
       'Line Item Value', 'Pack Price', 'Unit Price', 'Manufacturing Site',
       'First Line Designation', 'Weight (Kilograms)', 'Freight Cost (USD)',
       'Line Item Insurance (USD)'] columns and The data set has about 10,324 rows.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

In [None]:
df.describe(include='object').columns

In [None]:
df.describe(include=['int64','float64']).columns

### Variables Description

This data set have ['Project Code', 'PQ #', 'PO / SO #', 'ASN/DN #', 'Country',
       'Managed By', 'Fulfill Via', 'Vendor INCO Term', 'Shipment Mode',
       'PQ First Sent to Client Date', 'PO Sent to Vendor Date',
       'Scheduled Delivery Date', 'Delivered to Client Date',
       'Delivery Recorded Date', 'Product Group', 'Sub Classification',
       'Vendor', 'Item Description', 'Molecule/Test Type', 'Brand', 'Dosage',
       'Dosage Form', 'Manufacturing Site', 'First Line Designation',
       'Weight (Kilograms)', 'Freight Cost (USD)'] as categorial columns and
       ['ID', 'Unit of Measure (Per Pack)', 'Line Item Quantity',
       'Line Item Value', 'Pack Price', 'Unit Price',
       'Line Item Insurance (USD)'] as numerical columns.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
# import pandas as pd

# # load your dataset
# df = pd.read_csv('/content/drive/MyDrive/Projects/project2/SCMS_Delivery_History_Dataset.csv')

# For each column, print the number of unique values and optionally a few examples
for col in df.columns:
    unique_vals = df[col].unique()
    num_unique = df[col].nunique(dropna=False)  # include NaN in count
    # sample up to first 10 unique values to display
    sample_vals = unique_vals[:10]
    print(f"Column: {col}")
    print(f"  Unique count: {num_unique}")
    print(f"  Sample values: {sample_vals}")
    print("-" * 40)


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.


def standardize_column_names(df):

    df = df.copy()
    df.columns = (
        df.columns
        .str.strip()
        .str.lower()
        .str.replace(' ', '_')
        .str.replace(r'[^\w]', '_', regex=True)
    )
    return df

def clean_text_columns(df):

    for col in df.select_dtypes(include=['object']):
        df[col] = (
            df[col]
            .astype(str)
            .str.strip()
            .replace('nan', np.nan)  # optional: “nan” strings → actual NaN
        )
    return df

def convert_data_types(df, dtype_map=None):

    df = df.copy()
    if dtype_map is None:
        dtype_map = {}
    for col, dtype in dtype_map.items():
        try:
            df[col] = df[col].astype(dtype)
        except Exception:
            # fallback: use pd.to_datetime or pd.to_numeric with errors='coerce'
            if dtype == 'datetime':
                df[col] = pd.to_datetime(df[col], errors='coerce', dayfirst=True)
            elif dtype == 'numeric':
                df[col] = pd.to_numeric(df[col], errors='coerce')
            else:
                # try generic conversion
                df[col] = df[col].apply(dtype)
    return df

def drop_fully_empty_rows_and_cols(df):

    df = df.dropna(axis=0, how='all')
    df = df.dropna(axis=1, how='all')
    return df

def drop_duplicates(df, keep='first'):

    df2 = df.drop_duplicates(keep=keep).reset_index(drop=True)
    return df2

def impute_missing_values(df, strategy_map=None):

    df = df.copy()
    if strategy_map is None:
        strategy_map = {}
    for col, (method, fill_value) in strategy_map.items():
        if method == 'mean':
            df[col] = df[col].fillna(df[col].mean())
        elif method == 'median':
            df[col] = df[col].fillna(df[col].median())
        elif method == 'mode':
            mode = df[col].mode()
            if not mode.empty:
                df[col] = df[col].fillna(mode.iloc[0])
        elif method in ('ffill', 'bfill'):
            df[col] = df[col].fillna(method=method)
        elif method == 'constant':
            df[col] = df[col].fillna(fill_value)
        elif callable(method):
            df[col] = df[col].fillna(method(df[col]))
        else:
            # fallback: constant
            df[col] = df[col].fillna(fill_value)
    return df

def encode_categorical(df, cat_cols=None, drop_first=True):

    df = df.copy()
    if cat_cols is None:
        # heuristically pick object / category dtypes
        cat_cols = df.select_dtypes(include=['object', 'category']).columns.tolist()
    # Option: convert to category dtype
    for col in cat_cols:
        df[col] = df[col].astype('category')
    # One-hot (dummy) encoding if needed
    df = pd.get_dummies(df, columns=cat_cols, drop_first=drop_first)
    return df

def detect_and_handle_outliers(df, numeric_cols=None, method='iqr', factor=1.5):

    df = df.copy()
    if numeric_cols is None:
        numeric_cols = df.select_dtypes(include=[np.number]).columns.tolist()
    for col in numeric_cols:
        col_data = df[col]
        if method == 'iqr':
            q1 = col_data.quantile(0.25)
            q3 = col_data.quantile(0.75)
            iqr = q3 - q1
            lower = q1 - factor * iqr
            upper = q3 + factor * iqr
            df[col] = col_data.clip(lower=lower, upper=upper)
        elif method == 'zscore':
            mean = col_data.mean()
            std = col_data.std()
            z = (col_data - mean) / std
            df = df[(z.abs() <= factor)]
    return df

def feature_engineering(df):

    df = df.copy()
    # Example: if you have datetime columns, you can compute differences
    # E.g.,
    if 'scheduled_delivery_date' in df.columns and 'delivered_to_client_date' in df.columns:
        df['delay_days'] = (
            df['delivered_to_client_date'] - df['scheduled_delivery_date']
        ).dt.days
    # Extract month/year from date columns
    for date_col in df.select_dtypes(include=['datetime64[ns]', 'datetime64']).columns:
        df[f'{date_col}_month'] = df[date_col].dt.month
        df[f'{date_col}_year'] = df[date_col].dt.year
    return df

def full_pipeline(path, dtype_map=None, strategy_map=None):


    df = standardize_column_names(df)
    df = clean_text_columns(df)
    df = convert_data_types(df, dtype_map=dtype_map)
    df = drop_fully_empty_rows_and_cols(df)
    df = drop_duplicates(df)
    df = impute_missing_values(df, strategy_map=strategy_map)
    # Possibly encode or leave categorical columns as categories
    # df = encode_categorical(df, cat_cols=[...])
    df = feature_engineering(df)
    return df

# Example usage:




### What all manipulations have you done and insights you found?

1-Make column names consistent: lowercase, underscores, no leading/trailing spaces.

2-Clean whitespace, weird encoding, strings in object columns.

3-Convert columns to appropriate data types.
  #dtype_map: dict {col_name: dtype or conversion function}

4-Remove rows or columns that are entirely NaN.

5-Drop duplicate rows (exact match).

6-Fill or impute missing values.
  strategy_map: dict={col: (method, fill_value)}
  method can be 'mean', 'median', 'mode', 'ffill', 'bfill', 'constant', or custom function.

7-One-hot encode categorical variables or convert to category dtype.

8-Optionally detect and handle outliers (e.g. cap, remove, transform).
  method: 'iqr' or 'zscore'
  factor: multiplier

9-Add derived / useful columns (e.g. lead time, delay, month, year).



## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

Pie Chart on Dependant Variable(Univariate)

In [None]:
# Chart - 1 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Sample DataFrame (replace this with your actual DataFrame)
df = pd.DataFrame({
    'dependent_variable': ['A', 'B', 'A', 'C', 'B', 'A', 'C', 'A', 'B', 'C']
})

# Count occurrences of each category in the dependent variable
category_counts = df['dependent_variable'].value_counts()

# Plotting the pie chart
plt.figure(figsize=(8, 8))
category_counts.plot(kind='pie', autopct='%1.1f%%', startangle=90, colors=plt.cm.Paired.colors)
plt.title('Distribution of Dependent Variable')
plt.ylabel('')  # Hide the y-label
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

I chose a pie chart to visualize the distribution of your dependent variable because it effectively illustrates the proportional relationships among categories within a whole. This is particularly useful when you have a limited number of categories and want to emphasize how each category contributes to the total.

##### 2. What is/are the insight(s) found from the chart?

From the pie chart visualizing the distribution of your dependent variable (e.g., delivery_status, product_type, vendor), several insights can be drawn

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from analyzing the distribution of your dependent variable can significantly influence business outcomes, both positively and negatively, depending on the nature of the data and the actions taken.

Bar Chart

In [None]:
from google.colab import drive
drive.mount('/content/drive')
df=pd.read_csv('/content/drive/MyDrive/Projects/project2/SCMS_Delivery_History_Dataset.csv')

In [None]:
# Chart - 2 visualization code
import pandas as pd
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 6))
sns.barplot(x='Product Group', y='Line Item Value', data=df, ci=None, palette='viridis')
plt.title('Average Line Item Value by Product Group')
plt.xlabel('Product Group')
plt.ylabel('Average Line Item Value')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


I selected a bar chart to visualize the relationship between a categorical variable (e.g., 'Product Group') and a numerical variable (e.g., 'Line Item Value')

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Variation in Average Line Item Values:

The chart reveals that different product groups exhibit varying average line item values. For instance, 'Product Group A' may have a significantly higher average value compared to 'Product Group B'.

Identification of High-Value Product Groups:

Product groups with higher average line item values are easily identifiable. This information is crucial for prioritizing inventory management and sales strategies.

Potential for Profitability Analysis:

Understanding which product groups contribute more to revenue can aid in profitability analysis. Higher average values might indicate premium products or higher demand, suggesting areas for increased focus.

Strategic Decision Making:

The insights gained can inform strategic decisions such as pricing adjustments, promotional efforts, and resource allocation to maximize revenue and market share.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights derived from the bar chart—specifically, the variation in average 'Line Item Value' across different 'Product Group' categories—can significantly contribute to positive business outcomes.

Box Plot

In [None]:
# Chart - 3 visualization code
plt.figure(figsize=(10, 6))
sns.boxplot(x='Product Group', y='Line Item Value', data=df, palette='viridis')
plt.title('Distribution of Line Item Value by Product Group')
plt.xlabel('Product Group')
plt.ylabel('Line Item Value')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

When comparing multiple groups, box plots facilitate easy visualization of differences in medians, variability, and the presence of outliers, aiding in comparative analysis

##### 2. What is/are the insight(s) found from the chart?

Variation in Central Tendency:

The median 'Line Item Value' varies across 'Product Groups', indicating differing central tendencies. For instance, 'Product Group A' may have a higher median compared to 'Product Group B', suggesting that certain product categories consistently generate higher revenue.

Differences in Data Spread:

The interquartile range (IQR), represented by the box, varies among 'Product Groups'. A wider IQR indicates greater variability in 'Line Item Value' within that group, while a narrower IQR suggests more consistency. For example, 'Product Group C' might have a wider IQR, implying a diverse range of transaction values.

Presence of Outliers:

Outliers, depicted as individual points outside the whiskers, are present in some 'Product Groups'. These outliers could represent exceptional transactions, such as bulk orders or premium-priced items. Identifying these can help in understanding rare but impactful sales events.

Skewness Indication:

The symmetry of the box and whiskers provides insight into data skewness. If the upper whisker is longer than the lower one, it suggests a right-skewed distribution, where higher 'Line Item Values' are less frequent but more extreme. Conversely, a longer lower whisker indicates a left-skewed distribution.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights derived from the box plot can significantly contribute to creating a positive business impact.

(Univariate + Bivariate)

In [None]:
df.columns

In [None]:
# Chart - 4 visualization code
import seaborn as sns
import matplotlib.pyplot as plt

sns.histplot(df['Line Item Value'], kde=True)
plt.title('Distribution of Line Item Value')
plt.show()
sns.boxplot(x=df['Line Item Value'])
plt.title('Box Plot of Line Item Value')
plt.show()
sns.countplot(x='Country', data=df)
plt.title('Count of Entries by Country')
plt.xticks(rotation=90)
plt.show()
country_counts = df['Country'].value_counts()
country_counts.plot.pie(autopct='%1.1f%%')
plt.title('Proportion of Entries by Country')
plt.ylabel('')
plt.show()
sns.scatterplot(x='Line Item Value', y='Freight Cost (USD)', data=df)
plt.title('Line Item Value vs. Freight Cost')
plt.show()
sns.boxplot(x='Product Group', y='Line Item Value', data=df)
plt.title('Line Item Value by Product Group')
plt.xticks(rotation=90)
plt.show()


##### 1. Why did you pick the specific chart?

1. Numerical Variables (e.g., 'Line Item Value', 'Pack Price', 'Unit Price', 'Freight Cost (USD)', 'Weight (Kilograms)'):
Histogram: Ideal for visualizing the frequency distribution of a numerical variable. It helps in understanding the spread and skewness of the data.

Box Plot: Provides a summary of the data distribution, highlighting the median, quartiles, and potential outliers.

2. Categorical Variables (e.g., 'Country', 'Product Group', 'Vendor'):
Bar Chart: Displays the count of each category, making it easy to compare the frequency of categories.

Pie Chart: Shows the proportion of each category relative to the whole, useful for understanding the composition of categorical data.

🔗 Bivariate Analysis
Objective: Explore the relationship between two variables to identify patterns, correlations, or trends.

1. Numerical vs. Numerical (e.g., 'Line Item Value' vs. 'Freight Cost (USD)'):
Scatter Plot: Effective for visualizing the correlation between two numerical variables. It helps in identifying linear or non-linear relationships.

2. Categorical vs. Numerical (e.g., 'Product Group' vs. 'Line Item Value'):
Box Plot: Compares the distribution of a numerical variable across different categories, highlighting differences in medians and variability.

Bar Plot: Shows the average of a numerical variable for each category, useful for comparing central tendencies.

##### 2. What is/are the insight(s) found from the chart?

Both bar and box plots are instrumental in uncovering insights that can drive business improvements. Bar charts facilitate comparison across categories, while box plots provide a deeper understanding of data distribution and variability. Together, they enable businesses to make data-driven decisions that enhance performance and efficiency.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from bar and box plots can significantly contribute to positive business impact by enhancing decision-making processes, optimizing operations, and identifying areas for improvement.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import parallel_coordinates

# Sample data
data = {
    'Line Item Value': [100, 200, 300, 400, 500],
    'Pack Price': [10, 20, 30, 40, 50],
    'Unit Price': [5, 10, 15, 20, 25],
    'Freight Cost (USD)': [50, 100, 150, 200, 250],
    'Line Item Insurance (USD)': [5, 10, 15, 20, 25],
    'Product Group': ['A', 'B', 'A', 'B', 'A']
}

# Create DataFrame
df = pd.DataFrame(data)

# Plot
plt.figure(figsize=(10, 6))
parallel_coordinates(df, 'Product Group', color=('#556270', '#4ECDC4'))
plt.title('Parallel Coordinate Plot')
plt.xlabel('Variables')
plt.ylabel('Values')
plt.grid(True)
plt.show()


##### 1. Why did you pick the specific chart?

Handling Multivariate Data: It allows for the visualization of high-dimensional datasets, making it easier to identify relationships between variables.

Identifying Correlations and Patterns: By connecting data points across parallel axes, it facilitates the detection of trends, clusters, and outliers.

Comparing Multiple Variables: It enables the comparison of several variables at once, providing a comprehensive view of the data.

##### 2. What is/are the insight(s) found from the chart?

dentification of Correlations:

Parallel coordinate plots allow for the detection of linear or non-linear relationships between variables. For instance, if lines representing 'Line Item Value' and 'Pack Price' consistently move in parallel, it indicates a positive correlation between these two variables. Conversely, if the lines diverge, it suggests a negative correlation.

Detection of Clusters:

The plot can reveal clusters of similar observations. By grouping lines that follow similar paths across the axes, one can identify segments of the data that exhibit similar characteristics, aiding in segmentation and targeted analysis.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights derived from a parallel coordinate plot can significantly contribute to positive business outcomes. However, if misinterpreted or overlooked, certain patterns may inadvertently lead to negative growth.

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

# Sample DataFrame (replace with your actual DataFrame)
data = {
    'Line Item Quantity': [100, 200, 150, 300, 250],
    'Line Item Value': [1000, 2000, 1500, 3000, 2500],
    'Pack Price': [10, 20, 15, 30, 25],
    'Unit Price': [1, 2, 1.5, 3, 2.5],
    'Weight (Kilograms)': [5, 10, 7.5, 15, 12.5],
    'Freight Cost (USD)': [50, 100, 75, 150, 125],
    'Line Item Insurance (USD)': [5, 10, 7.5, 15, 12.5]
}

df = pd.DataFrame(data)

# Compute the correlation matrix
corr_matrix = df.corr()

# Set up the matplotlib figure
plt.figure(figsize=(10, 8))

# Generate a custom diverging colormap
cmap = sns.diverging_palette(230, 20, as_cmap=True)

# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(corr_matrix, dtype=bool))

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr_matrix, mask=mask, cmap=cmap, vmax=1.0, center=0,
            annot=True, annot_kws={"size": 10}, fmt='.2f',
            linewidths=0.5, cbar_kws={"shrink": 0.8})

# Add title
plt.title('Correlation Heatmap', fontsize=16)

# Display the heatmap
plt.show()


##### 1. Why did you pick the specific chart?

Quick Identification of Relationships: Correlation heatmaps provide an immediate visual representation of how variables relate to each other. The color intensity in each cell indicates the strength and direction of the correlation, making it easy to spot strong positive or negative relationships.

##### 2. What is/are the insight(s) found from the chart?

Identifying Strong Positive and Negative Correlations

Variables with correlation coefficients close to +1 indicate a strong positive relationship, meaning as one variable increases, the other tends to increase as well.

Conversely, coefficients close to -1 signify a strong negative relationship, where an increase in one variable corresponds to a decrease in the other.

A coefficient near 0 suggests little to no linear relationship between the variables.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
import seaborn as sns
import matplotlib.pyplot as plt

# Assuming 'df' is your DataFrame
numerical_columns = ['Line Item Quantity', 'Line Item Value', 'Pack Price', 'Unit Price',
                     'Weight (Kilograms)', 'Freight Cost (USD)', 'Line Item Insurance (USD)']
sns.pairplot(df[numerical_columns])
plt.show()


##### 1. Why did you pick the specific chart?

Comprehensive Visualization: A pair plot displays scatter plots for each pair of numerical variables, along with histograms or kernel density estimates (KDEs) on the diagonal. This layout allows for a quick assessment of how variables relate to each other and their individual distributions.

Correlation Detection: By examining the scatter plots, you can identify linear or nonlinear relationships between variables, which is crucial for understanding dependencies and multicollinearity.

Outlier Identification: The scatter plots can highlight data points that deviate significantly from the general trend, aiding in the detection of outliers that may require further investigation.

Feature Selection: In the context of machine learning, pair plots help in identifying which variables have strong relationships, guiding feature selection for model building.


Exploratory Data Analysis (EDA): Pair plots are a staple in EDA, providing insights into the structure and characteristics of the data, which can inform subsequent analysis steps.

##### 2. What is/are the insight(s) found from the chart?

Pair plots are invaluable tools in exploratory data analysis (EDA), offering a comprehensive view of relationships within a dataset. By visualizing pairwise relationships between numerical variables, they facilitate the identification of patterns, correlations, and anomalies.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

 Define Clear Business Objectives
Action: Engage in discussions with the client to clearly define their business goals. Ensure these objectives are Specific, Measurable, Achievable, Relevant, and Time-bound (SMART). This clarity will guide the project's direction and purpose.

Example: If the client's goal is to increase market share, a SMART objective could be: "Achieve a 10% increase in market share within the next 12 months by expanding into two new geographic regions."

 Align Project Objectives with Business Goals
Action: Develop project objectives that directly support the overarching business goals. This alignment ensures that the project's outcomes contribute to the client's strategic success.

Example: To support the market share expansion, a project objective might be: "Launch localized marketing campaigns in the identified regions within the next six months."

Establish Key Performance Indicators (KPIs)
Action: Define KPIs that will measure the project's success in achieving its objectives. These should be aligned with both project and business goals.

Example: KPIs for the marketing campaign could include metrics such as:

Number of new leads generated

Conversion rate from leads to customers

Customer acquisition cost

# **Conclusion**

This project aimed to analyze a comprehensive dataset encompassing various supply chain and product metrics, including shipment details, vendor information, and financial aspects. Through advanced data visualization techniques, such as pair plots and correlation heatmaps, we sought to uncover underlying patterns and relationships within the data.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***