# Adidas EDA and Sales Strategy

Analysis of sales data is essential to making strategic decisions in the dynamic retail market. This project's primary goal is to examine Adidas sales data in order to identify patterns, offer insightful commentary, and point out areas that might use improvement. Using Python and data analysis frameworks, the objective is to present a thorough study of Adidas' sales performance over a certain period of time.

# Data Description

* Retailer: The company or group that sells Adidas goods.
* Retailer ID: a unique number that each merchant is given.
* Invoice Date: The date when the sales transaction occurred.
* Region: The store's operational geographic territory.
* State: The state within the region where the retailer is located.
* City: The retailer's location in the city.
* Product: The item for sale is an Adidas product.
* Price per Unit: The price of a single Adidas product unit.
* Units Sold: The amount of Adidas merchandise sold in a certain transaction, expressed in units.
* Total Sales: The amount of money made from the sale of Adidas goods in a certain transaction.
* Operating Profit: The amount of money made from the sale of Adidas goods in a certain transaction.
* Operating Margin: The proportion of operational profit to overall sales.
* Sales Method: The process used to carry out the sales transaction.

In [None]:
#Importing libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')
import plotly.graph_objects as go


In [None]:
df = pd.read_excel("/kaggle/input/adidas-sales-dataset/Adidas US Sales Datasets.xlsx")
#Reading the excel file

In [None]:
df.head()

# Data Cleaning

It entails finding and fixing mistakes or discrepancies in datasets. Numerous circumstances might lead to the requirement for data cleansing, and accurate and clean data are necessary for trustworthy analysis and decision-making. 

In [None]:
df = df.drop(df.index[0:3]) #Removing First 3 rows

In [None]:
df.drop("Unnamed: 0", axis = 1, inplace = True) #Dropping first colums

In [None]:
df.columns = df.iloc[0] #Changing the column names

In [None]:
df = df.drop(df.index[0]) #Dropping the first row 

In [None]:
df = df.reset_index(drop=True) #Reseting the index 

In [None]:
df.duplicated().sum() #Checking whether the dataset have duplicated values

In [None]:
df.head()

The cleaned data does not have duplicated values and the index is reset.

In [None]:
df.info()

In [None]:
df['Invoice Date']=pd.to_datetime(df['Invoice Date']) #Changing datatype of Invoice Date to datetime

In [None]:
df.columns

In [None]:
df[['Price per Unit', 'Units Sold', 'Total Sales','Operating Profit', 'Operating Margin']] = df[['Price per Unit', 'Units Sold', 'Total Sales','Operating Profit', 'Operating Margin']].astype("float")
#Changing the datatype of Price per Unit', 'Units Sold', 'Total Sales','Operating Profit', 'Operating Margin' columns to float

In [None]:
df.info()

In [None]:
df['Year'] = df['Invoice Date'].dt.year
df['Month'] = df['Invoice Date'].dt.month
df['Day'] = df['Invoice Date'].dt.day

In [None]:
df.head()

# Data Visualization

The insights that are found out after Exploratory data analysis are
1. Top selling product category
2. Operating Profit by Retailer
3. Least and most sales - State and City wise
4. Price Per Product Distribution
5. Price per Unit vs Total Sales

# Top Selling Product Category

In [None]:
Top_prod = df.groupby('Product').agg({"Units Sold" : "sum"}).sort_values(by = "Units Sold", ascending = False).reset_index()
Top_prod

In [None]:
plt.figure(figsize=(8, 4))
sns.barplot(x=Top_prod["Product"], y=Top_prod["Units Sold"], palette = "deep")
plt.title('Top Selling Product Category ')
plt.xlabel('Products')
plt.ylabel('Units Sold')
plt.xticks(rotation = "vertical")
plt.show()

**Observations**

* The top selling category is Men's Street Footwear, followed by Men's Athletic Footwear.
* Men's Apparel holds last place.

# Operating Profit by Retailer

In [None]:
profit_by_retailer = df.groupby('Retailer').agg({'Operating Profit' : "sum"}).reset_index().sort_values(by='Operating Profit', ascending=False)
profit_by_retailer

In [None]:
plt.figure(figsize=(8, 4))
sns.barplot(x='Retailer', y='Operating Profit', data=profit_by_retailer,palette = "mako")
plt.title('Operating Profit by Retailer')
plt.xlabel('Retailer')
plt.ylabel('Operating Profit')
plt.xticks(rotation="vertical")
plt.show()

**Observations**

* West Gear is in first place with an operating profit of $85.67 million.
* With a profit of 74.33 million and 80.72 million, respectively, Sports Direct and Foot Locker, the top two finishers, are in strong competition with one another. 

# Least and most sales - State and City wise

In [None]:
df['City_State'] = df['City'] + ', ' + df['State'] #Considering city names alone does not make sense, as some states have common city names.

In [None]:
top_states = df.groupby('State')['Total Sales'].sum().nlargest(5).reset_index()
bottom_states = df.groupby('State')['Total Sales'].sum().nsmallest(5).reset_index()

top_cities = df.groupby('City_State')['Total Sales'].sum().nlargest(5).reset_index()
bottom_cities = df.groupby('City_State')['Total Sales'].sum().nsmallest(5).reset_index()

fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(20, 12))

sns.barplot(x='State', y='Total Sales', data=top_states,palette='magma', ax=axes[0, 0])
axes[0, 0].set_title('Top 5 States')
axes[0, 0].tick_params(axis='x', rotation=90)

sns.barplot(x='State', y='Total Sales', data=bottom_states,palette='magma_r', ax=axes[0, 1])
axes[0, 1].set_title('Bottom 5 States')
axes[0, 1].tick_params(axis='x', rotation=90)

sns.barplot(x='City_State', y='Total Sales', data=top_cities,palette='crest', ax=axes[1, 0])
axes[1, 0].set_title('Top 5 Cities')
axes[1, 0].tick_params(axis='x', rotation=90) 

sns.barplot(x='City_State', y='Total Sales', data=bottom_cities, palette='crest_r' , ax=axes[1, 1])
axes[1, 1].set_title('Bottom 5 Cities')
axes[1, 1].tick_params(axis='x', rotation=90)

plt.show()

**Observations**

* With total sales of 64.22 million dollars, New York State is in top place. 
* Florida and California are closely vying for second place.
* With 5.92 million dollars in sales, Nebraska is the least selling state.




# Price Per Product Distribution

In [None]:
plt.figure(figsize=(10, 6))
sns.histplot(df['Price per Unit'], bins = 20, kde = True, color='brown')
plt.title('Distribution of Prices per Unit')
plt.xlabel('Price per Unit')
plt.ylabel('Frequency')
plt.show()

**Observations**

* The price per unit follows a normal distribution. It peaks at 40 dollars, which means the 40 dollars price point is the most common.
* The majority of the products have prices between 20 dollars and 80 dollars.

# Price per Unit vs Total Sales

In [None]:
px.scatter(df, x='Price per Unit', y='Total Sales', title='Price per Unit vs Total Sales', color = "Total Sales")

* It is evident that the price per unit and total sales have a positive correlation. 
* Profit increases in lockstep with price per unit
* It's clear that street footwear sales for men are the highest overall.

# Monthly Total Sales Over Years

In [None]:
yearly_sales = df.groupby(['Year','Month'])['Total Sales'].sum().reset_index()
yearly_sales

In [None]:
px.line(yearly_sales, x='Month', y='Total Sales', color='Year',title='Monthly Total Sales Over Years', markers=True, template= "none")

In [None]:
df.groupby('Year')['Total Sales'].sum().reset_index()

**Observations**

The sales have shown significant growth from 2020 to 2021.


* Total Sales 717.82 million dollars

**2020**


* Total Sales 182.08 million dollars

# Monthly Total Profit Over Years

In [None]:
yearly_profit = df.groupby(['Year','Month'])['Operating Profit'].sum().reset_index()
yearly_profit

In [None]:
px.line(yearly_profit, x='Month', y='Operating Profit', color='Year',title='Monthly Total Profit Over Years', markers=True, template= "simple_white")

In [None]:
pd.options.display.float_format = '{:.0f}'.format

df.groupby('Year').agg({"Operating Profit" : "sum"})

**Observations**

The Profit have shown significant growth from 2020 to 2021.

**2021** 


* Total profit 268.75 million dollars

**2020**


* Total profit 63.37 million dollars

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
df=pd.read_excel('/kaggle/input/adidas-sales-dataset/Adidas US Sales Datasets.xlsx')
df.columns = df.iloc[3]
df = df.iloc[4:,1:].reset_index(drop=True)
df['year'] = pd.to_datetime(df['Invoice Date']).dt.year
df['month'] = pd.to_datetime(df['Invoice Date']).dt.month
df['day'] = pd.to_datetime(df['Invoice Date']).dt.day

In [None]:
df

In [None]:
df.describe(include='all')

In [None]:
retailer_counts = df['Retailer'].value_counts()
# Sort the retailer counts in descending order
retailer_counts = retailer_counts.sort_values(ascending=False)
# Plot the retailer counts as a bar chart
retailer_counts.plot(kind='bar')
# Show the plot
plt.show()

In [None]:
import plotly.express as px

# Group the data by retailer and sum the total sales for each retailer
retailer_sales = df.groupby('Retailer')['Total Sales'].sum()

# Calculate the total sales of all retailers
total_sales = retailer_sales.sum()

# Calculate the market share of each retailer by dividing their total sales by the total sales of all retailers
market_share = retailer_sales / total_sales

# Create a pie chart using plotly
fig = px.pie(market_share, values=market_share, names=market_share.index, title='Market Share of Retailers')

# Show the plot
fig.show()

# Pair Plot for Multiple Metrics

In [None]:
sns.pairplot(df[['Total Sales', 'Units Sold', 'Price per Unit', 'Operating Profit', 'Operating Margin']], diag_kind='kde', palette='deep')
plt.suptitle('Pair Plot for Multiple Metrics', y=1.02)
plt.show()

**Product Category Analysis:**

**Top Selling Product Category:**

* Men's Street Footwear is the top-selling category, followed by Men's Athletic Footwear.

**Retailer Performance:**

* West Gear leads in operating profit, followed closely by Foot Locker and Sports Direct.

**State and City Sales Analysis:*** New York state has the highest total sales, and New York City is the top-selling city.
* Nebraska is the least-selling state, and Omaha is the least-selling city.

**Monthly Sales and Profit Trends:**

* Sales and profit show significant growth from 2020 to 2021.

**Overall Insights:**

Strategic decisions and demand forecasting require knowledge of the best-performing product categories, regions, and merchants. Demand forecasting using static method can be used since demand for a product category is relatively stable, and historical patterns can be used to predict future demand. These methods are characterized by their simplicity and reliance on historical data without considering external factors that may influence demand fluctuations. Historical data may be the only available information for forecasting. Static methods could be adequate to capture the regular patterns when demand fluctuations are minimal, negating the need for more complex models. Static approaches can work well when short-term planning is the main priority. They don't require in-depth examination and offer rapid insights into current demand projections.
After forecasting, inventory modelling techniques can be applied for each product categories to come up with values like reorder point, economic order quantity, number of orders per order quantity per year, safety stock, base level stock, reorder level.

**Assumptions**

Lead time as 14 days.
Assuming order cost of 150 dollars.
20 percent of total cost is assumed as the holding cost per product. Assuming the value to be the average of the total product cost = 9.04







**Notes**

* The price per unit and overall sales have a positive correlation. When overall sales rise, the number of units sold often follows suit.
* The cost per unit varies between $7 and $110.
* The majority of the unit pricing is in the range of $35 to $55. Profit often rises in tandem with sales. Usually, the profit margin ranges from 25% to 75%.
