## Data Description

### Key Data Features
`Date` : Daily records from [start_date] to [end_date].
    
`Store ID & Product ID`: Unique identifiers for stores and products.

`Category`: Product categories like Electronics, Clothing, Groceries, etc.

`Region`: Geographic region of the store.

`Inventory Level`: Stock available at the beginning of the day.
    
`Units Sold`: Units sold during the day.
    
`Demand Forecast`: Predicted demand based on past trends.

`Weather Condition`: Daily weather impacting sales.
    
`Holiday/Promotion`: Indicators for holidays or promotions.

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
df = pd.read_csv('retail_store_inventory.csv')

In [3]:
df.head(10)


Unnamed: 0,Date,Store ID,Product ID,Category,Region,Inventory Level,Units Sold,Units Ordered,Demand Forecast,Price,Discount,Weather Condition,Holiday/Promotion,Competitor Pricing,Seasonality
0,1/1/2022,S001,P0001,Groceries,North,231,127,55.0,135.47,33.5,20,Rainy,0.0,29.69,Autumn
1,1/1/2022,S001,P0002,Toys,South,204,150,66.0,144.04,63.01,20,Sunny,0.0,66.16,Autumn
2,1/1/2022,S001,P0003,Toys,West,102,65,51.0,74.02,27.99,10,,1.0,31.32,Summer
3,1/1/2022,S001,P0004,Toys,North,469,61,,62.18,32.72,10,Cloudy,1.0,34.74,Autumn
4,1/1/2022,S001,P0005,Electronics,East,166,14,135.0,9.26,73.64,0,Sunny,0.0,68.95,Summer
5,1/1/2022,S001,P0006,Groceries,South,138,128,102.0,139.82,76.83,10,Sunny,1.0,79.35,Winter
6,1/1/2022,S001,P0007,Furniture,East,359,97,167.0,108.92,34.16,10,Rainy,1.0,36.55,Winter
7,1/1/2022,S001,P0008,Clothing,North,380,312,54.0,329.73,97.99,5,,0.0,100.09,Spring
8,1/1/2022,S001,P0009,Electronics,West,183,175,135.0,174.15,20.74,10,Cloudy,0.0,17.66,Autumn
9,1/1/2022,S001,P0010,Toys,South,108,28,196.0,24.47,59.99,0,Rainy,1.0,61.21,Winter


In [4]:
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 73100 entries, 0 to 73099
Data columns (total 15 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Date                73100 non-null  object 
 1   Store ID            73100 non-null  object 
 2   Product ID          73100 non-null  object 
 3   Category            73100 non-null  object 
 4   Region              73079 non-null  object 
 5   Inventory Level     73100 non-null  int64  
 6   Units Sold          73100 non-null  int64  
 7   Units Ordered       73069 non-null  float64
 8   Demand Forecast     73100 non-null  float64
 9   Price               73100 non-null  float64
 10  Discount            73100 non-null  int64  
 11  Weather Condition   73073 non-null  object 
 12  Holiday/Promotion   73094 non-null  float64
 13  Competitor Pricing  73100 non-null  float64
 14  Seasonality         73093 non-null  object 
dtypes: float64(5), int64(3), object(7)
memory usage: 8.4+

## Navigating the missing values

In [5]:
df.isnull().sum()


Date                   0
Store ID               0
Product ID             0
Category               0
Region                21
Inventory Level        0
Units Sold             0
Units Ordered         31
Demand Forecast        0
Price                  0
Discount               0
Weather Condition     27
Holiday/Promotion      6
Competitor Pricing     0
Seasonality            7
dtype: int64

## Navigating the data types of each column

In [6]:
df.dtypes


Date                   object
Store ID               object
Product ID             object
Category               object
Region                 object
Inventory Level         int64
Units Sold              int64
Units Ordered         float64
Demand Forecast       float64
Price                 float64
Discount                int64
Weather Condition      object
Holiday/Promotion     float64
Competitor Pricing    float64
Seasonality            object
dtype: object

#### This function calculates key summary statistics (count, mean, std, min, max, quartiles) for important numeric columns related to sales performance.
It also ensures the columns exist before attempting to describe them.

In [7]:
def summary_statistics(df):
    cols = ["Units Sold", "Units Ordered", "Demand Forecast", "Price", "Discount"]
    selected_cols = [col for col in cols if col in df.columns]  # ensure columns exist
    return df[selected_cols].describe()

summary_statistics(df)


Unnamed: 0,Units Sold,Units Ordered,Demand Forecast,Price,Discount
count,73100.0,73069.0,73100.0,73100.0,73100.0
mean,136.46487,109.999589,141.49472,55.135108,10.009508
std,108.919406,52.276573,109.254076,26.021945,7.083746
min,0.0,20.0,-9.99,10.0,0.0
25%,49.0,65.0,53.67,32.65,5.0
50%,107.0,110.0,113.015,55.05,10.0
75%,203.0,155.0,208.0525,77.86,15.0
max,499.0,200.0,518.55,100.0,20.0


## Check the all possible options

In [9]:
df['Category'].unique()



array(['Groceries', 'Toys', 'Electronics', 'Furniture', 'Clothing'],
      dtype=object)

In [10]:
df['Region'].unique()


array(['North', 'South', 'West', 'East', nan], dtype=object)

In [11]:
df['Weather Condition'].unique()

array(['Rainy', 'Sunny', nan, 'Cloudy', 'Snowy'], dtype=object)

In [5]:
df['Store ID'].unique()

array(['S001', 'S002', 'S003', 'S004', 'S005'], dtype=object)

In [6]:
df['Product ID'].unique()

array(['P0001', 'P0002', 'P0003', 'P0004', 'P0005', 'P0006', 'P0007',
       'P0008', 'P0009', 'P0010', 'P0011', 'P0012', 'P0013', 'P0014',
       'P0015', 'P0016', 'P0017', 'P0018', 'P0019', 'P0020'], dtype=object)

## `Store ID` and `Product ID` columns include a character prefix ('S' or 'P') before numeric codes.
To convert these columns into numeric format suitable for modeling and analysis, we remove the prefix and cast them to float.
Additionally, We converted the `Date` column into Datetime type

In [12]:
# Convert Date column to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Remove 'P' from Product ID and convert to float
df['Product ID'] = df['Product ID'].astype(str).str.replace('P', '', regex=False).astype(float)

# Remove the 'S' from Store ID and convert to float
df['Store ID'] = df['Store ID'].astype(str).str.replace('S', '', regex=False).astype(float)

# Check data types
print(df.dtypes)


Date                  datetime64[ns]
Store ID                     float64
Product ID                   float64
Category                      object
Region                        object
Inventory Level                int64
Units Sold                     int64
Units Ordered                float64
Demand Forecast              float64
Price                        float64
Discount                       int64
Weather Condition             object
Holiday/Promotion            float64
Competitor Pricing           float64
Seasonality                   object
dtype: object
