### Sales Performance Analysis with Walmart Data

Focus:
* Exploring the dataset using pandas functions
* Practicing groupby(), merge(), join(), and concat()

Objective:
* Analyze sales performance across different stores and departments. Use groupby to find trends and combine data using merging and concatenation techniques.

Skills Practiced:
* Aggregation with groupby() (e.g. total sales by store or department)
* Merging different datasets (e.g. sales with features)
* Concatenating data (e.g. appending data from different weeks)
* Basic EDA (describe(), value_counts(), filtering)

In [1]:
import pandas as pd

In [None]:
train = pd.read_csv("data/train.csv")
train.head()

FileNotFoundError: [Errno 2] No such file or directory: 'train.csv'

In [None]:
features = pd.read_csv("data/features.csv")
features.head()

Unnamed: 0,Store,Date,Temperature,Fuel_Price,MarkDown1,MarkDown2,MarkDown3,MarkDown4,MarkDown5,CPI,Unemployment,IsHoliday
0,1,2010-02-05,42.31,2.572,,,,,,211.096358,8.106,False
1,1,2010-02-12,38.51,2.548,,,,,,211.24217,8.106,True
2,1,2010-02-19,39.93,2.514,,,,,,211.289143,8.106,False
3,1,2010-02-26,46.63,2.561,,,,,,211.319643,8.106,False
4,1,2010-03-05,46.5,2.625,,,,,,211.350143,8.106,False


In [None]:
stores = pd.read_csv("data/stores.csv")
stores.head()

Unnamed: 0,Store,Type,Size
0,1,A,151315
1,2,A,202307
2,3,B,37392
3,4,A,205863
4,5,B,34875


In [None]:
train.describe()

Unnamed: 0,Store,Dept,Weekly_Sales
count,421570.0,421570.0,421570.0
mean,22.200546,44.260317,15981.258123
std,12.785297,30.492054,22711.183519
min,1.0,1.0,-4988.94
25%,11.0,18.0,2079.65
50%,22.0,37.0,7612.03
75%,33.0,74.0,20205.8525
max,45.0,99.0,693099.36


In [None]:
stores['Type'].value_counts()

Type
A    22
B    17
C     6
Name: count, dtype: int64

In [None]:
train.isnull().sum()

Store           0
Dept            0
Date            0
Weekly_Sales    0
IsHoliday       0
dtype: int64

In [None]:
features.isnull().sum()

Store              0
Date               0
Temperature        0
Fuel_Price         0
MarkDown1       4158
MarkDown2       5269
MarkDown3       4577
MarkDown4       4726
MarkDown5       4140
CPI              585
Unemployment     585
IsHoliday          0
dtype: int64

In [None]:
stores.columns

Index(['Store', 'Type', 'Size'], dtype='object')

In [None]:
features.columns

Index(['Store', 'Date', 'Temperature', 'Fuel_Price', 'MarkDown1', 'MarkDown2',
       'MarkDown3', 'MarkDown4', 'MarkDown5', 'CPI', 'Unemployment',
       'IsHoliday'],
      dtype='object')

In [None]:
train.columns

Index(['Store', 'Dept', 'Date', 'Weekly_Sales', 'IsHoliday'], dtype='object')

In [None]:
train.groupby('Store')['Weekly_Sales'].sum()

Store
1     2.224028e+08
2     2.753824e+08
3     5.758674e+07
4     2.995440e+08
5     4.547569e+07
6     2.237561e+08
7     8.159828e+07
8     1.299512e+08
9     7.778922e+07
10    2.716177e+08
11    1.939628e+08
12    1.442872e+08
13    2.865177e+08
14    2.889999e+08
15    8.913368e+07
16    7.425243e+07
17    1.277821e+08
18    1.551147e+08
19    2.066349e+08
20    3.013978e+08
21    1.081179e+08
22    1.470756e+08
23    1.987506e+08
24    1.940160e+08
25    1.010612e+08
26    1.434164e+08
27    2.538559e+08
28    1.892637e+08
29    7.714155e+07
30    6.271689e+07
31    1.996139e+08
32    1.668192e+08
33    3.716022e+07
34    1.382498e+08
35    1.315207e+08
36    5.341221e+07
37    7.420274e+07
38    5.515963e+07
39    2.074455e+08
40    1.378703e+08
41    1.813419e+08
42    7.956575e+07
43    9.056544e+07
44    4.329309e+07
45    1.123953e+08
Name: Weekly_Sales, dtype: float64

In [None]:
train.groupby('Dept')['Weekly_Sales'].sum()

Dept
1     1.236388e+08
2     2.806112e+08
3     7.589245e+07
4     1.671467e+08
5     1.356074e+08
          ...     
95    4.493202e+08
96    7.383392e+07
97    8.949651e+07
98    3.982892e+07
99    3.581498e+05
Name: Weekly_Sales, Length: 81, dtype: float64

In [None]:
train.dtypes

Store             int64
Dept              int64
Date             object
Weekly_Sales    float64
IsHoliday          bool
dtype: object

In [None]:
features.dtypes

Store             int64
Date             object
Temperature     float64
Fuel_Price      float64
MarkDown1       float64
MarkDown2       float64
MarkDown3       float64
MarkDown4       float64
MarkDown5       float64
CPI             float64
Unemployment    float64
IsHoliday          bool
dtype: object

In [None]:
train['Date'] = pd.to_datetime(train['Date'])
features['Date'] = pd.to_datetime(features['Date'])

In [None]:
train.dtypes

Store                    int64
Dept                     int64
Date            datetime64[ns]
Weekly_Sales           float64
IsHoliday                 bool
dtype: object

In [None]:
features.dtypes

Store                    int64
Date            datetime64[ns]
Temperature            float64
Fuel_Price             float64
MarkDown1              float64
MarkDown2              float64
MarkDown3              float64
MarkDown4              float64
MarkDown5              float64
CPI                    float64
Unemployment           float64
IsHoliday                 bool
dtype: object

In [None]:
f_left=features.merge(stores, how='left', on='Store')
f_left

Unnamed: 0,Store,Date,Temperature,Fuel_Price,MarkDown1,MarkDown2,MarkDown3,MarkDown4,MarkDown5,CPI,Unemployment,IsHoliday,Type,Size
0,1,2010-02-05,42.31,2.572,,,,,,211.096358,8.106,False,A,151315
1,1,2010-02-12,38.51,2.548,,,,,,211.242170,8.106,True,A,151315
2,1,2010-02-19,39.93,2.514,,,,,,211.289143,8.106,False,A,151315
3,1,2010-02-26,46.63,2.561,,,,,,211.319643,8.106,False,A,151315
4,1,2010-03-05,46.50,2.625,,,,,,211.350143,8.106,False,A,151315
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8185,45,2013-06-28,76.05,3.639,4842.29,975.03,3.00,2449.97,3169.69,,,False,B,118221
8186,45,2013-07-05,77.50,3.614,9090.48,2268.58,582.74,5797.47,1514.93,,,False,B,118221
8187,45,2013-07-12,79.37,3.614,3789.94,1827.31,85.72,744.84,2150.36,,,False,B,118221
8188,45,2013-07-19,82.84,3.737,2961.49,1047.07,204.19,363.00,1059.46,,,False,B,118221


In [None]:
f_right=features.merge(stores, how='right', on='Store')
f_right

Unnamed: 0,Store,Date,Temperature,Fuel_Price,MarkDown1,MarkDown2,MarkDown3,MarkDown4,MarkDown5,CPI,Unemployment,IsHoliday,Type,Size
0,1,2010-02-05,42.31,2.572,,,,,,211.096358,8.106,False,A,151315
1,1,2010-02-12,38.51,2.548,,,,,,211.242170,8.106,True,A,151315
2,1,2010-02-19,39.93,2.514,,,,,,211.289143,8.106,False,A,151315
3,1,2010-02-26,46.63,2.561,,,,,,211.319643,8.106,False,A,151315
4,1,2010-03-05,46.50,2.625,,,,,,211.350143,8.106,False,A,151315
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8185,45,2013-06-28,76.05,3.639,4842.29,975.03,3.00,2449.97,3169.69,,,False,B,118221
8186,45,2013-07-05,77.50,3.614,9090.48,2268.58,582.74,5797.47,1514.93,,,False,B,118221
8187,45,2013-07-12,79.37,3.614,3789.94,1827.31,85.72,744.84,2150.36,,,False,B,118221
8188,45,2013-07-19,82.84,3.737,2961.49,1047.07,204.19,363.00,1059.46,,,False,B,118221


In [None]:
pd.concat([f_left,f_right])

Unnamed: 0,Store,Date,Temperature,Fuel_Price,MarkDown1,MarkDown2,MarkDown3,MarkDown4,MarkDown5,CPI,Unemployment,IsHoliday,Type,Size
0,1,2010-02-05,42.31,2.572,,,,,,211.096358,8.106,False,A,151315
1,1,2010-02-12,38.51,2.548,,,,,,211.242170,8.106,True,A,151315
2,1,2010-02-19,39.93,2.514,,,,,,211.289143,8.106,False,A,151315
3,1,2010-02-26,46.63,2.561,,,,,,211.319643,8.106,False,A,151315
4,1,2010-03-05,46.50,2.625,,,,,,211.350143,8.106,False,A,151315
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8185,45,2013-06-28,76.05,3.639,4842.29,975.03,3.00,2449.97,3169.69,,,False,B,118221
8186,45,2013-07-05,77.50,3.614,9090.48,2268.58,582.74,5797.47,1514.93,,,False,B,118221
8187,45,2013-07-12,79.37,3.614,3789.94,1827.31,85.72,744.84,2150.36,,,False,B,118221
8188,45,2013-07-19,82.84,3.737,2961.49,1047.07,204.19,363.00,1059.46,,,False,B,118221


In [None]:
pd.concat([train['Date'],train['IsHoliday']],axis=1)

Unnamed: 0,Date,IsHoliday
0,2010-02-05,False
1,2010-02-12,True
2,2010-02-19,False
3,2010-02-26,False
4,2010-03-05,False
...,...,...
421565,2012-09-28,False
421566,2012-10-05,False
421567,2012-10-12,False
421568,2012-10-19,False
