# Sales Performance Analysis with Walmart Data 

#### **Focus:**
* Exploring the dataset using pandas functions.
* Practicing groupby(), merge(), join(), and concat().

#### **Objective:**
* Analyze sales performance across different stores and departments. Use groupby to find trends and combine data using merging and concatenation techniques.

#### **Skills Practiced:**
* Aggregation with groupby() (e.g. total sales by store or department).
* Merging different datasets (e.g. sales with features).
* Concatenating data (e.g. appending data from different weeks).
* Basic EDA (describe(), value_counts(), filtering).

In [74]:
import pandas as pd

In [75]:
path="data/stores.csv"
df=pd.read_csv(path)
df

Unnamed: 0,Store,Type,Size
0,1,A,151315
1,2,A,202307
2,3,B,37392
3,4,A,205863
4,5,B,34875
5,6,A,202505
6,7,B,70713
7,8,A,155078
8,9,B,125833
9,10,B,126512


In [76]:
path="data/features.csv"
df1=pd.read_csv(path)
df1

Unnamed: 0,Store,Date,Temperature,Fuel_Price,MarkDown1,MarkDown2,MarkDown3,MarkDown4,MarkDown5,CPI,Unemployment,IsHoliday
0,1,2010-02-05,42.31,2.572,,,,,,211.096358,8.106,False
1,1,2010-02-12,38.51,2.548,,,,,,211.242170,8.106,True
2,1,2010-02-19,39.93,2.514,,,,,,211.289143,8.106,False
3,1,2010-02-26,46.63,2.561,,,,,,211.319643,8.106,False
4,1,2010-03-05,46.50,2.625,,,,,,211.350143,8.106,False
...,...,...,...,...,...,...,...,...,...,...,...,...
8185,45,2013-06-28,76.05,3.639,4842.29,975.03,3.00,2449.97,3169.69,,,False
8186,45,2013-07-05,77.50,3.614,9090.48,2268.58,582.74,5797.47,1514.93,,,False
8187,45,2013-07-12,79.37,3.614,3789.94,1827.31,85.72,744.84,2150.36,,,False
8188,45,2013-07-19,82.84,3.737,2961.49,1047.07,204.19,363.00,1059.46,,,False


In [77]:
path="data/train.csv"
df2=pd.read_csv(path)
df2

Unnamed: 0,Store,Dept,Date,Weekly_Sales,IsHoliday
0,1,1,2010-02-05,24924.50,False
1,1,1,2010-02-12,46039.49,True
2,1,1,2010-02-19,41595.55,False
3,1,1,2010-02-26,19403.54,False
4,1,1,2010-03-05,21827.90,False
...,...,...,...,...,...
421565,45,98,2012-09-28,508.37,False
421566,45,98,2012-10-05,628.10,False
421567,45,98,2012-10-12,1061.02,False
421568,45,98,2012-10-19,760.01,False


## Group By

#### Stores

In [78]:
# Sums store sizes by type to compare total space per store type.
df.groupby('Type')['Size'].sum()

Type
A    3899450
B    1720242
C     243250
Name: Size, dtype: int64

In [79]:
# Sums store sizes per store and type to analyze size by store-type pair.
df.groupby(['Store', 'Type'])['Size'].sum()

Store  Type
1      A       151315
2      A       202307
3      B        37392
4      A       205863
5      B        34875
6      A       202505
7      B        70713
8      A       155078
9      B       125833
10     B       126512
11     A       207499
12     B       112238
13     A       219622
14     A       200898
15     B       123737
16     B        57197
17     B        93188
18     B       120653
19     A       203819
20     A       203742
21     B       140167
22     B       119557
23     B       114533
24     A       203819
25     B       128107
26     A       152513
27     A       204184
28     A       206302
29     B        93638
30     C        42988
31     A       203750
32     A       203007
33     A        39690
34     A       158114
35     B       103681
36     A        39910
37     C        39910
38     C        39690
39     A       184109
40     A       155083
41     A       196321
42     C        39690
43     C        41062
44     C        39910
45     B       11822

#### Features

In [80]:
# Finds the highest fuel price for each store.
df1.groupby('Store')['Fuel_Price'].max()

Store
1     3.907
2     3.907
3     3.907
4     3.881
5     3.907
6     3.907
7     3.936
8     3.907
9     3.907
10    4.468
11    3.907
12    4.468
13    3.845
14    4.066
15    4.211
16    3.936
17    3.845
18    4.101
19    4.211
20    4.066
21    3.907
22    4.101
23    4.101
24    4.211
25    4.066
26    4.101
27    4.211
28    4.468
29    4.101
30    3.907
31    3.907
32    3.936
33    4.468
34    3.881
35    4.066
36    3.934
37    3.907
38    4.468
39    3.907
40    4.101
41    3.936
42    4.468
43    3.907
44    3.845
45    4.066
Name: Fuel_Price, dtype: float64

In [81]:
# Calculates the average unemployment rate for each store.
df1.groupby('Store')['Unemployment'].mean()

Store
1      7.440994
2      7.403959
3      7.006006
4      5.647450
5      6.163166
6      6.412568
7      8.378556
8      5.947349
9      5.929456
10     8.137373
11     7.006006
12    12.637716
13     6.762024
14     8.640467
15     7.994580
16     6.335308
17     6.365296
18     8.712793
19     7.994580
20     7.368166
21     7.403959
22     7.963166
23     4.668562
24     8.481408
25     7.368166
26     7.749970
27     7.990077
28    12.637716
29     9.681391
30     7.403959
31     7.403959
32     8.378556
33     8.269621
34     9.772509
35     8.757343
36     7.617491
37     7.617491
38    12.637716
39     7.617491
40     4.668562
41     6.807852
42     8.137373
43     9.772509
44     6.475822
45     8.640467
Name: Unemployment, dtype: float64

In [82]:
# Counts how many CPI records exist for each store on each date.
df1.groupby(['Store', 'Date'])['CPI'].count()

Store  Date      
1      2010-02-05    1
       2010-02-12    1
       2010-02-19    1
       2010-02-26    1
       2010-03-05    1
                    ..
45     2013-06-28    0
       2013-07-05    0
       2013-07-12    0
       2013-07-19    0
       2013-07-26    0
Name: CPI, Length: 8190, dtype: int64

#### Train

In [83]:
# Finds the highest weekly sales for each department.
df2.groupby('Dept')['Weekly_Sales'].max()

Dept
1     172225.55
2     151090.50
3     131564.25
4      72179.92
5     259955.82
        ...    
95    213042.66
96     63978.78
97     49034.16
98     33759.90
99     12550.00
Name: Weekly_Sales, Length: 81, dtype: float64

In [84]:
# Calculates the average weekly sales for each store.
df2.groupby('Store')['Weekly_Sales'].mean()

Store
1     21710.543621
2     26898.070031
3      6373.033983
4     29161.210415
5      5053.415813
6     21913.243624
7      8358.766148
8     13133.014768
9      8772.890379
10    26332.303819
11    19276.762751
12    14867.308619
13    27355.136891
14    28784.851727
15     9002.493073
16     7863.224124
17    12954.393636
18    15733.313136
19    20362.126734
20    29508.301592
21    11283.435496
22    15181.218886
23    19776.180881
24    18969.106500
25    10308.157810
26    14554.129672
27    24826.984536
28    18714.889803
29     8158.810609
30     8764.237719
31    19681.907464
32    16351.621855
33     5728.414053
34    13522.081671
35    13803.596986
36     8584.412563
37    10297.355026
38     7492.478460
39    21000.763562
40    13763.632803
41    17976.004648
42    11443.370118
43    13415.114118
44     6038.929814
45    11662.897315
Name: Weekly_Sales, dtype: float64

In [85]:
# Counts the number of weekly sales records for each store and department.
df2.groupby(['Store', 'Dept'])['Weekly_Sales'].count()

Store  Dept
1      1       143
       2       143
       3       143
       4       143
       5       143
              ... 
45     94      134
       95      143
       96        2
       97      143
       98      135
Name: Weekly_Sales, Length: 3331, dtype: int64

## Merge / Join

#### Inner Merge 

In [86]:
# Joins df and df1 on Store, keeping only matching rows (inner join).
inner= pd.merge(df, df1, on='Store', how='inner')
inner

Unnamed: 0,Store,Type,Size,Date,Temperature,Fuel_Price,MarkDown1,MarkDown2,MarkDown3,MarkDown4,MarkDown5,CPI,Unemployment,IsHoliday
0,1,A,151315,2010-02-05,42.31,2.572,,,,,,211.096358,8.106,False
1,1,A,151315,2010-02-12,38.51,2.548,,,,,,211.242170,8.106,True
2,1,A,151315,2010-02-19,39.93,2.514,,,,,,211.289143,8.106,False
3,1,A,151315,2010-02-26,46.63,2.561,,,,,,211.319643,8.106,False
4,1,A,151315,2010-03-05,46.50,2.625,,,,,,211.350143,8.106,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8185,45,B,118221,2013-06-28,76.05,3.639,4842.29,975.03,3.00,2449.97,3169.69,,,False
8186,45,B,118221,2013-07-05,77.50,3.614,9090.48,2268.58,582.74,5797.47,1514.93,,,False
8187,45,B,118221,2013-07-12,79.37,3.614,3789.94,1827.31,85.72,744.84,2150.36,,,False
8188,45,B,118221,2013-07-19,82.84,3.737,2961.49,1047.07,204.19,363.00,1059.46,,,False


#### Left Merge

In [None]:
# Joins df1 with df2 on Store, keeping all rows from df1 (left join).
left =pd.merge(df1, df2, on='Store', how='left')
left

#### Right Merge

In [None]:
# Joins df1 with df2 on Store, keeping all rows from df2 (right join).
right =pd.merge(df1, df2, on='Store', how='right')
right

#### Outer Merge

In [None]:
# Joins df and df1 on Store, keeping all rows from both (outer join).
outer= pd.merge(df, df1, on='Store', how='outer')
outer

## Concatenation

In [None]:
# Stacks df and df1 vertically by adding rows one after another.
pd.concat([df,df1])

In [None]:
# Combines df and df2 by stacking rows on top of each other (vertical concatenation).
pd.concat([df,df2])

In [None]:
# Stacks df1 and df2 vertically, appending rows from df2 below df1.
pd.concat([df1,df2])

## Describe (Summary Statistics)

In [None]:
# Shows summary statistics (count, mean, std, min, max, etc.) for numeric columns in the DataFrame.
df.describe()

In [None]:
# Displays summary statistics (like mean, std, min, max) for numeric columns in df1.
df1.describe()

In [None]:
# Displays summary statistics (like mean, std, min, max) for numeric columns in df2.
df2.describe()

In [None]:
# Shows summary statistics for all columns in df, including numeric, categorical, and object types.
df.describe(include ='all')

In [None]:
# Shows summary statistics for all columns in df1, including numeric, categorical, and object types.
df1.describe(include ='all')

In [None]:
# Shows summary statistics for all columns in df2, including numeric, categorical, and object types.
df2.describe(include ='all')

## value_counts (Frequency of Categories)

In [None]:
# Counts the number of occurrences of each store type in the Type column.
df['Type'].value_counts()

In [None]:
# Displays a bar chart showing the frequency of each store type in the Type column.
df['Type'].value_counts().plot(kind='bar')

In [None]:
# Counts how many times each store appears in the Store column of df1.
df1['Store'].value_counts()

In [None]:
# Displays a bar chart of how many records each store has in the Store column of df1.
df1['Store'].value_counts().plot(kind='bar')

In [None]:
# Counts how many times each department appears in the Dept column of df2.
df2['Dept'].value_counts()

In [None]:
# Displays a bar chart showing how often each department appears in the Dept column of df2.
df2['Dept'].value_counts().plot(kind='bar')

## Filtering Rows Based on Conditions

In [None]:
# Filters and shows all rows in df where the store type is 'A'.
df[df['Type'] == 'A']

In [None]:
# Displays all rows in df where the store size is greater than 200,000.
df[df['Size'] > 200000]

In [None]:
df1[df1['IsHoliday'] == False]

In [None]:
# Filters df2 to show rows where Store is 1 and IsHoliday is False.
df2[(df2['Store'] == 1) & (df2['IsHoliday'] == False)]

## Check for missing values

In [None]:
# Returns a DataFrame of the same shape as df, with True for missing (NaN) values and False otherwise.
df.isnull()

In [None]:
# Returns a DataFrame of the same shape as df1, with True for missing (NaN) values and False otherwise.
df1.isnull().sum()

In [None]:
# Returns a DataFrame of the same shape as df2, with True for missing (NaN) values and False otherwise.
df2.isnull().sum()

## Check column data types

In [None]:
# Shows the data type of each column in the DataFrame df.
df.dtypes

In [None]:
# Shows the data type of each column in the DataFrame df1.
df1.dtypes

In [None]:
# Shows the data type of each column in the DataFrame df2.
df2.dtypes

## Basic info on the dataset

In [None]:
#Shows a summary of columns, non-null counts, and data types in df.
df.info()

In [None]:
# Shows a summary of columns, non-null counts, and data types in df1.
df1.info()

In [None]:
# Shows a summary of columns, non-null counts, and data types in df2.
df2.info()