# Walmart Sales Analysis

## Objective
The idea of this project is to practice turning raw data into insights that matter for a business.  
We’re looking at Walmart’s weekly sales data and asking: *what patterns can we find that would actually help a manager make decisions?*


## Business Questions
1. **Concentration:** How many stores are responsible for 50% and 80% of Walmart’s total sales?  
   → Why: If most sales come from just a few stores, those stores need more focus (inventory, promotions, staff).

2. **Holiday Impact:** Do sales really jump during holiday weeks, and by how much?  
   → Why: Managers can use this to plan ahead — schedule more workers and stock more products.

3. **Seasonality:** Which months or seasons bring in the most sales?  
   → Why: This helps understand yearly cycles and plan marketing campaigns.

4. **Trend Direction:** Over the years, are sales going up or down overall?  
   → Why: Shows whether demand is growing or cooling off in the long run.

5. **Macro Factors:** Do things like temperature, fuel price, CPI, or unemployment affect sales?  
   → Why: External conditions often change customer spending — good to know what matters most.


## Dataset

- **Source:** Kaggle — *Walmart Sales* dataset. (https://www.kaggle.com/datasets/mikhail1681/walmart-sales/data)
- **File Used:** `Walmart_Sales.csv`  

## Load Data

The goal here is just to **make sure the file loads correctly** and then **get a quick idea** of what’s inside:


In [6]:
import pandas as pd

# Load CSV
df = pd.read_csv("../data/Walmart_Sales.csv")

# Look at the first few rows
df.head()

Unnamed: 0,Store,Date,Weekly_Sales,Holiday_Flag,Temperature,Fuel_Price,CPI,Unemployment
0,1,05-02-2010,1643690.9,0,42.31,2.572,211.096358,8.106
1,1,12-02-2010,1641957.44,1,38.51,2.548,211.24217,8.106
2,1,19-02-2010,1611968.17,0,39.93,2.514,211.289143,8.106
3,1,26-02-2010,1409727.59,0,46.63,2.561,211.319643,8.106
4,1,05-03-2010,1554806.68,0,46.5,2.625,211.350143,8.106


In [7]:
# Convert the Date column to proper datetime format (dataset uses dd-mm-YYYY)
df["Date"] = pd.to_datetime(df["Date"], format="%d-%m-%Y")

In [11]:
# Shape = number of rows, number of columns
print("Shape of dataset:", df.shape)

# Data types of each column
print("\n")
print("Column info:")
print(df.dtypes)

# Check for missing values
print("\nMissing values per column:")
print(df.isna().sum())

Shape of dataset: (6435, 8)


Column info:
Store                    int64
Date            datetime64[ns]
Weekly_Sales           float64
Holiday_Flag             int64
Temperature            float64
Fuel_Price             float64
CPI                    float64
Unemployment           float64
dtype: object

Missing values per column:
Store           0
Date            0
Weekly_Sales    0
Holiday_Flag    0
Temperature     0
Fuel_Price      0
CPI             0
Unemployment    0
dtype: int64
