## RegressionProject - Walmart Sales Prediction

- This dataset contains weekly sales from 99 departments belonging to 45 different stores. 
- Our aim is to forecast weekly sales from a particular department.
- The objective of this case study is to forecast weekly retail store sales based on historical data.
- The data contains holidays and promotional markdowns offered by various stores and several departments throughout the year.
- Markdowns are crucial to promote sales especially before key events such as Super Bowl, Christmas and Thanksgiving. 
- Developing accurate model will enable make informed decisions and make recommendations to improve business processes in the future. 
- The data consists of three sheets: 
    - Stores
    - Features
    - Sales

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import zipfile


from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import r2_score,mean_squared_error
from sklearn.ensemble import RandomForestRegressor

import warnings
warnings.filterwarnings("ignore")

In [2]:
# Import the csv file using pandas
feature = pd.read_csv('Features_data_set.csv')
sales = pd.read_csv('sales_data_set.csv')
stores = pd.read_csv('stores_data_set.csv')

### 2.1stores

In [3]:
# "stores" dataframe contains information related to the 45 stores such as type and size of store.
stores.tail()

Unnamed: 0,Store,Type,Size
40,41,A,196321
41,42,C,39690
42,43,C,41062
43,44,C,39910
44,45,B,118221


### 2.1feature

Features dataframe contains additional data related to the store, department, and regional activity for the given dates.

* Store: store number
* Date: week
* Temperature: average temperature in the region
* Fuel_Price: cost of fuel in the region
* MarkDown1-5: anonymized data related to promotional markdowns. 
* CPI: consumer price index
* Unemployment: unemployment rate
* IsHoliday: whether the week is a special holiday week or not

In [4]:
feature.tail()

Unnamed: 0,Store,Date,Temperature,Fuel_Price,MarkDown1,MarkDown2,MarkDown3,MarkDown4,MarkDown5,CPI,Unemployment,IsHoliday
8185,45,28/06/2013,76.05,3.639,4842.29,975.03,3.0,2449.97,3169.69,,,False
8186,45,05/07/2013,77.5,3.614,9090.48,2268.58,582.74,5797.47,1514.93,,,False
8187,45,12/07/2013,79.37,3.614,3789.94,1827.31,85.72,744.84,2150.36,,,False
8188,45,19/07/2013,82.84,3.737,2961.49,1047.07,204.19,363.0,1059.46,,,False
8189,45,26/07/2013,76.06,3.804,212.02,851.73,2.06,10.88,1864.57,,,False


### 2.3sales

"Sales" dataframe contains historical sales data, which covers 2010-02-05 to 2012-11-01. 
* Store: store number
* Dept: department number
* Date: the week
* Weekly_Sales: sales for the given department in the given store
* IsHoliday: whether the week is a special holiday week

In [5]:
# We have over 42 thousand record here
sales.tail()

Unnamed: 0,Store,Dept,Date,Weekly_Sales,IsHoliday
421565,45,98,28/09/2012,508.37,False
421566,45,98,05/10/2012,628.1,False
421567,45,98,12/10/2012,1061.02,False
421568,45,98,19/10/2012,760.01,False
421569,45,98,26/10/2012,1076.8,False


In [6]:
# Change the datatype of 'date' column 
feature['Date'] = pd.to_datetime(feature['Date'])
sales['Date'] = pd.to_datetime(sales['Date'])