# Mandatory Challenge
## Context
You work in the data analysis team of a very important company. On Monday, the company shares some good news with you: you just got hired by a major retail company! So, let's get prepared for a huge amount of work!

Then you get to work with your team and define the following tasks to perform:   
1. You need to start your analysis using data from the past.  
2. You need to define a process that takes your daily data as an input and integrates it.  

You are in charge of the second part, so you are provided with a sample file that you will have to read daily. To complete you task, you need the following aggregates:
* One aggregate per store that adds up the rest of the values.
* One aggregate per item that adds up the rest of the values.

You can import the dataset `retail_sales` from Ironhack's database. 

## Your task
Therefore, your process will consist of the following steps:
1. Read the sample file that a daily process will save in your folder. 
2. Clean up the data.
3. Create the aggregates.
4. Write three tables in your local database: 
    - A table for the cleaned data.
    - A table for the aggregate per store.
    - A table for the aggregate per item.

## Instructions
* Read the csv you can find in Ironhack's database.
* Clean the data and create the aggregates as you consider.
* Create the tables in your local database.
* Populate them with your process.

In [12]:
import pymysql
from sqlalchemy import create_engine
import pandas as pd

In [25]:
driver = 'mysql+pymysql'
ip = '34.65.10.136'
username = 'data-students'
password = 'iR0nH@cK-D4T4B4S3'
db = 'retail_sales'
connection_string  = f'{driver}://{username}:{password}@{ip}/{db}'
engine = create_engine(connection_string)
raw_sales = 'SELECT * FROM raw_sales;'
raw_sales = pd.read_sql(raw_sales,engine)

In [26]:
raw_sales

Unnamed: 0,date,shop_id,item_id,item_price,item_cnt_day
0,2015-01-04,29,1469,1199.0,1.0
1,2015-01-04,28,21364,479.0,1.0
2,2015-01-04,28,21365,999.0,2.0
3,2015-01-04,28,22104,249.0,2.0
4,2015-01-04,28,22091,179.0,1.0
...,...,...,...,...,...
4540,2015-01-04,15,4240,1299.0,1.0
4541,2015-01-04,14,21922,99.0,1.0
4542,2015-01-04,15,1969,3999.0,1.0
4543,2015-01-04,14,22091,179.0,1.0


In [27]:
raw_sales2=raw_sales.copy()

In [28]:
raw_sales2

Unnamed: 0,date,shop_id,item_id,item_price,item_cnt_day
0,2015-01-04,29,1469,1199.0,1.0
1,2015-01-04,28,21364,479.0,1.0
2,2015-01-04,28,21365,999.0,2.0
3,2015-01-04,28,22104,249.0,2.0
4,2015-01-04,28,22091,179.0,1.0
...,...,...,...,...,...
4540,2015-01-04,15,4240,1299.0,1.0
4541,2015-01-04,14,21922,99.0,1.0
4542,2015-01-04,15,1969,3999.0,1.0
4543,2015-01-04,14,22091,179.0,1.0


In [31]:
raw_sales2["revenue"]=raw_sales2["item_price"]*raw_sales2["item_cnt_day"]

In [32]:
raw_sales2[["revenue"]]

Unnamed: 0,revenue
0,1199.0
1,479.0
2,1998.0
3,498.0
4,179.0
...,...
4540,1299.0
4541,99.0
4542,3999.0
4543,179.0


In [34]:
raw_sales2

Unnamed: 0,date,shop_id,item_id,item_price,item_cnt_day,revenue
0,2015-01-04,29,1469,1199.0,1.0,1199.0
1,2015-01-04,28,21364,479.0,1.0,479.0
2,2015-01-04,28,21365,999.0,2.0,1998.0
3,2015-01-04,28,22104,249.0,2.0,498.0
4,2015-01-04,28,22091,179.0,1.0,179.0
...,...,...,...,...,...,...
4540,2015-01-04,15,4240,1299.0,1.0,1299.0
4541,2015-01-04,14,21922,99.0,1.0,99.0
4542,2015-01-04,15,1969,3999.0,1.0,3999.0
4543,2015-01-04,14,22091,179.0,1.0,179.0


In [42]:
#One aggregate per store that adds up the rest of the values:
sales_shop_id=raw_sales2.groupby("shop_id").sum()
sales_shop_id.drop(["item_id", "item_price"], axis=1, inplace=True)

In [43]:
sales_shop_id.rename(columns={'item_cnt_day':'Items Sold/Day',
                          'revenue':'Revenue/Day'}, inplace=True)
sales_shop_id

Unnamed: 0_level_0,Items Sold/Day,Revenue/Day
shop_id,Unnamed: 1_level_1,Unnamed: 2_level_1
2,81.0,103746.0
3,33.0,67443.0
4,39.0,29361.0
5,45.0,33138.0
6,150.0,138678.0
7,63.0,52371.0
10,30.0,22716.0
12,216.0,295173.0
14,51.0,57450.0
15,93.0,125139.0


In [48]:
sales_shop_id.index.name='Shop ID'
sales_shop_id

Unnamed: 0_level_0,Items Sold/Day,Revenue/Day
Shop ID,Unnamed: 1_level_1,Unnamed: 2_level_1
2,81.0,103746.0
3,33.0,67443.0
4,39.0,29361.0
5,45.0,33138.0
6,150.0,138678.0
7,63.0,52371.0
10,30.0,22716.0
12,216.0,295173.0
14,51.0,57450.0
15,93.0,125139.0


In [None]:
sales_shop_id["Items/Shop"]=sales_shop_id.agg
sales_shop_id.groupby('Shop ID').agg({'a':['sum', 'max'], 
                         'b':'mean', 
                         'c':'sum', 
                         'd': lambda x: x.max() - x.min()})


In [49]:
raw_sales2

Unnamed: 0,date,shop_id,item_id,item_price,item_cnt_day,revenue
0,2015-01-04,29,1469,1199.0,1.0,1199.0
1,2015-01-04,28,21364,479.0,1.0,479.0
2,2015-01-04,28,21365,999.0,2.0,1998.0
3,2015-01-04,28,22104,249.0,2.0,498.0
4,2015-01-04,28,22091,179.0,1.0,179.0
...,...,...,...,...,...,...
4540,2015-01-04,15,4240,1299.0,1.0,1299.0
4541,2015-01-04,14,21922,99.0,1.0,99.0
4542,2015-01-04,15,1969,3999.0,1.0,3999.0
4543,2015-01-04,14,22091,179.0,1.0,179.0


In [50]:
sales_item_id=raw_sales2.groupby("item_id").sum()
sales_item_id.drop(["shop_id", "item_price"], axis=1, inplace=True)

In [51]:
sales_item_id

Unnamed: 0_level_0,item_cnt_day,revenue
item_id,Unnamed: 1_level_1,Unnamed: 2_level_1
30,3.0,507.0
31,3.0,1089.0
32,3.0,447.0
42,3.0,897.0
59,3.0,747.0
...,...,...
22091,6.0,1074.0
22092,3.0,537.0
22104,6.0,1494.0
22140,3.0,652.5


In [52]:
sales_item_id.index.name="Item ID"

In [53]:
sales_item_id

Unnamed: 0_level_0,item_cnt_day,revenue
Item ID,Unnamed: 1_level_1,Unnamed: 2_level_1
30,3.0,507.0
31,3.0,1089.0
32,3.0,447.0
42,3.0,897.0
59,3.0,747.0
...,...,...
22091,6.0,1074.0
22092,3.0,537.0
22104,6.0,1494.0
22140,3.0,652.5


In [54]:
sales_item_id.rename(columns={"item_cnt_day": "Units Sold/Day", "revenue": "Revenue/Day"})

Unnamed: 0_level_0,Units Sold/Day,Revenue/Day
Item ID,Unnamed: 1_level_1,Unnamed: 2_level_1
30,3.0,507.0
31,3.0,1089.0
32,3.0,447.0
42,3.0,897.0
59,3.0,747.0
...,...,...
22091,6.0,1074.0
22092,3.0,537.0
22104,6.0,1494.0
22140,3.0,652.5


---

In [None]:
#use "describe"

In [None]:
retail_sales2.describe()

In [None]:
#retail_sales_shop=retail_sales.groupby("shop_id").count()

In [None]:
item_price_deets=retail_sales2.groupby("item_id").agg(["min", "max"])[["item_price"]]
item_price_deets.head()

In [None]:
item_price_deets.loc[:, ("item_price", "min")]

In [None]:
temp = item_price_deets.loc[item_price_deets[("item_price", "min")] != item_price_deets[("item_price", "max")] ]

In [None]:
temp.loc[:,("item_price","diff")] = temp.loc[:, ("item_price", "max")] - temp.loc[:,("item_price", "min")]

In [None]:
temp["diff"] = temp[("item_price", "max")] - temp[("item_price", "min")]

In [None]:
temp.columns.drop("item_price", level=0)

In [None]:
temp["min"] = temp[("item_price", "min")]

In [None]:
temp

In [None]:
item_price_deets['min'].equals(item_price_deets['max'])