# Mandatory Challenge
## Context
You work in the data analysis team of a very important company. On Monday, the company shares some good news with you: you just got hired by a major retail company! So, let's get prepared for a huge amount of work!

Then you get to work with your team and define the following tasks to perform:   
1. You need to start your analysis using data from the past.  
2. You need to define a process that takes your daily data as an input and integrates it.  

You are in charge of the second part, so you are provided with a sample file that you will have to read daily. To complete you task, you need the following aggregates:
* One aggregate per store that adds up the rest of the values.
* One aggregate per item that adds up the rest of the values.

You can import the dataset `retail_sales` from Ironhack's database. 

## Your task
Therefore, your process will consist of the following steps:
1. Read the sample file that a daily process will save in your folder. 
2. Clean up the data.
3. Create the aggregates.
4. Write three tables in your local database: 
    - A table for the cleaned data.
    - A table for the aggregate per store.
    - A table for the aggregate per item.

## Instructions
* Read the csv you can find in Ironhack's database.
* Clean the data and create the aggregates as you consider.
* Create the tables in your local database.
* Populate them with your process.

In [1]:
# your code here
from sqlalchemy import create_engine
import pymysql
import pandas as pd

driver = 'mysql+pymysql'
ip = '127.0.0.1'
username = 'root'
password = 's2msung2'
db = 'retail_sales'
connection_string  = f'{driver}://{username}:{password}@{ip}/{db}'
engine = create_engine(connection_string)
query = 'select * from sales_by_item_index si join raw_sales rs on si.item_id = rs.item_id'

In [2]:
retail_sales = pd.read_sql(query,engine)
retail_sales

Unnamed: 0,id,item_id,item_earnings,total_items_sold,date,date.1,shop_id,item_id.1,item_price,item_cnt_day
0,43,1469,3597.0,3.0,03/12/2019,2015-01-04,29,1469,1199.0,1.0
1,940,21364,7616.0,22.0,03/12/2019,2015-01-04,28,21364,479.0,1.0
2,941,21365,9890.0,12.0,03/12/2019,2015-01-04,28,21365,999.0,2.0
3,983,22104,249.0,2.0,03/12/2019,2015-01-04,28,22104,249.0,2.0
4,981,22091,358.0,2.0,03/12/2019,2015-01-04,28,22091,179.0,1.0
...,...,...,...,...,...,...,...,...,...,...
4540,202,4240,1299.0,1.0,03/12/2019,2015-01-04,15,4240,1299.0,1.0
4541,973,21922,198.0,2.0,03/12/2019,2015-01-04,14,21922,99.0,1.0
4542,72,1969,63384.0,22.0,03/12/2019,2015-01-04,15,1969,3999.0,1.0
4543,981,22091,358.0,2.0,03/12/2019,2015-01-04,14,22091,179.0,1.0


In [32]:
#Cleaned data
retail_sales = retail_sales.loc[:,~retail_sales.columns.duplicated()]
retail_sales = retail_sales.drop(columns=['date'])
retail_sales

Unnamed: 0,id,item_id,item_earnings,total_items_sold,shop_id,item_price,item_cnt_day
0,43,1469,3597.0,3.0,29,1199.0,1.0
1,940,21364,7616.0,22.0,28,479.0,1.0
2,941,21365,9890.0,12.0,28,999.0,2.0
3,983,22104,249.0,2.0,28,249.0,2.0
4,981,22091,358.0,2.0,28,179.0,1.0
...,...,...,...,...,...,...,...
4540,202,4240,1299.0,1.0,15,1299.0,1.0
4541,973,21922,198.0,2.0,14,99.0,1.0
4542,72,1969,63384.0,22.0,15,3999.0,1.0
4543,981,22091,358.0,2.0,14,179.0,1.0


In [39]:
#Aggregate per store
store_c = retail_sales.drop(columns =['item_id'])
store = store_c.groupby(['shop_id']).sum()
store.head()

Unnamed: 0_level_0,id,item_earnings,total_items_sold,item_price,item_cnt_day
shop_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2,42957,439950.0,294.0,99070.5,81.0
3,15165,250596.0,147.0,67443.0,33.0
4,22488,133782.0,132.0,29361.0,39.0
5,28080,118863.0,138.0,33138.0,45.0
6,57111,496640.1,582.0,116352.0,150.0


In [45]:
#Aggregate per item
item_c = retail_sales.drop(['id', 'item_cnt_day'],axis=1)
item_id = item_c.groupby(['item_id']).sum()
item_id.head()

Unnamed: 0_level_0,item_earnings,total_items_sold,shop_id,item_price
item_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
30,507.0,3.0,84,507.0
31,1089.0,3.0,18,1089.0
32,447.0,3.0,93,447.0
42,897.0,3.0,162,897.0
59,747.0,3.0,171,747.0
