## Context
You work in the data analysis team of a very important company. On Monday, the company shares some good news with you: you just got hired by a major retail company! So, let's get prepared for a huge amount of work!

Then you get to work with your team and define the following tasks to perform:   
1. You need to start your analysis using data from the past.  
2. You need to define a process that takes your daily data as an input and integrates it.  

You are in charge of the second part, so you are provided with a sample file that you will have to read daily. To complete you task, you need the following aggregates:
* One aggregate per store that adds up the rest of the values.
* One aggregate per item that adds up the rest of the values.

You can import the `raw_sales` table from the database `retail_sales` fon of Ironhack's databases. 

## Your task
Therefore, your process will consist of the following steps:
1. Read the sample file that a daily process will save in your folder. 
2. Clean up the data.
3. Create the aggregates.
4. Write three tables in your local database: 
    - A table for the cleaned data.
    - A table for the aggregate per store.
    - A table for the aggregate per item.

## Instructions
* Clean the data and create the aggregates as you consider.
* Create the tables in your local database.
* Populate them with your process.

In [68]:
import pandas as pd 
import numpy as np


In [69]:
retail_raw = pd.read_csv("../data/retail_sales-raw_sales.csv", sep = ";")

In [70]:
retail_raw #we want to turn price and count into integers
#we rename columns too

new_names = ["Date", "Shop_Id",  "Item_ID", "Item_Price", "Item_count_Day"]
columns_1 = ["date", "shop_id", "item_id", "item_price", "item_cnt_day"]

retail_raw.rename(columns=dict(zip(columns_1, new_names)), inplace=True)
retail_raw

Unnamed: 0,Date,Shop_Id,Item_ID,Item_Price,Item_count_Day
0,2015-01-04 00:00:00,29,1469,1199.0,1.0
1,2015-01-04 00:00:00,28,21364,479.0,1.0
2,2015-01-04 00:00:00,28,21365,999.0,2.0
3,2015-01-04 00:00:00,28,22104,249.0,2.0
4,2015-01-04 00:00:00,28,22091,179.0,1.0
...,...,...,...,...,...
4540,2015-01-04 00:00:00,15,4240,1299.0,1.0
4541,2015-01-04 00:00:00,14,21922,99.0,1.0
4542,2015-01-04 00:00:00,15,1969,3999.0,1.0
4543,2015-01-04 00:00:00,14,22091,179.0,1.0


In [71]:
retail_raw["Item_Price"] =  retail_raw["Item_Price"].astype('int32')
retail_raw["Item_count_Day"] =  retail_raw["Item_count_Day"].astype('int32')

retail_raw.head()


retail_raw_cleaned = retail_raw

retail_raw_cleaned.to_csv("Cleaned_data.csv")

In [72]:
#We now agregate the Items by ID and create a new data frame 

Raw_sales_by_item = retail_raw.groupby("Item_ID").sum()



Raw_sales_by_item.to_csv('Raw_sales_by_Item.csv')

Raw_sales_by_item

Unnamed: 0_level_0,Shop_Id,Item_Price,Item_count_Day
Item_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
30,84,507,3
31,18,1089,3
32,93,447,3
42,162,897,3
59,171,747,3
...,...,...,...
22091,126,1074,6
22092,144,537,3
22104,84,747,6
22140,117,651,3


In [73]:
#We now agregate the Stores by ID and create a new data frame 

Raw_sales_by_store = retail_raw.groupby("Shop_Id").sum()



Raw_sales_by_store.to_csv('Raw_sales_by_Store.csv')

Raw_sales_by_store

Unnamed: 0_level_0,Item_ID,Item_Price,Item_count_Day
Shop_Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2,966879,99069,81
3,335745,67443,33
4,498624,29361,39
5,620868,33138,45
6,1266894,116349,150
7,669045,52371,63
10,310137,22707,30
12,1647339,212193,216
14,421977,33456,51
15,1210026,125139,93
