## Context
You work in the data analysis team of a very important company. On Monday, the company shares some good news with you: you just got hired by a major retail company! So, let's get prepared for a huge amount of work!

Then you get to work with your team and define the following tasks to perform:   
1. You need to start your analysis using data from the past.  
2. You need to define a process that takes your daily data as an input and integrates it.  

You are in charge of the second part, so you are provided with a sample file that you will have to read daily. To complete you task, you need the following aggregates:
* One aggregate per store that adds up the rest of the values.
* One aggregate per item that adds up the rest of the values.

You can import the `raw_sales` table from the database `retail_sales` fon of Ironhack's databases. 

## Your task
Therefore, your process will consist of the following steps:
1. Read the sample file that a daily process will save in your folder. 
2. Clean up the data.
3. Create the aggregates.
4. Write three tables in your local database: 
    - A table for the cleaned data.
    - A table for the aggregate per store.
    - A table for the aggregate per item.

## Instructions
* Clean the data and create the aggregates as you consider.
* Create the tables in your local database.
* Populate them with your process.

In [29]:
# Importing raw sales
import pandas as pd
raw_sales=pd.read_csv("../data/retail_sales-raw_sales.csv", sep=";")
# Exploring the database
print(raw_sales.info())
raw_sales

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4545 entries, 0 to 4544
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   date          4545 non-null   object 
 1   shop_id       4545 non-null   int64  
 2   item_id       4545 non-null   int64  
 3   item_price    4545 non-null   float64
 4   item_cnt_day  4545 non-null   float64
dtypes: float64(2), int64(2), object(1)
memory usage: 159.8+ KB
None


Unnamed: 0,date,shop_id,item_id,item_price,item_cnt_day
0,2015-01-04 00:00:00,29,1469,1199.0,1.0
1,2015-01-04 00:00:00,28,21364,479.0,1.0
2,2015-01-04 00:00:00,28,21365,999.0,2.0
3,2015-01-04 00:00:00,28,22104,249.0,2.0
4,2015-01-04 00:00:00,28,22091,179.0,1.0
...,...,...,...,...,...
4540,2015-01-04 00:00:00,15,4240,1299.0,1.0
4541,2015-01-04 00:00:00,14,21922,99.0,1.0
4542,2015-01-04 00:00:00,15,1969,3999.0,1.0
4543,2015-01-04 00:00:00,14,22091,179.0,1.0


In [30]:
# Generate a column with the revenue per product in each store
raw_sales["item_revenue"]=raw_sales["item_price"]*raw_sales["item_cnt_day"]

In [48]:
# Create a new dataframe with the information relevant to analyze shops
shop_info=raw_sales[["shop_id", "item_id","item_price", "item_cnt_day", "item_revenue"]]
# Group by shop_id and calculate the mean selling price, the items sold and the total revenue generated per shop
shop_info.groupby("shop_id").aggregate({"item_price":"mean", "item_cnt_day":"sum", "item_revenue":"sum"})

Unnamed: 0_level_0,item_price,item_cnt_day,item_revenue
shop_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2,1320.94,81.0,103746.0
3,2043.727273,33.0,67443.0
4,752.846154,39.0,29361.0
5,736.4,45.0,33138.0
6,923.428571,150.0,138678.0
7,831.285714,63.0,52371.0
10,841.0,30.0,22716.0
12,1473.586111,216.0,295173.0
14,743.466667,51.0,57450.0
15,1345.580645,93.0,125139.0


In [51]:
# Create a new dataframe with the information relevant to analyze items
shop_info=raw_sales[["item_id","item_price", "item_cnt_day", "item_revenue"]]
# Group by item_id and price, calculate the items sold and the total revenue generated per item
shop_info.groupby(["item_id", "item_price"]).aggregate({"item_cnt_day":"sum", "item_revenue":"sum"})

Unnamed: 0_level_0,Unnamed: 1_level_0,item_cnt_day,item_revenue
item_id,item_price,Unnamed: 2_level_1,Unnamed: 3_level_1
30,169.0,3.0,507.0
31,363.0,3.0,1089.0
32,149.0,3.0,447.0
42,299.0,3.0,897.0
59,249.0,3.0,747.0
...,...,...,...
22091,179.0,6.0,1074.0
22092,179.0,3.0,537.0
22104,249.0,6.0,1494.0
22140,217.5,3.0,652.5
