# Mandatory Challenge
## Context
You work in the data analysis team of a very important company. On Monday, the company shares some good news with you: you just got hired by a major retail company! So, let's get prepared for a huge amount of work!

Then you get to work with your team and define the following tasks to perform:   
1. You need to start your analysis using data from the past.  
2. You need to define a process that takes your daily data as an input and integrates it.  

You are in charge of the second part, so you are provided with a sample file that you will have to read daily. To complete you task, you need the following aggregates:
* One aggregate per store that adds up the rest of the values.
* One aggregate per item that adds up the rest of the values.

You can import the `raw_sales` table from the database `retail_sales` fon of Ironhack's databases. 

## Your task
Therefore, your process will consist of the following steps:
1. Read the sample file that a daily process will save in your folder. 
2. Clean up the data.
3. Create the aggregates.
4. Write three tables in your local database: 
    - A table for the cleaned data.
    - A table for the aggregate per store.
    - A table for the aggregate per item.

## Instructions
* Clean the data and create the aggregates as you consider.
* Create the tables in your local database.
* Populate them with your process.

In [33]:
# Importing needed packages
import pandas as pd
import numpy as np
import pymysql

# Reading the sample file

In [45]:
raw_sales = pd.read_csv(r'/Users/francesco/Desktop/DataFrames/raw_sales.csv')
raw_sales.head()

Unnamed: 0,date,shop_id,item_id,item_price,item_cnt_day
0,2015-01-04 00:00:00,29,1469,1199.0,1
1,2015-01-04 00:00:00,28,21364,479.0,1
2,2015-01-04 00:00:00,28,21365,999.0,2
3,2015-01-04 00:00:00,28,22104,249.0,2
4,2015-01-04 00:00:00,28,22091,179.0,1


# Cleaning the data

### Checking for missing values

In [8]:
raw_sales.isnull().values.any()

False

### Counting the missing values

In [9]:
raw_sales.isnull().sum()

date            0
shop_id         0
item_id         0
item_price      0
item_cnt_day    0
dtype: int64

## Creating revenue per transaction column

In [46]:
raw_sales["revenue"] = (raw_sales.item_price * raw_sales.item_cnt_day)
raw_sales.head()

Unnamed: 0,date,shop_id,item_id,item_price,item_cnt_day,revenue
0,2015-01-04 00:00:00,29,1469,1199.0,1,1199.0
1,2015-01-04 00:00:00,28,21364,479.0,1,479.0
2,2015-01-04 00:00:00,28,21365,999.0,2,1998.0
3,2015-01-04 00:00:00,28,22104,249.0,2,498.0
4,2015-01-04 00:00:00,28,22091,179.0,1,179.0


## Creating Aggregates

### Aggregates per shop

In [29]:
revenue_per_shop = raw_sales.groupby("shop_id").agg({'revenue':['sum']})
revenue_per_shop.head(10)

Unnamed: 0_level_0,revenue
Unnamed: 0_level_1,sum
shop_id,Unnamed: 1_level_2
2,103746.0
3,67443.0
4,29361.0
5,33138.0
6,138678.0
7,52371.0
10,22716.0
12,295173.0
14,57450.0
15,125139.0


## Aggregate by item

In [31]:
revenue_per_item = raw_sales.groupby("item_id").agg({'revenue':['sum']})
revenue_per_item.head(10)

Unnamed: 0_level_0,revenue
Unnamed: 0_level_1,sum
item_id,Unnamed: 1_level_2
30,507.0
31,1089.0
32,447.0
42,897.0
59,747.0
74,1497.0
109,747.0
259,747.0
464,897.0
482,39600.0


## Storing the tables

In [37]:
#Create an in-memory SQLite database.
from sqlalchemy import create_engine
engine = create_engine('sqlite://', echo=False)

In [41]:
revenue_per_item.to_sql('revenue_per_item', con=engine, if_exists= 'append')


In [42]:
revenue_per_shop.to_sql('revenue_per_shop', con=engine, if_exists='append')

In [47]:
raw_sales.to_sql('raw_sales', con=engine, if_exists='append')