# Mandatory Challenge
## Context
You work in the data analysis team of a very important company. On Monday, the company shares some good news with you: you just got hired by a major retail company! So, let's get prepared for a huge amount of work!

Then you get to work with your team and define the following tasks to perform:   
1. You need to start your analysis using data from the past.  
2. You need to define a process that takes your daily data as an input and integrates it.  

You are in charge of the second part, so you are provided with a sample file that you will have to read daily. To complete you task, you need the following aggregates:
* One aggregate per store that adds up the rest of the values.
* One aggregate per item that adds up the rest of the values.

You can import the dataset `warehouse_and_retail_sales` from Ironhack's database. 

## Your task
Therefore, your process will consist of the following steps:
1. Read the sample file that a daily process will save in your folder. 
2. Clean up the data.
3. Create the aggregates.
4. Write three tables in your local database: 
    - A table for the cleaned data.
    - A table for the aggregate per supplier.
    - A table for the aggregate per item.

## Instructions
* Read the csv you can find in Ironhack's database.
* Clean the data and create the aggregates as you consider.
* Create the tables in your local database.
* Populate them with your process.

In [68]:
import pandas as pd
ware_reta = pd.read_csv('Warehouse_and_Retail_Sales.csv')
ware_reta.head()

Unnamed: 0,YEAR,MONTH,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
16671,2017,5,SOUTHERN GLAZERS WINE AND SPIRITS,77904,VIDA TEQUILA ANEJO\t - 750ML,LIQUOR,4.25,1.0,0.0
30209,2017,6,SOUTHERN GLAZERS WINE AND SPIRITS,77904,VIDA TEQUILA ANEJO - 750ML,LIQUOR,2.2,3.0,0.0
43858,2017,8,SOUTHERN GLAZERS WINE AND SPIRITS,77904,VIDA TEQUILA ANEJO - 750ML,LIQUOR,2.03,0.0,0.0
57256,2017,9,SOUTHERN GLAZERS WINE AND SPIRITS,77904,VIDA TEQUILA ANEJO - 750ML,LIQUOR,1.02,0.0,0.0
70867,2017,10,SOUTHERN GLAZERS WINE AND SPIRITS,77904,VIDA TEQUILA ANEJO - 750ML,LIQUOR,1.7,1.0,0.0
85365,2017,11,SOUTHERN GLAZERS WINE AND SPIRITS,77904,VIDA TEQUILA ANEJO - 750ML,LIQUOR,2.38,1.0,0.0
99879,2017,12,SOUTHERN GLAZERS WINE AND SPIRITS,77904,VIDA TEQUILA ANEJO - 750ML,LIQUOR,4.4,0.0,0.0
113319,2018,1,SOUTHERN GLAZERS WINE AND SPIRITS,77904,VIDA TEQUILA ANEJO - 750ML,LIQUOR,2.03,2.0,0.0
126408,2018,2,SOUTHERN GLAZERS WINE AND SPIRITS,77904,VIDA TEQUILA ANEJO - 750ML,LIQUOR,1.19,0.0,0.0


In [16]:
null_cols = ware_reta.isnull().sum()
null_cols[null_cols > 0]

SUPPLIER     24
ITEM TYPE     1
dtype: int64

In [19]:
null_displ = ware_reta[(ware_reta['SUPPLIER'].isnull()==True)|(ware_reta['ITEM TYPE'].isnull()==True)]
null_displ = null_displ[['YEAR', 'MONTH', 'SUPPLIER', 'ITEM CODE', 'ITEM DESCRIPTION','ITEM TYPE','RETAIL SALES', 'RETAIL TRANSFERS', 'WAREHOUSE SALES']]
null_displ.head(24)

Unnamed: 0,YEAR,MONTH,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
19483,2017,6,,1279,EMPTY WINE KEG - KEGS,DUNNAGE,0.0,0.0,-9.0
20056,2017,8,,1279,EMPTY WINE KEG - KEGS,DUNNAGE,0.0,0.0,-5.0
32282,2017,6,,BC,BEER CREDIT,REF,0.0,0.0,-58.0
32283,2017,6,,WC,WINE CREDIT,REF,0.0,0.0,-8.0
45871,2017,8,,BC,BEER CREDIT,REF,0.0,0.0,-699.0
45872,2017,8,,WC,WINE CREDIT,REF,0.0,0.0,-5.0
46518,2017,9,,1279,EMPTY WINE KEG - KEGS,DUNNAGE,0.0,0.0,-9.0
59259,2017,9,,BC,BEER CREDIT,REF,0.0,0.0,-502.0
59260,2017,9,,WC,WINE CREDIT,REF,0.0,0.0,-15.0
59920,2017,10,,1279,EMPTY WINE KEG - KEGS,DUNNAGE,0.0,0.0,-6.0


In [20]:
ware_reta.dtypes

YEAR                  int64
MONTH                 int64
SUPPLIER             object
ITEM CODE            object
ITEM DESCRIPTION     object
ITEM TYPE            object
RETAIL SALES        float64
RETAIL TRANSFERS    float64
WAREHOUSE SALES     float64
dtype: object

In [22]:
ware_reta.head(60)

Unnamed: 0,YEAR,MONTH,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
0,2017,4,ROYAL WINE CORP,100200,GAMLA CAB - 750ML,WINE,0.0,1.0,0.0
1,2017,4,SANTA MARGHERITA USA INC,100749,SANTA MARGHERITA P/GRIG ALTO - 375ML,WINE,0.0,1.0,0.0
2,2017,4,JIM BEAM BRANDS CO,10103,KNOB CREEK BOURBON 9YR - 100P - 375ML,LIQUOR,0.0,8.0,0.0
3,2017,4,HEAVEN HILL DISTILLERIES INC,10120,J W DANT BOURBON 100P - 1.75L,LIQUOR,0.0,2.0,0.0
4,2017,4,ROYAL WINE CORP,101664,RAMON CORDOVA RIOJA - 750ML,WINE,0.0,4.0,0.0
5,2017,4,REPUBLIC NATIONAL DISTRIBUTING CO,101680,MANISCHEWITZ CREAM WH CONCORD - 1.5L,WINE,0.0,1.0,0.0
6,2017,4,ROYAL WINE CORP,101753,BARKAN CLASSIC PET SYR - 750ML,WINE,0.0,1.0,0.0
7,2017,4,JIM BEAM BRANDS CO,10197,KNOB CREEK BOURBON 9YR - 100P - 1.75L,LIQUOR,0.0,32.0,0.0
8,2017,4,STE MICHELLE WINE ESTATES,101974,CH ST MICH P/GRIS - 750ML,WINE,0.0,26.0,0.0
9,2017,4,MONSIEUR TOUTON SELECTION,102083,CH DE LA CHESNAIE MUSCADET - 750ML,WINE,0.0,1.0,0.0


In [35]:
test = ware_reta[(ware_reta['SUPPLIER']=='Default') & (ware_reta['ITEM TYPE']=='WINE')]
test[['YEAR', 'MONTH', 'SUPPLIER', 'ITEM CODE', 'ITEM DESCRIPTION','ITEM TYPE','RETAIL SALES', 'RETAIL TRANSFERS', 'WAREHOUSE SALES']]
test.head()

Unnamed: 0,YEAR,MONTH,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
28651,2017,6,Default,60453,LUNA DI LUNA P/GRIG P/BIANCO - 1.5L,WINE,0.0,0.5,0.0
55675,2017,9,Default,60453,LUNA DI LUNA P/GRIG P/BIANCO - 1.5L,WINE,0.17,0.0,0.0
69311,2017,10,Default,60453,LUNA DI LUNA P/GRIG P/BIANCO - 1.5L,WINE,0.17,0.0,0.0


In [41]:
ware_reta.loc[(ware_reta['SUPPLIER']=='Default') & (ware_reta['ITEM TYPE']=='WINE'), 'SUPPLIER'] = 'SOUTHERN GLAZERS WINE AND SPIRITS'
test1 = ware_reta[(ware_reta['SUPPLIER']=='Default')]
test1.head()

Unnamed: 0,YEAR,MONTH,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
24,2017,4,Default,104,FOUR BOTTLE WINE TOTE,STR_SUPPLIES,0.0,2.0,0.0
48,2017,4,Default,106,SIX BOTTLE WINE TOTE (NO LOGO),STR_SUPPLIES,0.0,2.0,0.0
111,2017,4,Default,112,CORKSCREW,REF,0.0,13.0,0.0
136,2017,4,Default,114,WINE PAPER GIFT TOTE SINGLE BOTTLE,STR_SUPPLIES,0.0,56.0,0.0
2913,2017,10,Default,59978,STORE SPECIAL BEER QUART,REF,0.25,0.0,0.0


In [52]:
#print(set(ware_reta['SUPPLIER']))
ware_reta['SUPPLIER'] = ware_reta['SUPPLIER'].str.replace('WI,INC', 'WI INC')
set(ware_reta['SUPPLIER'])

{'8 VINI INC',
 'A HARDY USA LTD',
 'A I G WINE & SPIRITS',
 'A VINTNERS SELECTIONS',
 'A&E INC',
 'A&W BORDERS LLC',
 'ADAMBA IMPORTS INTL',
 'AIKO IMPORTERS INC',
 'ALLAGASH BREWING COMPANY',
 'ALLIED IMPORTERS USA LTD',
 'ALTITUDE SPIRITS INC',
 'AMERICAN BEVERAGE CORPORATION',
 'AMERICAN BEVERAGE MARKETERS',
 'AMERICAN FIDELITY TRADING',
 'AMERICAN VINTAGE BEVERAGE INC',
 'ANHEUSER BUSCH INC',
 'ARCHER ROOSE LLC',
 'AREL GROUP WINE & SPIRITS',
 'ARIS A ZISSIS',
 'ARTISANS & VINES LLC',
 'ASAHI BEER USA INC',
 'ATLANTIC WINE & SPIRITS',
 'ATLAS BREW WORKS LLC',
 'AW DIRECT LLC',
 'AZIZ SHAFI TANNIC TONGUE',
 'BACARDI USA INC',
 'BACCHUS IMPORTERS LTD',
 'BACKUP BEVERAGE',
 'BANFI PRODUCTS CORP',
 'BANVILLE & JONES WINE MERCHANTS',
 'BARON FRANCOIS LTD',
 'BARREL ONE INC',
 'BASIGNANI WINERY',
 'BINDING BRAUEREI USA INC',
 'BLACK ANKLE VINEYARDS LLC',
 'BOND DISTRIBUTING CO',
 'BOORDY VINEYARDS',
 'BORVIN BEVERAGE',
 'BOSTON BEER CORPORATION',
 'BOUTIQUE VINEYARDS LLC',
 'BRONCO WINE

In [58]:
print(set(ware_reta['ITEM TYPE']))

{nan, 'BEER', 'REF', 'KEGS', 'WINE', 'NON-ALCOHOL', 'STR_SUPPLIES', 'LIQUOR', 'DUNNAGE'}


In [64]:
test2 = ware_reta[(ware_reta['ITEM TYPE']!='BEER') & (ware_reta['ITEM TYPE']!= 'DUNNAGE') & (ware_reta['ITEM TYPE']!= 'LIQUOR') & (ware_reta['ITEM TYPE']!= 'NON-ALCOHOL') & (ware_reta['ITEM TYPE']!= 'KEGS') & (ware_reta['ITEM TYPE']!= 'REF') & (ware_reta['ITEM TYPE']!= 'STR_SUPPLIES') & (ware_reta['ITEM TYPE']!= 'WINE')]
test2[['YEAR', 'MONTH', 'SUPPLIER', 'ITEM CODE', 'ITEM DESCRIPTION','ITEM TYPE','RETAIL SALES', 'RETAIL TRANSFERS', 'WAREHOUSE SALES']]
test2

Unnamed: 0,YEAR,MONTH,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
66439,2017,10,REPUBLIC NATIONAL DISTRIBUTING CO,347939,FONTANAFREDDA BAROLO SILVER LABEL 750 ML,,0.0,0.0,1.0


In [69]:
ware_reta.loc[(ware_reta['ITEM CODE']=='347939') & (ware_reta['SUPPLIER']=='REPUBLIC NATIONAL DISTRIBUTING CO'), 'ITEM TYPE'] = 'WINE'
test3 = ware_reta.loc[(ware_reta['ITEM CODE']=='347939') & (ware_reta['SUPPLIER']=='REPUBLIC NATIONAL DISTRIBUTING CO')]
test3.head()

Unnamed: 0,YEAR,MONTH,SUPPLIER,ITEM CODE,ITEM DESCRIPTION,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
66439,2017,10,REPUBLIC NATIONAL DISTRIBUTING CO,347939,FONTANAFREDDA BAROLO SILVER LABEL 750 ML,WINE,0.0,0.0,1.0


In [72]:
before = len(ware_reta)
ware_reta = ware_reta.drop_duplicates()
after = len(ware_reta)
print('Number of duplicate records dropped: ', str(before - after))

Number of duplicate records dropped:  0


In [73]:
select_columns = ['YEAR', 'MONTH', 'SUPPLIER', 'ITEM CODE', 'ITEM DESCRIPTION','ITEM TYPE','RETAIL SALES', 'RETAIL TRANSFERS', 'WAREHOUSE SALES']
ware_reta = ware_reta[select_columns].drop_duplicates()
after = len(ware_reta)
print('Number of duplicate records dropped: ', str(before - after))

Number of duplicate records dropped:  0


In [74]:
import numpy as np
import pandas as pd
#Write three tables in your local database:
#A table for the cleaned data.
#A table for the aggregate per supplier.
#A table for the aggregate per item.

In [77]:
#to be continued
#ware_reta = 
ware_suppliers = ware_reta.groupby(['SUPPLIER'])['RETAIL SALES', 'RETAIL TRANSFERS', 'WAREHOUSE SALES'].sum().reset_index()
ware_suppliers.head(60)


  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,SUPPLIER,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
0,8 VINI INC,2.78,2.0,1.0
1,A HARDY USA LTD,0.4,0.0,0.0
2,A I G WINE & SPIRITS,12.52,5.92,134.0
3,A VINTNERS SELECTIONS,8640.57,8361.1,29776.67
4,A&E INC,11.52,2.0,0.0
5,A&W BORDERS LLC,0.8,1.0,0.0
6,ADAMBA IMPORTS INTL,32.2,40.49,0.0
7,AIKO IMPORTERS INC,11.24,11.0,3.0
8,ALLAGASH BREWING COMPANY,304.09,339.0,1742.92
9,ALLIED IMPORTERS USA LTD,7.63,11.0,18.0


In [78]:
ware_itemtype = ware_reta.groupby(['ITEM TYPE'])['RETAIL SALES', 'RETAIL TRANSFERS', 'WAREHOUSE SALES'].sum().reset_index()
ware_itemtype.head(60)

  """Entry point for launching an IPython kernel.


Unnamed: 0,ITEM TYPE,RETAIL SALES,RETAIL TRANSFERS,WAREHOUSE SALES
0,BEER,209763.11,234924.44,2437617.32
1,DUNNAGE,0.0,0.0,-45331.0
2,KEGS,0.0,0.0,43558.0
3,LIQUOR,309847.85,334176.41,33173.32
4,NON-ALCOHOL,8109.97,9058.37,8656.72
5,REF,281.34,171.92,-6754.0
6,STR_SUPPLIES,995.98,3594.7,0.0
7,WINE,313400.42,340710.51,433010.47


In [None]:
#I still need to import 3 tables to my local database