# Plotting Multiple Data Series

Complete the following set of exercises to solidify your knowledge of plotting multiple data series with pandas, matplotlib, and seaborn. Part of the challenge that comes with plotting multiple data series is transforming the data into the form needed to visualize it like you want. For some of the exercises in this lab, you will need to transform the data into the form most appropriate for generating the visualization and then create the plot. The data can be found [here](https://drive.google.com/file/d/1tgx8nnEXLcqy1ds_99T_14-2B9TM-Gne/view?usp=sharing), please downloand and place them in your local data folder from which you can read them.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Read the data
data = pd.read_csv('../data/liquor_store_sales.csv')
data.head()

## 1. Create a bar chart with bars for total Retail Sales, Retail Transfers, and Warehouse Sales by Item Type.

In [None]:
Sales_per_Item = data.pivot_table(index=['ItemType'], values=['RetailSales','RetailTransfers','WarehouseSales'], aggfunc=sum)

Sales_per_Item.plot.barh()
plt.show()

## 2. Create a horizontal bar chart showing sales mix for the top 10 suppliers with the most total sales. 

In [None]:
Top_10 = data

#create column for total sale
Top_10['Total_Sale'] =  Top_10['RetailSales'] + Top_10['RetailTransfers'] + Top_10['WarehouseSales']
#group per supplier
Top_10 = Top_10.pivot_table(index=['Supplier'], values=['RetailSales','RetailTransfers','WarehouseSales','Total_Sale'], aggfunc=sum)
#sort by total sales to find biggest supplier
Top_10 = Top_10.sort_values(by='Total_Sale',ascending=False).iloc[0:10,:]
#drop total as its no longer needed
Top_10 = Top_10.drop('Total_Sale',1)
#plot:
Top_10.plot.barh()
plt.show()

## 3. Create a bar-chart chart that shows average Retail Sales, Retail Transfers, and Warehouse Sales per month over time.

In [None]:
per_month = data

per_month = per_month.pivot_table(index=['Year','Month'], values=['RetailSales','RetailTransfers','WarehouseSales'], aggfunc=np.mean)

per_month.plot.bar()
plt.title("Average sales per month")
plt.show()

## 4. Create a multi-line chart that shows Retail Sales summed by Item Type over time (Year & Month).

*Hint: There should be a line representing each Item Type.*

In [None]:
item_per_month = data

item_per_month = item_per_month.pivot_table(index=['Year','Month'],columns=["ItemType"], values=['RetailSales'], aggfunc=sum)
item_per_month.plot()
figsize = (16,4)
plt.rcParams["figure.figsize"] = figsize
plt.show()
#peak during Xmas season for alcohol

## 6. Plot the same information as above (i.e. Q5) but as a bar chart.

In [None]:
item_per_month.plot.bar()
plt.show()


## 7. Create a scatter plot showing the relationship between Retail Sales (x-axis) and Retail Transfers (y-axis) with the plot points color-coded according to their Item Type.

*Hint: Seaborn's lmplot is the easiest way to generate the scatter plot.*

In [None]:
sns.scatterplot(x="RetailSales", y="RetailTransfers", data=data, hue='ItemType')
plt.show()



## 8. Create a scatter matrix using all the numeric fields in the data set with the plot points color-coded by Item Type.

*Hint: Seaborn's pairplot may be your best option here.*

In [None]:
#numeric_data = 
sns.pairplot(data, hue = 'ItemType')
plt.show()
#This cell gives error , but I think I found the problem: the issue is that when using
# pairplot, on the diagonal of the plot matrix it tries to fit and plot also a plausible distribution of the univariate variable, but
# sometimes it cannot fit the data to a ditribution and gives error. For example if you take the Year column
# it only has two years 2017 and 2018 and it has trouble to find a matching distribution. A way around is
# to plot the data as raw histogram, see cell below


In [None]:
sns.pairplot(data, hue = 'ItemType', diag_kind= 'hist')