<a href="https://colab.research.google.com/github/iakob12345/Wine-Project/blob/master/Wine_3_descriptive_statistics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Introduction**

##Libraries Imported

In [None]:
from google.colab import drive
import pandas as pd
import numpy as np
from plotly import graph_objs as go
import plotly.figure_factory as ff
import plotly.express as px
from numpy import arange

drive.mount('/content/drive/')
path='/content/drive/My Drive/Colab Notebooks/winedata_df_after_data_wrangling.pkl'

# read cvs in Pandas
winedata_df = pd.read_pickle(path)

Mounted at /content/drive/


##Useful Functions

In [None]:
def groupby_count_mean(groupby_col,agg_col):
  return winedata_df.groupby(by=groupby_col)[agg_col].agg(counts='count', mean ='mean').round(0).reset_index()

##Data Introduction

In [None]:
winedata_df.head(5)

Unnamed: 0,catalog,wine_type,winery,grape,harvest_year,country,region,price,price_category,rating_avg,total_reviews,value,category,link,country_iso3,group
0,red_portugal,red,Bacalhôa,Alentejano Monte das Ânforas Tinto,2014,Portugal,Alentejano,1.85,Low,3.4,740,183.78,Red-Low,https://www.vivino.com/bacalhoa-vinhos-de-port...,PRT,Portugal-Red-Low
1,red_portugal,red,AR - Adega de Redondo,Porta Da Ravessa Alentejo Tinto,2017,Portugal,Alentejo,2.39,Low,3.4,1016,142.26,Red-Low,https://www.vivino.com/adega-cooperativa-de-re...,PRT,Portugal-Red-Low
2,red_portugal,red,AR - Adega de Redondo,Porta Da Ravessa Alentejo Tinto,2018,Portugal,Alentejo,2.39,Low,3.4,479,142.26,Red-Low,https://www.vivino.com/adega-cooperativa-de-re...,PRT,Portugal-Red-Low
3,Rose,rose,Canals & Nubiola,Rosado,No Year Detected,Spain,Catalunya,2.65,Low,2.9,360,109.43,Rose-Low,https://www.vivino.com/canals-and-nubiola-rosa...,ESP,Spain-Rose-Low
4,red_portugal,red,Alfacinha,Tinto,2018,Portugal,Lisboa,2.8,Low,3.5,385,125.0,Red-Low,https://www.vivino.com/alfacinha-tinto/w/68775...,PRT,Portugal-Red-Low


In [None]:
groupby_count_mean('group','value').sort_values(by='counts',ascending=True)

Unnamed: 0,group,counts,mean
47,Italy-Dessert-Premium,51,12.0
67,South africa-Rose-Low,51,51.0
11,Australia-White-Premium,54,10.0
81,United states-White-Luxury,55,2.0
63,Portugal-Sparkling-Medium,56,25.0
...,...,...,...
26,France-Red-Low,890,46.0
27,France-Red-Luxury,931,2.0
28,France-Red-Medium,979,24.0
29,France-Red-Premium,1073,8.0


# **Country Distribution**

##**Country Representation**

In [None]:
country_iso3_groupby = groupby_count_mean('country_iso3','value')

fig = go.Figure(data=go.Choropleth(
    locations = country_iso3_groupby['country_iso3'],
    z = country_iso3_groupby['counts'],
    colorscale = 'Reds',
    autocolorscale=False,
    reversescale=False,
    marker_line_color='darkgray',
    marker_line_width=0.5,
    colorbar_title = 'Wine Bottles',
))

fig.update_layout(
    title_text='Country Representation of Wines',
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type='equirectangular'
    )
)

fig.show()

## **Country Representation for Wine Types**

In [None]:
wine_country_df = groupby_count_mean(['wine_type','country'],'value')
wine_country_df.sort_values(by='counts',ascending=False,inplace=True)

fig = px.bar(wine_country_df.sort_values(by='counts',ascending=False), 
             x="wine_type", 
             y="counts", 
             color="country", 
             title="Country representation for each Price Category",
             barmode='group',
             text = 'country',
             category_orders = {'wine_type':['Red','White','Sparkling', 'Dessert','Rose']})
fig.show()

##**Country Representation for Price Categories**

In [None]:
wine_country_price_df = groupby_count_mean(['price_category','country'],'value')

fig = px.bar(wine_country_price_df.sort_values(by='counts',ascending=False), 
             x="price_category", 
             y="counts", 
             color="country", 
             title="Country representation for each Price Category",
             barmode='group',
             text = 'country',
             category_orders = {'price_category':['Low','Medium','Premium','Luxury']})
fig.show()

# **Value Exploration**

## **Avg Value for Price Categories**

*Price categories were defined as follows*:
- Low: 𝑃 <10€,
- Medium: 10€ ≤ 𝑃 < 25€,
- Premium: 25€ ≤ 𝑃 < 100€
- Luxury: 𝑃 ≥ 100€



In [None]:
fig = px.bar(groupby_count_mean(['price_category'],'value'), 
             x="mean", 
             y="price_category", 
             orientation='h',
             category_orders = {'price_category':['Low','Medium','Premium','Luxury']},
             color = 'mean',
             text = "mean",
             title = "Mean Values for Price Categories")
fig.show()

## **Avg Values for Wine Types**

In [None]:
fig = px.bar(groupby_count_mean(['wine_type'],'value').sort_values('mean'), 
             x="mean", 
             y="wine_type", 
             orientation='h',
             color='mean',
             text = "mean",
             title = "Mean Values for different Wine Types")
fig.show()

## **Avg Values for Price Categories & Countries**

In [None]:
fig = px.bar(groupby_count_mean(['country','price_category'],'value').sort_values(by='mean',ascending=False), 
             x="price_category", 
             y="mean", 
             color="country", 
             title="Value distribution across price categories and countries",
             barmode='group',
             text = 'country',
             category_orders = {'price_category':['Low','Medium','Premium','Luxury']})
fig.show()

**Quick Inferences**
<br>
Upfront, without statistical testing, dataset confirms general world-wide assumptions, as well as, intrigues us about some new, interesting pieces of information:

*   By Price: Low priced wines have highest value
*   By Wine Types: Rose has the highest value and Dessert has the lowest value

