# PourTaste - Project 1

Project Title: PourTaste
Team Members: Erin Ford, Elizabeth Brown and Dylan Grimm,
Project Description: With no experience as Sommeliers or Oenophiles, we would like to study wine data.    Our project is to examine the relationships between wine prices, reviews, regions, provinces, points, and variety.
Research Questions: First, we would like to answer the question whether it is possible to get high rated wines for less than $15 a bottle. We also want to know which regions and countries have the most wineries and which have the best reviews and prices.
Data Sets Used: 
•	Kaggle - https://www.kaggle.com/zynicide/wine-reviews
Breakdown of Tasks:
•	Erin – Compare price vs. reviews and price vs. variety. Build a word cloud based on descriptions of wines
•	Dylan – Evaluate the number of wineries in given regions and countries. Compare price vs. country/region and create mapping for given regions/countries
•	Elizabeth – Pull main CSV into the repository and create main Data Frame. Lastly, compare price vs year/vintage


In [18]:
#Dependencies
import pandas as pd
import numpy as np
import matplotlib.pyplot as lt
pd.options.display.max_colwidth = 100


In [19]:
#File Location
csvfile = "Output/Resources/winemag-data-130k-v2.csv"
#Read csv file and store in Pandas dataframe, preview head (skipfirst colmn)
wine_data_raw= pd.read_csv(csvfile,usecols=range(1,14))
wine_data_raw.head(2)

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimstone and dried herb. The palate isn't overly expressi...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth while still structured. Firm tannins are filled o...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos


In [20]:
wine_data_raw.describe()

Unnamed: 0,points,price
count,129971.0,120975.0
mean,88.447138,35.363389
std,3.03973,41.022218
min,80.0,4.0
25%,86.0,17.0
50%,88.0,25.0
75%,91.0,42.0
max,100.0,3300.0


In [21]:
#Since price is our main comparison factor, we are only including data instances with a price. 
#Removing all records with NaN for price
wine_data = wine_data_raw.dropna(subset = ['price'])
wine_data.describe()

Unnamed: 0,points,price
count,120975.0,120975.0
mean,88.421881,35.363389
std,3.044508,41.022218
min,80.0,4.0
25%,86.0,17.0
50%,88.0,25.0
75%,91.0,42.0
max,100.0,3300.0


Note:  Do not capitalize or add spaces in the column titles to allow for cleaner pandas df referencing.  

In [22]:
wine_data.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
1,Portugal,"This is ripe and fruity, a wine that is smooth while still structured. Firm tannins are filled o...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and rind dominate. Some green pineapple pokes through...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom start off the aromas. The palate is a bit more opu...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling (Lake Michigan Shore),Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this comes across as rather rough and tannic, with rus...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley),Pinot Noir,Sweet Cheeks
5,Spain,"Blackberry and raspberry aromas show a typical Navarran whiff of green herbs and, in this case, ...",Ars In Vitro,87,15.0,Northern Spain,Navarra,,Michael Schachner,@wineschach,Tandem 2011 Ars In Vitro Tempranillo-Merlot (Navarra),Tempranillo-Merlot,Tandem


In [23]:
# What are the column datatypes?
wine_data.dtypes

country                   object
description               object
designation               object
points                     int64
price                    float64
province                  object
region_1                  object
region_2                  object
taster_name               object
taster_twitter_handle     object
title                     object
variety                   object
winery                    object
dtype: object

In [24]:
#Save the clean dataframe to a new csv file
#File Location
cleancsvfile = "output/Resources/wine_data.csv"
#Save dataframe as a new csv file
wine_data.to_csv(cleancsvfile)


# Wines less than $20

In [25]:
#create dataframe of wines less than $20 sorted by points
cheap_wine_data = wine_data[(wine_data["price"] <= 20) & (wine_data["points"] >= 94)].sort_values("points",ascending = False)
cheap_wine_data

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
101580,US,"Superb fruit highlights this tight, sculpted Syrah. It's extremely dense, with deep and intense ...",,96,20.0,Washington,Columbia Valley (WA),Columbia Valley,Paul Gregutt,@paulgwine,Rulo 2007 Syrah (Columbia Valley (WA)),Syrah,Rulo
5011,US,"Truly stunning, the Lewis Estate Riesling from Dunham explodes with fragrant fruits—citrus, melo...",Lewis Estate Vineyard,95,20.0,Washington,Columbia Valley (WA),Columbia Valley,Paul Gregutt,@paulgwine,Dunham 2010 Lewis Estate Vineyard Riesling (Columbia Valley (WA)),Riesling,Dunham
76570,US,"Spectacular is the word that comes to mind here. Tasted over many hours, this stunning effort pr...",Bacchus Vineyard,95,20.0,Washington,Columbia Valley (WA),Columbia Valley,Paul Gregutt,@paulgwine,Januik 2012 Bacchus Vineyard Riesling (Columbia Valley (WA)),Riesling,Januik
126242,US,"With residual sugar at about 1.35%, this balances the acidity without seeming sweet. Hand-picked...",,95,20.0,Washington,Columbia Valley (WA),Columbia Valley,Paul Gregutt,@paulgwine,Poet's Leap 2009 Riesling (Columbia Valley (WA)),Riesling,Poet's Leap
15196,France,"The home vineyard of Madiran's star, Alain Brumont, has yielded a superb wine that is full of da...",Château Bouscassé,95,20.0,Southwest France,Madiran,,Roger Voss,@vossroger,Alain Brumont 2010 Château Bouscassé Red (Madiran),Red Blend,Alain Brumont
17983,France,This is one of the classics among Provence rosés. Produced from organically grown grapes and com...,,94,20.0,Provence,Coteaux d'Aix-en-Provence,,Roger Voss,@vossroger,Château Vignelaure 2016 Rosé (Coteaux d'Aix-en-Provence),Rosé,Château Vignelaure
19136,Spain,"A spectacularly sweet and rich bruiser, and one that delivers the essence of raisins, toffee and...",Pedro Ximenez 1827 Sweet Sherry,94,14.0,Andalucia,Jerez,,Michael Schachner,@wineschach,Osborne NV Pedro Ximenez 1827 Sweet Sherry Sherry (Jerez),Sherry,Osborne
123776,US,"Don't let the cartoony label fool you, this is fantastic wine. The color is almost black to the ...",Petite Petit,94,18.0,California,Lodi,Central Valley,Jim Gordon,@gordone_cellars,Michael David 2012 Petite Petit Petite Sirah (Lodi),Petite Sirah,Michael David
117857,France,"From one of the top estates in Cahors, this complex, dense wine is both structured and packed wi...",Cèdre Héritage,94,20.0,Southwest France,Cahors,,Roger Voss,@vossroger,Château du Cèdre 2012 Cèdre Héritage Malbec (Cahors),Malbec,Château du Cèdre
117839,France,"Bouscassé is the home estate of Alain Brumont, who also owns Château Montus. The wine may be six...",Château Bouscassé,94,20.0,Southwest France,Madiran,,Roger Voss,@vossroger,Alain Brumont 2009 Château Bouscassé Red (Madiran),Red Blend,Alain Brumont


In [26]:
cheap_wine_data = cheap_wine_data[["title","variety", "price", "points", "province"]]

In [27]:
cheap_wine_data.variety.count()

34

In [28]:
cheap_wine_data_grouped = pd.DataFrame(cheap_wine_data.groupby(['variety'])['title','province','points','price'].max())
cheap_wine_data_grouped

Unnamed: 0_level_0,title,province,points,price
variety,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Chardonnay,Joseph Carr 2009 Dijon Clone Chardonnay (Sonoma Coast),California,94,20.0
Gamay,Domaines Dominique Piron 2015 Vieilles Vignes (Moulin-à-Vent),Beaujolais,94,20.0
Grüner Veltliner,Geyerhof 2016 Rosensteig Grüner Veltliner (Kremstal),Kremstal,94,20.0
Malbec,Château du Cèdre 2012 Cèdre Héritage Malbec (Cahors),Southwest France,94,20.0
Moscato Giallo,Uvaggio 2010 Secco Moscato Giallo (Lodi),California,94,16.0
Petit Manseng,Domaine Cauhapé 2011 Symphonie de Novembre (Jurançon),Southwest France,94,18.0
Petite Sirah,Michael David 2012 Petite Petit Petite Sirah (Lodi),California,94,18.0
Port,Brian Carter Cellars 2009 Opulento Dessert Wine Touriga-Souzao-Tinto Cão Port (Yakima Valley),Washington,94,20.0
Portuguese Red,Quinta dos Murças 2011 Assobio Red (Douro),Douro,94,20.0
Red Blend,Le Fraghe 2015 Brol Grande (Bardolino),Veneto,95,20.0


In [29]:
cheap_wine_data_grouped.to_excel("Output/Top_wines.xlsx")