# Data Science Assignment - Pandas 
---
There are tons of online resources for learning pandas. Feel free to use any of them for this lesson.

Here's a handy cheat sheet: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf

The basics: https://www.w3schools.com/python/pandas/default.asp

---

Start by importing pandas (as pd). 

In [2]:
import pandas as pd

Recall that a "DataFrame" is the heart and soul of Pandas. It is a class of object that represents tabular data.

There are 2 ways to manually create a DataFrame. You can pass into the DataFrame constructor either a dictionary of lists or a list of lists. In each method, you will need to pass in the index values. In the lists of lists, you will also need to pass in the column names. 

In [3]:
# Run this code to see that the 2 DataFrames are equivalent. (You must run the import above first!)

# A dictionary of lists
data1 = {"a" : [4 ,5, 6], "b" : [7, 8, 9], "c" : [10, 11, 12]}
df1 = pd.DataFrame(data1, index = [1, 2, 3])

# A list of lists, with columns added
data2 = [[4, 7, 10], [5, 8, 11], [6, 9, 12]]
df2 = pd.DataFrame(data2, index=[1, 2, 3], columns=['a', 'b', 'c'])

print(df1)
print(df2)

   a  b   c
1  4  7  10
2  5  8  11
3  6  9  12
   a  b   c
1  4  7  10
2  5  8  11
3  6  9  12


## Now for some real slicing and dicing!

Let's use the wine review data from the Kaggle lesson.
Read the data into a DataFrame using .read_csv and this URL:
https://raw.githubusercontent.com/davestroud/Wine/master/winemag-data_first150k.csv

Check that your variable (which should be called "reviews") contains the data.

If you notice a duplicate index column, add index_col=0 to the read_csv call to indicate that row 0 is your index. (See https://www.kaggle.com/residentmario/creating-reading-and-writing#Reading-data-files if you are unsure.)


In [5]:
# Declare a variable named "reviews" and assign it the DataFrame using pd.read_csv()

reviews = pd.read_csv("https://raw.githubusercontent.com/davestroud/Wine/master/winemag-data_first150k.csv")
reviews

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,0,US,This tremendous 100% varietal wine hails from ...,Martha's Vineyard,96,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz
1,1,Spain,"Ripe aromas of fig, blackberry and cassis are ...",Carodorum Selección Especial Reserva,96,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez
2,2,US,Mac Watson honors the memory of a wine once ma...,Special Selected Late Harvest,96,90.0,California,Knights Valley,Sonoma,Sauvignon Blanc,Macauley
3,3,US,"This spent 20 months in 30% new French oak, an...",Reserve,96,65.0,Oregon,Willamette Valley,Willamette Valley,Pinot Noir,Ponzi
4,4,France,"This is the top wine from La Bégude, named aft...",La Brûlade,95,66.0,Provence,Bandol,,Provence red blend,Domaine de la Bégude
...,...,...,...,...,...,...,...,...,...,...,...
150925,150925,Italy,Many people feel Fiano represents southern Ita...,,91,20.0,Southern Italy,Fiano di Avellino,,White Blend,Feudi di San Gregorio
150926,150926,France,"Offers an intriguing nose with ginger, lime an...",Cuvée Prestige,91,27.0,Champagne,Champagne,,Champagne Blend,H.Germain
150927,150927,Italy,This classic example comes from a cru vineyard...,Terre di Dora,91,20.0,Southern Italy,Fiano di Avellino,,White Blend,Terredora
150928,150928,France,"A perfect salmon shade, with scents of peaches...",Grand Brut Rosé,90,52.0,Champagne,Champagne,,Champagne Blend,Gosset


Find a list of the highest rated wines. (First find the highest points rating then get all wines that have that rating.)

In [12]:
# First, get the highest rating

hr = reviews.loc[(reviews.points >= 100)]
hr

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
2145,2145,France,"Full of ripe fruit, opulent and concentrated, ...",,100,848.0,Bordeaux,Pessac-Léognan,,Bordeaux-style White Blend,Château Haut-Brion
19354,19354,US,"In a stunning lineup of Cayuse Syrahs, the En ...",En Chamberlin Vineyard,100,65.0,Oregon,Walla Walla Valley (OR),Oregon Other,Syrah,Cayuse
19355,19355,Australia,Not a Cellar Selection in the traditional sens...,Rare,100,300.0,Victoria,Rutherglen,,Muscat,Chambers Rosewood Vineyards
24151,24151,Italy,"A perfect wine from a classic vintage, the 200...",Masseto,100,460.0,Tuscany,Toscana,,Merlot,Tenuta dell'Ornellaia
26296,26296,France,A wine that has created its own universe. It h...,Clos du Mesnil,100,1400.0,Champagne,Champagne,,Chardonnay,Krug
28954,28954,Italy,"This small, family-run estate in the heart of ...",Guado de' Gemoli,100,195.0,Tuscany,Bolgheri Superiore,,Red Blend,Giovanni Chiappini
41521,41521,Italy,"A perfect wine from a classic vintage, the 200...",Masseto,100,460.0,Tuscany,Toscana,,Merlot,Tenuta dell'Ornellaia
51886,51886,France,A wine that has created its own universe. It h...,Clos du Mesnil,100,1400.0,Champagne,Champagne,,Chardonnay,Krug
78004,78004,Italy,"This small, family-run estate in the heart of ...",Guado de' Gemoli,100,195.0,Tuscany,Bolgheri Superiore,,Red Blend,Giovanni Chiappini
83536,83536,France,A wine that has created its own universe. It h...,Clos du Mesnil,100,1400.0,Champagne,Champagne,,Chardonnay,Krug


In [17]:
# Next, get all the wines with that rating


Use this "best wines" DataFrame to create a "best_US_wines" DataFrame with the following columns: country, winery, price. Only show the US wines!

In [15]:
#("country", "winery", "price") 
best_US_wines = reviews.loc[(reviews.country == "US") & (reviews.points >= 100), ("country", "winery", "price")]
best_US_wines

Unnamed: 0,country,winery,price
19354,US,Cayuse,65.0
84034,US,Cayuse,65.0
89399,US,Cardinale,200.0
92916,US,Shafer,215.0
98647,US,Williams Selyem,100.0
114272,US,Sloan,245.0
119194,US,Cayuse,65.0
122767,US,Williams Selyem,100.0
137099,US,Cardinale,200.0
143522,US,Sloan,245.0


Rerun this entire Notebook (to get a "clean" version, commit and push it to Github. Let me know when it's there for me to check.