# Pandas II

## More indexing tricks

We'll start out with some data from Beer Advocate (see [Tom Augspurger](https://github.com/TomAugspurger/pydata-chi-h2t/blob/master/3-Indexing.ipynb) for some cool details on how he extracted this data)

In [8]:
import numpy as np
import pandas as pd
pd.options.display.max_rows = 10

In [4]:
df = pd.read_csv('data/beer_subset.csv.gz', parse_dates=['time'], compression='gzip')
df.head()

Unnamed: 0,abv,beer_id,brewer_id,beer_name,beer_style,review_appearance,review_aroma,review_overall,review_palate,profile_name,review_taste,text,time
0,7.0,2511,287,Bell's Cherry Stout,American Stout,4.5,4.0,4.5,4.0,blaheath,4.5,Batch 8144\tPitch black in color with a 1/2 f...,2009-10-05 21:31:48
1,5.7,19736,9790,Duck-Rabbit Porter,American Porter,4.5,4.0,4.5,4.0,GJ40,4.0,Sampled from a 12oz bottle in a standard pint...,2009-10-05 21:32:09
2,4.8,11098,3182,Fürstenberg Premium Pilsener,German Pilsener,4.0,3.0,3.0,3.0,biegaman,3.5,Haystack yellow with an energetic group of bu...,2009-10-05 21:32:13
3,9.5,28577,3818,Unearthly (Imperial India Pale Ale),American Double / Imperial IPA,4.0,4.0,4.0,4.0,nick76,4.0,"The aroma has pine, wood, citrus, caramel, an...",2009-10-05 21:32:37
4,5.8,398,119,Wolaver's Pale Ale,American Pale Ale (APA),4.0,3.0,4.0,3.5,champ103,3.0,A: Pours a slightly hazy golden/orange color....,2009-10-05 21:33:14


### Boolean indexing

Like a where clause in SQL. 

The indexer (or boolean mask) should be 1-dimensional and the same length as the thing being indexed.

In [10]:
df.abv < 5

0      False
1      False
2       True
3      False
4      False
       ...  
994    False
995    False
996    False
997    False
998    False
Name: abv, dtype: bool

In [12]:
df.loc[df.abv < 5].head()

Unnamed: 0,abv,beer_id,brewer_id,beer_name,beer_style,review_appearance,review_aroma,review_overall,review_palate,profile_name,review_taste,text,time
2,4.8,11098,3182,Fürstenberg Premium Pilsener,German Pilsener,4.0,3.0,3.0,3.0,biegaman,3.5,Haystack yellow with an energetic group of bu...,2009-10-05 21:32:13
7,4.8,1669,256,Great White,Witbier,4.5,4.5,4.5,4.5,n0rc41,4.5,"Ok, for starters great white I believe will b...",2009-10-05 21:34:29
21,4.6,401,118,Dark Island,Scottish Ale,4.0,4.0,3.5,4.0,abuliarose,4.0,"Poured into a snifter, revealing black opaque...",2009-10-05 21:47:36
22,4.9,5044,18968,Kipona Fest,Märzen / Oktoberfest,4.0,3.5,4.0,4.0,drcarver,4.0,A - a medium brown body with an off white hea...,2009-10-05 21:47:56
28,4.6,401,118,Dark Island,Scottish Ale,4.0,4.0,4.5,4.0,sisuspeed,4.0,The color of this beer fits the name well. Op...,2009-10-05 21:53:38


In [15]:
df.loc[((df.abv < 5) & (df.time > pd.Timestamp('2009-06'))) | (df.review_overall >= 4.5)].head()

Unnamed: 0,abv,beer_id,brewer_id,beer_name,beer_style,review_appearance,review_aroma,review_overall,review_palate,profile_name,review_taste,text,time
0,7.0,2511,287,Bell's Cherry Stout,American Stout,4.5,4.0,4.5,4.0,blaheath,4.5,Batch 8144\tPitch black in color with a 1/2 f...,2009-10-05 21:31:48
1,5.7,19736,9790,Duck-Rabbit Porter,American Porter,4.5,4.0,4.5,4.0,GJ40,4.0,Sampled from a 12oz bottle in a standard pint...,2009-10-05 21:32:09
2,4.8,11098,3182,Fürstenberg Premium Pilsener,German Pilsener,4.0,3.0,3.0,3.0,biegaman,3.5,Haystack yellow with an energetic group of bu...,2009-10-05 21:32:13
6,6.2,53128,1114,Smokin' Amber Kegs Gone Wild,American Amber / Red Ale,3.5,4.0,4.5,4.0,Deuane,4.5,An American amber with the addition of smoked...,2009-10-05 21:34:24
7,4.8,1669,256,Great White,Witbier,4.5,4.5,4.5,4.5,n0rc41,4.5,"Ok, for starters great white I believe will b...",2009-10-05 21:34:29


See [the docs](http://pandas.pydata.org/pandas-docs/stable/timeseries.html) for more information on Pandas' complex time and date functionalities...

Be careful with the order of operations...

In [16]:
2 > 1 & 0

True

Safest to use parentheses...

In [17]:
(2 > 1) & 0

0

Select just the rows where the `beer_style` contains `'IPA'`:

In [19]:
df.beer_style.str?

In [20]:
df.beer_style.str.

SyntaxError: invalid syntax (<ipython-input-20-8785dd64c165>, line 1)

In [21]:
df.beer_style.str.contains('IPA')

0      False
1      False
2      False
3       True
4      False
       ...  
994    False
995    False
996    False
997    False
998    False
Name: beer_style, dtype: bool

In [24]:
df.loc[df.beer_style.str.contains('IPA')].head()

Unnamed: 0,abv,beer_id,brewer_id,beer_name,beer_style,review_appearance,review_aroma,review_overall,review_palate,profile_name,review_taste,text,time
3,9.5,28577,3818,Unearthly (Imperial India Pale Ale),American Double / Imperial IPA,4.0,4.0,4.0,4.0,nick76,4.0,"The aroma has pine, wood, citrus, caramel, an...",2009-10-05 21:32:37
8,6.7,6549,140,Northern Hemisphere Harvest Wet Hop Ale,American IPA,4.0,4.0,4.0,4.0,david18,4.0,I like all of Sierra Nevada's beers but felt ...,2009-10-05 21:34:31
16,8.0,36179,3818,Hoppe (Imperial Extra Pale Ale),American Double / Imperial IPA,4.0,3.0,4.0,3.5,nick76,3.0,"The aroma is papery with citrus, yeast, and s...",2009-10-05 21:43:23
23,6.5,44727,596,Portsmouth 5 C's IPA,American IPA,4.5,5.0,5.0,4.5,ALeF,5.0,As a devoted drinker of American and English ...,2009-10-05 21:48:46
26,5.9,37477,140,Sierra Nevada Anniversary Ale (2007-2009),American IPA,4.5,4.5,4.5,4.5,n0rc41,4.5,Poured a great dark color with great smell! t...,2009-10-05 21:51:33


Find the rows where the beer style is either `'American IPA'` or `'Pilsner'`:

In [25]:
df[(df.beer_style == 'American IPA') | (df.beer_style == 'Pilsner')].head()

Unnamed: 0,abv,beer_id,brewer_id,beer_name,beer_style,review_appearance,review_aroma,review_overall,review_palate,profile_name,review_taste,text,time
8,6.7,6549,140,Northern Hemisphere Harvest Wet Hop Ale,American IPA,4.0,4.0,4.0,4.0,david18,4.0,I like all of Sierra Nevada's beers but felt ...,2009-10-05 21:34:31
23,6.5,44727,596,Portsmouth 5 C's IPA,American IPA,4.5,5.0,5.0,4.5,ALeF,5.0,As a devoted drinker of American and English ...,2009-10-05 21:48:46
26,5.9,37477,140,Sierra Nevada Anniversary Ale (2007-2009),American IPA,4.5,4.5,4.5,4.5,n0rc41,4.5,Poured a great dark color with great smell! t...,2009-10-05 21:51:33
32,7.5,6076,651,Flower Power India Pale Ale,American IPA,3.5,4.5,4.0,3.5,OnThenIn,4.0,Appearance: The beer pours a rather cloudy da...,2009-10-05 22:02:11
48,6.7,44749,140,Sierra Nevada Chico Estate Harvest Wet Hop Ale...,American IPA,4.5,3.5,4.0,4.5,mikey711,4.0,I love this concept. Way to go Sierra Nevada!...,2009-10-05 22:19:33


Or more succinctly:

In [26]:
df[df.beer_style.isin(['American IPA', 'Pilsner'])].head()

Unnamed: 0,abv,beer_id,brewer_id,beer_name,beer_style,review_appearance,review_aroma,review_overall,review_palate,profile_name,review_taste,text,time
8,6.7,6549,140,Northern Hemisphere Harvest Wet Hop Ale,American IPA,4.0,4.0,4.0,4.0,david18,4.0,I like all of Sierra Nevada's beers but felt ...,2009-10-05 21:34:31
23,6.5,44727,596,Portsmouth 5 C's IPA,American IPA,4.5,5.0,5.0,4.5,ALeF,5.0,As a devoted drinker of American and English ...,2009-10-05 21:48:46
26,5.9,37477,140,Sierra Nevada Anniversary Ale (2007-2009),American IPA,4.5,4.5,4.5,4.5,n0rc41,4.5,Poured a great dark color with great smell! t...,2009-10-05 21:51:33
32,7.5,6076,651,Flower Power India Pale Ale,American IPA,3.5,4.5,4.0,3.5,OnThenIn,4.0,Appearance: The beer pours a rather cloudy da...,2009-10-05 22:02:11
48,6.7,44749,140,Sierra Nevada Chico Estate Harvest Wet Hop Ale...,American IPA,4.5,3.5,4.0,4.5,mikey711,4.0,I love this concept. Way to go Sierra Nevada!...,2009-10-05 22:19:33
