This lesson will be showing some more of the interesting things you can do with the pandas library.
Of course pandas is very big, so this lesson will obviously not cover all of it.
We will use some of the basic features from the earlier lesson, although it should be pretty easy to follow along even if you weren't at the lesson.

In [1]:
import pandas as pd

the data we're going to use today is drinks by country, originally from http://fivethirtyeight.com/datalab/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/ and compiled into a csv by https://github.com/justmarkham , who has great python tutorials for pandas and machine learning

In [55]:
drinks = pd.read_csv('http://bit.ly/drinksbycountry')

head gives us the first 5 rows to preview the data

In [4]:
drinks.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


describe gives us some statistics of the numerical part of the data, although we can pass in some other parameters to show more. 

In [29]:
drinks.describe()

Unnamed: 0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
count,193.0,193.0,193.0,193.0
mean,106.160622,80.994819,49.450777,4.717098
std,101.143103,88.284312,79.697598,3.773298
min,0.0,0.0,0.0,0.0
25%,20.0,4.0,1.0,1.3
50%,76.0,56.0,8.0,4.2
75%,188.0,128.0,59.0,7.2
max,376.0,438.0,370.0,14.4


you can sort the dataframe (table) by a specific series (column) as follows

In [36]:
drinks.sort_values('country',ascending=False).head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
192,Zimbabwe,64,18,4,4.7,Africa
191,Zambia,32,19,4,2.5,Africa
190,Yemen,6,0,0,0.1,Asia
189,Vietnam,111,2,1,2.0,Asia
188,Venezuela,333,100,3,7.7,South America


Exercise time: Use sort of figure out the top countries in terms of beer consumption, spirits consumption, and wine consumption

Suppose we don't want the continent series (column) any more, here's how we would drop it.

In [49]:
drinks.drop('continent', axis=1).head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
0,Afghanistan,0,0,0,0.0
1,Albania,89,132,54,4.9
2,Algeria,25,0,14,0.7
3,Andorra,245,138,312,12.4
4,Angola,217,57,45,5.9


We need to specify the axis as 1, since the drop method can drop rows or columns. 1 tells it that we want to drop columns, and 0 would be to drop a row.

In [48]:
drinks.drop(3, axis=0).head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
4,Angola,217,57,45,5.9,Africa
5,Antigua & Barbuda,102,128,45,4.9,North America


Suppose we wanted to uppercase all the country names, here's how we would do it

In [58]:
upper_drinks = drinks.copy()
upper_drinks['country'] = drinks['country'].str.upper()
upper_drinks.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,AFGHANISTAN,0,0,0,0.0,Asia
1,ALBANIA,89,132,54,4.9,Europe
2,ALGERIA,25,0,14,0.7,Africa
3,ANDORRA,245,138,312,12.4,Europe
4,ANGOLA,217,57,45,5.9,Africa


Suppose we wanted to only list the countries that have 'United' in its name

In [60]:
united_drinks = drinks[drinks['country'].str.contains('United')]
united_drinks.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
181,United Arab Emirates,16,135,5,2.8,Asia
182,United Kingdom,219,126,195,10.4,Europe


Wat!?! United States of America isn't listed? Oh it's USA in the data. Well time to replace that:

In [64]:
trump_drinks = drinks.copy()
trump_drinks['country'] = drinks['country'].str.replace('USA', 'United States of America')
trump_drinks.tail(10)

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
183,Tanzania,36,6,1,5.7,Africa
184,United States of America,249,158,84,8.7,North America
185,Uruguay,115,35,220,6.6,South America
186,Uzbekistan,25,101,8,2.4,Asia
187,Vanuatu,21,18,11,0.9,Oceania
188,Venezuela,333,100,3,7.7,South America
189,Vietnam,111,2,1,2.0,Asia
190,Yemen,6,0,0,0.1,Asia
191,Zambia,32,19,4,2.5,Africa
192,Zimbabwe,64,18,4,4.7,Africa
