# NumPy + Pandas

### Learning Objectives

- Use NumPy in a Jupyter notebook to perform mathematical operations
- Import data from CSV files using Pandas
- Alter a Pandas dataframe by filtering or slicing to get a subset of data
- Use a combination of NumPy and Pandas to clean data

### Overview
- NumPy and Pandas are two common Python libraries used for statistical analysis, data wrangling and advanced mathematical operations
- To put it simply, they are you connection to the data
- Pandas is the most import tool as you'll spend the most time using it

## NumPy

In [1]:
# import the NumPy library
import numpy as np

The most fundamental data object in NumPy is the multidimensional array. Think of an array as a table of elements, similar to an Excel spreadsheet, consisting of items all of the same type (usually numbers).

In [2]:
# Let's turn a list into an array

data = [1, 3, 5, 7, 9, -1]
array = np.array(data)

In [3]:
# Call the array
array

array([ 1,  3,  5,  7,  9, -1])

In [4]:
# Print out the type of the array
type(array)

numpy.ndarray

How does an array differ from a list?

In [5]:
# Can you perform a mathematical operation with a list and an integer?

data + 5

TypeError: can only concatenate list (not "int") to list

In [6]:
# What about with an array and an integer?

array + 5

array([ 6,  8, 10, 12, 14,  4])

NumPy arrays have easy to use mathematical abilities that lists don't have which is why they're better to use.

In [7]:
# Calculate the mean of the array
array.mean()

4.0

In [8]:
# Find the max value in the array
array.max()

9

In [9]:
# Find the min value in the array
array.min()

-1

In [10]:
# Sum all of the values in the array
array.sum()

24

In [11]:
# Find the standard deviation of the values in the array
array.std()

3.415650255319866

In [13]:
# What happens when you do this

dir(array) # these are the different methods available for array (i.e. the functions you can call)

['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_finalize__',
 '__array_interface__',
 '__array_prepare__',
 '__array_priority__',
 '__array_struct__',
 '__array_wrap__',
 '__class__',
 '__complex__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__delslice__',
 '__div__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getslice__',
 '__gt__',
 '__hash__',
 '__hex__',
 '__iadd__',
 '__iand__',
 '__idiv__',
 '__ifloordiv__',
 '__ilshift__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__long__',
 '__lshift__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__oct__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdiv__',
 '__rdivmod__',


You can also use NumPy itself to call certain functions.

In [14]:
# Get the absolute value of the items in the array
np.abs(array)

array([1, 3, 5, 7, 9, 1])

In [15]:
# Calculate the median of the array

np.median(array)

4.0

In [16]:
# How would you square the values in the array?
np.square(array)

array([ 1,  9, 25, 49, 81,  1])

In [17]:
# How would you get the sqare root of the values in the array?
np.sqrt(array)

  


array([ 1.        ,  1.73205081,  2.23606798,  2.64575131,  3.        ,
               nan])

Hmm...Python is warning us that it's invalid to take the square root of -1 (imaginary numbers don't exist here). Let's suppress the warning (only do this if you know what you're doing!!)

In [18]:
import warnings
warnings.filterwarnings('ignore')

So far, we've been using an array with a single dimension, similar to just using one column in Excel. However, arrays can also be multi-dimensional.

In [19]:
# Generate a multi-dimensional array with 64 items using np.arange and .reshape
array = np.arange(64).reshape(8, 8)

In [20]:
array

array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47],
       [48, 49, 50, 51, 52, 53, 54, 55],
       [56, 57, 58, 59, 60, 61, 62, 63]])

Sometimes we want to extract a subset of data from an array. This process is called **slicing**.

In [21]:
# Get a slice of the rows
array[:5] # returns the first 5 rows (from index 0 to 4) and all columns

array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31],
       [32, 33, 34, 35, 36, 37, 38, 39]])

In [22]:
# Get a slice of the columns
array[:, :3] # returns all rows and the first three columns

array([[ 0,  1,  2],
       [ 8,  9, 10],
       [16, 17, 18],
       [24, 25, 26],
       [32, 33, 34],
       [40, 41, 42],
       [48, 49, 50],
       [56, 57, 58]])

In [23]:
# Get a slice of both the rows and the columns
array[1:3, 2:6] # returns the 2nd and 3rd rows and the 3rd through 6th columns

array([[10, 11, 12, 13],
       [18, 19, 20, 21]])

In [24]:
# Get a specific value
array[5] # the "value" of a multi-dimensional array is an array with one less dimension

array([40, 41, 42, 43, 44, 45, 46, 47])

**REMEMBER:** If you slice with colons, NumPy will give you an object back with the same number of dimensions as your original array. If you don't slice with colons, NumPy will give you back an object with one less dimension.

Here's a very good tutorial on NumPy: https://www.datacamp.com/community/tutorials/python-numpy-tutorial

## Pandas

Pandas is a high-performance, open source library for data analysis in Python developed by Wes McKinney in 2008. Over the years, it has become the de-facto standard library for data analysis using Python. There's been great adoption of the tool, a large community behind it, rapid iteration, features and enhancements continuously made.

- It can process a variety of data sets in different formats: time series, tabular heterogeneous, and matrix data.
- It facilitates loading/importing data from varied sources such as CSV and DB/SQL.
- It can handle a myriad of operations on data sets: subsetting, slicing,  ltering, merging, groupBy, re-ordering, and re-shaping.
- It can deal with missing data according to rules defined by the user/ developer: ignore, convert to 0, and so on.
- It can be used for parsing and munging (conversion) of data as well as modeling and statistical analysis.
- It integrates well with other Python libraries such as statsmodels, SciPy, and scikit-learn.

In [25]:
# import the pandas library
import pandas as pd

In [26]:
# create a pandas series

series = pd.Series(
    [0.25, 0.5, 0.75, 1.0], 
    index=['a', 'b', 'c', 'd']
)

In [27]:
series

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [28]:
# return a row from the series
series['a']

0.25

In [29]:
# change a value in the series
series['b'] = 2.0

In [30]:
series

a    0.25
b    2.00
c    0.75
d    1.00
dtype: float64

In [31]:
# turn a Python dictionary into a pandas series

population_dict = {
    'California': 38332521,
    'Texas': 26448193,
    'New York': 19651127,
    'Florida': 19552860,
    'Illinois': 12882135,
}

population = pd.Series(population_dict)

In [32]:
population

California    38332521
Florida       19552860
Illinois      12882135
New York      19651127
Texas         26448193
dtype: int64

In [33]:
# get the population for California
population['California']

38332521

In [34]:
# turn a Python dictionary into a pandas data frame

data = {
    'feature_one': [1, 2, 4, 8, -3],
    'feature_two': ['haight', 'mission', 'geary', 'castro', 'soma'],
    'feature_three': [True, True, False, True, False],
}

df = pd.DataFrame(data)

In [35]:
# take a look at the first few rows of the data frame
df.head() 

Unnamed: 0,feature_one,feature_three,feature_two
0,1,True,haight
1,2,True,mission
2,4,False,geary
3,8,True,castro
4,-3,False,soma


In [36]:
# return the colums of a dataframe
df.columns

Index([u'feature_one', u'feature_three', u'feature_two'], dtype='object')

In [37]:
# return the index of a dataframe
df.index

RangeIndex(start=0, stop=5, step=1)

In [38]:
# return the numpy array version of the data frame (also works on a series)
df.values

array([[1, True, 'haight'],
       [2, True, 'mission'],
       [4, False, 'geary'],
       [8, True, 'castro'],
       [-3, False, 'soma']], dtype=object)

In [39]:
# get the type of the data frame
type(df)

pandas.core.frame.DataFrame

In [40]:
# get the feature_one column
df['feature_one'] # you can also do df.feature_one

0    1
1    2
2    4
3    8
4   -3
Name: feature_one, dtype: int64

In [41]:
# get the type of the feature_one column
type(df.feature_one)

pandas.core.series.Series

In [42]:
# get multiple columns from the data frame
df[['feature_one', 'feature_two']]

Unnamed: 0,feature_one,feature_two
0,1,haight
1,2,mission
2,4,geary
3,8,castro
4,-3,soma


In [43]:
# add a new column to the data frame
df['new_feature'] = 5

In [44]:
df

Unnamed: 0,feature_one,feature_three,feature_two,new_feature
0,1,True,haight,5
1,2,True,mission,5
2,4,False,geary,5
3,8,True,castro,5
4,-3,False,soma,5


In [45]:
# you can also provide a series or list of values with the correct number of rows
df['new_new_column'] = [1, 2, 3, 4, 5]

In [46]:
df

Unnamed: 0,feature_one,feature_three,feature_two,new_feature,new_new_column
0,1,True,haight,5,1
1,2,True,mission,5,2
2,4,False,geary,5,3
3,8,True,castro,5,4
4,-3,False,soma,5,5


The first dataset we'll work with is the drinks dataset.

In [47]:
# here's the location of the drinks.csv file
path = '../../data/drinks.csv'

In [48]:
# read the drinks dataset into a pandas data frame
drinks = pd.read_csv(path)

In [49]:
# take a look at some of the data
drinks.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,AS
1,Albania,89,132,54,4.9,EU
2,Algeria,25,0,14,0.7,AF
3,Andorra,245,138,312,12.4,EU
4,Angola,217,57,45,5.9,AF


In [50]:
# let's make the country column the index using .set_index()
drinks.set_index('country', inplace=True) # we use 'inplace' so that a copy isn't made

In [51]:
# let's take another look at our data -- this time, let's look at the last few rows
drinks.tail()

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Venezuela,333,100,3,7.7,SA
Vietnam,111,2,1,2.0,AS
Yemen,6,0,0,0.1,AS
Zambia,32,19,4,2.5,AF
Zimbabwe,64,18,4,4.7,AF


In [52]:
# How many rows and columns are there in the dataset?
drinks.shape # output is (num_rows, num_columns)

(193, 5)

In [53]:
# let's take a look at some of the details of this dataset using .info()
drinks.info()

<class 'pandas.core.frame.DataFrame'>
Index: 193 entries, Afghanistan to Zimbabwe
Data columns (total 5 columns):
beer_servings                   193 non-null int64
spirit_servings                 193 non-null int64
wine_servings                   193 non-null int64
total_litres_of_pure_alcohol    193 non-null float64
continent                       170 non-null object
dtypes: float64(1), int64(3), object(1)
memory usage: 9.0+ KB


In [54]:
# we can also calculate some summary statistics
drinks.describe()

Unnamed: 0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol
count,193.0,193.0,193.0,193.0
mean,106.160622,80.994819,49.450777,4.717098
std,101.143103,88.284312,79.697598,3.773298
min,0.0,0.0,0.0,0.0
25%,20.0,4.0,1.0,1.3
50%,76.0,56.0,8.0,4.2
75%,188.0,128.0,59.0,7.2
max,376.0,438.0,370.0,14.4


## Slicing Data Frames

In [55]:
# select the values in the Peru row using .loc[]
drinks.loc['Peru']

beer_servings                   163
spirit_servings                 160
wine_servings                    21
total_litres_of_pure_alcohol    6.1
continent                        SA
Name: Peru, dtype: object

In [56]:
# select values in the wine_servings column
drinks.loc[:, 'wine_servings']

country
Afghanistan               0
Albania                  54
Algeria                  14
Andorra                 312
Angola                   45
Antigua & Barbuda        45
Argentina               221
Armenia                  11
Australia               212
Austria                 191
Azerbaijan                5
Bahamas                  51
Bahrain                   7
Bangladesh                0
Barbados                 36
Belarus                  42
Belgium                 212
Belize                    8
Benin                    13
Bhutan                    0
Bolivia                   8
Bosnia-Herzegovina        8
Botswana                 35
Brazil                   16
Brunei                    1
Bulgaria                 94
Burkina Faso              7
Burundi                   0
Cote d'Ivoire             7
Cabo Verde               16
                       ... 
Suriname                  7
Swaziland                 2
Sweden                  186
Switzerland             280
Syria       

In [65]:
# slice the data frame by rows and columns
drinks.loc['Germany':'Iceland', 'beer_servings':'spirit_servings']

Unnamed: 0_level_0,beer_servings,spirit_servings
country,Unnamed: 1_level_1,Unnamed: 2_level_1
Germany,346,117
Ghana,31,3
Greece,133,112
Grenada,199,438
Guatemala,53,69
Guinea,9,0
Guinea-Bissau,28,31
Guyana,93,302
Haiti,1,326
Honduras,69,98


In [66]:
# use .iloc[] to get the row at index 48
drinks.iloc[48]

beer_servings                    224
spirit_servings                   81
wine_servings                    278
total_litres_of_pure_alcohol    10.4
continent                         EU
Name: Denmark, dtype: object

In [67]:
# return the column at index 1
drinks.iloc[:, 1]

country
Afghanistan               0
Albania                 132
Algeria                   0
Andorra                 138
Angola                   57
Antigua & Barbuda       128
Argentina                25
Armenia                 179
Australia                72
Austria                  75
Azerbaijan               46
Bahamas                 176
Bahrain                  63
Bangladesh                0
Barbados                173
Belarus                 373
Belgium                  84
Belize                  114
Benin                     4
Bhutan                    0
Bolivia                  41
Bosnia-Herzegovina      173
Botswana                 35
Brazil                  145
Brunei                    2
Bulgaria                252
Burkina Faso              7
Burundi                   0
Cote d'Ivoire             1
Cabo Verde               56
                       ... 
Suriname                178
Swaziland                 2
Sweden                   60
Switzerland             100
Syria       

In [68]:
# return a slice of rows and columns
drinks.iloc[48:55, 1:3]

Unnamed: 0_level_0,spirit_servings,wine_servings
country,Unnamed: 1_level_1,Unnamed: 2_level_1
Denmark,81,278
Djibouti,44,3
Dominica,286,26
Dominican Republic,147,9
Ecuador,74,3
Egypt,4,1
El Salvador,69,2


## Conditional Selection

In [70]:
# check if the the row's continent is 'EU'
drinks.continent == 'EU'

country
Afghanistan             False
Albania                  True
Algeria                 False
Andorra                  True
Angola                  False
Antigua & Barbuda       False
Argentina               False
Armenia                  True
Australia               False
Austria                  True
Azerbaijan               True
Bahamas                 False
Bahrain                 False
Bangladesh              False
Barbados                False
Belarus                  True
Belgium                  True
Belize                  False
Benin                   False
Bhutan                  False
Bolivia                 False
Bosnia-Herzegovina       True
Botswana                False
Brazil                  False
Brunei                  False
Bulgaria                 True
Burkina Faso            False
Burundi                 False
Cote d'Ivoire           False
Cabo Verde              False
                        ...  
Suriname                False
Swaziland               False
Sw

In [71]:
# check if the row's wine_servings > 20
drinks.wine_servings > 20

country
Afghanistan             False
Albania                  True
Algeria                 False
Andorra                  True
Angola                   True
Antigua & Barbuda        True
Argentina                True
Armenia                 False
Australia                True
Austria                  True
Azerbaijan              False
Bahamas                  True
Bahrain                 False
Bangladesh              False
Barbados                 True
Belarus                  True
Belgium                  True
Belize                  False
Benin                   False
Bhutan                  False
Bolivia                 False
Bosnia-Herzegovina      False
Botswana                 True
Brazil                  False
Brunei                  False
Bulgaria                 True
Burkina Faso            False
Burundi                 False
Cote d'Ivoire           False
Cabo Verde              False
                        ...  
Suriname                False
Swaziland               False
Sw

Now we can take those commands and pass them into the drinks data frame to get a subset of the data.

In [72]:
# get the data for countries in Europe
drinks[drinks.continent == 'EU']

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Albania,89,132,54,4.9,EU
Andorra,245,138,312,12.4,EU
Armenia,21,179,11,3.8,EU
Austria,279,75,191,9.7,EU
Azerbaijan,21,46,5,1.3,EU
Belarus,142,373,42,14.4,EU
Belgium,295,84,212,10.5,EU
Bosnia-Herzegovina,76,173,8,4.6,EU
Bulgaria,231,252,94,10.3,EU
Croatia,230,87,254,10.2,EU


In [73]:
# get the data for non-European countries
drinks[drinks.continent != 'EU']

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Afghanistan,0,0,0,0.0,AS
Algeria,25,0,14,0.7,AF
Angola,217,57,45,5.9,AF
Antigua & Barbuda,102,128,45,4.9,
Argentina,193,25,221,8.3,SA
Australia,261,72,212,10.4,OC
Bahamas,122,176,51,6.3,
Bahrain,42,63,7,2.0,AS
Bangladesh,0,0,0,0.0,AS
Barbados,143,173,36,6.3,


In [74]:
# get the data for countries with more than 20 wine servings
drinks[drinks.wine_servings > 20]

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Albania,89,132,54,4.9,EU
Andorra,245,138,312,12.4,EU
Angola,217,57,45,5.9,AF
Antigua & Barbuda,102,128,45,4.9,
Argentina,193,25,221,8.3,SA
Australia,261,72,212,10.4,OC
Austria,279,75,191,9.7,EU
Bahamas,122,176,51,6.3,
Barbados,143,173,36,6.3,
Belarus,142,373,42,14.4,EU


In [75]:
# get the data for countries in Europe with more than 20 wine servings
drinks[(drinks.continent == 'EU') & (drinks.wine_servings > 20)]

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Albania,89,132,54,4.9,EU
Andorra,245,138,312,12.4,EU
Austria,279,75,191,9.7,EU
Belarus,142,373,42,14.4,EU
Belgium,295,84,212,10.5,EU
Bulgaria,231,252,94,10.3,EU
Croatia,230,87,254,10.2,EU
Cyprus,192,154,113,8.2,EU
Czech Republic,361,170,134,11.8,EU
Denmark,224,81,278,10.4,EU


In [76]:
# get the data for countries in Europe OR countries with more than 20 wine servings
drinks[(drinks.continent == 'EU') | (drinks.wine_servings > 20)]

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Albania,89,132,54,4.9,EU
Andorra,245,138,312,12.4,EU
Angola,217,57,45,5.9,AF
Antigua & Barbuda,102,128,45,4.9,
Argentina,193,25,221,8.3,SA
Armenia,21,179,11,3.8,EU
Australia,261,72,212,10.4,OC
Austria,279,75,191,9.7,EU
Azerbaijan,21,46,5,1.3,EU
Bahamas,122,176,51,6.3,


In [77]:
# call index to return just the names of the countries with more wine servings than beer servings
drinks[drinks.wine_servings > drinks.beer_servings].index

Index([u'Andorra', u'Argentina', u'Chile', u'Cook Islands', u'Croatia',
       u'Denmark', u'Equatorial Guinea', u'France', u'Georgia', u'Greece',
       u'Italy', u'Laos', u'Lebanon', u'Luxembourg', u'Montenegro',
       u'Portugal', u'Qatar', u'Sao Tome & Principe', u'Slovenia', u'Sweden',
       u'Switzerland', u'Syria', u'Timor-Leste', u'Turkmenistan', u'Tuvalu',
       u'Uruguay'],
      dtype='object', name=u'country')

Boolean values (True, False) are essentially encoded as 0 and 1 therefore, we can sum them.

In [78]:
# How many countries consume no beer at all?
(drinks.beer_servings == 0).sum()

15

## Pandas Series

In [79]:
# assign the beer_servings column to the variable 'beer'
beer = drinks.beer_servings

In [80]:
# we can do math operations, similar to numpy arrays
beer * 2

country
Afghanistan               0
Albania                 178
Algeria                  50
Andorra                 490
Angola                  434
Antigua & Barbuda       204
Argentina               386
Armenia                  42
Australia               522
Austria                 558
Azerbaijan               42
Bahamas                 244
Bahrain                  84
Bangladesh                0
Barbados                286
Belarus                 284
Belgium                 590
Belize                  526
Benin                    68
Bhutan                   46
Bolivia                 334
Bosnia-Herzegovina      152
Botswana                346
Brazil                  490
Brunei                   62
Bulgaria                462
Burkina Faso             50
Burundi                 176
Cote d'Ivoire            74
Cabo Verde              288
                       ... 
Suriname                256
Swaziland               180
Sweden                  304
Switzerland             370
Syria       

In [81]:
# What's the average servings of beer consumed?
beer.mean()

106.16062176165804

In [82]:
# What's the median servings of beer consumed?
beer.median()

76.0

In [83]:
# What's the total worldwide beer consumption?
beer.sum()

20489L

In [86]:
# we can add series together -- create a new column called 'total_servings'
drinks['total_servings'] = drinks.beer_servings + drinks.wine_servings + drinks.spirit_servings

In [87]:
drinks.head()

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent,total_servings
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Afghanistan,0,0,0,0.0,AS,0
Albania,89,132,54,4.9,EU,275
Algeria,25,0,14,0.7,AF,39
Andorra,245,138,312,12.4,EU,695
Angola,217,57,45,5.9,AF,319


In [94]:
# let's find out how many null values there are in continent
cont = drinks.continent
drinks.continent.isnull().sum()

23

In [97]:
# replace every null value with "No Continent" using .fillna()
cont.fillna('No Continent', inplace=True)

In [98]:
# How many unique continents are there? (use .nunique())
cont.nunique()

6

In [99]:
# How many countries are from each continent? (use .value_counts())
cont.value_counts()

AF              53
EU              45
AS              44
No Continent    23
OC              16
SA              12
Name: continent, dtype: int64

In [100]:
# What percentage of the data belongs to each continent?
cont.value_counts(normalize=True)

AF              0.274611
EU              0.233161
AS              0.227979
No Continent    0.119171
OC              0.082902
SA              0.062176
Name: continent, dtype: float64

In [101]:
# get the top 5 booziest countries (using .sort_values())
drinks.sort_values('total_servings', ascending=False)

Unnamed: 0_level_0,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent,total_servings
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Andorra,245,138,312,12.4,EU,695
Grenada,199,438,28,11.9,No Continent,665
Czech Republic,361,170,134,11.8,EU,665
France,127,151,370,11.8,EU,648
Russian Federation,247,326,73,11.5,AS,646
Lithuania,343,244,56,12.9,EU,643
Luxembourg,236,133,271,11.4,EU,640
Germany,346,117,175,11.3,EU,638
Hungary,234,215,185,11.3,EU,634
Poland,343,215,56,10.9,EU,614
