_Exercises taken from Pandas for Everyone_

The goal of this notebook is to introduce Pandas using the [Gapminder data](https://www.gapminder.org/data/)

A copy of the data is available in this [code repo](./data/gapminder.tsv)

1- Load the dataset in the notebook

In [1]:
import pandas as pd

In [3]:
df = pd.read_csv('./data/gapminder.tsv', sep='\t')

In [4]:
df.head()

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap
0,Afghanistan,Asia,1952,28.801,8425333,779.445314
1,Afghanistan,Asia,1957,30.332,9240934,820.85303
2,Afghanistan,Asia,1962,31.997,10267083,853.10071
3,Afghanistan,Asia,1967,34.02,11537966,836.197138
4,Afghanistan,Asia,1972,36.088,13079460,739.981106


2- How many rows and cols does the data contain?

In [5]:
df.columns

Index(['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap'], dtype='object')

In [6]:
df.index

RangeIndex(start=0, stop=1704, step=1)

In [7]:
len(df.columns)

6

In [8]:
len(df.index)

1704

In [10]:
#more information on the dataframe
df.dtypes

country       object
continent     object
year           int64
lifeExp      float64
pop            int64
gdpPercap    float64
dtype: object

In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1704 entries, 0 to 1703
Data columns (total 6 columns):
country      1704 non-null object
continent    1704 non-null object
year         1704 non-null int64
lifeExp      1704 non-null float64
pop          1704 non-null int64
gdpPercap    1704 non-null float64
dtypes: float64(2), int64(2), object(2)
memory usage: 80.0+ KB


3- Select the country column from the dataset

In [13]:
subset = df['country']

In [14]:
subset.head()

0    Afghanistan
1    Afghanistan
2    Afghanistan
3    Afghanistan
4    Afghanistan
Name: country, dtype: object

4- select country and continent

In [15]:
subset = df[['country', 'continent']]

In [16]:
subset.head()

Unnamed: 0,country,continent
0,Afghanistan,Asia
1,Afghanistan,Asia
2,Afghanistan,Asia
3,Afghanistan,Asia
4,Afghanistan,Asia


5- select row 1

In [17]:
subset = df.loc[1]

In [18]:
subset

country      Afghanistan
continent           Asia
year                1957
lifeExp           30.332
pop              9240934
gdpPercap        820.853
Name: 1, dtype: object

In [19]:
subset['country']

'Afghanistan'

6- select rows 1 to 4

In [22]:
subset = df.loc[[1,2,3,4]]

In [23]:
subset

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap
1,Afghanistan,Asia,1957,30.332,9240934,820.85303
2,Afghanistan,Asia,1962,31.997,10267083,853.10071
3,Afghanistan,Asia,1967,34.02,11537966,836.197138
4,Afghanistan,Asia,1972,36.088,13079460,739.981106


In [24]:
#you can use loc to subset columns as well
subset = df.loc[:, ['country', 'continent']]

In [25]:
subset.head()

Unnamed: 0,country,continent
0,Afghanistan,Asia
1,Afghanistan,Asia
2,Afghanistan,Asia
3,Afghanistan,Asia
4,Afghanistan,Asia


7- select rows pertaining to Albania

In [26]:
subset = df[df.country == "Albania"]

In [30]:
subset

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap
12,Albania,Europe,1952,55.23,1282697,1601.056136
13,Albania,Europe,1957,59.28,1476505,1942.284244
14,Albania,Europe,1962,64.82,1728137,2312.888958
15,Albania,Europe,1967,66.22,1984060,2760.196931
16,Albania,Europe,1972,67.69,2263554,3313.422188
17,Albania,Europe,1977,68.93,2509048,3533.00391
18,Albania,Europe,1982,70.42,2780097,3630.880722
19,Albania,Europe,1987,72.0,3075321,3738.932735
20,Albania,Europe,1992,71.581,3326498,2497.437901
21,Albania,Europe,1997,72.95,3428038,3193.054604


8- select rows pertaining to Albania in the 21th century

In [32]:
subset = df[df.country == "Albania"][df.year >= 2000]

  """Entry point for launching an IPython kernel.


In [33]:
subset

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap
22,Albania,Europe,2002,75.651,3508512,4604.211737
23,Albania,Europe,2007,76.423,3600523,5937.029526
