## Exercise 1.06: Indexing, Slicing, and Iterating Using pandas

In order to get some good and understandable insights into our dataset, we need to be able to explicitly index, slice and iterate our data to e.g. compare several countries in terms of population density growth.   

After looking at the distinct operations we want to display the countries Germany, Singapore, United States, and India with their population density of years 1970, 1990, 2010.

#### Loading the dataset

In [None]:
# importing the necessary dependencies
import pandas as pd

In [None]:
# loading the Dataset
dataset = pd.read_csv('../../Datasets/world_population.csv', index_col=0)

In [None]:
# looking at the first 2 elements of the dataset
dataset.head(2)

---

#### Indexing

Since we need several rows and columns of our dataset to complete the given task, we have to use indexing to get the right rows and columns.   
Use indexing to get: 
- the row of the USA
- the second to last row
- the column of year 2000 as Series
- the population density for India in 2000

In [None]:
# indexing the USA row
dataset.loc[["United States"]].head()

In [None]:
# indexing the last second to last row by index
dataset.iloc[[-2]]

In [None]:
# indexing the column of 2000 as a Series
dataset["2000"].head()

In [None]:
# indexing the population density of India in 2000 (Dataframe)
dataset[["2000"]].loc[["India"]]

**Note:**   
Using sinlge brackets to index columns (like with NumPy) we will get a pandas Series object.   
When using double brackets to do indexing, a DataFrame will be returned. This way we can also index several elements with one query. 

When comparing the output of the DataFrame query to the Series query, we can see the difference between Series and DataFrames

In [None]:
# indexing the population density of India in 2000 (Series)
dataset["2000"].loc["India"]

---

#### Slicing

Other than the single rows and columns and we also need to get some Subsets of the dataset.   
Use slicing for:
- the countries in row 2 to 5
- countries Germany, Singapore, United States, and India
- Germany, Singapore, United States, and India with their population density of years 1970, 1990, 2010

In [None]:
# slicing countries of rows 2 to 5
dataset.iloc[1:5]

In [None]:
# slicing rows Germany, Singapore, United States, and India 
dataset.loc[["Germany", "Singapore", "United States", "India"]]

In [None]:
# slicing a subset of Germany, Singapore, United States, and India 
# for years 1970, 1990, 2010 <
country_list = ["Germany", "Singapore", "United States", "India"]

dataset.loc[country_list][["1970", "1990", "2010"]]

---

#### Iterating

As the last task of this exercise, we want to iterate over the first three countries of our dataset and print:   
- name
- country code 
- years 1970, 1990, 2010 

In [None]:
# iterating over the first three countries (row by row)
for index, row in dataset.iterrows():
    # only printing the rows until Angola
    if index == 'Angola':
        break
    
    print(index, '\n', row[["Country Code", "1970", "1990", "2010"]], '\n')

**Note:**   
Iterrows returns a Series for each row. This means that it does not preserve data types across the row.   
If you should need to preserve the dtypes of the columns, use the `itertuples()` method.