# Selection

## Import pandas

In [9]:
import pandas as pd

## Import data

In [10]:
# URL of data
URL = "https://raw.githubusercontent.com/kirenz/datasets/master/height_clean_cols.csv"

In [11]:
df = pd.read_csv(URL)

df["gender"] = df["gender"].astype("category")
df['id'] = df['id'].astype(str)

## Selection

### Getting [[]]

Selecting a single column with `[[]]`:

- Select the column `height` and save it as a new Pandas dataframe `df_height`

Hint:

```python
df_height = df[[___]]
````

In [27]:
### BEGIN SOLUTION
df_height = df[["height"]]
### END SOLUTION

In [33]:
"""Check if your code returns the correct output"""
assert  df_height.columns.to_list() == ['height']

Selecting via [], which slices the rows (endpoint is not included) and includes all columns:

In [7]:
df[0:1]

Unnamed: 0,name,id,height,average_height_parents,gender,number,height_m,weight,bmi,date
0,Stefanie,1,162,161.5,female,42,1.62,84.58,32.23,2022-10-08


- Select rows 1, 2 and 3 and save it as `df_sr`

In [34]:
### BEGIN SOLUTION
df_sr = df[1:4]
### END SOLUTION

In [40]:
"""Check if your code returns the correct output"""
assert df_sr.iloc[0,0] == 'Peter'
assert df_sr.iloc[2,0] == 'Manuela'
assert len(df_sr) == 3

### By label .loc

The `.loc` (location) attribute is the primary access method. The following are valid inputs:

For getting a cross section using a label:

In [8]:
# select row 0
df.loc[[0]]

Unnamed: 0,name,id,height,average_height_parents,gender,number,height_m,weight,bmi,date
0,Stefanie,1,162,161.5,female,42,1.62,84.58,32.23,2022-10-08


Selecting on a multi-axis by label:

In [42]:
# only select location at row 0 for column "name"
df.loc[0 , 'name']

'Stefanie'

In [43]:
# select row 2 to 4 for column "name" (when using .loc endpoints are included)
df.loc[2:4 , 'name']

2    Stefanie
3     Manuela
4       Simon
Name: name, dtype: object

In [44]:
# select row 2 to 4 for columns "name" and "height" (when using .loc endpoints are included)
df.loc[2:4 , ['name', 'height']]

Unnamed: 0,name,height
2,Stefanie,163
3,Manuela,164
4,Simon,164


In [25]:
# select all rows for 
df.loc[ : , ["name", "height"]]

Unnamed: 0,name,height
0,Stefanie,162
1,Peter,163
2,Stefanie,163
3,Manuela,164
4,Simon,164
5,Sophia,164
6,Ellen,164
7,Emilia,165
8,Lina,165
9,Marie,165


- Select row 0 and 1 for columns 'name' and 'height' and save it as `df_loc1`

In [46]:
### BEGIN SOLUTION
df_loc1 = df.loc[0:1, ["name", "height"]]
### END SOLUTION

In [47]:
"""Check if your code returns the correct output"""
assert  df_loc1.columns.to_list() == ['name', 'height']
assert  len(df_loc1) == 2
assert  df_loc1.iloc[0,0] == "Stefanie"


Reduction in the dimensions of the returned object:

In [27]:
df.loc[0, ["name", "height"]]

name      Stefanie
height         162
Name: 0, dtype: object

For getting a scalar value:

In [28]:
df.loc[[0], "height"]

0    162
Name: height, dtype: int64

### By position .iloc

Pandas provides a suite of methods in order to get purely integer based indexing. 

Here, the `.iloc` attribute is the primary access method. 

*When using `.iloc`, endpoints are not included.*

In [29]:
df.iloc[0]

name                        Stefanie
id                                 1
height                           162
average_height_parents         161.5
gender                        female
number                            42
height_m                        1.62
weight                         84.58
bmi                            32.23
date                      2022-10-08
Name: 0, dtype: object

By integer slices:

In [49]:
df.iloc[0:2, 0:2]

Unnamed: 0,name,id
0,Stefanie,1
1,Peter,2


By lists of integer position locations:

In [50]:
df.iloc[[0, 2], [0, 2]]

Unnamed: 0,name,height
0,Stefanie,162
2,Stefanie,163


For slicing rows explicitly:

In [51]:
df.iloc[1:3, :]

Unnamed: 0,name,id,height,average_height_parents,gender,number,height_m,weight,bmi,date
1,Peter,2,163,163.5,male,42,1.63,70.57,26.56,2022-10-08
2,Stefanie,3,163,163.2,female,42,1.63,75.48,28.41,2022-10-08


For slicing columns explicitly:

In [52]:
df.iloc[:, 1:3]

Unnamed: 0,id,height
0,1,162
1,2,163
2,3,163
3,4,164
4,5,164
5,6,164
6,7,164
7,8,165
8,9,165
9,10,165


For getting a value explicitly:

In [53]:
df.iloc[0, 0]

'Stefanie'

Use `.iloc` to obtain rows 4 to 6 and columns 3 and 4. Save the result as `df_iloc1` 

In [56]:
### BEGIN SOLUTION
df_iloc1 = df.iloc[4:7, 3:5]
### END SOLUTION

In [58]:
"""Check if your code returns the correct output"""
assert  df_iloc1.columns.to_list() == ['average_height_parents', 'gender']
assert  len(df_iloc1) == 3
assert  df_iloc1.iloc[0,1] == "male"