
Objectives

- Add a new column to a DataFrame with a chosen name
- Understand that operations work element-wise (no loops required)
- Rename an existing column

Content to cover

- df[“...”] = f(other column)
- df[“...”] = f(other columns), eg. df[“...”]  + df[“...”]
- df.rename


In [15]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [16]:
air_quality = pd.read_csv("../data/air_quality_no2.csv", 
                          index_col=0, parse_dates=True)
air_quality.head()

Unnamed: 0_level_0,station_antwerp,station_paris,station_london
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019-04-04 03:00:00,37.5,44.5,20.0
2019-04-04 04:00:00,31.5,44.8,20.0
2019-04-04 05:00:00,,35.7,19.0
2019-04-04 06:00:00,,53.6,19.0
2019-04-04 07:00:00,,63.2,32.0


## Add new column

![](../schemas/05_newcolumn_1.svg)

> I want to express the $NO_2$ concentration of the station in London in mg/m$^3$

(Assume temperature of 25 degrees Celsius and pressure of 1013 hPa. The molecular weight of $NO_2$ is 46.01 g/mol, resulting in 1 ppm $NO_2$ being equivalent to 1.882 mg/m$^3$)

In [19]:
air_quality["london_mg_per_cubic"] = air_quality["station_london"] * 1.882
air_quality.head()

Unnamed: 0_level_0,station_antwerp,station_paris,station_london,london_mg_per_cubic
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2019-04-04 03:00:00,37.5,44.5,20.0,37.64
2019-04-04 04:00:00,31.5,44.8,20.0,37.64
2019-04-04 05:00:00,,35.7,19.0,35.758
2019-04-04 06:00:00,,53.6,19.0,35.758
2019-04-04 07:00:00,,63.2,32.0,60.224


To create a new column, use the `[]` brackets with the new column name at the left side of the assignment. When familiar to Python dictionaries, the syntax will feel similar.

<div class="alert alert-info">
    
__Note__: The calculation of the values is done __element_wise__. This means all values in the given column are multiplied by the value 1.882 at once. You do not need to use a loop to iterate each of the rows!

</div>

![](../schemas/05_newcolumn_2.svg)

> I want to check the ratio of the values in Paris versus Antwerp and save the result in a new column

In [22]:
air_quality["ratio_paris_antwerp"] = air_quality["station_paris"] / air_quality["station_antwerp"]
air_quality.head()

Unnamed: 0_level_0,station_antwerp,station_paris,station_london,london_mg_per_cubic,ratio_paris_antwerp
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2019-04-04 03:00:00,37.5,44.5,20.0,37.64,1.186667
2019-04-04 04:00:00,31.5,44.8,20.0,37.64,1.422222
2019-04-04 05:00:00,,35.7,19.0,35.758,
2019-04-04 06:00:00,,53.6,19.0,35.758,
2019-04-04 07:00:00,,63.2,32.0,60.224,


The calculation is again element-wise, so the `/` is applied _for the values in each row_. Also other mathematical operators (+, -, *, /) or logical operators (<, >, =,...) work element wise. The latter was already used in the [subset data tutorial](./3_subset_data.ipynb) to filter rows of a table using a conditional expression.

> I want to add a new column with a given value

In [31]:
air_quality["ID"] = ["ID_" + str(value) for value in range(air_quality.shape[0])]

In [32]:
air_quality["ID"]

datetime
2019-04-04 03:00:00       ID_0
2019-04-04 04:00:00       ID_1
2019-04-04 05:00:00       ID_2
2019-04-04 06:00:00       ID_3
2019-04-04 07:00:00       ID_4
2019-04-04 08:00:00       ID_5
2019-04-04 09:00:00       ID_6
2019-04-04 10:00:00       ID_7
2019-04-04 11:00:00       ID_8
2019-04-04 12:00:00       ID_9
2019-04-04 13:00:00      ID_10
2019-04-04 14:00:00      ID_11
2019-04-04 15:00:00      ID_12
2019-04-04 16:00:00      ID_13
2019-04-04 17:00:00      ID_14
2019-04-04 18:00:00      ID_15
2019-04-05 01:00:00      ID_16
2019-04-05 03:00:00      ID_17
2019-04-05 05:00:00      ID_18
2019-04-05 06:00:00      ID_19
2019-04-05 07:00:00      ID_20
2019-04-05 08:00:00      ID_21
2019-04-05 09:00:00      ID_22
2019-04-05 13:00:00      ID_23
2019-04-05 14:00:00      ID_24
2019-04-05 15:00:00      ID_25
2019-04-05 16:00:00      ID_26
2019-04-05 17:00:00      ID_27
2019-04-05 18:00:00      ID_28
2019-04-05 19:00:00      ID_29
                        ...   
2019-06-17 23:00:00    ID_1785

## REMEMBER

- Create a new column by assigning the output to the DataFrame with a new column name 
- Operations are element-wise
- 