# Lecture 3: Pandas [`DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html) 2

* How to modify a `DataFrame`

## Setup

In [1]:
import pandas as pd

Let's continue with our fully indexed `DataFrame` from the previous JNB:

In [2]:
df = pd.DataFrame(data=list(zip(range(5), range(5,10), range(10, 15))),
                  index=list('abcde'),
                  columns=['col0', 'col1', 'col2'])
df

Unnamed: 0,col0,col1,col2
a,0,5,10
b,1,6,11
c,2,7,12
d,3,8,13
e,4,9,14


## Modifying `DataFrame`

We can use the same APIs to modify a `DataFrame` or insert new elements.

### Add a Column

New column with a single value for all rows:

In [3]:
df['col3'] = 0
df

Unnamed: 0,col0,col1,col2,col3
a,0,5,10,0
b,1,6,11,0
c,2,7,12,0
d,3,8,13,0
e,4,9,14,0


New column with different values for each:

In [4]:
df['col4'] = list('ghjkl')
df

Unnamed: 0,col0,col1,col2,col3,col4
a,0,5,10,0,g
b,1,6,11,0,h
c,2,7,12,0,j
d,3,8,13,0,k
e,4,9,14,0,l


New column using universal functions:

In [5]:
df['col5'] = df['col0'] / df['col2']
df

Unnamed: 0,col0,col1,col2,col3,col4,col5
a,0,5,10,0,g,0.0
b,1,6,11,0,h,0.090909
c,2,7,12,0,j,0.166667
d,3,8,13,0,k,0.230769
e,4,9,14,0,l,0.285714


### Add a Row

Add a row with a single value:

In [6]:
df.loc['f', 'col0'] = 15
df

Unnamed: 0,col0,col1,col2,col3,col4,col5
a,0.0,5.0,10.0,0.0,g,0.0
b,1.0,6.0,11.0,0.0,h,0.090909
c,2.0,7.0,12.0,0.0,j,0.166667
d,3.0,8.0,13.0,0.0,k,0.230769
e,4.0,9.0,14.0,0.0,l,0.285714
f,15.0,,,,,


All other columns become `NaN`.

Add a whole row:

In [7]:
df.loc['g'] = [16, 17, 18, 0, 'z', None]
df

Unnamed: 0,col0,col1,col2,col3,col4,col5
a,0.0,5.0,10.0,0.0,g,0.0
b,1.0,6.0,11.0,0.0,h,0.090909
c,2.0,7.0,12.0,0.0,j,0.166667
d,3.0,8.0,13.0,0.0,k,0.230769
e,4.0,9.0,14.0,0.0,l,0.285714
f,15.0,,,,,
g,16.0,17.0,18.0,0.0,z,


... or as a Series, which we create from a dictionary:

In [8]:
df.loc['h'] = pd.Series({'col0': 19,
                         'col1': 20,
                         'col2': 21,
                         'col3': 0,
                         'col4': 'x',
                         'col5': None})
df

Unnamed: 0,col0,col1,col2,col3,col4,col5
a,0.0,5.0,10.0,0.0,g,0.0
b,1.0,6.0,11.0,0.0,h,0.090909
c,2.0,7.0,12.0,0.0,j,0.166667
d,3.0,8.0,13.0,0.0,k,0.230769
e,4.0,9.0,14.0,0.0,l,0.285714
f,15.0,,,,,
g,16.0,17.0,18.0,0.0,z,
h,19.0,20.0,21.0,0.0,x,


### Modify Single Value

In [9]:
df.loc['a', 'col0'] = -1.0
df

Unnamed: 0,col0,col1,col2,col3,col4,col5
a,-1.0,5.0,10.0,0.0,g,0.0
b,1.0,6.0,11.0,0.0,h,0.090909
c,2.0,7.0,12.0,0.0,j,0.166667
d,3.0,8.0,13.0,0.0,k,0.230769
e,4.0,9.0,14.0,0.0,l,0.285714
f,15.0,,,,,
g,16.0,17.0,18.0,0.0,z,
h,19.0,20.0,21.0,0.0,x,


### Modify Column

One value:

In [10]:
df['col4'] = 1
df

Unnamed: 0,col0,col1,col2,col3,col4,col5
a,-1.0,5.0,10.0,0.0,1,0.0
b,1.0,6.0,11.0,0.0,1,0.090909
c,2.0,7.0,12.0,0.0,1,0.166667
d,3.0,8.0,13.0,0.0,1,0.230769
e,4.0,9.0,14.0,0.0,1,0.285714
f,15.0,,,,1,
g,16.0,17.0,18.0,0.0,1,
h,19.0,20.0,21.0,0.0,1,


With masking:

In [11]:
df.loc[df['col1'] < 7, 'col3'] = 3
df

Unnamed: 0,col0,col1,col2,col3,col4,col5
a,-1.0,5.0,10.0,3.0,1,0.0
b,1.0,6.0,11.0,3.0,1,0.090909
c,2.0,7.0,12.0,0.0,1,0.166667
d,3.0,8.0,13.0,0.0,1,0.230769
e,4.0,9.0,14.0,0.0,1,0.285714
f,15.0,,,,1,
g,16.0,17.0,18.0,0.0,1,
h,19.0,20.0,21.0,0.0,1,


Different values:

In [12]:
df['col4'] = range(8)
df

Unnamed: 0,col0,col1,col2,col3,col4,col5
a,-1.0,5.0,10.0,3.0,0,0.0
b,1.0,6.0,11.0,3.0,1,0.090909
c,2.0,7.0,12.0,0.0,2,0.166667
d,3.0,8.0,13.0,0.0,3,0.230769
e,4.0,9.0,14.0,0.0,4,0.285714
f,15.0,,,,5,
g,16.0,17.0,18.0,0.0,6,
h,19.0,20.0,21.0,0.0,7,


Drop column using the [`drop()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html) function:

In [13]:
df = df.drop(columns=['col4'])
df

Unnamed: 0,col0,col1,col2,col3,col5
a,-1.0,5.0,10.0,3.0,0.0
b,1.0,6.0,11.0,3.0,0.090909
c,2.0,7.0,12.0,0.0,0.166667
d,3.0,8.0,13.0,0.0,0.230769
e,4.0,9.0,14.0,0.0,0.285714
f,15.0,,,,
g,16.0,17.0,18.0,0.0,
h,19.0,20.0,21.0,0.0,


Columns as a function of other columns:

In [14]:
df['col5'] = df['col2'] / df['col0']
df

Unnamed: 0,col0,col1,col2,col3,col5
a,-1.0,5.0,10.0,3.0,-10.0
b,1.0,6.0,11.0,3.0,11.0
c,2.0,7.0,12.0,0.0,6.0
d,3.0,8.0,13.0,0.0,4.333333
e,4.0,9.0,14.0,0.0,3.5
f,15.0,,,,
g,16.0,17.0,18.0,0.0,1.125
h,19.0,20.0,21.0,0.0,1.105263


Rename column:
* [`DataFrame.rename()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html)
* Dictionary to map old to new column names.
* Alternatively also accepts a (lambda) function.
* `inplace=True` or replace (`df = df.rename(...)`).

In [15]:
df.rename(columns={'col5': 'ratio'},
          inplace=True)
df

Unnamed: 0,col0,col1,col2,col3,ratio
a,-1.0,5.0,10.0,3.0,-10.0
b,1.0,6.0,11.0,3.0,11.0
c,2.0,7.0,12.0,0.0,6.0
d,3.0,8.0,13.0,0.0,4.333333
e,4.0,9.0,14.0,0.0,3.5
f,15.0,,,,
g,16.0,17.0,18.0,0.0,1.125
h,19.0,20.0,21.0,0.0,1.105263


### Modify Row

One value:

In [16]:
df.loc['a'] = 1
df

Unnamed: 0,col0,col1,col2,col3,ratio
a,1.0,1.0,1.0,1.0,1.0
b,1.0,6.0,11.0,3.0,11.0
c,2.0,7.0,12.0,0.0,6.0
d,3.0,8.0,13.0,0.0,4.333333
e,4.0,9.0,14.0,0.0,3.5
f,15.0,,,,
g,16.0,17.0,18.0,0.0,1.125
h,19.0,20.0,21.0,0.0,1.105263


Different values:

In [17]:
df.loc['b'] = (1, 5, 10, 'x', True)
df

Unnamed: 0,col0,col1,col2,col3,ratio
a,1.0,1.0,1.0,1,1
b,1.0,5.0,10.0,x,True
c,2.0,7.0,12.0,0,6
d,3.0,8.0,13.0,0,4.33333
e,4.0,9.0,14.0,0,3.5
f,15.0,,,,
g,16.0,17.0,18.0,0,1.125
h,19.0,20.0,21.0,0,1.10526


Drop row:

In [18]:
df = df.drop(index=['g', 'h'])
df

Unnamed: 0,col0,col1,col2,col3,ratio
a,1.0,1.0,1.0,1,1
b,1.0,5.0,10.0,x,True
c,2.0,7.0,12.0,0,6
d,3.0,8.0,13.0,0,4.33333
e,4.0,9.0,14.0,0,3.5
f,15.0,,,,


© 2023 Philipp Cornelius