# Filling records that have missing values with `fillna`

`DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)`

value:
- A value that is used to fill the null values
- Can be scalar, dict, series, DataFrame

method:
- pad / ffill (front fill) : use last value as next value (top --> bottom) <br>
- bfill (back fill) : use value in front as a value backwards (bottom --> top) <br>


In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("./datasets/df_na.csv")

df

Unnamed: 0,Name,Score,Grades
0,Paul,98.0,
1,Aaron,,AB
2,Krista,99.0,AA
3,Veronica,87.0,
4,Paxton,90.0,AC
5,Madison,,BA
6,Aurora,82.0,BB


## 1. Fill all NaN values with zeros

In [3]:
df.fillna(0)

Unnamed: 0,Name,Score,Grades
0,Paul,98.0,0
1,Aaron,0.0,AB
2,Krista,99.0,AA
3,Veronica,87.0,0
4,Paxton,90.0,AC
5,Madison,0.0,BA
6,Aurora,82.0,BB


## 2. Fill all NaN values with mean value

In [6]:
df['Score'].fillna(df['Score'].mean())

0    98.0
1    91.2
2    99.0
3    87.0
4    90.0
5    91.2
6    82.0
Name: Score, dtype: float64

In [19]:
df_cities = pd.read_csv("./datasets/cities_temp.csv", na_values="")

df_cities

Unnamed: 0,City,State (Abbrev),Temp (C)
0,Anaheim,CA,32.0
1,Ann Arbor,MI,
2,Baltimore,MD,
3,Savannah,GA,24.0
4,Davenport,IA,
5,Baton Rouge,LA,28.0


In [21]:
# df_cities['Temp (C)'] = df_cities['Temp (C)'].fillna(df_cities['Temp (C)'].mean())
df_cities['Temp (C)'].fillna(df_cities['Temp (C)'].mean())

# df_cities

0    32.0
1    28.0
2    28.0
3    24.0
4    28.0
5    28.0
Name: Temp (C), dtype: float64

In [22]:
df.fillna({'Score' : 95, 'Grades' : 'AA'})

Unnamed: 0,Name,Score,Grades
0,Paul,98.0,AA
1,Aaron,95.0,AB
2,Krista,99.0,AA
3,Veronica,87.0,AA
4,Paxton,90.0,AC
5,Madison,95.0,BA
6,Aurora,82.0,BB


## 3. Front fill (top bottom)

In [23]:
df_cities['Temp (C)'].fillna(method='ffill')

0    32.0
1    32.0
2    32.0
3    24.0
4    24.0
5    28.0
Name: Temp (C), dtype: float64

## 4. Back fill (bottom top)

In [26]:
df_cities['Temp (C)'].fillna(method='bfill')

0    32.0
1    24.0
2    24.0
3    24.0
4    28.0
5    28.0
Name: Temp (C), dtype: float64

## 5. Axis

In [27]:
df_cities

Unnamed: 0,City,State (Abbrev),Temp (C)
0,Anaheim,CA,32.0
1,Ann Arbor,MI,
2,Baltimore,MD,
3,Savannah,GA,24.0
4,Davenport,IA,
5,Baton Rouge,LA,28.0


Fill values by rows

In [28]:
df_cities['Temp (C)'].fillna(method='ffill', axis=0)

0    32.0
1    32.0
2    32.0
3    24.0
4    24.0
5    28.0
Name: Temp (C), dtype: float64

Fill values by rows (bottom top)

In [32]:
df_cities['Temp (C)'].fillna(method='bfill', axis=0)

0    32.0
1    24.0
2    24.0
3    24.0
4    28.0
5    28.0
Name: Temp (C), dtype: float64

Fill values by columns (left to right)

In [30]:
df_cities.fillna(method='ffill', axis=1)

Unnamed: 0,City,State (Abbrev),Temp (C)
0,Anaheim,CA,32.0
1,Ann Arbor,MI,MI
2,Baltimore,MD,MD
3,Savannah,GA,24.0
4,Davenport,IA,IA
5,Baton Rouge,LA,28.0


Fill values by columns (right to left)

In [31]:
df.fillna(method='bfill', axis=1)

Unnamed: 0,Name,Score,Grades
0,Paul,98.0,
1,Aaron,AB,AB
2,Krista,99.0,AA
3,Veronica,87.0,
4,Paxton,90.0,AC
5,Madison,BA,BA
6,Aurora,82.0,BB


## 6. Inplace

Make changes to the dataframe permanently

In [33]:
df_cities['Temp (C)'].fillna(method='ffill', inplace=True)

In [34]:
df_cities

Unnamed: 0,City,State (Abbrev),Temp (C)
0,Anaheim,CA,32.0
1,Ann Arbor,MI,32.0
2,Baltimore,MD,32.0
3,Savannah,GA,24.0
4,Davenport,IA,24.0
5,Baton Rouge,LA,28.0


## 7. Limit

limiting the number of missing values to be filled from top to bottom

In [35]:
df_cities_2 = pd.read_csv("./datasets/cities_temp.csv")

In [36]:
df_cities_2

Unnamed: 0,City,State (Abbrev),Temp (C)
0,Anaheim,CA,32.0
1,Ann Arbor,MI,
2,Baltimore,MD,
3,Savannah,GA,24.0
4,Davenport,IA,
5,Baton Rouge,LA,28.0


Fill the first NaN element only

In [37]:
df_cities_2.fillna(method='ffill', limit=1)

Unnamed: 0,City,State (Abbrev),Temp (C)
0,Anaheim,CA,32.0
1,Ann Arbor,MI,32.0
2,Baltimore,MD,
3,Savannah,GA,24.0
4,Davenport,IA,24.0
5,Baton Rouge,LA,28.0


## 8. Downcast

- takes a dict or string 'infer' <br>
- copies the values from previous state to the next

In [38]:
df_down = pd.DataFrame({'a' : [1, None]})

df_down

Unnamed: 0,a
0,1.0
1,


In [40]:
df_down_1 = df_down.fillna(0, downcast='infer')

df_down_1

Unnamed: 0,a
0,1
1,0


In [41]:
df_down_2 = df_down.fillna({'a' : 0}, downcast='infer')

df_down_2

Unnamed: 0,a
0,1
1,0
