## __Data Pre-processing: Filling Null Values__

## Step 1: Import Required Libraries

Let's learn about filling the null values in pandas DataFrames. 

- Import the pandas and NumPy libraries


In [None]:
import pandas as pd
import numpy as np

## Step 2: Create a Sample Dataset with Null Values

- Create a dictionary containing data with null values
- Convert the dictionary to a DataFrame


In [None]:
data = {'a':[1,2,np.nan],'b':[1,np.nan,3],'c':[1,np.nan,np.nan]}
df = pd.DataFrame(data)

In [None]:
df

Unnamed: 0,a,b,c
0,1.0,1.0,1.0
1,2.0,,
2,,3.0,


**Observation:**

- There are a  lot of null values.

## Step 3: Fill the Missing Values Using a Predefined Value

We can fill the missing values with a predefined value.

- Use fillna() to fill null values

- Pass the value with which the null values are to be filled

In [None]:
df.fillna(value=3)

Unnamed: 0,a,b,c
0,1.0,1.0,1.0
1,2.0,3.0,3.0
2,3.0,3.0,3.0


**Observation:**

- Here, we can see that all the missing values are filled with the value 3, which we passed.

## Step 4: Fill the Missing Values Using Forward Fill and Backward Fill

We can fill the null values using forward fill and backward fill.

- In forward fill, the null values are replaced with the previous value in the column.

- In backward fill, the null values are replaced by the next value in the column.

- We use the **method = 'ffill'** parameter for forward fill and **method = 'bfill'** for backward fill. 

Let's try to fill the null values with forward fill.

In [None]:
df.fillna(method='ffill')

Unnamed: 0,a,b,c
0,1.0,1.0,1.0
1,2.0,1.0,1.0
2,2.0,3.0,1.0


**Observation:**

- We can see that the null values are replaced by the previous non-null values in the dataset. 

Now, let's try to fill the null values with backward fill.

In [None]:
df.fillna(method='bfill')

Unnamed: 0,a,b,c
0,1.0,1.0,1.0
1,2.0,3.0,
2,,3.0,


**Observation:**

- We can see that some null values are replaced by the next value. Hovever, some are still **NaN** since there are no values in the next cell. 

## Step 5: Using Statistics to Fill Null Values in a DataFrame

We can also fill the null values with statistical values like mean or median.

- Fill in the missing values using the mean of the column


In [None]:
df['a'].fillna(value=np.mean(df['a']))

0    1.0
1    2.0
2    1.5
Name: a, dtype: float64

**Observation:**

- The null values in the 'a' column are replaced by the mean.

## Step 6: Modify the DataFrame Inplace

- Fill the missing values with a specific value, and modify the DataFrame inplace


In [None]:
df.fillna(value=2,inplace=True)

In [None]:
df

Unnamed: 0,a,b,c
0,1.0,1.0,1.0
1,2.0,2.0,2.0
2,2.0,3.0,2.0


**Observation**

- The missing values in the DataFrame are replaced by the given value (2).