# Handle Missing Data
Handling missing data is an important part of data analysis, and Pandas provides a number of methods for dealing with missing values. In this notebook, we will cover some common techniques for handling missing data using Pandas.

In [1]:
import pandas as pd

In [2]:
# Creating a dataframe from CSV
df1 = pd.read_csv("weather_data.csv")
df1

Unnamed: 0,day,temperature,windspeed,event
0,01-01-2020,32.0,6.0,Rain
1,01-02-2020,,7.0,Sunny
2,01-03-2020,28.0,,
3,01-04-2020,24.0,7.0,Snow
4,01-05-2020,,4.0,Rain
5,01-06-2020,32.0,,Sunny


## 01. Convert String Column into Date Type
In Pandas, it is common to work with data that includes dates. However, sometimes the dates are stored as strings, which makes it difficult to perform any operations on them. In this case, it is necessary to convert the string column into a date type.

Pandas provides the to_datetime() method for converting a string column into a date type. This method is very powerful and flexible, allowing you to convert many different string formats into dates.

In [3]:
# Print the datatype of values in 'day' column
type(df1.day[0])

str

In [4]:
df2 = pd.read_csv("weather_data.csv", parse_dates=["day"])
df2

Unnamed: 0,day,temperature,windspeed,event
0,2020-01-01,32.0,6.0,Rain
1,2020-01-02,,7.0,Sunny
2,2020-01-03,28.0,,
3,2020-01-04,24.0,7.0,Snow
4,2020-01-05,,4.0,Rain
5,2020-01-06,32.0,,Sunny


In [5]:
# Print the datatype of values in 'day' column
type(df2.day[0])

pandas._libs.tslibs.timestamps.Timestamp

## 02. Use Date as Index of DataFrame
To use a date column as the index of a DataFrame, we can use the set_index() method of the DataFrame object, and pass the name of the date column as an argument. The set_index() method will return a new DataFrame with the specified column as the index.

The inplace=True argument is used to modify the DataFrame in place, rather than creating a new one.

In [6]:
df3 = pd.read_csv("weather_data.csv", parse_dates=["day"])
df3.set_index("day", inplace=True)
df3

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-01,32.0,6.0,Rain
2020-01-02,,7.0,Sunny
2020-01-03,28.0,,
2020-01-04,24.0,7.0,Snow
2020-01-05,,4.0,Rain
2020-01-06,32.0,,Sunny


## 03. Use fillna() Method
In Pandas, fillna() is a method used to fill missing or null values in a DataFrame with a specified value or technique. This method can be used to clean up the data before further processing.

In [7]:
df4 = df3.fillna(0)
df4

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-01,32.0,6.0,Rain
2020-01-02,0.0,7.0,Sunny
2020-01-03,28.0,0.0,0
2020-01-04,24.0,7.0,Snow
2020-01-05,0.0,4.0,Rain
2020-01-06,32.0,0.0,Sunny


In [8]:
df5 = df3.fillna({
    "temperature": 0,
    "windspeed": 0,
    "event": "no event"
})
df5

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-01,32.0,6.0,Rain
2020-01-02,0.0,7.0,Sunny
2020-01-03,28.0,0.0,no event
2020-01-04,24.0,7.0,Snow
2020-01-05,0.0,4.0,Rain
2020-01-06,32.0,0.0,Sunny


## 04. Use fillna(method="ffill"/"bfill") Method
The fillna() method in pandas is used to fill the missing or NaN values in a DataFrame with a specified value or method. The method parameter can be used to fill the missing values using forward or backward filling method. When method='ffill', it fills the missing values with the previous non-missing value along each column. When method='bfill', it fills the missing values with the next non-missing value along each column.

In [11]:
# Using fillna(method="ffill") Method
df6 = df3.fillna(method="ffill")
df6

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-01,32.0,6.0,Rain
2020-01-02,32.0,7.0,Sunny
2020-01-03,28.0,7.0,Sunny
2020-01-04,24.0,7.0,Snow
2020-01-05,24.0,4.0,Rain
2020-01-06,32.0,4.0,Sunny


In [12]:
# Using fillna(method="bfill") Method
df7 = df3.fillna(method="bfill")
df7

Unnamed: 0_level_0,temperature,windspeed,event
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-01,32.0,6.0,Rain
2020-01-02,28.0,7.0,Sunny
2020-01-03,28.0,7.0,Snow
2020-01-04,24.0,7.0,Snow
2020-01-05,32.0,4.0,Rain
2020-01-06,32.0,,Sunny
