# Handle Missing Data
In data analysis, missing data can be a common occurrence, and handling it properly is important to ensure accurate analysis. Pandas is a powerful library in Python that provides several methods for handling missing data. One of the methods to handle missing data is by using the replace() method.

The replace() method in Pandas can be used to replace a specific value with another value, including replacing missing or null values.

In [1]:
import pandas as pd

In [2]:
df1 = pd.read_csv("weather_data2.csv")
df1

Unnamed: 0,day,temperature,windspeed,event
0,01-01-2017,32F,6 mph,Rain
1,01-02-2017,-99999,7 mph,Sunny
2,01-03-2017,28,-99999,Snow
3,01-04-2017,-99999,7,No Event
4,01-05-2017,32,-88888,Rain
5,01-06-2017,31,2,Sunny
6,01-06-2017,34,5,0


## 01. Use replace() Method to Replace Values
Pandas provides the replace() method to replace values in a DataFrame. The replace() method can be used to replace a single value or multiple values at once.

In [3]:
# Replacing single value
df2 = df1.replace(-99999, "NaN")
df2

Unnamed: 0,day,temperature,windspeed,event
0,01-01-2017,32F,6 mph,Rain
1,01-02-2017,-99999,7 mph,Sunny
2,01-03-2017,28,-99999,Snow
3,01-04-2017,-99999,7,No Event
4,01-05-2017,32,-88888,Rain
5,01-06-2017,31,2,Sunny
6,01-06-2017,34,5,0


In [4]:
# Replacing multiple values
df3 = df1.replace([-99999, -88888], "NaN")
df3

Unnamed: 0,day,temperature,windspeed,event
0,01-01-2017,32F,6 mph,Rain
1,01-02-2017,-99999,7 mph,Sunny
2,01-03-2017,28,-99999,Snow
3,01-04-2017,-99999,7,No Event
4,01-05-2017,32,-88888,Rain
5,01-06-2017,31,2,Sunny
6,01-06-2017,34,5,0


In [5]:
# Replacing values based on specific columns
df4 = df1.replace({
    "temperature": -99999,
    "windspeed": [-99999, -88888],
    "event": "0"
}, "NaN")
df4

Unnamed: 0,day,temperature,windspeed,event
0,01-01-2017,32F,6 mph,Rain
1,01-02-2017,-99999,7 mph,Sunny
2,01-03-2017,28,-99999,Snow
3,01-04-2017,-99999,7,No Event
4,01-05-2017,32,-88888,Rain
5,01-06-2017,31,2,Sunny
6,01-06-2017,34,5,


In [6]:
# Mapping specific values
df5 = df1.replace({
    -99999: "NaN",
    "No Event": "Sunny"
})
df5

Unnamed: 0,day,temperature,windspeed,event
0,01-01-2017,32F,6 mph,Rain
1,01-02-2017,-99999,7 mph,Sunny
2,01-03-2017,28,-99999,Snow
3,01-04-2017,-99999,7,Sunny
4,01-05-2017,32,-88888,Rain
5,01-06-2017,31,2,Sunny
6,01-06-2017,34,5,0


## 02. regex() (Regular Expression) Parameter
The regex parameter is a boolean parameter used in Pandas to determine whether to interpret the to_replace parameter in the replace() method as a regular expression. When regex is set to True, to_replace is treated as a regular expression pattern, and any matches are replaced with the value parameter.

Regular expressions are a powerful tool for pattern matching in text data. They allow us to search for patterns in text, and can be used to extract specific parts of a string or replace certain parts of a string with another value.

In the context of Pandas, using regular expressions can be very useful for cleaning and transforming data. For example, we could use regular expressions to extract email addresses from a column of text data or to replace certain characters with others.

In [7]:
# Using regex
df6 = df1.replace("[A-Za-z]", "", regex=True)
df6

Unnamed: 0,day,temperature,windspeed,event
0,01-01-2017,32,6,
1,01-02-2017,-99999,7,
2,01-03-2017,28,-99999,
3,01-04-2017,-99999,7,
4,01-05-2017,32,-88888,
5,01-06-2017,31,2,
6,01-06-2017,34,5,0.0


In [8]:
# Using regex based on specific columns
df7 = df1.replace({
    "temperature": "[A-Za-z]",
    "windspeed": "[A-Za-z]"
}, "", regex=True)
df7

Unnamed: 0,day,temperature,windspeed,event
0,01-01-2017,32,6,Rain
1,01-02-2017,-99999,7,Sunny
2,01-03-2017,28,-99999,Snow
3,01-04-2017,-99999,7,No Event
4,01-05-2017,32,-88888,Rain
5,01-06-2017,31,2,Sunny
6,01-06-2017,34,5,0


## 03. Replace the List of Values with Another List of Values
In Pandas, we can use the replace() method to replace a list of values in a DataFrame column with another list of values. This can be useful when we want to replace multiple values with a single value or when we want to replace values with different values depending on their original value.

In [9]:
# Creatin a new dataframe
df = pd.DataFrame({
    "score": ["exceptional", "average", "good", "poor", "average", "exceptional"],
    "student": ["rob", "maya", "parthiv", "tom", "julian", "erica"]
})
df

Unnamed: 0,score,student
0,exceptional,rob
1,average,maya
2,good,parthiv
3,poor,tom
4,average,julian
5,exceptional,erica


In [10]:
new_df = df.replace(["poor", "average", "good", "exceptional"], [1, 2, 3, 4])
new_df

Unnamed: 0,score,student
0,4,rob
1,2,maya
2,3,parthiv
3,1,tom
4,2,julian
5,4,erica
