# Lec 10 : Pandas Tutorial for Handling Missing Data (Replace and interpolate)

![image.png](attachment:e2f4fe34-5bf0-4b46-869d-aa5b93e2265c.png)

It looks like you are working with the `replace` method in pandas to replace values in a DataFrame. However, I noticed a couple of things:

1. In the code snippet, the `replace` method is called, but the DataFrame is not assigned to a variable, and the changes are not stored.

2. The `replace` method with the `to_replace` parameter is not used in the correct way. The correct usage would be to assign the result back to the DataFrame or another variable.

Here's a modified version of your code with corrections:

```python
import pandas as pd

# Reading the CSV file into a DataFrame
df = pd.read_csv("movies.csv")

# Displaying the original DataFrame
print("Original DataFrame:")
print(df)

# Replace specific value "PK" with "Praveen Kumar Singh"
df.replace(to_replace="PK", value="Praveen Kumar Singh", inplace=True)

# Displaying the DataFrame after the first replacement
print("\nDataFrame after replacing 'PK' with 'Praveen Kumar Singh':")
print(df)

# Replace all uppercase letters with "Python"
df.replace("[A-Z]", "Python", regex=True, inplace=True)

# Displaying the DataFrame after replacing uppercase letters with "Python"
print("\nDataFrame after replacing uppercase letters with 'Python':")
print(df)

# Replace all lowercase letters with "Python"
df.replace("[a-z]", "Python", regex=True, inplace=True)

# Displaying the DataFrame after replacing lowercase letters with 'Python'
print("\nDataFrame after replacing lowercase letters with 'Python':")
print(df)

# Replace values in the "Movie Title" column matching uppercase letters with 143
df['Movie Title'].replace('[A-Z]', 143, regex=True, inplace=True)

# Displaying the DataFrame after the last replacement
print("\nDataFrame after replacing values in 'Movie Title' column with 143:")
print(df)
```

This corrected code includes appropriate assignments and prints to demonstrate the changes at each step.

In [None]:
import pandas as pd
df = pd.read_csv("finnew.csv")
df

![image.png](attachment:9a7fb7ae-0cd4-4665-81fe-d7fe2dd99e8a.png)

It appears you are using the `interpolate` method in pandas to fill missing values in a DataFrame. However, there are a couple of things to note:

1. In your code, the DataFrame is not displayed or assigned to a variable after each operation.

2. The `inplace=True` parameter is used, which modifies the DataFrame in place. This means that the original DataFrame is changed, and the changes are not displayed.

Here's a modified version of your code with appropriate assignments and prints:

```python
import pandas as pd

# Reading the CSV file into a DataFrame
df = pd.read_csv("your_csv_file.csv")

# Displaying the original DataFrame
print("Original DataFrame:")
print(df)

# Interpolate missing values linearly along the columns (axis=0)
df_interpolated_linear = df.interpolate(method="linear", axis=0)

# Displaying the DataFrame after linear interpolation
print("\nDataFrame after linear interpolation along columns:")
print(df_interpolated_linear)

# Interpolate missing values with a limit of 2
df_interpolated_limit = df.interpolate(limit=2)

# Displaying the DataFrame after interpolation with a limit
print("\nDataFrame after interpolation with a limit of 2:")
print(df_interpolated_limit)

# Interpolate missing values forward with a limit of 2
df_interpolated_forward = df.interpolate(limit_direction="forward", limit=2)

# Displaying the DataFrame after forward interpolation with a limit
print("\nDataFrame after forward interpolation with a limit of 2:")
print(df_interpolated_forward)

# Interpolate missing values backward with a limit of 2
df_interpolated_backward = df.interpolate(limit_direction="backward", limit=2)

# Displaying the DataFrame after backward interpolation with a limit
print("\nDataFrame after backward interpolation with a limit of 2:")
print(df_interpolated_backward)

# Interpolate missing values in both directions with a limit of 1
df_interpolated_both = df.interpolate(limit_direction="both", limit=1)

# Displaying the DataFrame after bidirectional interpolation with a limit
print("\nDataFrame after bidirectional interpolation with a limit of 1:")
print(df_interpolated_both)

# Interpolate missing values using linear interpolation inside the limit area
df_interpolated_limit_area_inside = df.interpolate(limit_area="inside")

# Displaying the DataFrame after interpolation inside the limit area
print("\nDataFrame after interpolation inside the limit area:")
print(df_interpolated_limit_area_inside)

# Interpolate missing values using linear interpolation outside the limit area
df_interpolated_limit_area_outside = df.interpolate(limit_area="outside")

# Displaying the DataFrame after interpolation outside the limit area
print("\nDataFrame after interpolation outside the limit area:")
print(df_interpolated_limit_area_outside)

# Inplace interpolation with a limit in both directions
df.interpolate(limit_direction="both", limit=2, inplace=True)

# Displaying the DataFrame after inplace interpolation
print("\nDataFrame after inplace interpolation:")
print(df)
```

This corrected code includes appropriate assignments and prints to demonstrate the changes at each step. Adjust the file name ("your_csv_file.csv") accordingly based on your actual CSV file.

In [19]:
!jupyter --version

Selected Jupyter core packages...
IPython          : 8.22.1
ipykernel        : 6.29.2
ipywidgets       : 8.1.2
jupyter_client   : 8.6.0
jupyter_core     : 5.7.1
jupyter_server   : 2.12.5
jupyterlab       : 4.1.2
nbclient         : 0.9.0
nbconvert        : 7.16.1
nbformat         : 5.9.2
notebook         : 7.1.0
qtconsole        : 5.5.1
traitlets        : 5.14.1
