# Handling Missing Data in Pandas Coding Practice Questions

1. Create a Pandas DataFrame named `df` with values `[[1, np.nan], [2, 3], [np.nan, 4]]` and columns `['A', 'B']`. Display the DataFrame.

2. Check if there are any missing values in the DataFrame `df`.

3. Count the number of missing values in each column of `df`.

4. Drop all rows containing missing values from `df`.

5. Drop all columns containing missing values from `df`.

6. Fill all missing values in `df` with 0.

7. Fill missing values in column 'A' of `df` with the mean of the column.

8. Forward-fill missing values in `df`.

9. Backward-fill missing values in `df`.

10. Use the `interpolate` method to fill missing values in `df`.

11. Replace all occurrences of the value 3 in `df` with `NaN`.

12. Check if `df` has any infinite values.

13. Replace any infinite values in `df` with `NaN`.

14. Convert the data type of column 'A' in `df` to integer, handling any missing values.

15. Create a mask for `df` that indicates which entries are missing.

16. Count the number of non-missing values in each column of `df`.

17. Replace missing values in `df` with the median of each column.

18. Fill missing values in `df` using a specified fill value for each column. Use 0 for column 'A' and 1 for column 'B'.

19. Drop rows from `df` where all values are missing.

20. Drop rows from `df` where any value is missing.

# Solutions to Handling Missing Data in Pandas Coding Practice Questions

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Solution to Question 1
df = pd.DataFrame({'A': [1, 2, np.nan], 'B': [np.nan, 3, 4]})
df

Unnamed: 0,A,B
0,1.0,
1,2.0,3.0
2,,4.0


In [3]:
# Solution to Question 2
df.isnull().any().any()

True

In [4]:
# Solution to Question 3
df.isnull().sum()

A    1
B    1
dtype: int64

In [5]:
# Solution to Question 4
df.dropna()

Unnamed: 0,A,B
1,2.0,3.0


In [6]:
# Solution to Question 5
df.dropna(axis=1)

0
1
2


In [7]:
# Solution to Question 6
df.fillna(0)

Unnamed: 0,A,B
0,1.0,0.0
1,2.0,3.0
2,0.0,4.0


In [8]:
# Solution to Question 7
df['A'].fillna(df['A'].mean())

0    1.0
1    2.0
2    1.5
Name: A, dtype: float64

In [9]:
# Solution to Question 8
df.ffill()

Unnamed: 0,A,B
0,1.0,
1,2.0,3.0
2,2.0,4.0


In [10]:
# Solution to Question 9
df.bfill()

Unnamed: 0,A,B
0,1.0,3.0
1,2.0,3.0
2,,4.0


In [11]:
# Solution to Question 10
df.interpolate()

Unnamed: 0,A,B
0,1.0,
1,2.0,3.0
2,2.0,4.0


In [12]:
# Solution to Question 11
df.replace(3, np.nan)

Unnamed: 0,A,B
0,1.0,
1,2.0,
2,,4.0


In [13]:
# Solution to Question 12
np.isinf(df).any().any()

False

In [14]:
# Solution to Question 13
df.replace([np.inf, -np.inf], np.nan)

Unnamed: 0,A,B
0,1.0,
1,2.0,3.0
2,,4.0


In [15]:
# Solution to Question 14
df['A'].fillna(0).astype(int)

0    1
1    2
2    0
Name: A, dtype: int64

In [16]:
# Solution to Question 15
df.isnull()

Unnamed: 0,A,B
0,False,True
1,False,False
2,True,False


In [17]:
# Solution to Question 16
df.notnull().sum()

A    2
B    2
dtype: int64

In [18]:
# Solution to Question 17
df.fillna(df.median())

Unnamed: 0,A,B
0,1.0,3.5
1,2.0,3.0
2,1.5,4.0


In [19]:
# Solution to Question 18
df.fillna({'A': 0, 'B': 1})

Unnamed: 0,A,B
0,1.0,1.0
1,2.0,3.0
2,0.0,4.0


In [20]:
# Solution to Question 19
df.dropna(how='all')

Unnamed: 0,A,B
0,1.0,
1,2.0,3.0
2,,4.0


In [21]:
# Solution to Question 20
df.dropna()

Unnamed: 0,A,B
1,2.0,3.0
