<a href="https://colab.research.google.com/github/Tanu-N-Prabhu/Python/blob/master/How_to_Handle_Missing_Data_in_Pandas_Like_a_Pro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to Handle Missing Data in Pandas Like a Pro (Python for Data Science)

## Master the most efficient techniques to clean and impute missing values in real-world datasets.

| ![space-1.jpg](https://github.com/Tanu-N-Prabhu/Python/blob/master/Img/choong-deng-xiang--WXQm_NTK0U-unsplash.jpg?raw=true) |
|:--:|
| Photo by <a href="https://unsplash.com/@dengxiangs?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Choong Deng Xiang</a> on <a href="https://unsplash.com/photos/graphical-user-interface--WXQm_NTK0U?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash">Unsplash</a>
      |


### Introduction
Missing data is one of the most common challenges in real-world machine learning pipelines. Whether you're dealing with financial records, customer surveys, or healthcare data, null values can break your models if not handled properly. In this article, you'll learn how to deal with missing data using Pandas, the most popular data manipulation library in Python.

---

### Problem
You have a dataset with several missing values and want to clean or impute those without losing valuable information. Manually checking and filling NaNs is inefficient, especially for large datasets.

---

### Code Implementation






In [2]:
import pandas as pd
import numpy as np

# Sample DataFrame with missing values
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David', None],
    'Age': [25, np.nan, 30, 22, 28],
    'Salary': [50000, 60000, None, 52000, 58000]
})

df

Unnamed: 0,Name,Age,Salary
0,Alice,25.0,50000.0
1,Bob,,60000.0
2,Charlie,30.0,
3,David,22.0,52000.0
4,,28.0,58000.0


In [3]:
# Drop rows with any missing values
clean_df = df.dropna()

# Fill missing Age with mean
df['Age'] = df['Age'].fillna(df['Age'].mean())

# Fill missing Salary with median
df['Salary'] = df['Salary'].fillna(df['Salary'].median())

# Fill missing Name with 'Unknown'
df['Name'] = df['Name'].fillna('Unknown')

df

Unnamed: 0,Name,Age,Salary
0,Alice,25.0,50000.0
1,Bob,26.25,60000.0
2,Charlie,30.0,55000.0
3,David,22.0,52000.0
4,Unknown,28.0,58000.0


### Code Explanation

* `dropna()` removes all rows with any missing values.

* `fillna(mean)` replaces `NaN` in numeric columns using statistical imputation.

* String columns are filled using constant values like "Unknown".

* This method is clean, fast, and ideal for preprocessing before feeding into ML models.


---

### Why it’s so important

* Machine learning algorithms cannot handle missing values directly.

* Preserves dataset size by imputing rather than deleting.

* Saves manual cleanup time, especially for large or dirty datasets.

* Aligns with best practices in automated ML workflows.


---

### Applications

* Data cleaning and preprocessing pipelines.

* ETL (Extract, Transform, Load) operations in data engineering.

* Feature engineering and transformation for AI models.

* Works seamlessly with Scikit-learn, TensorFlow, and PyTorch.

---

### Conclusion
Handling missing data effectively is a critical skill in data science and AI. Pandas provides powerful, efficient methods to clean and transform your datasets, making them ML-ready. With these techniques, you ensure data integrity without sacrificing performance or accuracy. Thanks for reading my article, let me know if you have any suggestions or similar implementations via the comment section. Until then, see you next time. Happy coding!

---

### Before you go
* Be sure to Like and Connect Me
* Follow Me : [Medium](https://medium.com/@tanunprabhu95) | [GitHub](https://github.com/Tanu-N-Prabhu) | [LinkedIn](https://ca.linkedin.com/in/tanu-nanda-prabhu-a15a091b5) | [Python Hub](https://github.com/Tanu-N-Prabhu/Python)
* [Check out my latest articles on Programming](https://medium.com/@tanunprabhu95)
* Check out my [GitHub](https://github.com/Tanu-N-Prabhu) for code and [Medium](https://medium.com/@tanunprabhu95) for deep dives!

