# Topic 02 - Problem 8: Identify Columns with High Missing Value Percentage

---

## 1. About the Problem

This problem asks me to identify which columns in a dataset contain too many missing values.  
In real-world projects, columns with excessive missing data are often dropped because they add noise instead of useful information.  
To solve this problem, I will calculate the percentage of missing values for each column and flag columns that cross a given threshold.  
This helps in making informed decisions during data preprocessing.

---


## 2. Solution Code

In [8]:
def columns_with_high_missing(data,threshold_percent):
    missing_count_each_column={}
    total=len(data)
    for record in data:
        for key,value in record.items():
            if value is None:
                missing_count_each_column[key]=missing_count_each_column.get(key,0)+1
    
    high_missing_cols=[]

    for col,count in missing_count_each_column.items():
        missing_percentage=(count/total)*100
        if missing_percentage>=threshold_percent:
            high_missing_cols.append(col)
    
    return high_missing_cols

data = [
    {"age": 25, "salary": None, "city": "Dhaka",'Name':'John'},
    {"age": None, "salary": None, "city": None,'Name':'Anna'},
    {"age": 30, "salary": None, "city": "Chittagong",'Name':'Jacob'},
    {"age": None, "salary": None, "city": None,'Name':None}
]

print("Columns with high missing values:",columns_with_high_missing(data,threshold_percent=50))


Columns with high missing values: ['salary', 'age', 'city']


---

## 3. Summary / Takeaways

By solving this problem, I learned how to quantify missing data instead of guessing.  
I understood how missing value percentage influences feature selection decisions.  
Dropping highly incomplete columns can improve model performance and simplicity.  
This logic is commonly used before feature engineering and modeling.  
Next, I want to automatically drop these columns from the dataset.
