# Topic 02 - Problem 9: Remove Columns with Too Many Missing Values

---

## 1. About the Problem

This problem asks me to remove columns that contain too many missing values.  
After identifying columns with high missing percentages, keeping them can reduce data quality and harm model performance.  
To solve this problem, I will first detect such columns and then remove them from every record.  
This ensures that the cleaned dataset contains only useful and reliable features.

---


## 2. Solution Code

In [7]:
def drop_high_missing_columns(data,threshold_percent):
    missing_counts={}
    total=len(data)
    for record in data:
        for key,value in record.items():
            if value is None:
                missing_counts[key]=missing_counts.get(key,0)+1
    drop_columns=set()
    for cols,count in missing_counts.items():
        missing_percent=(count/total)*100
        if missing_percent>=threshold_percent:
            drop_columns.add(cols)

    cleaned_records=[]
    for record in data:
        cleaned_dataset={}
        for key,value in record.items():
            if key not in drop_columns:
                cleaned_dataset[key]=value
        cleaned_records.append(cleaned_dataset)
    
    return cleaned_records
data = [
    {"age": 25, "salary": None, "city": "Dhaka",'Name':'Anna'},
    {"age": None, "salary": None, "city": None,'Name':'Bob'},
    {"age": 30, "salary": None, "city": "Chittagong",'Name':'Jacob'},
    {"age": None, "salary": None, "city": None,'Name':None}
]

print("Cleaned dataset:",drop_high_missing_columns(data, threshold_percent=50))

Cleaned dataset: [{'Name': 'Anna'}, {'Name': 'Bob'}, {'Name': 'Jacob'}, {'Name': None}]


---

## 3. Summary / Takeaways

By solving this problem, I learned how to remove low-quality features automatically.  
I understood how feature removal helps simplify datasets and reduce noise.  
This step is commonly done before feature engineering and modeling.  
Dropping bad columns improves training stability and performance.  
Next, I want to combine multiple missing-value strategies together.
