Skip to content

Cleaning data using decision tree and k-nn techniques

Notifications You must be signed in to change notification settings

isadays/Co2emissions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Co2emissions

The dataset 'visualizing_global_co2_data.csv' contains high percentage of missing values.

Column Percentage Missing
country 0.000000
year 0.000000
iso_code 16.712123
population 20.929681
gdp 71.216254
... ...
temperature_change_from_n2o 25.649235
total_ghg 87.847346
total_ghg_excluding_lucf 87.847346
trade_co2 91.582671
trade_co2_share 91.584648

Our purpose is then tackle the problem following the approach:

  • Columns with Low Missing Values (e.g., less than 20%): For columns with relatively few missing values, we might consider imputation methods such as mean, median, mode imputation, or using more sophisticated methods like KNN imputation or interpolation.

  • Columns with Moderate Missing Values (e.g., between 20% and 50%): For columns with a moderate percentage of missing values, we could use more advanced imputation methods like KNN imputation, iterative imputation (e.g., MICE), or consider building predictive models to estimate missing values based on other features.

  • Columns with High Missing Values (e.g., over 50%): Columns with a high percentage of missing values may require careful consideration. Depending on the importance of these columns to your analysis, we might decide to drop them if they are not critical.

Releases

No releases published

Packages

No packages published