Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning dataset #22

Closed
mawada-sweis opened this issue Feb 22, 2023 · 1 comment · Fixed by #21
Closed

Cleaning dataset #22

mawada-sweis opened this issue Feb 22, 2023 · 1 comment · Fixed by #21
Assignees
Labels
enhancement New feature or request

Comments

@mawada-sweis
Copy link
Owner

Cleaning the data by removing columns that are unnecessary, or contain a large amount of missing data. in addition to dealing with noisy or inconsistent data.

@mawada-sweis mawada-sweis self-assigned this Feb 22, 2023
@mawada-sweis mawada-sweis added the enhancement New feature or request label Feb 22, 2023
This was linked to pull requests Feb 24, 2023
@mawada-sweis mawada-sweis removed a link to a pull request Feb 24, 2023
@mawada-sweis
Copy link
Owner Author

  • ✅ Cleared direct missing data by removing any features with more than 75% missing data and filling the remaining features with their mode value.

  • ✅ Cleaned indirect missing data by removing any features with more than 60% missing and filling the remaining features based on the type of feature; encoded features handled by encoding the label of the missing type; numeric features handled by replacing the label with a constant or mean value based on the type of missing. ❗The label features are not handled and will be handled during the transformation process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant