Skip to content

Imports and cleans real-world data sets for use in machine learning experiments

License

Notifications You must be signed in to change notification settings

simonharris/pycleandata

Repository files navigation

pycleandata

Retrieve and clean data sets for use in machine learning experiments.

Usage

To process all configured data sets:

$ python3 cleandata.py

Or to specify a single data set using the key from data.yml:

$ python3 cleandata.py <dataset_key>

Requirements

  • Python 3
  • numpy
  • pandas
  • PyYAML

Future work

  • better documentation, especially this file
  • rename cd_data/ to output/ for consistency with pygendata
  • specify config file as command-line option
  • ...and place output in a subdirectory of output/ named after it
  • add a suitable license
  • more flexibility regarding normalisation (and remove the term standardisation)

About

Imports and cleans real-world data sets for use in machine learning experiments

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published