Skip to content

Latest commit

 

History

History
31 lines (19 loc) · 739 Bytes

README.md

File metadata and controls

31 lines (19 loc) · 739 Bytes

pycleandata

Retrieve and clean data sets for use in machine learning experiments. See also pygendata.

Usage

To process all configured data sets:

$ python3 cleandata.py

Or to specify a single data set using the key from data.yml:

$ python3 cleandata.py <dataset_key>

Requirements

  • Python 3
  • numpy
  • pandas
  • PyYAML

Future work

  • better documentation, especially this file
  • rename cd_data/ to output/ for consistency with pygendata
  • specify config file as command-line option
  • ...and place output in a subdirectory of output/ named after it
  • add a suitable license
  • more flexibility regarding normalisation (and remove the term standardisation)