- Python 3.8
- matplotlib
- numpy
- pandas
- plotly
- scikit-learn
- Tensorflow 2.0
- pyyaml
- scipy
- dvc
- graphviz, for creating a plot of the model architecture.
Stages:
- Restructure: Restructure raw data into dataframes.
- Featurize: Add features to data set.
- Split: Split data set into training and test data.
- Scale: Scale input data.
- Sequentialize: Split data into input/output sequences.
- Train: Train model.
- Evaluate: Evaluate model.
Restructure raw data into dataframes.
This stage consists of these steps:
- Remove data that should not be a part of the model:
- Time (the time of the workout do not matter).
- Calories. Currently the calories are not a part of the model, but might be interesting as target values later on.
- Optionally scale input features.
- Feature engineering. Examples:
- Rolling range of breathing data.
- Gradient of breathing data.
- Slope angle of the breathing pattern.
- Optionally delete raw inputs that may be less important, since we prefer a simpler model. This might for example be raw breathing data, if we have engineered features that work better.
This stage splits the data set into a training and test set.
In this stage the data is scaled.
In this stage the data is divided into sequences based on a chosen history size.
All stages are defined in the file dvc.yaml
, and the parameters to be used
are saved in params.yaml
.
To run/reproduce an experiment with any given parameters specified in
params.yaml
, run:
dvc repro
If a model already exists and you want to test it on a test set, run:
python3 src/evaluate.py
This requires that the test data already is present in the correct folder.
Because of this, it is usually better to use the command dvc repro
when
evaluating models.
N/A
To run experiments with another dataset, just change the content of
assets/data/raw/
to the files you want to use.
Data set can be visualized by running
python3 visualize.py