This DVC pipeline trains a CNN to classify images of Pokémon. It will predict whether a Pokémon is of a predetermined type (default: water).
Note: due to the limited size of the dataset, the evaluation dataset is the same data set as the train+test. Take the results of the model with a grain of salt.
This project details the transformation from Notebook to DVC pipeline. In the different branches, you can find three stages in this process:
snapshot-jupyter
: a prototype as you might build it in a Jupyter Notebookpapermill-dvc
: a DVC pipeline with a single stage to run a parameterized notebook using Papermilldvc-pipeline
: pure DVC pipeline with Python modules
-
Create a new virtual environment with
virtualenv -p python3 .venv
-
Activate the virtual environment with
source .venv/bin/activate
-
Install the dependencies with
pip install -r requirements.txt
-
Download the datasets from Kaggle into the data/external/ directory.
$ wget https://www.kaggle.com/datasets/robdewit/pokemon-images -o data/external/pokemon-gen-1-8 $ wget https://www.kaggle.com/datasets/rounakbanik/pokemon -o data/external/stats/pokemon-gen-1-8.csv
-
Run the pipeline with
dvc repro
or run an experiment withdvc exp run
The requirements specify tensorflow-macos
and tensorflow-metal
, which are
the appropriate requirements when you are using a Mac with an M1 CPU or later.
In case you are using a different system, you will need to replace these with
tensorflow
.