ScalarStop is a Python framework for reproducible machine learning research.
It was written and open-sourced at Neocrym, where it is used to train thousands of models every week.
ScalarStop can help you:
- organize datasets and models with content-addressable names.
- save/load datasets and models to/from the filesystem.
- record hyperparameters and metrics to a relational database.
ScalarStop is a Python package that requires Python 3.8 or newer.
Currently, ScalarStop only supports tracking tf.data.Dataset datasets and tf.keras.Model models. As such, ScalarStop requires TensorFlow 2.8.0 or newer.
We encourage anybody that would like to add support for other machine learning frameworks to ScalarStop. :)
ScalarStop is available on PyPI.
If you are using TensorFlow on a CPU, you can install ScalarStop with the command:
python3 -m pip install scalarstop[tensorflow]
If you are using TensorFlow with GPUs, you can install ScalarStop with the command:
python3 -m pip install scalarstop[tensorflow-gpu]
If you intend to use ScalarStop with PostgreSQL, you should also install either psycopg2-binary (which works out of the box) or psycopg2 (which you compile from source).
Therefore, your installation command could look like either:
python3 -m pip install scalarstop[tensorflow,psycopg2]
python3 -m pip install scalarstop[tensorflow,psycopg2-binary]
python3 -m pip install scalarstop[tensorflow-gpu,psycopg2]
python3 -m pip install scalarstop[tensorflow-gpu,psycopg2-binary]
If you would like to make changes to ScalarStop, you can clone the repository from GitHub.
git clone https://github.com/scalarstop/scalarstop.git
cd scalarstop
python3 -m pip install .
Read the ScalarStop Tutorial to learn the core concepts behind ScalarStop and how to structure your datasets and models.
Afterwards, you might want to dig deeper into the ScalarStop Documentation. In general, a typical ScalarStop workflow involves four steps:
1. Organize your datasets with scalarstop.datablob.
2. Describe your machine learning model architectures using scalarstop.model_template.
3. Load, train, and save machine learning models with scalarstop.model.
4. Save hyperparameters and training metrics to a SQLite or PostgreSQL database using scalarstop.train_store.
We warmly welcome contributions to ScalarStop. Here are the technical details for getting started with adding code to ScalarStop.
First, clone this repository from GitHub. All development happens on the main
branch.
git clone https://github.com/scalarstop/scalarstop.git
Then, run make install
to install Python dependencies in a Poetry virtualenv.
You can run make help
to see the other commands that are available.
Run make fmt
to automatically format code.
Run make lint
to run Pylint and MyPy to check for errors.
Documentation is important! Here is how to add to it.
You can generate a local copy of our Sphinx documentation at scalarstop.com with make docs
.
The generated documentation can be found at docs/_build/dirhtml
. To view it, you should start an HTTP server in this directory, such as:
make docs
cd docs/_build/dirhtml
python3 -m http.server 5000
Then visit http://localhost:5000 in your browser to preview changes to the documentation.
If you want to use Sphinx's ability to automatically generate hyperlinks to the Sphinx documentation of other Python projects, then you should configure intersphinx settings at the path docs/conf.py. If you need to download an objects.inv file, make sure to update the make update-sphinx
command in the Makefile.
The main ScalarStop tutorial is in a Jupyter notebook. If you have made changes to ScalarStop, you should rerun the Jupyter notebook on your machine with your changes to make sure that it still runs without error.
Run make test
to run all unit tests.
If you want to run a specific unit test, try running python3 -m poetry run python -m unittest -k {name of your test}
.
If you are running tests using a Python interpreter that does not have the SQLite3 JSON1 extension, then TrainStore unit tests involving SQLite3 will be skipped. This is likely to happen if you are using Python 3.8 on Windows. If you suspect that you are missing the SQLite3 JSON1 extension, the Django documentation has some suggestions for how to fix it.
By default, tests involving PostgreSQL are skipped. To enable PostgreSQL, run make test
in a shell where the environment variable TRAIN_STORE_CONNECTION_STRING
is set to a SQLAlchemy database connection URL--which looks something like "postgresql://scalarstop:changeme@localhost:5432/train_store"
. The connection URL should point to a working PostgreSQL database with an existing database and user.
The docker-compose.yml file in the root of this directory can set up a PostgreSQL instance on your local machine. If you have Docker and Docker Compose installed, you can start the PostgreSQL database by running docker-compose up
in the same directory as the docker-compose.yml file.
You can run make test-with-coverage
to collect Python line and branch coverage information. Afterwards, run make coverage-html
to generate an HTML report of unit test coverage. You can view the report in a web browser at the path htmlcov/index.html
.
ScalarStop's documentation is built with Sphinx using @pradyunsg's Furo theme and is hosted by Read the Docs.