Getting started with PyPads |
---|
|
PyPads is a tracking framework for your python programs. It implements an infrastructure featuring the possibilities for: |
* Community driven mapping files |
* Logging injection by importlib extension |
* Timekeeping |
* Full access to the current state in logging functions |
* Prefabricated tracking functions and formats |
* Data and control flow manipulation with actuators |
|
The framework was developed for machine learning experiments and is based on mlflow. The main focus for PyPads is based in its ulterior, but pythonic manner of use. PyPads aims to deliver a way to harmonize results of a multitude of libraries in a structured way, while stepping out of the way if needed. Most dependencies of PyPads are to be considered as optional and are only used to extend on more sophisticated logging functions. |
In its core app, PyPads allows for registering plugin |
Quick start |
Install PyPads assuming Python 3 is already installed: |
|
Usage |
==================== |
Activating PyPads for tracking in its default setting is as easy as adding two lines to your experiment. |
A simple example looks like the following. |
|
|
|
|
|
|
|
|
|
|
|
|
Results |
==================== |
By default results can be found in the |
Concepts |
PyPads includes a set of concepts, of which some are to be followed because of technical reasons, while others only impose semantical meaning. |
Actuators |
========= |
Actuators are features of PyPads manipulating experiments. When using an actuator the result of the experiment may be or is impacted. Actuators can include changes to the underlying machine learning code, setup and more. An exemplary actuator is an actuator enforcing a random seed setup. Custom, new or other actuators can be added to an IActuators plugin exposing them to PyPads. |
|
|
|
|
|
|
|
|
|
|
To call an actuator you can use the app. |
|
|
|
API |
========= |
The PyPads API delivers standard functionality of PyPads. This also pipes some of mlflow features. You can start, stop runs, log artifacts, metrics or parameters, set tags and write meta information about them. Additionally the PyPads API inroduces setup and teardown (also called pre and post run) functions to be called and also to manually mark functions for tracking. A full documentation can be found |
|
|
|
Validators |
========= |
Validators are to be used if the experimental status or code has to be checked on some properties. These should normally not log anything, but a validation report. A validation report should be an optional tag or at max a text file. In general validators should inform the user on runtime about errors and problems. It is planned to add the possibility to interrupt an execution if validators fail in the future. Some validators will be logging functions bound to library functions. An examplary validator which will want to be bound to the usage of pytorch is the determinism check for pytorch. |
|
|
|
To call the api you can use the app. |
|
|
|
Setup / Teardown functions |
========= |
Setup or teardown functions are to be called when a run starts or ends. These mostly are used to log meta information about the experiment including data about git, hardware and the environment. A list of currently defined decorators can be found |
|
|
|
|
|
|
|
|
|
|
|
Configuring setup or teardown functions can be done via the app constructor or api. |
|
|
|
MappingFiles |
========= |
Mapping files deliver hooks into libraries to trigger tracking functionality. They are written in yml and defining a syntax to markup functions, classes and modules. |
Decorators |
========= |
Decorators can be used instead of a mapping file to denote hooks in code. Because most libraries are not to be changed directly they are currently used sparingly. In PyPads defined decorators can be found |
Logging functions |
========= |
Logging functions are the generic functions performing tracking tasks bound to hooked functions of libraries. Everything not fitting into other concepts is just called logging function. Following function would track the input to the hooked function. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Configuring logging functions can be achieved by providing mappings to the constructor of the app. Mapping files provide hooks (generally prepended by "pypads" in their naming) and logging functions are mapped to events. A hook can subsequently trigger multiple events and thus logging functions. To pass an event to function mapping a simple dict can be used. |
|
|
|
|
|
|
|
Additionally a hook to event mapping can be defined. |
|
|
|
|
|
|
|
Defining hooks can be done via api, mappings, mapping files or decorators. Decorators are a sensible approach for local custom code. |
|
|
|
|
|
The same holds true for api based tracking. |
|
|
|
|
|
Mapping files or mappings are a more permanent, shareable and modular approach. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Check points |
========= |
Check points are currently not implemented. They will introduce a structured way to denote cache able states. By defining check points we hope to be able to define marks from which an experiment can be rerun in the future. |
Examples |
Sklearn DecisionTree example |
==================== |
Following shows how PyPads can be used to track the parameters, input and output of a sklearn experiment. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The used hooks for each event are defined in the mapping yml file where each event includes the functions to listen to. |
Mapping file example |
For the previous example, the sklearn mapping yml file would look like the following.
files/sklearn_example.yml
For example, "pypads_fit" is an event listener on any fit, fit_predict and fit_transform function call made by any tracked class with those methods.
A hook can be defined in the mapping file via the "hooks" attribute. It is composed of the given name and path defined by the keys in the yml file. Muliple hooks can use the same name and therefore trigger the same functions.
Once the hooks are defined, they are then linked to the events we want them to trigger. Following the example below, the hook pypads_metric will be linked to an event we call Metrics for example. This is done via passing a dictionary as the parameter config to the PyPads class <base_class>
events = {
"Metrics" : {"on": ["pypads_metrics"]}
}
PyPads has a set of built-in logging functions that are mapped by default to some pre-defined events. Check the default setting of PyPads here <default_setting>
. The user can also define custom logging functions for custom events. Details on how to do that can be found here <loggingfns>
.
Currently there are unfortunately not too many external resources available fo PyPads. Additional examples are to be added in the next steps of the road map. You can find an IPython Notebook and an Code example on these repositories.
TODO Please add links to two repositories with example code (We can use the stuff for the data science lab)