Labs can be configured in many different ways. We often want to specify what datasets should be used and where they come from, which Machine Learning framework or library we want to run our code on, or maybe our lab is a script that needs command line parameters to work.
This and many more things can be configured in every lab's
ml.yaml configuration file and in this section we'll explore what configurations MachineLabs supports.
Every created lab comes with an
ml.yaml configuration file in which all configurations can be done. An
ml.yaml file is by no means more special than other
.yaml files. This means that we can just use normal YAML syntax to configure our labs.
A lab must have an `ml.yaml` file, otherwise it won't be executable.
A configuration setting usually comes with a name followed by one or more values. For example, if we want to configure a environment in which our lab is executed, we can use MachineLabs'
dockerImageId property with a dedicated value:
If a configuration property can have multiple values, we specificy this as a list like this:
inputs: - name: some-value url: https://some-url.com
In this case we're configuring a lab input, which can be a dataset that will be fetched from a specified
url. Notice that
inputs is the configuration property while
url a part of a single configuration item, which is part of a list of values.
Let's take a look at what configuration options are available in MachineLabs.
Every lab we execute will be executed in a certain environment. The environment describes things like what programming language and libraries will be available at execution time.
MachineLabs uses docker containers to run labs in isolated and reproducable environments. It comes with a set of prebuilt images that we can use to execute our code. To specify a lab's environment, we use the
dockerImageId configuration property.
The following docker images are supported (more will be added in the near future):
- keras_v2-0-x_python_3-1 - Keras 2.0.x and Python 3.1
- keras_v2-0-x_python_2-1 - Keras 2.0.x and Python 2.1
- tensorflow_v1-4-x-gpu_python_3-1 - GPU-enabled Tensorflow 1.4.x, Keras 2.1.x and Python 3.1
- tensorflow_v1-4-x-gpu_python_2-1 - GPU-enabled Tensorflow 1.4.x, Keras 2.1.x and Python 2.1
This means, if we want to run our lab with Keras version 2.0.x and Python version 3.1, our configuration would look like this:
Configuring custom docker images is a feature that is planned for future versions.
Lab inputs (datasets)
Obviously there's no Machine Learning without data. Labs need access to datasets and one way to get hold of them is using inputs. Inputs are basically metadata that describe what data needs to be downloaded before a lab is executed.
An input has a
url, which is the endpoint from which that data should be downloaded from, and a
name, which specifies under what name the data is stored on the file system in a special
inputs directory, so it can be later accessed by the lab. MachineLabs has full internet access so we can basically download from any reachable place.
All inputs are downloaded in a special `inputs` directory on the root level of where the lab's code lives and can be accessed from within labs using common file system operations.
For example, the following configuration sets up an input that fetches the famous MNIST dataset and saves it as
mnist.npz into the
inputs directory (note that the
name property is mandatory, even though we might not need it all the time).
inputs: - name: mnist.npz url: https://s3.amazonaws.com/img-datasets/mnist.npz
If there's multiple datasets to download, no problem. We can simply add more inputs to the list. All inputs will then be downloaded concurrently:
inputs: - name: mnist.npz url: https://s3.amazonaws.com/img-datasets/mnist.npz - name: reuters.npz url: https://s3.amazonaws.com/text-datasets/reuters.npz - name: imdb.npz url: https://s3.amazonaws.com/text-datasets/imdb.npz
Soon MachineLabs will support **mounting custom datasets** that can be uploaded and shared across labs, so data doesn't have to be downloaded every single time a lab is executed.
Every now and then we would like to allow configurable parameters for our own scripts or execute third-party scripts that expect parameters. Script parameters can be configured using the
parameters is a list of parameters that will be passed to our entry file (e.g.
main.py) in the same order they are specified in the
Script parameters are executed in the order they are specified, allowing positional arguments as well!
parameters is a list of
pass-as properties that gives us all the freedom we need to configure different kind of script parameters.
parameters: - pass-as: '--foo=bar' - pass-as: 'some-value'
In a Python environment, this would be equivalent to executing (assuming our script is called
$ python main.py --foo=bar some-value
A very good example to see script parameters in action is MachineLabs Neural Style Transfer Lab.
Obviously, the hardware on which we execute our experiments has a big impact on how much time is being spent on, for example, training a neural net. That's why we often want to make sure our code is executed on GPU-accelerated machines. MachineLabs supports CPU and GPU hardware.
We will open up possible hardware configurations in the future but for now every lab is executed on CPU hardware by default. To run our labs on GPU-accelerated machines, all we have to do is to set the
hardwareType configuration to
gpu like this:
Once that is done, we need to make sure that we're configuring a GPU-enabled lab environment as well. For example:
GPU support isn't available to everyone yet. However, it can easily be enabled by becoming a Patreon for the MachineLabs project.
**GPU support is only enabled for Patreon backers**. Thank you for your support!
A note on host maintenance
Since we're using Google Cloud Platform, GPU instances will terminate for host maintenance events and restart automatically. These maintenance events typically occur once per week.
This means that there is a slight chance that an execution may be unexpectedly terminated by the system every once in a while. We're working on ironing these things out.
For more information about maintenance events, read the official GPU Compute Engine documentation.
||Specifies docker container environment for lab execution.|
||Specifies which data(sets) need to be downloaded before the lab executes.|
||Configures script parameters passed to the entry script (e.g.
||Sets the hardware on which your lab is executed. This can be either