Skip to content

Quickstart

Justin Fu edited this page Feb 14, 2020 · 6 revisions

This page covers simple examples for getting started with Doodad. For more detailed documentation on the API, see Detailed Documentation.

Table of Contents

  1. Hello World
  2. Adding Dependencies
  3. Retrieving Output
  4. Script Arguments
  5. Remote Workflow

Hello World

We can very quickly run shell commands using the launch_api:

import doodad

doodad.run_command(
    command='echo helloworld',
)

This command will launch a docker container and execute the command echo helloworld inside.

Next, we can try running a simple python program which replicates the above behavior. First, we can write the following hello world script:

print('helloworld')

Save this script and remember the filename. We can run this script with the run_python command:

import doodad

doodad.run_python(
    target='path/to/hello_world.py',
)

Likewise, this will start a docker container and run the python script inside.

Adding Dependencies

The hello world programs do not specify any data dependencies, so there is no mechanism for sending or retrieving data from the running script. We can specify these dependencies using "mount" objects. In the next example, let's include a text file and read it.

Save the following data as 'foo/secret.txt'

apple

Now, we need tell doodad to specify the 'foo' folder as a dependency. We can do this by creating a MountLocal object:

mnt = doodad.MountLocal(
    local_dir='foo',
    mount_point='./mymount',
    output=False
)

The mount_point argument specified where this folder will be available to the running script.

Now, let us finish writing the script.

import doodad

doodad.run_command(
    command='cat ./mymount/secret.txt',
    mounts=[mnt]
)

When run, this should now print out the contents of secret.txt!

Mount objects are used for both specifying outputs and code/data dependencies. See the documentation for available options.

Retrieving Output

Sometimes you will need to collect data or log files generated by your program. If you need to retrieve outputs from the log container, use the output=True flag for MountLocal as follows:

import os
import doodad

os.makedirs('testing_dir')
mnt = doodad.MountLocal(
    local_dir='testing_dir',
    mount_point='/mymount',
    output=True
)

doodad.run_command(
    command='echo hello123 > /mymount/secret.txt',
    mounts=[mnt]
)

This script will write 'hello123' into the text file testing_dir/secret.txt.

When running remotely, you will need to use either MountGCP or MountS3 instead of MountLocal in order to sync to cloud storage services. Again, see the documentation for more details.

Script arguments

Command-line arguments can be passed into the script, by passing in cli_args to the launch_api.run_command or launch_api.run_python functions. cli_args should be formatted as a string, for example: '--arg1 10 --arg2'. Some basic hyperparameter sweeping functionality can be performed on top of command-line arguments.

The argparse module in python is recommended for retrieving arguments inside the script.

Remote Workflow

In order to launch jobs remotely, we need to specify a launch mode. Our previous examples have been using the local run mode, mode.LocalMode() by default.

Let's take our example from the previous section. To run a job remotely, we simply need to pass in the appropriate launch mode.

import doodad

local = doodad.LocalMode()
gcp_mode = doodad.GCPMode(<fill in arguments>)

mnt = doodad.MountLocal(
    local_dir='foo',
    mount_point='./mymount',
    output=False
)

# This will run locally
doodad.run_command(
    command='cat ./mymount/secret.txt',
    mounts=[mnt],
    mode=local,
)

# This will run remotely
doodad.run_command(
    command='cat ./mymount/secret.txt',
    mounts=[mnt],
    mode=gcp_mode
)

A good workflow is to first test your code locally, before launching jobs on remote services (and potentially incurring charges):

  1. Launch your code with mode.LocalMode()
  2. Launch your code with the appropriate remote service.