# Introduction

In this tutorial we'll take a working [JobConfig](https://github.com/openlema/lema/tree/main/configs/lema/jobs) and deploy it remotely on a cluster of your choice.

This guide dovetails nicely with our [Finetuning Tutorial](https://github.com/openlema/lema/blob/main/notebooks/LeMa%20-%20Finetuning%20Tutorial.ipynb) where you create your own TrainingConfig and run it locally. Give it a try if you haven't already!

We'll cover the following topics:
1. Prerequisites
1. Choosing a Cloud
1. Preparing Your JobConfig
1. Launching Your Job
1. \[Advanced\] Deploying a Training Config

## Prerequisites


### LeMa Installation
First, let's install lema. You can find detailed instructions [here](https://github.com/openlema/lema/blob/main/README.md), but it should be as simple as:

```bash
pip install -e ".[dev,train]"
```


### Creating our working directory
For our experiments, we'll use the following folder to save our configs.

In [2]:
from pathlib import Path

tutorial_dir = "deploy_training_tutorial"

Path(tutorial_dir).mkdir(parents=True, exist_ok=True)

## Choosing a Cloud

We'll be using the LeMa Launcher to run remote training. To use the launcher, you need to specify which cloud you'd like to run training on.
We'll list the clouds below:

In [8]:
import oumi.launcher as launcher

# Print all available clouds
print(launcher.which_clouds())

['local', 'polaris', 'runpod', 'gcp', 'lambda']


#### Local Cloud
If you don't have any clouds set up yet, feel free to use the `local` cloud. This will simply execute your job on your current device as if it's a remote cluster. Hardware requirements are ignored for the `local` cloud.

#### Other Providers
Note that to use a cloud you must already have an account registered with that cloud provider.

For example, GCP, RunPod, and Lambda require accounts with billing enabled. Polaris requires an account set up with [ALCF](https://www.alcf.anl.gov/polaris).

Once you've picked a cloud, move on to the next step.

## Preparing Your JobConfig

Let's get started by creating your JobConfig. We'll create a config specifically for this tutorial, but there are many other pre-made configs readily available in our [jobs directory](https://github.com/openlema/lema/tree/main/configs/lema/jobs).

In the config below, feel free to change `cloud: local` to the cloud you chose in the previous step.



In [9]:
%%writefile $tutorial_dir/job.yaml

name: job-tutorial
resources:
  cloud: local
  # Accelerators is ignored for the local cloud.
  accelerators: A100

# Upload working directory to remote.
# If on the local cloud, we CD into the working directory before running the job.
working_dir: .

envs:
  TEST_ENV_VARIABLE: '"Hello, World!"'

# `setup` will always be executed before `run`.
# No setup is required for this job.
#setup: |
#  echo "Running setup..."

run: |
  set -e  # Exit if any command failed.

  echo "$TEST_ENV_VARIABLE"

Overwriting deploy_training_tutorial/job.yaml


## Launching Your Job

First let's load your JobConfig:

In [None]:
# Read our JobConfig from the YAML file.
job = launcher.JobConfig.from_yaml(str(Path(tutorial_dir) / "job.yaml"))

At any point you can easily change the cloud where your job will run by modifying the job's `resources.cloud` parameter:

In [None]:
# Manually set the cloud to use.
job.resources.cloud = "local"

Once you have a job config, kicking off your job is simple:

In [10]:
# You can optionally specify a cluster name here. If not specified, a random name will
# be generated. This is also useful for launching multiple jobs on the same cluster.
cluster_name = None

# Launch the job!
cluster, job_status = launcher.up(job, cluster_name)
print(f"Job status: {job_status}")



Job status: JobStatus(name='job-tutorial', id='0', status='QUEUED', cluster='local', metadata='')


Don't worry if you see any errors from `launcher.up`--you may need to configure permissions to run a job on your specified cloud. The error message should provide you with the proper command to run to authenticate (for GCP this is often `gcloud auth application-default login`).

We can quickly check on the status of our job using the `cluster` returned in the previous command:

In [11]:
print(cluster.get_job(job_status.id))

JobStatus(name='job-tutorial', id='0', status='COMPLETED', cluster='local', metadata='Job finished at 2024-08-09T14:17:42.537384')


Now that we're done with the cluster, let's turn it down to stop billing for non-local clouds.

In [12]:
cluster.down()

## \[Advanced\] Deploying a Training Config

In our [Finetuning Tutorial](https://github.com/openlema/lema/blob/main/notebooks/LeMa%20-%20Finetuning%20Tutorial.ipynb), we created and saved a TrainingConfig. We then invoked training by running
```shell
oumi-train -c "$tutorial_dir/train.yaml"
```

You can also run that command as a job! Simply update the "run" section of the JobConfig with your desired command:


In [None]:
path_to_your_train_config = Path(tutorial_dir) / "train.yaml"  # Make sure this exists!

# Set the `run` command to run your training script.
job.run = f'oumi-train -c "{path_to_your_train_config}"'

And now your job will run your training config when executed!

For a more in-depth overview of the fields in JobConfig, please see our [Running Jobs Remotely tutorial](https://github.com/openlema/lema/blob/main/notebooks/LeMa%20-%20Running%20Jobs%20Remotely.ipynb).