# Development Ray submission

Generally, `lm-buddy` is installed as a pip requirement in the runtime environment of the Ray job.
During development, however, it can be helpful to execute a job from a local branch 
that has not been published to PyPI.

This example notebook shows how to bypass the pip requirements section of the Ray runtime environment
and instead upload a local copy of the `lm_buddy` Python module directly to Ray.

## File-based submission

This demonstrates the basic workflow for submitting an LM Buddy job to Ray
from a configuration stored as a local file.

The job configuration is stored as a YAML file in a the local `configs` directory,
and that directory is specified as the working directory of the Ray runtime environment upon submission.

In [None]:
# Required imports
import os
from pathlib import Path

from ray.job_submission import JobSubmissionClient

In [None]:
# Create a submission client bound to a Ray cluster
# If using a remote cluster, replace 127.0.0.1 with the head node's IP address.
client = JobSubmissionClient(f"http://127.0.0.1:8265")

In [None]:
# Determine local module path for the LM Buddy repo
# In theory this workflow is possible without having the LM Buddy package installed locally,
# but this is a convenient means to access the local module path
import lm_buddy

lm_buddy_module = Path(lm_buddy.__file__).parent
root_dir = Path(lm_buddy.__file__).parents[2]

In [None]:
# Construct the runtime environment for your job submission
# py_modules contains the path to the local LM Buddy module directory
# pip contains an export of the dependencies for the LM Buddy package (see CONTRIBUTING.md for how to generate)

runtime_env = {
    "working_dir": f"{root_dir}/examples/configs/finetuning",
    "env_vars": {"WANDB_API_KEY": os.environ["WANDB_API_KEY"]},  # If running a job that uses W&B
    "py_modules": [str(lm_buddy_module)],
    "pip": "requirements.txt",  # See CONTRIBUTING.md for how to generate this
}

In [None]:
# Submit the job to the Ray cluster
# Note: LM Buddy is invoked by 'python -m lm_buddy run ...' since the CLI is not installed in the environment
submission_id = client.submit_job(
    entrypoint=f"python -m lm_buddy run finetuning --config finetuning_config.yaml",
    runtime_env=runtime_env,
)

In [None]:
# The client outputs a string with a job ID
# Jobs can be interacted with and terminated via client methods
client.stop_job(submission_id)

## Iterative submission with temporary config files

It is also possible to submit LM Buddy jobs using a fully Jupyter-driven workflow without external file dependencies.
In this case, the job configuration is instantiated in your Python script and written to a temporary directory for submission.
The Ray working directory is based off this temporary YAML file location.

This approach is convenient if you want to run sweeps over parameter ranges, need to modify your config frequently, and use a Python script/Jupyter notebook as your local "driver" for the workflow.


In [None]:
# Required imports
from pathlib import Path
from ray.job_submission import JobSubmissionClient

In [None]:
# Create a submission client bound to a Ray cluster
# If using a remote cluster, replace 127.0.0.1 with the head node's IP address.
client = JobSubmissionClient(f"http://127.0.0.1:8265")

In [None]:
# Determine local module path for the LM Buddy repo
# In theory this workflow is possible without having the LM Buddy package installed locally,
# but this is a convenient means to access the local module path
import lm_buddy

lm_buddy_module = Path(lm_buddy.__file__).parent

In [None]:
import os

from lm_buddy.jobs.configs import FinetuningJobConfig

# Parameters for a programatic sweep
learning_rates = [1e-5, 1e-4, 1e-3, 1e-2]


# Load a "base" config from file with some suitable defaults
base_config = FinetuningJobConfig.from_yaml_file(
    f"{root_dir}/examples/configs/finetuning/finetuning_config.yaml"
)

for lr in learning_rates:
    # Modify based on current iteration lr
    job_config = base_config.model_copy(deep=True)
    job_config.trainer.learning_rate = lr

    # `config_path` is the fully qualified path to the config file on your local filesystem
    with job_config.to_tempfile(name="config.yaml") as config_path:
        # `config_path.parent` is the working directory
        runtime_env = {
            "working_dir": str(config_path.parent),
            "env_vars": {"WANDB_API_KEY": os.environ["WANDB_API_KEY"]},
            "py_modules": [str(lm_buddy_module)],
            "pip": "requirements.txt",  # See CONTRIBUTING.md for how to generate this
        }

        # `config_path.name` is the file name within the working directory, i.e., "config.yaml"
        client.submit_job(
            entrypoint=f"python -m lm_buddy run finetuning --config {config_path.name}",
            runtime_env=runtime_env,
        )