Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,7 @@
{
"group": "Guides",
"pages":[
"integrations/transformer-lab",
"integrations/dstack",
"integrations/mods",
"integrations/skypilot"
Expand Down
1 change: 1 addition & 0 deletions integrations/dstack.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: "Manage Pods with dstack on Runpod"
sidebarTitle: "dstack"
description: "Use dstack to automate Pod orchestration for AI and ML workloads on Runpod."
---

[dstack](https://dstack.ai/) is an open-source tool that automates Pod orchestration for AI and ML workloads. It lets you define your application and resource requirements in YAML files, then handles provisioning and managing cloud resources on Runpod so you can focus on your application instead of infrastructure.
Expand Down
1 change: 1 addition & 0 deletions integrations/mods.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: "Running Runpod on Mods"
sidebarTitle: "Mods"
description: "Use Mods to interact with language models hosted on Runpod from the command line."
---

[Mods](https://github.com/charmbracelet/mods) is a command-line tool for interacting with language models. It integrates with Unix pipelines, letting you send command output directly to LLMs from your terminal.
Expand Down
1 change: 1 addition & 0 deletions integrations/skypilot.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
title: "Running Runpod on SkyPilot"
sidebarTitle: "SkyPilot"
description: "Use SkyPilot to run LLMs, AI, and batch jobs on Runpod Pods and Serverless endpoints."
---

[SkyPilot](https://skypilot.readthedocs.io/en/latest/) is a framework for running LLMs, AI, and batch jobs on any cloud.
Expand Down
190 changes: 190 additions & 0 deletions integrations/transformer-lab.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
---
title: "Run ML experiments on Runpod with Transformer Lab"
sidebarTitle: "Transformer Lab"
description: "Configure Transformer Lab to run ML training and inference workloads on Runpod GPUs."
---

[Transformer Lab](https://lab.cloud/) is an open-source research environment for AI researchers to train, fine-tune and evaluate models. It allows you to easily scale training from local hardware to cloud GPUs. It provides a unified interface to all your compute resources and simplifies experiment/checkpoint tracking, job scheduling, auto-recovery, centralized artifact storage and more.

This guide shows you how to configure Transformer Lab to run ML workloads on Runpod GPUs.

## Requirements

You'll need:

* [A Runpod account with an API key](/get-started/api-keys).
* macOS, Linux, or Windows with WSL2.
* Python 3.8 or higher.
* Git and curl installed.

<Note>

**Windows users**

Transformer Lab requires [WSL2 (Windows Subsystem for Linux)](https://docs.microsoft.com/en-us/windows/wsl/install). Install WSL2 first, then follow the Linux instructions within your WSL2 environment.

</Note>

## Install Transformer Lab

<Steps>
<Step title="Run the install script">
Open a terminal and run:

```bash
curl -fsSL https://lab.cloud/install.sh | bash -s -- multiuser_setup
```

This installs Transformer Lab to `~/.transformerlab`, sets up a conda environment with all dependencies, and enables the Team Settings features needed for cloud provider configuration.
</Step>

<Step title="Launch Transformer Lab">
Start the Transformer Lab server:

```bash
cd ~/.transformerlab/src
./run.sh
```

Open your browser to `http://localhost:8338`.
</Step>

<Step title="Log in">
Use the default credentials:

* **Email**: `admin@example.com`
* **Password**: `admin123`

<Warning>
Change these credentials after your first login for security.
</Warning>
</Step>
</Steps>

## Configure shared storage

For remote task execution, Transformer Lab requires shared storage so your local instance can communicate with remote Pods. Configure one of the following:

- **Amazon S3**: Create an S3 bucket and configure credentials.
- **Google Cloud Storage**: Create a GCS bucket and configure service account.
- **Azure Blob Storage**: Create a storage container and configure credentials.

Refer to the [Transformer Lab documentation](https://lab.cloud/for-teams/advanced-install/cloud-storage/) for detailed shared storage setup instructions.

## Configure Runpod as a compute provider

<Steps>
<Step title="Get your Runpod API key">
In the Runpod console, go to [Settings](https://www.console.runpod.io/user/settings) and create an API key with **All** permissions or **Restricted** permissions that include Pod access.

Copy the API key. Runpod doesn't store it, so save it securely.
</Step>

<Step title="Open Team Settings">
In Transformer Lab, click your profile icon in the top right corner and select **Team Settings**.
</Step>

<Step title="Add Runpod as a provider">
Navigate to **Compute Providers** and click **Add Provider**.

In the modal that opens:

1. Enter a name for your provider (e.g., "runpod-provider"). Remember this name—you'll use it in your task.yaml files.
2. Select **Runpod** as the provider type.
3. In the configuration JSON field, add your [Runpod API key](/get-started/api-keys):

```json
{
"api_key": "YOUR_RUNPOD_API_KEY",
"api_base_url": "https://rest.runpod.io/v1"
}
```

Leave the base URL as is.

Click **Add Compute Provider** to save the provider.
</Step>
</Steps>

## Run a task on Runpod

Transformer Lab uses task files to define cloud workloads. Tasks specify the resources, setup commands, and run commands for your job.

For detailed information on task configuration, see the [Task YAML Structure](https://lab.cloud/for-teams/running-a-task/task-yaml-structure) documentation. You can also browse the [Task Gallery](https://lab.cloud/for-teams/running-a-task/quick-start#4-import-a-task-from-tasks-gallery) for pre-built templates.

### Create a task

<Steps>
<Step title="Open the Tasks menu">
In the Transformer Lab sidebar, click **Tasks** to open the task management interface.
</Step>

<Step title="Create a new task">
Click **New** to add a new task. Select **Start with a blank task template**, then click **Submit**.
</Step>

<Step title="Configure the task">
In the task editor, paste the following YAML configuration:

```yaml
name: hello-runpod
resources:
compute_provider: runpod-provider
cpus: 4
memory: 16
accelerators: "A40:1"
setup: |
echo "Setting up environment..."
pip install torch
run: |
echo "Hello from Runpod!"
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU: {torch.cuda.get_device_name(0)}' if torch.cuda.is_available() else 'GPU: None')"
```

Replace `runpod-provider` with the name you gave your Runpod provider in Team Settings.

This configuration requests a single NVIDIA A40 GPU on Runpod, installs PyTorch, and runs a simple script to verify GPU access.
</Step>

<Step title="Queue the task">
Click **Queue** to submit the task you just created. Select your Runpod compute provider and click **Submit** to start the job. Transformer Lab provisions a Pod on Runpod, runs your task, and displays the output in the task logs.
</Step>
</Steps>

### Monitor task progress

Once queued, your task appears in the Tasks list with its current status. Click **Output** to view the task logs.

The output modal has two tabs:

- **Lab SDK Output**: Shows output from scripts that use the `transformerlab` Python package.
- **Machine Logs**: Shows raw stdout/stderr from the Pod. Use this tab to see output from standard `print()` statements.

For the examples in this guide, check the **Machine Logs** tab to see your task output.

### Stop a running task

To stop a task before it completes, click the stop button (square icon). This terminates the Runpod Pod and releases the resources.

You can also verify that no Pods are running by checking the [Runpod console](https://www.console.runpod.io/pods).

## Specify GPU types

Use the `accelerators` field to specify the GPU type:

| Accelerator | Description |
|-------------|-------------|
| `"RTX4090:1"` | NVIDIA GeForce RTX 4090 (24GB) |
| `"A40:1"` | NVIDIA A40 (48GB) |
| `"A100:1"` | NVIDIA A100 (40GB or 80GB) |
| `"A100-80GB:1"` | NVIDIA A100 80GB |
| `"H100:1"` | NVIDIA H100 (80GB) |
| `"L40S:1"` | NVIDIA L40S (48GB) |

For multiple GPUs, change the count: `"A100:4"` for 4x A100 GPUs.

## Clean up

When your tasks complete, Transformer Lab automatically releases the Runpod resources. To manually stop a running task, select it from the Tasks list and click **Stop**.

You can also verify that no Pods are running by checking the [Runpod console](https://www.console.runpod.io/pods).
Loading