Skip to content

Add Trackio Integration for Ray#61632

Open
ParagEkbote wants to merge 40 commits into
ray-project:masterfrom
ParagEkbote:add-trackio-integration-for-ray
Open

Add Trackio Integration for Ray#61632
ParagEkbote wants to merge 40 commits into
ray-project:masterfrom
ParagEkbote:add-trackio-integration-for-ray

Conversation

@ParagEkbote
Copy link
Copy Markdown

@ParagEkbote ParagEkbote commented Mar 10, 2026

Description

As described in the issue, this PR adds the integration of trackio with ray. Trackio has a drop-in API similar to W&B, the main difference between the wandb and trackio is that trackio does not rely on a global mutable state, each run is explicitly created, and each run has a fixed lifetime. The API docs have also helped to define the scope of the callback. The runs can be pushed to HF Hub as a dataset or as a Space as seen below, you can view them locally.

Could you please review?

Related issues

Fixes #60708

Additional information

You can the test the same with the following script:

import random
import time

import numpy as np
import trackio

import ray
from ray import tune
from ray.air.integrations.trackio import (
    TrackioLoggerCallback,
    setup_trackio,
)
from ray.train import RunConfig, ScalingConfig
from ray.train.torch import TorchTrainer


PROJECT_NAME = "trackio-ray-demo"

HF_DATASET_ID = (
    "AINovice2005/ray-trackio-experiments"  
)
HF_SPACE_ID = (
    "AINovice2005/ray-trackio-dashboard" 
)

NUM_STEPS = 15


def tune_trainable(config):

    for step in range(NUM_STEPS):

        loss = (config["lr"] * 10) / (step + 1) + random.random()
        accuracy = 1 / (loss + 1e-3)

        # Example artifact
        image = np.random.rand(64, 64, 3)

        tune.report(
            {
                "loss": loss,
                "accuracy": accuracy,
                "image_mean": float(image.mean()),
                "step": step,
            }
        )

        time.sleep(0.2)


def run_tune_example():

    tuner = tune.Tuner(
        tune_trainable,
        param_space={
            "lr": tune.grid_search([0.001, 0.01, 0.1]),
        },
        run_config=tune.RunConfig(
            name="trackio-ray-tune-demo",
            callbacks=[
                TrackioLoggerCallback(
                    project=PROJECT_NAME,
                    # Trackio capabilities
                    auto_log_gpu=True,
                    gpu_log_interval=5,
                    dataset_id=HF_DATASET_ID,
                    space_id=HF_SPACE_ID,
                )
            ],
        ),
    )

    results = tuner.fit()

    print("\nTune finished\n")
    print(results)


def train_loop(config):

    run = setup_trackio(
        config=config,
        project=PROJECT_NAME,
        auto_log_gpu=True,
        gpu_log_interval=5,
        dataset_id=HF_DATASET_ID,
        space_id=HF_SPACE_ID,
    )

    for step in range(NUM_STEPS):

        loss = 5 / (step + 1) + random.random()
        throughput = random.uniform(50, 150)

        # Log metrics
        if run:
            run.log(
                {
                    "loss": loss,
                    "throughput": throughput,
                    "step": step,
                },
                step=step,
            )

        # Example artifact
        sample_image = np.random.rand(64, 64, 3)

        if run:
            run.log(
                {
                    "image_mean": float(sample_image.mean()),
                    "image_std": float(sample_image.std()),
                }
            )

        time.sleep(0.2)

    if run:
        run.finish()


def run_train_example():

    trainer = TorchTrainer(
        train_loop_per_worker=train_loop,
        train_loop_config={"lr": 0.01},
        scaling_config=ScalingConfig(num_workers=1),
        run_config=RunConfig(name="trackio-ray-train-demo"),
    )

    trainer.fit()


def launch_dashboard():

    print("\nLaunching Trackio dashboard...\n")

    trackio.show(
        project=PROJECT_NAME,
        open_browser=True,
    )


if __name__ == "__main__":

    ray.init()

    print("\nRunning Ray Tune experiment\n")
    run_tune_example()

    print("\nRunning Ray Train experiment\n")
    run_train_example()

    print("\nOpening dashboard\n")
    print("Run dashboard manually with:")
    print('trackio show --project "trackio-ray-demo"')

    trackio.finish()
    ray.shutdown()

    print("\nDemo completed\n")

Local usage

image

HF Space: https://huggingface.co/spaces/AINovice2005/ray-trackio-dashboard

HF Dataset: https://huggingface.co/datasets/AINovice2005/ray-trackio-experiments

Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
@ParagEkbote ParagEkbote requested a review from a team as a code owner March 10, 2026 18:46
Comment thread python/ray/air/integrations/trackio.py Outdated
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new integration for Trackio with Ray, providing both a setup_trackio function for Ray Train and a TrackioLoggerCallback for Ray Tune. The implementation is clean and follows existing patterns for logger integrations in Ray.

I've identified a few areas for improvement:

  • The example in the setup_trackio docstring can lead to an AttributeError and should be corrected.
  • The error handling in TrackioLoggerCallback can be improved by logging exceptions instead of silently passing, which will make debugging easier for users.

Overall, this is a great addition. My comments are focused on improving robustness and user experience.

Comment thread python/ray/air/integrations/trackio.py Outdated
Comment thread python/ray/air/integrations/trackio.py Outdated
Comment thread python/ray/air/integrations/trackio.py Outdated
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Comment thread python/ray/air/integrations/trackio.py Outdated
@ray-gardener ray-gardener Bot added tune Tune-related issues train Ray Train Related Issue community-contribution Contributed by the community labels Mar 10, 2026
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Comment thread python/ray/air/integrations/trackio.py Outdated
Comment thread python/ray/air/integrations/trackio.py Outdated
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Comment thread python/ray/air/integrations/trackio.py Outdated
Comment thread python/ray/air/integrations/trackio.py Outdated
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Comment thread python/ray/air/integrations/trackio.py Outdated
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Comment thread python/ray/air/integrations/trackio.py Outdated
Comment thread python/ray/air/integrations/trackio.py
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Comment thread python/ray/air/integrations/trackio.py
@ParagEkbote
Copy link
Copy Markdown
Author

Could you please review the changes?

cc: @alexeykudinkin

@ParagEkbote
Copy link
Copy Markdown
Author

Which maintainer do I have to reach out to get a first review for this integration?

A gentle ping to @alexeykudinkin

Copy link
Copy Markdown
Member

@pseudo-rnd-thoughts pseudo-rnd-thoughts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and integration.

I've made a couple of edits based on personal style, reducing the number of new line between code (IMO new line should show a group of code that works together), changed Dict[Trial, object] to Dict[Trial, trackio.Run].
Ray uses Google docstring styling, so I've moved the docsrings to start on the """ rather than next line.

The PR is missing two important things

  • Documentation - The other experiment trackers have a page to explain how to use it. For example comet. To do this, you need to create a file in doc/source/tune/examples/tune-trackio.ipynb
  • Testing - Comet, WandB, etc have their own testing to ensure that the implementation continues working. Create a couple of tests in ray/python/air/tests/test_integration_trackio.py

I've also put some comments on the rest of the file if you could answer them

Comment thread python/ray/air/integrations/trackio.py Outdated

self._effective_excludes = list(self._exclude_results)
if not self.log_config:
self._effective_excludes.append("config")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the difference between the excludes and exclude_results? Could we combine the two?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, I have updated the integration to combine them.


# Only log supported metric types
if isinstance(value, (int, float)):
metrics[key] = value
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will silently ignore values to be logged without telling the user why.
I think we should log a warning to the user and why.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a logging warning about the same.

Comment thread python/ray/air/integrations/trackio.py Outdated
from ray.tune.experiment import Trial
from ray.tune.logger import LoggerCallback
from ray.tune.result import TRAINING_ITERATION
from ray.tune.utils import flatten_dict
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is now deprecated, is it necessary / can we find a replacement?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the import and added a helper function for it, the function is needed since trackio callback needs flat keys for the trials.

run = self._trial_runs.get(trial)

# Lazy initialization after experiment restore
if run is None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can the run be None? Looking through the code I can't see anywhere that a trial's run is set to None? If the run is never None, then could you remove all the checking

Copy link
Copy Markdown
Author

@ParagEkbote ParagEkbote Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several edge cases where the trial runs do not get recorded for the callback, especially if a trial errors or partially initializes, the run can be None. Secondly, if trials are resumed after a fault, log_trial_result may be called before log_trial_start, leading to a race condition. This lazy initialization ensures that metrics are not dropped in these cases.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh ok thanks

@ParagEkbote
Copy link
Copy Markdown
Author

Looking at the CI, the error is

[2026-04-02T09:42:19Z] #7 111.7 The conflict is caused by:
--
[2026-04-02T09:42:19Z] #7 111.7     The user requested gradio<7.0.0 and >=6.8.0
[2026-04-02T09:42:19Z] #7 111.7     The user requested (constraint) gradio==3.50.2

Therefore, it looks like we need to wait for gradio to be updated to support v6 before we can merge this PR

In this repo, a separate integration of ray serve has gradio v3 used. We could update this module with gradio v6 public API and bump the gradio deps in requirements.txt

Can I do this in a separate PR?

@pseudo-rnd-thoughts
Copy link
Copy Markdown
Member

@ParagEkbote One of my colleagues are in the process of updating Gradio to v5 and will look at updating it to v6 afterwards.

@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions Bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Apr 18, 2026
@ParagEkbote
Copy link
Copy Markdown
Author

Not Stale.

@github-actions github-actions Bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Apr 18, 2026
@pseudo-rnd-thoughts
Copy link
Copy Markdown
Member

@ParagEkbote We've updated gradio to 5.50.0 however gradio==6.x will require a breaking gradio-client 1.x -> 2.x change. Could you research to understand what changes are necessary both on the packages and implementations for us?

Comment thread python/ray/air/integrations/trackio.py
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
@ParagEkbote
Copy link
Copy Markdown
Author

ParagEkbote commented Apr 22, 2026

@ParagEkbote We've updated gradio to 5.50.0 however gradio==6.x will require a breaking gradio-client 1.x -> 2.x change. Could you research to understand what changes are necessary both on the packages and implementations for us?

Hi, I'd like to thank you for continuing to reach out and supporting this PR with your inputs.

I've rechecked the main branch and the gradio version bump is not visible in the integration module and the requirements.txt file. Could you let me know if the gradio dependency has been updated elsewhere, since I'm only aware about these files since they are visible in the buildkite logs.

@ParagEkbote
Copy link
Copy Markdown
Author

As for upgrading the existing gradio integration, could you please point me to the module or file which uses gradio-client as a dependency, I'm unable to locate the imports used within this repo.

cc: @pseudo-rnd-thoughts

@pseudo-rnd-thoughts
Copy link
Copy Markdown
Member

pseudo-rnd-thoughts commented Apr 24, 2026

Gradio-client is one of our requirements code however I can't see its use in Python.
@elliot-barn Could you clarify where we needed to use gradio-client and what needs updating

@ParagEkbote
Copy link
Copy Markdown
Author

Gradio-client is one of our requirements code however I can't see its use in Python. @elliot-barn Could you clarify where we needed to use gradio-client and what needs updating

As far as I know, trackio pulls in gradio-client based on their fixed version pin, similarly gradio has a fixed version of gradio-client for their repo. Does this clarification help?

@ParagEkbote
Copy link
Copy Markdown
Author

Do you think that we can enable the tests in CI in a separate PR until the gradio dependency issue is resolved, I'll add a TODO note in BUILD.baze and merge this working implementation of trackio?

cc: @pseudo-rnd-thoughts

@pseudo-rnd-thoughts
Copy link
Copy Markdown
Member

@ParagEkbote Reviewing the CI, your PR is failing due to

[2026-05-08T09:51:39Z]     from ray.air.integrations.trackio import (
--
  | [2026-05-08T09:51:39Z]   File "/rayci/python/ray/air/integrations/trackio.py", line 7, in <module>
  | [2026-05-08T09:51:39Z]     import trackio
  | [2026-05-08T09:51:39Z]   File "/opt/miniforge/lib/python3.10/site-packages/trackio/__init__.py", line 8, in <module>
  | [2026-05-08T09:51:39Z]     from trackio.ui import demo
  | [2026-05-08T09:51:39Z]   File "/opt/miniforge/lib/python3.10/site-packages/trackio/ui.py", line 68, in <module>
  | [2026-05-08T09:51:39Z]     with gr.Sidebar() as sidebar:
  | [2026-05-08T09:51:39Z] AttributeError: module 'gradio' has no attribute 'Sidebar'

How important is Sidebar? Do you know what the minimal gradio version contains this?

Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 5deb5ae. Configure here.

Comment thread python/requirements_compiled.txt
Comment thread python/ray/air/integrations/trackio.py
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
@ParagEkbote
Copy link
Copy Markdown
Author

@ParagEkbote Reviewing the CI, your PR is failing due to

[2026-05-08T09:51:39Z]     from ray.air.integrations.trackio import (
--
  | [2026-05-08T09:51:39Z]   File "/rayci/python/ray/air/integrations/trackio.py", line 7, in <module>
  | [2026-05-08T09:51:39Z]     import trackio
  | [2026-05-08T09:51:39Z]   File "/opt/miniforge/lib/python3.10/site-packages/trackio/__init__.py", line 8, in <module>
  | [2026-05-08T09:51:39Z]     from trackio.ui import demo
  | [2026-05-08T09:51:39Z]   File "/opt/miniforge/lib/python3.10/site-packages/trackio/ui.py", line 68, in <module>
  | [2026-05-08T09:51:39Z]     with gr.Sidebar() as sidebar:
  | [2026-05-08T09:51:39Z] AttributeError: module 'gradio' has no attribute 'Sidebar'

How important is Sidebar? Do you know what the minimal gradio version contains this?

As per the latest version of trackio release, gradio as a dependency has been removed and we need to only set gradio-client>=2.0.0

So, I've pinned the latest version of trackio, but we still need to update gradio-client to the required version.

cc: @pseudo-rnd-thoughts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community train Ray Train Related Issue tune Tune-related issues unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Trackio as a New Backend for Experiment Visualization

2 participants