Add Trackio Integration for Ray#61632
Conversation
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a new integration for Trackio with Ray, providing both a setup_trackio function for Ray Train and a TrackioLoggerCallback for Ray Tune. The implementation is clean and follows existing patterns for logger integrations in Ray.
I've identified a few areas for improvement:
- The example in the
setup_trackiodocstring can lead to anAttributeErrorand should be corrected. - The error handling in
TrackioLoggerCallbackcan be improved by logging exceptions instead of silently passing, which will make debugging easier for users.
Overall, this is a great addition. My comments are focused on improving robustness and user experience.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
|
Could you please review the changes? cc: @alexeykudinkin |
|
Which maintainer do I have to reach out to get a first review for this integration? A gentle ping to @alexeykudinkin |
Signed-off-by: Mark Towers <mark.m.towers@gmail.com>
pseudo-rnd-thoughts
left a comment
There was a problem hiding this comment.
Thanks for the PR and integration.
I've made a couple of edits based on personal style, reducing the number of new line between code (IMO new line should show a group of code that works together), changed Dict[Trial, object] to Dict[Trial, trackio.Run].
Ray uses Google docstring styling, so I've moved the docsrings to start on the """ rather than next line.
The PR is missing two important things
- Documentation - The other experiment trackers have a page to explain how to use it. For example comet. To do this, you need to create a file in doc/source/tune/examples/tune-trackio.ipynb
- Testing - Comet, WandB, etc have their own testing to ensure that the implementation continues working. Create a couple of tests in
ray/python/air/tests/test_integration_trackio.py
I've also put some comments on the rest of the file if you could answer them
|
|
||
| self._effective_excludes = list(self._exclude_results) | ||
| if not self.log_config: | ||
| self._effective_excludes.append("config") |
There was a problem hiding this comment.
Whats the difference between the excludes and exclude_results? Could we combine the two?
There was a problem hiding this comment.
That's a good point, I have updated the integration to combine them.
|
|
||
| # Only log supported metric types | ||
| if isinstance(value, (int, float)): | ||
| metrics[key] = value |
There was a problem hiding this comment.
This will silently ignore values to be logged without telling the user why.
I think we should log a warning to the user and why.
There was a problem hiding this comment.
I have added a logging warning about the same.
| from ray.tune.experiment import Trial | ||
| from ray.tune.logger import LoggerCallback | ||
| from ray.tune.result import TRAINING_ITERATION | ||
| from ray.tune.utils import flatten_dict |
There was a problem hiding this comment.
This function is now deprecated, is it necessary / can we find a replacement?
There was a problem hiding this comment.
I have removed the import and added a helper function for it, the function is needed since trackio callback needs flat keys for the trials.
| run = self._trial_runs.get(trial) | ||
|
|
||
| # Lazy initialization after experiment restore | ||
| if run is None: |
There was a problem hiding this comment.
When can the run be None? Looking through the code I can't see anywhere that a trial's run is set to None? If the run is never None, then could you remove all the checking
There was a problem hiding this comment.
There are several edge cases where the trial runs do not get recorded for the callback, especially if a trial errors or partially initializes, the run can be None. Secondly, if trials are resumed after a fault, log_trial_result may be called before log_trial_start, leading to a race condition. This lazy initialization ensures that metrics are not dropped in these cases.
In this repo, a separate integration of ray serve has gradio v3 used. We could update this module with gradio v6 public API and bump the gradio deps in requirements.txt Can I do this in a separate PR? |
|
@ParagEkbote One of my colleagues are in the process of updating Gradio to v5 and will look at updating it to v6 afterwards. |
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
|
Not Stale. |
|
@ParagEkbote We've updated gradio to 5.50.0 however |
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Hi, I'd like to thank you for continuing to reach out and supporting this PR with your inputs. I've rechecked the main branch and the gradio version bump is not visible in the integration module and the requirements.txt file. Could you let me know if the gradio dependency has been updated elsewhere, since I'm only aware about these files since they are visible in the buildkite logs. |
|
As for upgrading the existing gradio integration, could you please point me to the module or file which uses |
|
Gradio-client is one of our requirements code however I can't see its use in Python. |
As far as I know, trackio pulls in |
|
Do you think that we can enable the tests in CI in a separate PR until the gradio dependency issue is resolved, I'll add a TODO note in BUILD.baze and merge this working implementation of trackio? |
|
@ParagEkbote Reviewing the CI, your PR is failing due to How important is |
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Reviewed by Cursor Bugbot for commit 5deb5ae. Configure here.
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
…ParagEkbote/ray into add-trackio-integration-for-ray
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
Signed-off-by: Parag Ekbote <thecoolekbote189@gmail.com>
As per the latest version of trackio release, gradio as a dependency has been removed and we need to only set So, I've pinned the latest version of trackio, but we still need to update |

Description
As described in the issue, this PR adds the integration of trackio with ray. Trackio has a drop-in API similar to W&B, the main difference between the wandb and trackio is that trackio does not rely on a global mutable state, each run is explicitly created, and each run has a fixed lifetime. The API docs have also helped to define the scope of the callback. The runs can be pushed to HF Hub as a dataset or as a Space as seen below, you can view them locally.
Could you please review?
Related issues
Fixes #60708
Additional information
You can the test the same with the following script:
Local usage
HF Space: https://huggingface.co/spaces/AINovice2005/ray-trackio-dashboard
HF Dataset: https://huggingface.co/datasets/AINovice2005/ray-trackio-experiments