Skip to content

[Misc] Migrate to use XET-Core Rust based HF Hub in ome-agent replica#282

Merged
slin1237 merged 2 commits into
mainfrom
migrate-hf-download-for-ome-agent
Sep 25, 2025
Merged

[Misc] Migrate to use XET-Core Rust based HF Hub in ome-agent replica#282
slin1237 merged 2 commits into
mainfrom
migrate-hf-download-for-ome-agent

Conversation

@beiguo218
Copy link
Copy Markdown
Collaborator

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR migrates the ome-agent replica from using the Go-based hfutil package to the new Rust-based XET-core library for Hugging Face Hub downloads. It preserves the existing interface while leveraging the performance benefits of the Rust implementation. Including changes for below:

  1. Replaced hfutil with xet package in ome agent replica: Updated all Hugging Face download operations to use the new Rust-based XET-core client;
  2. Modified the replica agent's dependency injection to use xet.Module instead of the old hub client;
  3. Added/updated the unit tests.

Special notes for your reviewer:

Tested. Triggered a OME agent replica Job with 30 pods in parallel to import llama3.1 8b model from HF, no any 429 error being observed, all pods run successfully at their first try.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @beiguo218, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the Hugging Face model download capabilities within the ome-agent replica. By transitioning from a custom Go-based utility to the high-performance Rust-based XET-core library, the system is expected to handle model downloads more efficiently and reliably. The changes encompass core dependency updates, configuration adjustments, and comprehensive testing to validate the new integration.

Highlights

  • Migration to XET-Core: The ome-agent replica has been migrated from using the Go-based hfutil package to the new Rust-based XET-core library for Hugging Face Hub downloads, aiming for improved performance.
  • Dependency Injection Update: The replica agent's dependency injection mechanism has been updated to utilize xet.Module instead of the older hub client, reflecting the new library integration.
  • Configuration Enhancements: New configuration parameters have been introduced in ome-agent.yaml to support XET-core specific features, including enable_dedup, max_concurrent_downloads, and enable_progress_reporting.
  • Test Coverage: Existing unit tests have been updated and new ones added to ensure the correct functionality and integration of the XET-core library.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully migrates the ome-agent replica from the Go-based hfutil to the new Rust-based XET-core library for Hugging Face Hub downloads. The changes are comprehensive, covering dependency updates, configuration changes, and refactoring of related code to use the new xet package. The introduction of a dedicated fx module for the xet client is well-structured and includes new tests. Overall, this is a solid improvement. I have a couple of suggestions to enhance error handling observability and code clarity.

Comment on lines 75 to 77
if err != nil {
// Check error type for better handling
var rateLimitErr *hub.RateLimitError
var httpErr *hub.HTTPError
if errors.As(err, &rateLimitErr) ||
errors.As(err, &httpErr) && httpErr.StatusCode == 429 ||
strings.Contains(err.Error(), "429") ||
strings.Contains(err.Error(), "rate limit") {
logger.Warnf("Rate limited while downloading HuggingFace model %s: %v", input.Source.BucketName, err)
} else {
logger.Errorf("Failed to download HuggingFace model %s: %v", input.Source.BucketName, err)
}
return downloadPath, err
logger.Errorf("Failed to download snapshot: %v", err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error handling for downloadSnapHook has been simplified. The previous implementation had specific logic to detect rate-limiting errors from Hugging Face and log them as warnings. This new implementation logs all errors from the download as a generic "Failed to download snapshot" error.

While the new xet-core library is likely more resilient to rate-limiting, as mentioned in the PR description, this change could represent a loss of observability if such errors can still occur.

Could you please check if the xet library provides a way to distinguish different error types (e.g., transient network errors, rate-limiting, fatal errors)? If so, it would be beneficial to restore more nuanced error logging. For example:

if err != nil {
    var xetErr *xet.XetError
    if errors.As(err, &xetErr) && isRateLimitError(xetErr) {
        logger.Warnf("Rate limited while downloading snapshot: %v", err)
    } else {
        logger.Errorf("Failed to download snapshot: %v", err)
    }
}

(Assuming isRateLimitError can be implemented based on xetErr.Code or xetErr.Message)

Comment thread pkg/xet/config.go
Comment on lines +97 to +101
func WithAppParams(params HubParams) Option {
return func(c *Config) error {
return nil
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function WithAppParams is currently a no-op. It is called in pkg/xet/module.go but doesn't perform any configuration. This can be confusing for future developers. If this function is a placeholder for future functionality, it would be helpful to add a // TODO comment explaining its purpose. If it's not needed, it should be removed to avoid dead code and its corresponding call in module.go.

Suggested change
func WithAppParams(params HubParams) Option {
return func(c *Config) error {
return nil
}
}
// WithAppParams applies configuration parameters from Hub params.
// TODO: Implement logic to apply parameters from HubParams to the config if needed.
func WithAppParams(params HubParams) Option {
return func(c *Config) error {
return nil
}
}

@beiguo218
Copy link
Copy Markdown
Collaborator Author

Will do fix the build.

Comment thread pkg/xet/libxet.a Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we delete this or add this to gitignore

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, would do.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted

@beiguo218
Copy link
Copy Markdown
Collaborator Author

Will do fix the build.

Fixed

@beiguo218 beiguo218 requested a review from slin1237 September 25, 2025 17:48
@slin1237 slin1237 merged commit 99485e1 into main Sep 25, 2025
24 checks passed
@zhyncs zhyncs deleted the migrate-hf-download-for-ome-agent branch October 4, 2025 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants