Skip to content

[Bugfix] Fix list_files fn in XET-Core Rust Hub implementation - implement recursive file listing#286

Merged
slin1237 merged 1 commit into
mainfrom
fix-bug-in-listfiles-hf
Sep 26, 2025
Merged

[Bugfix] Fix list_files fn in XET-Core Rust Hub implementation - implement recursive file listing#286
slin1237 merged 1 commit into
mainfrom
fix-bug-in-listfiles-hf

Conversation

@beiguo218
Copy link
Copy Markdown
Collaborator

What type of PR is this?

/kind bug

Which issue(s) this PR fixes:

Previously list_files fn only returns the files on the top level, will ignore all directories. Make this change so that all subdirectories under a model repo can be traversed, no files will be missed.

Special notes for your reviewer:

Tested in local.

Before this fix, for model Qwen/Qwen3-Embedding-8B, it only lists and downloads 16 files, and one file under subdirectory got missed;
Screenshot 2025-09-26 at 1 52 40 PM

After this fix, for model Qwen/Qwen3-Embedding-8B, all 17 files being listed and downloaded.
Screenshot 2025-09-26 at 2 15 17 PM

Does this PR introduce a user-facing change?

No

@beiguo218 beiguo218 requested a review from slin1237 as a code owner September 26, 2025 21:15
@github-actions github-actions Bot added the bug Something isn't working label Sep 26, 2025
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @beiguo218, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug in the XET-Core Rust Hub implementation where the list_files function failed to provide a complete list of files for Hugging Face models, specifically omitting files located within subdirectories. The core change involves refactoring the file listing logic to perform a recursive traversal of the repository's directory tree, guaranteeing that all files are discovered and returned, thus resolving the issue of incomplete file listings.

Highlights

  • Bugfix: Recursive File Listing: The list_files function in the XET-Core Rust Hub implementation has been fixed to recursively traverse model repository directories. Previously, it only listed top-level files, missing those in subdirectories.
  • Improved File Discovery: This change ensures that all files within a model repository, regardless of their depth in the directory structure, are correctly identified and included in the file listing.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly implements recursive file listing in the list_files function by traversing the directory tree. The implementation is sound. I've provided a couple of suggestions to improve code maintainability and robustness by reducing code duplication and using more idiomatic Rust constructs.

Comment thread pkg/xet/src/hf_adapter.rs
Comment on lines +156 to +160
let url = if current_path.is_empty() {
format!("{}/api/models/{}/tree/{}", self.endpoint, repo_id, revision)
} else {
format!("{}/api/models/{}/tree/{}/{}", self.endpoint, repo_id, revision, current_path)
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve maintainability and reduce code duplication, you can refactor the URL construction. The base URL is constructed in both branches of the if statement. It would be cleaner to construct the base URL once and then append the current_path if it's not empty. This will make future changes to the URL structure easier and less error-prone.

Suggested change
let url = if current_path.is_empty() {
format!("{}/api/models/{}/tree/{}", self.endpoint, repo_id, revision)
} else {
format!("{}/api/models/{}/tree/{}/{}", self.endpoint, repo_id, revision, current_path)
};
let base_url = format!("{}/api/models/{}/tree/{}", self.endpoint, repo_id, revision);
let url = if current_path.is_empty() {
base_url
} else {
format!("{}/{}", base_url, current_path)
};

Comment thread pkg/xet/src/hf_adapter.rs
Comment on lines +174 to +187
for item in tree_items {
if item.item_type == "file" {
// Add file to results
all_files.push(HfFileInfo {
path: item.path,
hash: item.oid.clone(), // Git OID
size: item.size,
xet_hash: item.xet_hash, // XET hash if available
});
} else if item.item_type == "directory" {
// Add directory to processing queue
directories_to_process.push(item.path);
}
})
.collect();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a match statement here instead of if/else if would be more idiomatic in Rust and more robust. It makes the code's intent clearer about which item_type values are handled and allows for easily handling or ignoring other types. The Hugging Face API can also return other types like submodule, which this change would explicitly ignore.

            for item in tree_items {
                match item.item_type.as_str() {
                    "file" => {
                        // Add file to results
                        all_files.push(HfFileInfo {
                            path: item.path,
                            hash: item.oid.clone(), // Git OID
                            size: item.size,
                            xet_hash: item.xet_hash, // XET hash if available
                        });
                    }
                    "directory" => {
                        // Add directory to processing queue
                        directories_to_process.push(item.path);
                    }
                    _ => {
                        // Other types like 'submodule' are ignored.
                    }
                }
            }

@slin1237 slin1237 merged commit d921a04 into main Sep 26, 2025
24 checks passed
@zhyncs zhyncs deleted the fix-bug-in-listfiles-hf branch October 4, 2025 03:50
slin1237 pushed a commit that referenced this pull request Dec 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants