[Bugfix] Fix list_files fn in XET-Core Rust Hub implementation - implement recursive file listing#286
Conversation
… implement recursive file listing
Summary of ChangesHello @beiguo218, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical bug in the XET-Core Rust Hub implementation where the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly implements recursive file listing in the list_files function by traversing the directory tree. The implementation is sound. I've provided a couple of suggestions to improve code maintainability and robustness by reducing code duplication and using more idiomatic Rust constructs.
| let url = if current_path.is_empty() { | ||
| format!("{}/api/models/{}/tree/{}", self.endpoint, repo_id, revision) | ||
| } else { | ||
| format!("{}/api/models/{}/tree/{}/{}", self.endpoint, repo_id, revision, current_path) | ||
| }; |
There was a problem hiding this comment.
To improve maintainability and reduce code duplication, you can refactor the URL construction. The base URL is constructed in both branches of the if statement. It would be cleaner to construct the base URL once and then append the current_path if it's not empty. This will make future changes to the URL structure easier and less error-prone.
| let url = if current_path.is_empty() { | |
| format!("{}/api/models/{}/tree/{}", self.endpoint, repo_id, revision) | |
| } else { | |
| format!("{}/api/models/{}/tree/{}/{}", self.endpoint, repo_id, revision, current_path) | |
| }; | |
| let base_url = format!("{}/api/models/{}/tree/{}", self.endpoint, repo_id, revision); | |
| let url = if current_path.is_empty() { | |
| base_url | |
| } else { | |
| format!("{}/{}", base_url, current_path) | |
| }; |
| for item in tree_items { | ||
| if item.item_type == "file" { | ||
| // Add file to results | ||
| all_files.push(HfFileInfo { | ||
| path: item.path, | ||
| hash: item.oid.clone(), // Git OID | ||
| size: item.size, | ||
| xet_hash: item.xet_hash, // XET hash if available | ||
| }); | ||
| } else if item.item_type == "directory" { | ||
| // Add directory to processing queue | ||
| directories_to_process.push(item.path); | ||
| } | ||
| }) | ||
| .collect(); | ||
| } |
There was a problem hiding this comment.
Using a match statement here instead of if/else if would be more idiomatic in Rust and more robust. It makes the code's intent clearer about which item_type values are handled and allows for easily handling or ignoring other types. The Hugging Face API can also return other types like submodule, which this change would explicitly ignore.
for item in tree_items {
match item.item_type.as_str() {
"file" => {
// Add file to results
all_files.push(HfFileInfo {
path: item.path,
hash: item.oid.clone(), // Git OID
size: item.size,
xet_hash: item.xet_hash, // XET hash if available
});
}
"directory" => {
// Add directory to processing queue
directories_to_process.push(item.path);
}
_ => {
// Other types like 'submodule' are ignored.
}
}
}…ement recursive file listing (#286)
What type of PR is this?
/kind bug
Which issue(s) this PR fixes:
Previously list_files fn only returns the files on the top level, will ignore all directories. Make this change so that all subdirectories under a model repo can be traversed, no files will be missed.
Special notes for your reviewer:
Tested in local.
Before this fix, for model

Qwen/Qwen3-Embedding-8B, it only lists and downloads 16 files, and one file under subdirectory got missed;After this fix, for model

Qwen/Qwen3-Embedding-8B, all 17 files being listed and downloaded.Does this PR introduce a user-facing change?
No