Skip to content

Fix _is_package_available reporting available without a version#46125

Merged
vasqu merged 4 commits into
huggingface:mainfrom
blipbyte:fix-is-package-available-na-version
Jun 2, 2026
Merged

Fix _is_package_available reporting available without a version#46125
vasqu merged 4 commits into
huggingface:mainfrom
blipbyte:fix-is-package-available-na-version

Conversation

@blipbyte
Copy link
Copy Markdown
Contributor

@blipbyte blipbyte commented May 20, 2026

_is_package_available confuses a directory on sys.path with
an installed PyPI package, returning (True, "N/A"). Callers
that compare versions (is_kernels_available for example) then
crash on version.parse("N/A").

Fix: if no real version can be read, mark the package as
unavailable.

Found while running tests in PyCharm — its pytest runner adds
tests/ to sys.path, where tests/kernels/ shadows the real
PyPI kernels package.

Tests
Added regression test in tests/utils/test_import_utils.py — patches find_spec + import_module to simulate the namespace shadow; asserts (False, "N/A"). Crash before; passes after.
make check-code-quality && make typing && make check-repo pass.

AI
Drafted with Claude. I reviewed every line and ran the repro
locally. No existing issue or open PR.

@Rocketknight1
Copy link
Copy Markdown
Member

Yeah, I think the bug is real, but I'm not sure about this fix. I'm fine with a small agent PR to fix this, but is there a more robust approach, maybe using PACKAGE_DISTRIBUTION_MAPPING? We have a lot of code that does something like _is_package_available("optimum")[0] and I think that would still fail with this fix if there's a name conflict in sys.path.

@blipbyte blipbyte force-pushed the fix-is-package-available-na-version branch 2 times, most recently from 69137a7 to 9338c2a Compare May 21, 2026 13:41
@blipbyte
Copy link
Copy Markdown
Contributor Author

blipbyte commented May 21, 2026

Update: I tried the PACKAGE_DISTRIBUTION_MAPPING approach, but it caused a regression — packages_distributions() does not index several installed packages in my local env
(tokenizers, safetensors, numpy, packaging, filelock), so the gate rejected them and CI failed (e.g. TokenizersBackend could no longer be imported because
is_tokenizers_available() returned False).

Reverted to the original narrow fix in the editable-install fallback. It addresses the reported is_kernels_available() crash but doesn't cover
_is_package_available("foo")[0]-style callers against sys.path shadows, as you noted. Open to suggestions for a more robust check that doesn't depend on packages_distributions() coverage.

@blipbyte blipbyte force-pushed the fix-is-package-available-na-version branch 2 times, most recently from ea7e3c8 to c56dc32 Compare May 22, 2026 12:18
Comment thread src/transformers/utils/import_utils.py Outdated
# Note that this branch will almost never be run, so we do not import packages for nothing here
package = importlib.import_module(pkg_name)
package_version = getattr(package, "__version__", "N/A")
if package_version == "N/A":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not happy with this because some packages just have messed up versioning tbh and it could affect more packages then

Iiuc the problem arises for a name clash with the kernels package? Is that correct? I think we should be more explicit here then if that's the case without affecting potentially other packages

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Gating on version was too broad — it even dropped rich for me locally, since it's installed but has no version attribute.

The thing that actually signals a sys.path directory shadow is that the import gives back neither a version nor a file. So I now only mark it unavailable when both are missing:

   if package_version == "N/A" and getattr(package, "__file__", None) is None:
       package_exists = False

That way installed-but-version-less packages stay available, and frozen builds (PyInstaller etc.) are fine too - they may not set file, but real packages still expose version. And since this lives in the metadata fallback, detection-only checks like google/protobuf aren't affected.

Let me know what you think!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's interesting 🤔 do we have some reference for the __file__ being necessary? It sounds reasonable but would be nice if there was something official about this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it's in PEP 420: "Namespace packages have no __file__ attribute." A bare dir shadowing the package on sys.path imports as a namespace package, so it has no version and no __file__, while a real install always has __file__ set. That's why I check both, not just the missing version.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM then, if we additionally add this as comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I've read a bit through the pep (sorry shouldve done this before). We could also overshadow namespace packages with this, no? E.g. if I look into the specifications, then those are allowed to not have __file__. Or am I understanding something wrong?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No — this check only runs after importlib.metadata.version() has already failed. A real namespace package with any installed sub-distribution is resolved via metadata first and never reaches it. The "no __version__ + no __file__" combo only survives for an empty shadow dir.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But couldn't there be a situation where the namespace package also doesn't have a version? Versions are not mandatory afaik?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! You're right that versions are optional. But I think we only reach this branch when no package is installed under that name at all. So if there's also no __file__, it should just be an empty folder on sys.path rather than a real namespace package.

Worth mentioning too: this whole branch only runs when return_version=True, so bool-only callers like _is_package_available("foo")[0] are unaffected. Keeps the change scoped to the version-reporting bug from the issue.

Happy to adjust if I'm missing a case though.

@blipbyte
Copy link
Copy Markdown
Contributor Author

Pushed: added the PEP 420 comment + the __file__ gate and two regression tests. Ready for re-review

@blipbyte
Copy link
Copy Markdown
Contributor Author

The 8 failures all look unrelated — they're in cohere2_moe (TP multiprocess + an overfit flake), nothing touching _is_package_available. Happy to rebase if a re-run would help

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Jun 1, 2026

The ci should be fixed on main, no worries not on you

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets summarize the tests but should be fine to merge then, thanks for bearing with me on this. indeed its a super edge case where we want the version to be returned and it clashes with the sys path

Comment thread tests/utils/test_import_utils.py Outdated
assert modeling_auto.__name__ == "transformers.models.auto.modeling_auto"


def test_is_package_available_namespace_shadow_marked_unavailable():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make this one test with all the cases. imo its a bit overengineered to have all of these split

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidated the three tests into one parametrized case as requested, and rebased on main. Thanks for the review!

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Jun 2, 2026

The failing tests should be fixed on main, just need merge/rebase on main

bob added 4 commits June 2, 2026 16:11
When find_spec resolves to a namespace package (e.g. a directory on
sys.path that shadows an optional PyPI dependency), the metadata
fallback returns getattr(package, "__version__", "N/A"). Returning
(True, "N/A") then breaks downstream callers that pass the value to
version.parse().
A versionless-but-installed package (e.g. rich) has no __version__ but
still sets __file__; only a PEP 420 namespace package (a bare dir
shadowing on sys.path) has neither. Gate on both so real installs and
frozen builds stay available.
@blipbyte blipbyte force-pushed the fix-is-package-available-na-version branch from 63bb7bb to 8b50558 Compare June 2, 2026 13:15
@vasqu vasqu enabled auto-merge June 2, 2026 13:37
@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Jun 2, 2026

Merging, thanks a lot!

@vasqu vasqu added this pull request to the merge queue Jun 2, 2026
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 2, 2026
@vasqu vasqu added this pull request to the merge queue Jun 2, 2026
Merged via the queue into huggingface:main with commit 18d845c Jun 2, 2026
29 checks passed
@blipbyte blipbyte deleted the fix-is-package-available-na-version branch June 3, 2026 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants