Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uncaught Exception when reading Egg information #455

Closed
HonakerM opened this issue Apr 25, 2023 · 6 comments · Fixed by #482
Closed

Uncaught Exception when reading Egg information #455

HonakerM opened this issue Apr 25, 2023 · 6 comments · Fixed by #482

Comments

@HonakerM
Copy link

HonakerM commented Apr 25, 2023

Hello,

I manage a project using poetry and since importlib-metadata 6.5.1 my install has been failing with the following error. I'm trying to install an updated version of ansible while a current one is already installed. I think the issue is that pathlib is having trouble finding a relative path between the ansible binary /usr/local/bin/ansible and site-packages.

ansible [core 2.13.9]
  config file = None
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.8/site-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.8.16 (default, Feb 11 2023, 02:53:15) [GCC 10.2.1 20210110]
  jinja version = 3.1.2
  libyaml = True
---
00:40:59  #12 42.71   ValueError
00:40:59  #12 42.71 
00:40:59  #12 42.71   '/usr/local/bin/ansible' does not start with '/usr/local/lib/python3.8/site-packages'
00:40:59  #12 42.71 
00:40:59  #12 42.71   at /usr/lib64/python3.8/pathlib.py:908 in relative_to
00:40:59  #12 42.83        904│         n = len(to_abs_parts)
00:40:59  #12 42.83        905│         cf = self._flavour.casefold_parts
00:40:59  #12 42.83        906│         if (root or drv) if n == 0 else cf(abs_parts[:n]) != cf(to_abs_parts):
00:40:59  #12 42.83        907│             formatted = self._format_parsed_parts(to_drv, to_root, to_parts)
00:40:59  #12 42.83     →  908│             raise ValueError("{!r} does not start with {!r}"
00:40:59  #12 42.83        909│                              .format(str(self), str(formatted)))
00:40:59  #12 42.83        910│         return self._from_parsed_parts('', root if n == 1 else '',
00:40:59  #12 42.83        911│                                        abs_parts[n:])
00:40:59  #12 42.83        912│ 

This error used to be caught by a way to broad broad contextlib.suppress(Exception) suppression which @jaraco's recent PR removed. I'm not that familiar with pythons package/importing mechanisms but could we add back a specific suppress(ValueError) to _read_files_egginfo_installed? We could also completely avoid the error by manually loop through and checking the path before finding its relative. I'm happy to open a PR with either change

@jaraco
Copy link
Member

jaraco commented Jun 19, 2023

could we add back a specific suppress(ValueError) to _read_files_egginfo_installed

Yes, maybe. I'd first want to concoct a test that captures the expectation that's being missed by not trapping it. That is, I want to have a better understanding of the circumstances that lead to the failure, confirm that those circumstances are legitimate and supported, and then replicate them in a test to ensure that the library continues to honor that circumstance.

Can you investigate what it is about the interactions between ansible and poetry that lead to the failure and create a (ideally minimal) reproducer that replicates that state without using poetry or ansible?

@dan-blanchard
Copy link
Contributor

I just encountered a similar instance of this issue, and the root cause is that pathlib.Path.relative_to() is not smart enough to add leading .. to make a path relative to site-packages even when it should. (See https://bugs.python.org/issue23082#msg366433)

If you switch to using os.path.relpath, it will add the leading .. instead of crashing.

I subclassed Distribution locally to work around this and changed Distribution._read_files_egginfo_installed to:

    def _read_files_egginfo_installed(self):
        """
        Read installed-files.txt and return lines in a similar
        CSV-parsable format as RECORD: each file should be placed
        relative to the site-packages directory and must be
        quoted (since file names can contain literal commas).

        This file is written when the package is installed by pip,
        but it might not be written for other installation methods.
        Assume the file is accurate if it exists.
        """
        text = self.read_text("installed-files.txt")
        # Prepend the .egg-info/ subdir to the lines in this file.
        # But this subdir is only available from PathDistribution's
        # self._path.
        subdir = getattr(self, "_path", None)
        if not text or not subdir:
            return

        site_pkgs_path = self.locate_file("").resolve()
        for name in text.splitlines():
            path = (subdir / name).resolve()
            try:
                path = path.relative_to(site_pkgs_path)
            except ValueError:
                # relpath will add .. to a path to make it relative to site-packages
                path = Path(os.path.relpath(path, site_pkgs_path))
            yield f'"{path.as_posix()}"'

I may change it to just always use os.path.relpath instead of handling the exception. Happy to make a PR if this seems reasonable.

@jaraco
Copy link
Member

jaraco commented Dec 3, 2023

Thanks dan-blanchard for the additional explanation. It does sound as if there may be a bug here. What I still don't understand is what are the conditions under which this failure occurs? Is there a way to replicate the conditions that trigger the failure? I'd like to avoid rewriting that function in a way that introduces extra branching and looping and mutated variables. If there is a deficiency in Path.relative_to, I'd like to see that deficiency documented (in a bug if a bug else in the docs) and a wrapper function replacing the .relative_to() call that encapsulates the workaround... something like:

diff --git a/importlib_metadata/__init__.py b/importlib_metadata/__init__.py
index 312d6966..0cfd1f47 100644
--- a/importlib_metadata/__init__.py
+++ b/importlib_metadata/__init__.py
@@ -524,14 +524,23 @@ class Distribution(DeprecatedNonAbstract):
             return
 
         paths = (
-            (subdir / name)
-            .resolve()
-            .relative_to(self.locate_file('').resolve())
-            .as_posix()
+            self._relative_to(
+                (subdir / name).resolve(),
+                self.locate_file('').resolve(),
+            ).as_posix()
             for name in text.splitlines()
         )
         return map('"{}"'.format, paths)
 
+    def _relative_to(self, path, root):
+        """
+        Workaround for ... where ".." isn't added.
+        """
+        try:
+            return path.relative_to(root)
+        except ValueError:
+            return pathlib.Path(os.path.relpath(path, root))
+
     def _read_files_egginfo_sources(self):
         """
         Read SOURCES.txt and return lines in a similar CSV-parsable

Such an approach would be minimally invasive to the current behavior and provide context for a future reader to understand why that method exists (and possibly when it can be removed).

But more important is to find an example case that fails and write a test from it that captures the missed expectation.

If you'd like to start on a PR with that test, that would be fantastic. Or if you're only able to devise a set of steps to recreate the failure, that would be useful for someone else to create such a test.

@dan-blanchard
Copy link
Contributor

I just put up a PR that I think should fix (and test) this.

@P403n1x87
Copy link

Based on a report that we had, and that is now linked in this issue, I don't think a missing .. is all there is to it. From the following traceback

Traceback (most recent call last):
  File "/Users/econnolly/source/repos/rivt/personalization/.venv/lib/python3.9/site-packages/ddtrace/internal/packages.py", line 101, in _package_file_mapping
    if ilmd_d is not None and ilmd_d.files is not None:
  File "/Users/econnolly/source/repos/rivt/personalization/.venv/lib/python3.9/site-packages/importlib_metadata/__init__.py", line 520, in files
    return skip_missing_files(
  File "/Users/econnolly/source/repos/rivt/personalization/.venv/lib/python3.9/site-packages/importlib_metadata/_functools.py", line 102, in wrapper
    return func(param, *args, **kwargs)
  File "/Users/econnolly/source/repos/rivt/personalization/.venv/lib/python3.9/site-packages/importlib_metadata/__init__.py", line 518, in skip_missing_files
    return list(filter(lambda path: path.locate().exists(), package_paths))
  File "/Users/econnolly/source/repos/rivt/personalization/.venv/lib/python3.9/site-packages/importlib_metadata/__init__.py", line 555, in <genexpr>
    (subdir / name)
  File "/opt/homebrew/bin/Cellar/pyenv/2.3.33/versions/3.9.7/lib/python3.9/pathlib.py", line 939, in relative_to
    raise ValueError("{!r} is not in the subpath of {!r}"
ValueError: '/Users/econnolly/source/repos/rivt/personalization/.venv/bin/futurize' is not in the subpath of '/Users/econnolly/source/repos/rivt/personalization/.venv/lib/python3.9/site-packages' OR one path is relative and the other is absolute.

we can see a path in the bin area of a venv, and the other in the lib area. My hunch is that the distribution is shipping entry points as part of its file contents, which would be located under the bin area. Trying to match that under the site-customize area will fail.

@dan-blanchard
Copy link
Contributor

Based on a report that we had, and that is now linked in this issue, I don't think a missing .. is all there is to it. From the following traceback

I think we're both saying the same thing in different ways. Some packages include absolute paths in installed-files.txt that are outside of site-packages. When I was looking into the issue, this seems to happen pretty commonly with jupyterlab extension packages. pathlib.Path.relative_to() does not support trying to figure out how two absolute paths are relative to each other (unless one is a subpath of the other), whereas os.path.relpath does, so using that instead seems to solve the issue.

For one of Coiled's projects where we kept running into this issue, I already worked around it by subclassing Distribution as mentioned in #455 (comment), and things generally worked as expected. #482 should fix it as well.

majorgreys added a commit to DataDog/dd-trace-py that referenced this issue Jan 25, 2024
## Description

- Pin importlib to 6.5.0 to avoid issue
python/importlib_metadata#455
- Fix a potential unbound local error.

Fixes #8068 

## Checklist

- [X] Change(s) are motivated and described in the PR description.
- [X] Testing strategy is described if automated tests are not included
in the PR.
- [X] Risk is outlined (performance impact, potential for breakage,
maintainability, etc).
- [X] Change is maintainable (easy to change, telemetry, documentation).
- [X] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed. If no release note is required, add label
`changelog/no-changelog`.
- [X] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/)).
- [X] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist

- [x] Title is accurate.
- [x] No unnecessary changes are introduced.
- [x] Description motivates each change.
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes unless absolutely necessary.
- [x] Testing strategy adequately addresses listed risk(s).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] Release note makes sense to a user of the library.
- [x] Reviewer has explicitly acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment.
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
- [x] If this PR touches code that signs or publishes builds or
packages, or handles credentials of any kind, I've requested a review
from `@DataDog/security-design-and-guidance`.
- [x] This PR doesn't touch any of that.

---------

Signed-off-by: Juanjo Alvarez <juanjo.alvarezmartinez@datadoghq.com>
Co-authored-by: Tahir H. Butt <tahir.butt@datadoghq.com>
Co-authored-by: Zachary Groves <32471391+ZStriker19@users.noreply.github.com>
github-actions bot pushed a commit to DataDog/dd-trace-py that referenced this issue Jan 25, 2024
## Description

- Pin importlib to 6.5.0 to avoid issue
python/importlib_metadata#455
- Fix a potential unbound local error.

Fixes #8068

## Checklist

- [X] Change(s) are motivated and described in the PR description.
- [X] Testing strategy is described if automated tests are not included
in the PR.
- [X] Risk is outlined (performance impact, potential for breakage,
maintainability, etc).
- [X] Change is maintainable (easy to change, telemetry, documentation).
- [X] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed. If no release note is required, add label
`changelog/no-changelog`.
- [X] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/)).
- [X] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist

- [x] Title is accurate.
- [x] No unnecessary changes are introduced.
- [x] Description motivates each change.
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes unless absolutely necessary.
- [x] Testing strategy adequately addresses listed risk(s).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] Release note makes sense to a user of the library.
- [x] Reviewer has explicitly acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment.
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
- [x] If this PR touches code that signs or publishes builds or
packages, or handles credentials of any kind, I've requested a review
from `@DataDog/security-design-and-guidance`.
- [x] This PR doesn't touch any of that.

---------

Signed-off-by: Juanjo Alvarez <juanjo.alvarezmartinez@datadoghq.com>
Co-authored-by: Tahir H. Butt <tahir.butt@datadoghq.com>
Co-authored-by: Zachary Groves <32471391+ZStriker19@users.noreply.github.com>
(cherry picked from commit c8d5e9f)
github-actions bot pushed a commit to DataDog/dd-trace-py that referenced this issue Jan 25, 2024
## Description

- Pin importlib to 6.5.0 to avoid issue
python/importlib_metadata#455
- Fix a potential unbound local error.

Fixes #8068

## Checklist

- [X] Change(s) are motivated and described in the PR description.
- [X] Testing strategy is described if automated tests are not included
in the PR.
- [X] Risk is outlined (performance impact, potential for breakage,
maintainability, etc).
- [X] Change is maintainable (easy to change, telemetry, documentation).
- [X] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed. If no release note is required, add label
`changelog/no-changelog`.
- [X] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/)).
- [X] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist

- [x] Title is accurate.
- [x] No unnecessary changes are introduced.
- [x] Description motivates each change.
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes unless absolutely necessary.
- [x] Testing strategy adequately addresses listed risk(s).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] Release note makes sense to a user of the library.
- [x] Reviewer has explicitly acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment.
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
- [x] If this PR touches code that signs or publishes builds or
packages, or handles credentials of any kind, I've requested a review
from `@DataDog/security-design-and-guidance`.
- [x] This PR doesn't touch any of that.

---------

Signed-off-by: Juanjo Alvarez <juanjo.alvarezmartinez@datadoghq.com>
Co-authored-by: Tahir H. Butt <tahir.butt@datadoghq.com>
Co-authored-by: Zachary Groves <32471391+ZStriker19@users.noreply.github.com>
(cherry picked from commit c8d5e9f)
github-actions bot pushed a commit to DataDog/dd-trace-py that referenced this issue Jan 25, 2024
## Description

- Pin importlib to 6.5.0 to avoid issue
python/importlib_metadata#455
- Fix a potential unbound local error.

Fixes #8068

## Checklist

- [X] Change(s) are motivated and described in the PR description.
- [X] Testing strategy is described if automated tests are not included
in the PR.
- [X] Risk is outlined (performance impact, potential for breakage,
maintainability, etc).
- [X] Change is maintainable (easy to change, telemetry, documentation).
- [X] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed. If no release note is required, add label
`changelog/no-changelog`.
- [X] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/)).
- [X] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist

- [x] Title is accurate.
- [x] No unnecessary changes are introduced.
- [x] Description motivates each change.
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes unless absolutely necessary.
- [x] Testing strategy adequately addresses listed risk(s).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] Release note makes sense to a user of the library.
- [x] Reviewer has explicitly acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment.
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
- [x] If this PR touches code that signs or publishes builds or
packages, or handles credentials of any kind, I've requested a review
from `@DataDog/security-design-and-guidance`.
- [x] This PR doesn't touch any of that.

---------

Signed-off-by: Juanjo Alvarez <juanjo.alvarezmartinez@datadoghq.com>
Co-authored-by: Tahir H. Butt <tahir.butt@datadoghq.com>
Co-authored-by: Zachary Groves <32471391+ZStriker19@users.noreply.github.com>
(cherry picked from commit c8d5e9f)
gnufede pushed a commit to DataDog/dd-trace-py that referenced this issue Jan 29, 2024
Backport c8d5e9f from #8075 to 2.5.

## Description

- Pin importlib to 6.5.0 to avoid issue
python/importlib_metadata#455
- Fix a potential unbound local error.

Fixes #8068 

## Checklist

- [X] Change(s) are motivated and described in the PR description.
- [X] Testing strategy is described if automated tests are not included
in the PR.
- [X] Risk is outlined (performance impact, potential for breakage,
maintainability, etc).
- [X] Change is maintainable (easy to change, telemetry, documentation).
- [X] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed. If no release note is required, add label
`changelog/no-changelog`.
- [X] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/)).
- [X] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist

- [x] Title is accurate.
- [x] No unnecessary changes are introduced.
- [x] Description motivates each change.
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes unless absolutely necessary.
- [x] Testing strategy adequately addresses listed risk(s).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] Release note makes sense to a user of the library.
- [x] Reviewer has explicitly acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment.
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
- [x] If this PR touches code that signs or publishes builds or
packages, or handles credentials of any kind, I've requested a review
from `@DataDog/security-design-and-guidance`.
- [x] This PR doesn't touch any of that.

Co-authored-by: Juanjo Alvarez Martinez <juanjo.alvarezmartinez@datadoghq.com>
ZStriker19 pushed a commit to DataDog/dd-trace-py that referenced this issue Feb 1, 2024
Backport c8d5e9f from #8075 to 2.3.

## Description

- Pin importlib to 6.5.0 to avoid issue
python/importlib_metadata#455
- Fix a potential unbound local error.

Fixes #8068 

## Checklist

- [X] Change(s) are motivated and described in the PR description.
- [X] Testing strategy is described if automated tests are not included
in the PR.
- [X] Risk is outlined (performance impact, potential for breakage,
maintainability, etc).
- [X] Change is maintainable (easy to change, telemetry, documentation).
- [X] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed. If no release note is required, add label
`changelog/no-changelog`.
- [X] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/)).
- [X] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist

- [x] Title is accurate.
- [x] No unnecessary changes are introduced.
- [x] Description motivates each change.
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes unless absolutely necessary.
- [x] Testing strategy adequately addresses listed risk(s).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] Release note makes sense to a user of the library.
- [x] Reviewer has explicitly acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment.
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
- [x] If this PR touches code that signs or publishes builds or
packages, or handles credentials of any kind, I've requested a review
from `@DataDog/security-design-and-guidance`.
- [x] This PR doesn't touch any of that.

Co-authored-by: Juanjo Alvarez Martinez <juanjo.alvarezmartinez@datadoghq.com>
gnufede pushed a commit to DataDog/dd-trace-py that referenced this issue Feb 1, 2024
Backport c8d5e9f from #8075 to 2.4.

## Description

- Pin importlib to 6.5.0 to avoid issue
python/importlib_metadata#455
- Fix a potential unbound local error.

Fixes #8068 

## Checklist

- [X] Change(s) are motivated and described in the PR description.
- [X] Testing strategy is described if automated tests are not included
in the PR.
- [X] Risk is outlined (performance impact, potential for breakage,
maintainability, etc).
- [X] Change is maintainable (easy to change, telemetry, documentation).
- [X] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed. If no release note is required, add label
`changelog/no-changelog`.
- [X] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/)).
- [X] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

## Reviewer Checklist

- [x] Title is accurate.
- [x] No unnecessary changes are introduced.
- [x] Description motivates each change.
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes unless absolutely necessary.
- [x] Testing strategy adequately addresses listed risk(s).
- [x] Change is maintainable (easy to change, telemetry, documentation).
- [x] Release note makes sense to a user of the library.
- [x] Reviewer has explicitly acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment.
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
- [x] If this PR touches code that signs or publishes builds or
packages, or handles credentials of any kind, I've requested a review
from `@DataDog/security-design-and-guidance`.
- [x] This PR doesn't touch any of that.

Co-authored-by: Juanjo Alvarez Martinez <juanjo.alvarezmartinez@datadoghq.com>
Co-authored-by: Emmett Butler <723615+emmettbutler@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants