Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: emit a warning when a module in a traceback shadows a stdlib module #67997

Open
ncoghlan opened this issue Mar 30, 2015 · 8 comments
Open
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@ncoghlan
Copy link
Contributor

BPO 23809
Nosy @brettcannon, @birkenfeld, @terryjreedy, @ncoghlan, @larryhastings, @rbtcollins, @ericsnowcurrently, @zooba

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2015-03-30.01:00:31.468>
labels = ['type-feature']
title = 'RFE: emit a warning when a module in a traceback shadows a stdlib module'
updated_at = <Date 2015-04-14.14:12:57.788>
user = 'https://github.com/ncoghlan'

bugs.python.org fields:

activity = <Date 2015-04-14.14:12:57.788>
actor = 'steve.dower'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = []
creation = <Date 2015-03-30.01:00:31.468>
creator = 'ncoghlan'
dependencies = []
files = []
hgrepos = []
issue_num = 23809
keywords = []
message_count = 7.0
messages = ['239543', '239544', '239545', '239546', '239547', '239549', '240032']
nosy_count = 8.0
nosy_names = ['brett.cannon', 'georg.brandl', 'terry.reedy', 'ncoghlan', 'larry', 'rbcollins', 'eric.snow', 'steve.dower']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue23809'
versions = ['Python 3.5']

@ncoghlan
Copy link
Contributor Author

A colleague just ran into the issue where they created a "json.py" module in their local directory and broke a previously working program. I picked up on the mistake when I saw the following traceback snippet:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python2.7/site-packages/praw/__init__.py", line 27, in <module>
        import json
      File "json.py", line 1, in <module>
        r = requests.get("http://www.reddit.com/user/spilcm/about/.json")

However, actually making the connection required thinking "Wait, why is the stdlib JSON module calling out to Reddit?" followed immediately by "Oh, that's not the stdlib json module".

That connection would potentially be easier for new Python users to make if there was an inline notification printed after the traceback warning of the stdlib module shadowing. If nothing else, it would give them something specific to search for to lead them to a discussion of name shadowing and the problems that can arise when you name one of your local modules the same as a standard library module.

Offering such a feature would necessarily rely on having a standard way of getting a programmatic list of the standard library modules available to the running interpreter version without actually loading them all, a question which is discussed further in https://stackoverflow.com/questions/6463918/how-can-i-get-a-list-of-all-the-python-standard-library-modules and https://github.com/jackmaney/python-stdlib-list

Since the contents of the standard library for a given release is immutable, and we'd like to keep any such mechanism (if we decide to provide it) very simple so we can easily access it from the low level interpreter traceback machinery, my suggestion would be along the lines of:

  1. Update the interpreter build process to emit a flat text file containing a complete list of fully qualified stdlib module names (including submodules), one name per line (this will be generated at build time, so it *won't* contain filenames, just the module names - however, it could theoretically be combined with importlib.util.find_spec to generate a PEP-376 style RECORD file for the non-builtin parts of the standard library at installation time)
  2. Rather than exposing the underlying machinery to end users directly, instead expose a function in sysconfig to read that list of modules into memory via a read-only line based iterator. (There may be an internal API to allow access to the content without going through sysconfig)
  3. Updating the two main traceback display mechanisms (the builtin traceback display and the traceback module) to check that list for potential conflicts after displaying a traceback

(This idea could benefit from discussion on python-ideas and potentially even escalate into a full PEP, but I don't currently have the personal bandwidth available to pursue that course. Accordingly, I'm just filing the idea here in case someone else finds it intriguing and has more time available to pursue it)

@ncoghlan ncoghlan added the type-feature A feature request or enhancement label Mar 30, 2015
@rbtcollins
Copy link
Member

Why limit this to just stdlib shadowing?

A local module can shadow a top level module-or-package across the board. If we don't limit it to stdlib names, it becomes a lot easier to implement.

@rbtcollins
Copy link
Member

Implementation wise: this is not part of the generic rendering-of-tracebacks; I'd like to make the traceback new stuff be tastefully extensible - I'd be inclined to do this with a per-frame-callback on render (so we don't pay overhead on unrendered tb's) and a global variable that controls the default hooks loaded into the objects.

That would:

  • allow folk using the raw API to control what hooks are at play
  • allow third-parties to install additional things globally, to influence code using the existing old tb apis
  • make testing of the warning specific code be very straight forward and isolated

@ncoghlan
Copy link
Contributor Author

I proposed limiting it to stdlib names as that's the case where we see the most beginner confusion (experimenting with sockets in a file named "socket.py", etc), and the one where we can generate a comprehensive list of known module names ahead of time (so in most cases we won't need to touch the filesystem to determine there's no shadowing happening once the list of stdlib modules has been pulled in to memory).

Handling the general case means we have to instead rely on rescanning the import system at runtime. On the other hand, we're already doing something along those lines through linecache to handle retrieving source lines, and the import system itself already has caches everywhere (hence importlib.invalidate_caches()), so adding yet-another-cache to be able to check for module shadowing efficiently wouldn't be that big a deal.

Adding Brett & Eric to get some additional perspectives on the idea of doing checks for name shadowing through a new importlib.util API rather than a separate stdlib specific mechanism.

@ncoghlan
Copy link
Contributor Author

On the implementation front, +1 for looking at a per-frame callback API rather than hardcoding this directly into the existing traceback rendering code.

@ncoghlan
Copy link
Contributor Author

I had an idea for a possible importlib.util API to support this capability: an "ignore_entries=0" default arg to https://docs.python.org/dev/library/importlib.html#importlib.util.find_spec

The new arg would say how many found entries to skip when looking for the module, defaulting to returning the first hit as it does now.

The challenging part would be that all the implicit import system caching is currently built around the notion of "there can be only one", so we'd likely want to introduce something like a new name shadowing cache that mapped "(modname, ignored_entries)" pairs to module specs (and was cleared by importlib.invalidate_caches() along with everything else).

@terryjreedy
Copy link
Member

This is a fairly common newbie bug. random.py seems to be favorite name.

@terryjreedy
Copy link
Member

The traceback snippet is not from executing the broken program but from interactive import json. The snippet is missing any following lines, including the essential exception and message. How could requests.get on the first line of json.py execute unless requests were added to the builtins module?

A quite different case is from mod import ob. If the shadow mod does not define ob, as is usual, the current message is
ImportError: cannot import name 'ob' from 'mod' (<path to mod>) This specific error could trigger a check whether mod is a non-stdlib module shadowing an stdlib module and if so, add a hint.

The presented case where import mod succeeds and some code sometime later raises some exception seems much harder to do anything about.

@iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants