Python: Allow absolute imports in directories with scripts #5506

tausbn · 2021-03-23T17:42:19Z

Fixes the import logic to account for absolute imports.

We do this by classifying which files and folders may serve as the
entry point for execution, based on a few simple heuristics. If the
file module.py is in the same folder as a file main.py that may be
executed directly, then we allow module to be a valid name for
module.py so that import module will work as expected.

There are still some issues around namespace packages, but the fix for these is beyond the scope of the present PR.

The first commit message goes into quite some detail about the peculiarities of the test setup.

As this is fixing a minor but annoying bug, I don't think a change note is necessary (but I will gladly be convinced otherwise).

This one will require some explanation... First, the file structure. This commit adds a test consisting representing a few different kinds of imports. - Absolute imports, from `module.py` to `main.py` when the latter is executed directly. - A package (contained in the `package` folder) - A namespace package (contained in the `namespace_package` folder) All of these are inside a folder called `code` for reasons I will detail later. The file `main.py` is identified as a script, by the presence of the `!#` comment in its first line. The files themselves are executable, and `python3 main.py` will print out all modules in the order they are imported. The test itself is very simple. It simply lists all modules and their corresponding names. As is plainly visible, without modification we only pick up `package` and its component modules as having names. This is the bit that needs to be fixed. Convincing the test runner to extract this test in a way that mimics reality is, unfortunately, a bit complicated. By default, the test runner itself includes any Python files in the test directory as modules in the invocation of the extractor, and so we must hide everything in the `code` subdirectory. Secondly, a `--path` argument (set to the test directory) is automatically added, and this would also interfere with extraction, and hence we must prevent this. Luckily, if we supply our own `--path` argument -- even if it doesn't make any sense -- then the other argument is left out. Finally, we must actually tell the extractor to extract the files (or it would just happily pass the test with zero files extracted), so the `-R .` argument ensures that we recurse over the files in the test directory after all.

Fixes the import logic to account for absolute imports. We do this by classifying which files and folders may serve as the entry point for execution, based on a few simple heuristics. If the file `module.py` is in the same folder as a file `main.py` that may be executed directly, then we allow `module` to be a valid name for `module.py` so that `import module` will work as expected.

yoff

Seems very reasonable

yoff · 2021-03-24T07:57:03Z

python/ql/src/semmle/python/Files.qll

+      // The file doesn't have the extension `.py` but still contains Python statements
+      not this.getExtension() = "py" and
+      exists(Stmt s | s.getLocation().getFile() = this)
+      or


Have we seen this? Can we have a test for this?

I haven't personally seen it in the wild, but I based the above on this similar construction in ObjectAPI.qll, so I assume there's a reason for it. (I added the check that the file contains Python statements, because we don't want random non-Python files to affect the status of the Python files in the same directory.)

That being said, I'm not sure we can test this without modifying the extractor. I tried renaming main.py to have a non-.py extension, and I can include it directly using --filter:include=..., but even though the file appears in the database, it seems we're not interpreting it as Python code. My best guess is that it is being filtered out by a more restrictive "is this a script" check in the extractor, which seems unfortunate.

I'm also thinking that the above should read this.getExtension().matches("py%"), since there are many Python-related files that have extensions starting with py, and we certainly don't want to have those affecting the analysis, even if they do contain Python statements.

Ok, I think it is fine to have this structure in place.

Oh, I do this every now and then if I write a small helper script in Python, that just needs to look like an executable. With the #!/usr/bin/env python everything works out anyway 👍

OHH, but that is covered by the exists(Comment c | case... So I guess this is for python my-script-without-extension, which does seem strange.

I'm also thinking that the above should read this.getExtension().matches("py%"), since there are many Python-related files that have extensions starting with py, and we certainly don't want to have those affecting the analysis, even if they do contain Python statements.

Hmm, is this about .pyc, .pyo or .pyd files? I'm under the impression that since they contain byte-code, that wouldn't be valid Python statements.

(This last point is not super import to me, just curious)

Indeed, not all .py? files contain Python code, so those would be filtered out anyway. I was thinking more of stub files like .pyi which do contain valid Python, but not code that is actually executed as a script.

(I also found this random list of Python extensions. There are more than I had expected.)

Good point about .pyi 👍 I personally didn't know about .py3 usage either 😄

Note that the test in `no_py_extension` isn't complete, since we're not extracting the `main` file there.

Sadly, it seems we're not interpreting this as Python code, even if we explicitly ask to have it included.

yoff

LGTM

yoff · 2021-03-24T13:42:57Z

Merging this now as I believe it fixes a bunch of grievances. We can always refine it later..

…ute-imports-from-source-directory" This reverts commit 8d15680, reversing changes made to 63831cc. This PR caused performance problems, so reverting now to clear up immediate problems.

Python: Revert #5506 due to bad performance

tausbn added 2 commits March 23, 2021 18:21

tausbn added the no-change-note-required This PR does not need a change note label Mar 23, 2021

tausbn requested a review from a team as a code owner March 23, 2021 17:42

github-actions bot added the Python label Mar 23, 2021

yoff previously approved these changes Mar 24, 2021

View reviewed changes

tausbn added 3 commits March 24, 2021 13:15

Python: Test all cases

6d86239

Note that the test in `no_py_extension` isn't complete, since we're not extracting the `main` file there.

Python: Include unmarked Python file in snapshot

8d30ee5

Sadly, it seems we're not interpreting this as Python code, even if we explicitly ask to have it included.

Python: Disregard all files matching .py%

47686a6

tausbn dismissed yoff’s stale review via 47686a6 March 24, 2021 13:03

yoff approved these changes Mar 24, 2021

View reviewed changes

yoff merged commit 8d15680 into github:main Mar 24, 2021

tausbn deleted the python-allow-absolute-imports-from-source-directory branch March 24, 2021 16:41

RasmusWL mentioned this pull request Mar 29, 2021

Python: Revert #5506 due to bad performance #5552

Merged

calumgrant added a commit that referenced this pull request Mar 31, 2021

Merge pull request #5552 from RasmusWL/revert-import-change

49d1937

Python: Revert #5506 due to bad performance

tausbn mentioned this pull request Apr 6, 2021

Python: Allow absolute imports from source directory #5614

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Python: Allow absolute imports in directories with scripts #5506

Python: Allow absolute imports in directories with scripts #5506

Uh oh!

tausbn commented Mar 23, 2021

Uh oh!

yoff left a comment

Uh oh!

yoff Mar 24, 2021

Uh oh!

tausbn Mar 24, 2021

Uh oh!

yoff Mar 24, 2021

Uh oh!

RasmusWL Mar 24, 2021

Uh oh!

tausbn Mar 24, 2021

Uh oh!

RasmusWL Mar 24, 2021

Uh oh!

yoff left a comment

Uh oh!

yoff commented Mar 24, 2021

Uh oh!

Uh oh!

Python: Allow absolute imports in directories with scripts #5506

Python: Allow absolute imports in directories with scripts #5506

Uh oh!

Conversation

tausbn commented Mar 23, 2021

Uh oh!

yoff left a comment

Choose a reason for hiding this comment

Uh oh!

yoff Mar 24, 2021

Choose a reason for hiding this comment

Uh oh!

tausbn Mar 24, 2021

Choose a reason for hiding this comment

Uh oh!

yoff Mar 24, 2021

Choose a reason for hiding this comment

Uh oh!

RasmusWL Mar 24, 2021

Choose a reason for hiding this comment

Uh oh!

tausbn Mar 24, 2021

Choose a reason for hiding this comment

Uh oh!

RasmusWL Mar 24, 2021

Choose a reason for hiding this comment

Uh oh!

yoff left a comment

Choose a reason for hiding this comment

Uh oh!

yoff commented Mar 24, 2021

Uh oh!

Uh oh!