-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import inconsistencies between languages #975
Comments
Yep, in both case they should really match the entire path. |
I think it would be useful to apply consistent behavior to all supported languages. It seems we have a few other open issues for this, I'll include them here so we can track in one place: |
@DrewDennison @aryx Thoughts on where/when to prioritize this work? |
For what it's worth, I've had this bite me a few times, in Java and Python. It would be nice if If it's not too hard to support matching the full path of imports, I would be happy 🙏 |
Do you have a semgrep.dev example of what you would like? Just so I can use that later as a set of tests. |
Hm, to be honest I'm not totally sure what the behavior semgrep should have is. There are 2 main contenders in my mind:
I think either are OK, as long as we’re consistent across languages.
So maybe this comes does to some meta questions about Semgrep’s philosophy:
Import aliasing and matching function annotations or keyword dict args regardless of order are great examples of when Semgrep abstracts things for users in a way that’s intuitive (does the expected thing) and makes users’ lives easier. I could see this case going either way |
I think it's worth going over all the different import styles in Python. Python is fairly representative here, so we can extrapolate to other languages as well. To my knowledge, here are the different ways you can import something in Python: import foo.bar.baz
import foo.bar as baz
from foo.bar import baz
from foo.bar import baz as qux
from foo import bar, baz
from foo import bar as baz, qux as quine
# We can avoid wildcard imports for now
from foo import *
# Let's also avoid all relative imports for now
from . import foo IMO, we should always return the FQIN (fully-qualified import name) as the metavariable
The last example is another shortcoming of the automagic import detection initially highlighted in #806. We can put that aside for now. Here are the reasons I think always returning the FQIN is superior:
IMO it's fine to use the aliased import name when showing results on the CLI. Users will naturally be looking at code side-by-side with CLI Semgrep results. Using the aliased name makes matching those easier. However, users will typically be feeding JSON results into some post-processing functionality. It's unlikely that post-processing functionality is referencing back to the code, so returning the FQIN makes sense in JSON output. |
@mschwager Thanks so much for the details and examples, this comment is 💯 I agree with your reasoning, I think having the metavariable return the fully-qualified import name makes the most sense. In the last two examples you provided: from foo import bar, baz
from foo import bar as baz, qux as quine To me, I think the expected behavior should be what you've proposed, 2 matches for each line:
|
Ok I'll have a try at it next week. |
Fixes #1771 This should also help #975 Test plan: $ semgrep -f /tmp/test2.yml tests/python/import_metavar_fullpath.py running 1 rules... tests/python/import_metavar_fullpath.py severity:error rule:tmp.import-auth: spooky import 12:import a.auth 14:import b.auth 16:import a.b.auth ran 1 rules on 1 files: 3 findings
#2094) Fixes #1771 This should also help #975 Test plan: $ semgrep -f /tmp/test2.yml tests/python/import_metavar_fullpath.py running 1 rules... tests/python/import_metavar_fullpath.py severity:error rule:tmp.import-auth: spooky import 12:import a.auth 14:import b.auth 16:import a.b.auth ran 1 rules on 1 files: 3 findings
Here are results on develop with the latest fix: Note that $X contains the right thing internally (if you look at the value binded to $X internally), but in the message we output sometimes the wrong thing because of the new way we print the content of metavariable. We used to rely on abstract_content, but there was some problems with missing tokens, but now @brendongo is using the range of the mached code, and here if you do from foo.bar import baz, then X will be binded to foo.bar.baz and the range will be from foo to baz, which includes unfortunately the 'import' token ... |
Still, there's progress, we now bind the fully qualified name in Python. |
And now we bind the fully qualified name in Java |
w00t, slowly but surely! |
This issue is being marked |
Wave |
This issue is being marked |
This issue is being marked |
This issue is being marked |
wave |
Looks like this is fixed actually -- at least all the issues in the original comment! @aryx I'm going to close but feel free to re-open if there's another thing we're waiting for |
Consider the following Python code:
When using a pattern looking for imports we get the following results:
Here the
$X
metavariable content isos
. This doesn't feel correct looking at the code, I'd expect it to beos.path.join
. An argument could be made for this behavior, but it doesn't even seem consistent across languages. For example, consider the following Java code:Running a similar import check gives the following results:
Here the
$X
metavariable isstatement
. The last module instead of the first that Python uses. Personally I think the full module path should be used in both cases, but whatever we choose it should at least be consistent.This feels similar to #806.
The text was updated successfully, but these errors were encountered: