Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --python-infer-string-imports #10609

Merged
merged 2 commits into from Aug 13, 2020

Conversation

Eric-Arellano
Copy link
Contributor

Certain Python projects, such as Django apps, make frequent use of dynamic imports. Dep inference needs to be able to understand those to work well with those code bases.

We turn string imports inference off by default because it could be surprising and it's much more likely to have false positives.

Rejected design (for now) - global ignores

With Toolchain's buildgen (which inspired dep inference), false positives were handled via a subsystem option that allowed users to put regex strings to ignore globally. At the time, there was no ! ignore syntax in the dependencies field, so this was the only option.

Now that we have ! ignores, we start with the simplest solution to only use that for false positives. If we get user feedback that they want a decentralized mechanism, we can easily add a subsystem.

(Also note that we will only infer dependencies on Python targets, whereas Toolchain's Buildgen would infer on files() and resources(). Many of the ignores in Toolchain were for static assets, like .css files.)

[ci skip-rust]
[ci skip-build-wheels]

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
result = (
owner.address
for owner in owner_per_import
if (
owner.address
and owner.address.maybe_convert_to_base_target() != request.sources_field.address
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, we would not infer for a base target any dependencies on its own generated subtargets. That used to make sense. Now, though, with the model change, it makes sense for a base target to have a dependency on its generated subtarget.

@Eric-Arellano
Copy link
Contributor Author

@asherf I added you as an FYI, and if you have any feedback on the decision to not add a global option for now. No worries about reviewing the code.

@asherf
Copy link
Member

asherf commented Aug 13, 2020

honestly, I don't like this way of doing things.

Specifying stuff on a repo level won't scale very well and is confusing to a random user trying to add a new file with some string references to python file, I have been down this road and ended up having a regex that looks like this:

  '^([\\/]?\\w*\\/)+settings(_base|_util|_extra(_worker)?)?\\.py$',
  '^([\\/]?\\w*\\/)+settings/[a-z]*\\.py',
  '^([\\/]?\\w*\\/)+constants\\.py$',

UGH...
and this is for a relatively small repo (less than 60K loc of python code)

I think it is much better if we can add something at the python_library() target level.
either specify globs for python files from which to infer string dependencies from or specify a string regex on how for strings that represents references to python modules.

@Eric-Arellano
Copy link
Contributor Author

@asherf, to clarify, we are not adding global excludes like you had to use with Toolchain. The PR description justifies that decision, and explains that we can add it down the road if we want.

Right now, the only option is to use decentralized excludes like !bad/import/ignore_me:tgt in the dependencies field. It sounds like we're in agreement here.

@asherf
Copy link
Member

asherf commented Aug 13, 2020

What we had in Toolchain is not a global exclude, but a global include, i.e. a regex for filenames in which we should be looking for strings that looks like python imports.

OTOH, treating every string that looks like aaa.bbb.ccc as a python import is also a bad idea (you will pick up strings that point to domain names).

The other way is bad too because it will miss imports of external packages (used with django).
In many case, django packages are added as django apps (into the INSTALLED_APPS list) and as middleware (MIDDLWARE list):

https://django-csp.readthedocs.io/en/latest/installation.html#installing-django-csp
https://github.com/jdelic/django-postgresql-setrole#how-do-i-use-this-django-application
https://django-extensions.readthedocs.io/en/latest/installation_instructions.html#configuration
https://github.com/korfuri/django-prometheus#quickstart
and basically every other third party django package works like that, and most django project will use at least 5 of those (if I am being conservative).
see: https://djangopackages.org/ there are ton of extensions for django and most larger project will use a lot of external packages (internally, we use about ~10 packages)

Needing to specify those in the pants config is also not ideal.

The only thing that makes sense to have in the top level pants config is the default file names (regex) in which imports should be looked for (which is usually conftest.py) everything else should only be specified on the target level.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 0.0% when pulling b432da7 on Eric-Arellano:import-strings into 8e73969 on pantsbuild:master.

@Eric-Arellano
Copy link
Contributor Author

treating every string that looks like aaa.bbb.ccc as a python import is also a bad idea (you will pick up strings that point to domain names).

This would be perfectly fine. While Pants will think that it's an import, it will fail to find an owning target for that domain name, so Pants will no-op. That owning target also must be a Python target (like python_library), so Pants won't try adding a dependency on imports for things that are actually files or resources.

Pants only infers a dependency if a) there is an owning target, b) that owning target is a Python target, and c) that is the sole owning target, i.e. there are not >1 owners.

specified on the target level

This would be achieved by adding a normal explicit dependency to the target like you would without inference enabled. That won't change.

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants