Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MyPy to not directly run over dependencies #11553

Closed
Eric-Arellano opened this issue Feb 12, 2021 · 2 comments
Closed

Fix MyPy to not directly run over dependencies #11553

Eric-Arellano opened this issue Feb 12, 2021 · 2 comments

Comments

@Eric-Arellano
Copy link
Contributor

Right now, we instruct MyPy to run over not only the specified files, but all transitive deps. A user found that this results in type checking more than the user actually intended. It is true that we need all transitive deps to be in the chroot, but we shouldn't be telling MyPy to run directly on those.

FYI, how it works now

We partition MyPy based on interpreter constraints, which allows for you to run over a codebase that might have some parts only Py2 and another part only Py3:

@dataclass(frozen=True)
class MyPyPartition:
field_set_addresses: FrozenOrderedSet[Address]
closure: FrozenOrderedSet[Target]
interpreter_constraints: PexInterpreterConstraints
python_version_already_configured: bool

All the interesting work happens on partitions, in this rule:

@rule
async def mypy_typecheck_partition(partition: MyPyPartition, mypy: MyPy) -> TypecheckResult:

We get all the source files for the partition here, which will continue to be the same so that we can include the files in our chroot:

typechecked_sources_request = Get(
PythonSourceFiles, PythonSourceFilesRequest(partition.closure)
)

What will change is this:

typechecked_srcs_snapshot = typechecked_sources.source_files.snapshot
file_list_path = "__files.txt"
python_files = "\n".join(
determine_python_files(typechecked_sources.source_files.snapshot.files)
)

We won't run on all of typechecked_sources anymore, only what was specified. (FYI, see https://www.pantsbuild.org/docs/rules-api-file-system#core-abstractions-digest-and-snapshot for what the "snaphsot" part means.)

"roots" vs. "closure"

Instead, we need to figure out which files were specified by the user. How? Looking back at the below definition, the field_set_addresses represent the roots, meaning what the user explicitly said to run on. Meanwhile, closure represents the transitive closure: all roots + transitive deps.

@dataclass(frozen=True)
class MyPyPartition:
field_set_addresses: FrozenOrderedSet[Address]
closure: FrozenOrderedSet[Target]
interpreter_constraints: PexInterpreterConstraints
python_version_already_configured: bool

We want to operate on the roots, i.e. the field_set_addresses. We need to get the Python source files that belong specifically to those roots, whereas now we're getting it for the whole closure. We'll use the same approach as before, sayingGet(PythonSourceFiles, PythonSourceFilesRequest(..)), which will run a rule that gets us the relevant Python source files; only, we'll run it over just the roots, rather than the whole closure.

Part 1: refactor

First, there's a refactor to facilitate this change. Rather than storing on MyPyPartition the field field_set_addresses: FrozenOrderedSet[Address], change it to root_targets: FrozenOrderedSet[Target]. We're going to need Target objects in a moment, and this is also a clearer name.

Then, fix our rule that creates the partitions. Update to combined_roots.update(transitive_targets.roots):

for transitive_targets in all_transitive_targets:
combined_roots.update(tgt.address for tgt in transitive_targets.roots)
combined_closure.update(transitive_targets.closure)

You'll need to fix this too, use tgt.address for tgt in partitions.root_targets to convert from Target -> Address:

(addr for addr in partition.field_set_addresses),

Check that everything works by running ./pants typecheck src/python/pants/util/strutil.py.

Part 2: resolve the roots' files

See this:

plugin_sources_request = Get(
PythonSourceFiles, PythonSourceFilesRequest(plugin_transitive_targets.closure)
)
typechecked_sources_request = Get(
PythonSourceFiles, PythonSourceFilesRequest(partition.closure)
)

First, that name typechecked_sources_request is confusing now. Really, it should probably be something like closure_sources_request. We're not going to typecheck every one of those files anymore.

Then, add something like roots_sources_request = Get(PythonSourceFiles, PythonSourceFilesRequest(partition.root_targets)). Feel free to use a better var name.

Add to this await MultiGet, which is what will cause the engine to actually execute the relevant code:

(
plugin_sources,
typechecked_sources,
mypy_pex,
requirements_pex,
config_digest,
) = await MultiGet(
plugin_sources_request,
typechecked_sources_request,
mypy_pex_request,
requirements_pex_request,
config_digest_request,
)

Finally, update this to point to your new PythonSourceFiles object with just the roots, rather than the original one with the closure.

typechecked_srcs_snapshot = typechecked_sources.source_files.snapshot

Update this:

description=f"Run MyPy on {pluralize(len(typechecked_srcs_snapshot.files), 'file')}.",

That should be it! Check that things are working how you want. It can be helpful to add logging statements to debug. https://www.pantsbuild.org/docs/rules-api-tips#tip-add-logging

--

This function will still stay the same to normalize file names, but the docstring should be tweaked as we won't run over every file anymore:

def determine_python_files(files: Iterable[str]) -> Tuple[str, ...]:
"""We run over all .py and .pyi files, but .pyi files take precedence.
MyPy will error if we say to run over the same module with both its .py and .pyi files, so we
must be careful to only use the .pyi stub.
"""
result: OrderedSet[str] = OrderedSet()
for f in files:
if f.endswith(".pyi"):
py_file = f[:-1] # That is, strip the `.pyi` suffix to be `.py`.
result.discard(py_file)
result.add(f)
elif f.endswith(".py"):
pyi_file = f + "i"
if pyi_file not in result:
result.add(f)
return tuple(result)

Testing

You'll also want to check that mypy/rules_integration_test.py works correctly still. You may need to tweak some tests.

It'd be helpful to add a simple test that ensures this behavior works correctly. We're going to be changing this MyPy code in the future and we don't want to regress. Look at how the other tests are doing it to see how to set up, and also refer to https://www.pantsbuild.org/docs/rules-api-testing.

christophermaier added a commit to grapl-security/grapl that referenced this issue Mar 3, 2021
This adds MyPy to Pants for typechecking Pulumi code. We're not
introducing MyPy more broadly at this time until
pantsbuild/pants#11553 has been
addressed.

Signed-off-by: Christopher Maier <chris@graplsecurity.com>
@Eric-Arellano
Copy link
Contributor Author

Thanks to @hephex for fixing this!

@wimax-grapl
Copy link
Contributor

Thanks for fixing this <3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants