Skip to content

Conversation

@seanh
Copy link
Contributor

@seanh seanh commented Jul 13, 2022

This re-adds the non-boilerplate files that #5 (which this PR is based on) removed in order to facilitate code review. Also adds tests.

Once again a review of the meat is required (README.md, src/* and tests/*) and the cookiecutter stuff (.cookiecutter/*) can be reviewed when we review the cookiecutter itself.

How to test manually

The pip-sync-faster concept is pretty simple: it calls pip-sync and stashes hashes of the requirements files in a pip_sync_faster.json file within the venv. If you run it again with the same requirements files and you haven't changed any of the requirements files then it won't call pip-sync again. If any of the requirements files has changed or if you call it with a different set of requirements files (even a subset of the ones you called it with last time) then it will call pip-sync again.

The CLI of pip-sync-faster is the same as that of pip-sync and it passes all command line options and arguments blindly to pip-sync. It also echoes pip-sync's exit status.

To test it:

  • Create and activate a venv

  • Create some test requirements files

  • Install pip-sync-faster in the venv:

    pip install -e /path/to/pip-sync-faster
    
  • It should be pretty obvious to see that it's running pip-sync:

    pip-sync-faster requirements.txt
    
  • There is one testing gotcha: pip-sync-faster must itself be in the requirements file, otherwise it'll uninstall itself (it'll call pip-sync requirements.txt which'll uninstall anything that's not in requirements.txt, including pip-sync-faster).

    In testing this is a double-gotcha because even if requirements.txt contains pip-sync-faster it will replace the local development version with a copy from PyPI.

    So in testing a local development version you have to reinstall it every time:

    pip install -e /path/to/pip-sync-faster
    
  • If you run pip-sync-faster again with the same requirements file it won't call pip-sync:

    pip-sync-faster requirements.txt
    

Other things to test:

  • If you modify the requirements file and then re-call pip-sync-faster it will call pip-sync

  • If you call it with a different requirements file it will call pip-sync. If you then call it with the first requirements file again it'll call pip-sync again

  • Calling it with multiple requirements files. If you call it again without changing any of the requirements files it won't call pip-sync. If you change any of them it will

  • If you call it with a different set of requirements files (even a subset of what you previously called it with) it will call pip-sync

@seanh seanh changed the title Revert "Remove all non-boilerplate files" Re-add non-boilerplate code and add tests Jul 14, 2022
@seanh seanh force-pushed the re-add-non-boilerplate-files-and-add-tests branch from 87995cd to b429198 Compare July 14, 2022 19:56
@seanh seanh requested a review from jon-betts July 14, 2022 20:01
@seanh seanh marked this pull request as ready for review July 14, 2022 20:01
@seanh seanh marked this pull request as draft July 15, 2022 09:58
@seanh seanh force-pushed the re-add-non-boilerplate-files-and-add-tests branch from b429198 to 89f5331 Compare July 15, 2022 14:44
@seanh seanh requested a review from marcospri July 15, 2022 14:51
@seanh seanh marked this pull request as ready for review July 15, 2022 14:52
@seanh seanh force-pushed the re-add-non-boilerplate-files-and-add-tests branch from d2bab8e to 368d67b Compare July 18, 2022 11:12
Copy link

@jon-betts jon-betts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this benefit from following a more standard script pattern?

I think there's an opportunity here to make it clearer that this is a script, which would help understanding how it works and also avoid all of the testing woes noted in the ticket.

We have a pretty common pattern in a lot of our scripts (along with chmod +x for extra points) which I think we can follow here:

#!/usr/bin/env python

from argparse import ArgumentParser

parser = ArgumentParser(description="A tool for ...")
...

def main():
    args = parser.parse_args()
    ...

if __name__ == '__main__':
    main()

That won't work verbatim here as there are more complex arg parsing requirements, but the general structure of the script could be followed. This has a few nice features:

  • It's obvious it's intended to be used as a script (not a library)
  • The docs for how to use it are integrated and front and center
  • You can use it directly on the command line, or test it

This would avoid having to install it into the env in order to test it.

Other stuff

The rest is mostly just random naming suggestions.

The only thing I'd say is definitely worth thinking about is whether pre-baked hashes in the tests is a better way to go. At the moment they use the code which is indirectly under test is used to setup the fixture.

Another approach might be to have tests which explicitly drive it from the outside, and call the method twice. Changing things inbetween or not. This way you'd only be testing the externally observable behavior without having to bake in any hashes if that made you unconfortable.

parallel = true
source = ["pip_sync_faster", "tests/unit"]
omit = [
"*/pip_sync_faster/__main__.py",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This'll go into the cookiecutter, __main__.py will always be omitted from coverage

@seanh
Copy link
Contributor Author

seanh commented Jul 23, 2022

This is ready for re-review:

  • I've added a __main__.py so that the package can be executed directly with python3 -m pip_sync_faster <args>. Added manual testing instructions to HACKING.md that use this.
  • Re-named pip_sync_maybe() to just sync()
  • Added help strings to the CLI (see tox -qe dev --run-command 'pip-sync-faster --help')
  • The tests now use pre-baked hashes
  • Accepted various naming suggestions (handle etc)
  • Accepted various other minor suggestions

File layout

I spent some time thinking about the best file layout for the cookiecutter to use for packages like this one that have a command line interface. This is what I came up with:

pip-sync-faster/
  setup.cfg     <- Declares a console_script that calls cli.py::cli()
  pip_sync_faster/
    __init__.py <- Empty (except maybe some imports)
    __main__.py <- Just imports and calls cli.py::cli()
    cli.py      <- Contains the argparse code and imports and calls something from core.py
    core.py     <- The actual code
  tests/
    unit/
      pip_sync_faster/
        cli_test.py
        core_test.py

If you install the package and run pip-sync-faster it goes setup.cfg -> cli.py::cli() -> core.py, and __main__.py isn't involved. If you run the package directly with python3 -m pip_sync_faster it's __main__.py -> cli.py::cli() -> core.py and setup.cfg isn't involved.

For packages that don't have a CLI the cookiecutter will use the same layout but with the CLI-related files missing:

pip-sync-faster/
  setup.cfg
  pip_sync_faster/
    __init__.py
    core.py
  tests/
    unit/
      pip_sync_faster/
        core_test.py

Here's why I went for this layout:

  • Empty __init__.py file. I know that some people don't like code in __init__.py and people new to Python often find it surprising. I don't want to have to have a file named __init___test.py. I don't want to have to open __init__.py often because it plays badly with autocomplete (since there are lots of files with that name). Packages might want to put some imports in __init__.py in order to hoist them into the package's top-level namespace, the cookiecutter won't do any of these for you but won't delete any that you put there.

  • Minimal __main__.py with no tests. This is what the Python docs says is idiomatic usage of __main__.py: when Python developers see a __main__.py file their eyes can pass right over it because they can expect it won't contain any code except to import and call an entry point function. This is how the runnable stdlib packages do it. Also it's nice to avoid a __main___test.py.

  • Separate cli.py file that contains the argparse code and imports and calls things from core.py. Reasons for separating cli.py out from core.py:

    • Two smaller modules instead of one bigger one. Not really necessary for a small package like pip-sync-faster but scales well as the code size grows
    • A file named cli.py makes it obvious where the CLI code is going to be found
    • You can cookiecutter a package without a CLI then add one later. Change console_script to "yes" in cookiecutter.json and run make template. All the cookiecutter has to do is add the console_script to the templated setup.cfg file and add the __main__.py, cli.py and cli_test.py files. It doesn't need to modify core.py and core_test.py which it can't really touch because the user will have changed these files. If the CLI code were in core.py you couldn't really use make template do add a CLI later like this.
  • I chose the name core.py because main.py (what it was previously called) would clash with __main__.py. I also didn't want to call it pip_sync_faster.py because this name would have to vary between projects and because it makes for awkward imports especially if the package wants to have a function named pip_sync_faster() as the existence of a module with this same name blocks hoisting the function into __init__.py. You can end up with stuff like from pip_sync_faster.pip_sync_faster import pip_sync_faster.

    core.py is expected to be changed by the user including perhaps splitting it up into multiple files or just keeping it as a single file but renaming it. Here in pip-sync-faster I've renamed it to sync.py as that seems a more appropriate name specific to this project. In tox plugins it might get renamed to plugin.py, etc.

I like that we can change around the internals of the package without breaking its interface. For example we could change the contents of cli.py or rename that file etc and as long as we update setup.cfg and __main__.py the pip-sync-faster and python3 -m pip_sync_faster commands won't change.

@seanh seanh force-pushed the re-add-non-boilerplate-files-and-add-tests branch from a9bf7c3 to dddaf1b Compare July 23, 2022 12:55
@seanh seanh requested a review from jon-betts July 23, 2022 13:14
seanh added 4 commits July 23, 2022 14:51
It's enough for a console script entry point to just return an int and
setuptools makes sure that the command exits with that int as its
status.

This does mean that we have to add a sys.exit() to __main__.py so that
it mirrors the same behaviour.
A more appropriate/specific name for this particular project.
[options.entry_points]
console_scripts =
pip-sync-faster = pip_sync_faster.main:entry_point
pip-sync-faster = pip_sync_faster.cli:cli
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this in the cookiecutter: main.py::entry_point() renamed to cli.py::cli()

Comment on lines +1 to +5
import sys

from pip_sync_faster.cli import cli

sys.exit(cli())
Copy link
Contributor Author

@seanh seanh Jul 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be a good idea to move this __main__.py into the cookiecutter where it can be implemented correctly once and for all. For one thing we can implement __main__.py in the idiomatic way (as here), and we can omit it from test coverage. Also it's actually not as trivial to implement as you might expect:

cli() is used as both the setuptools entry point function (specified in setup.cfg) and the top-level function for __main__.py to call. A setuptools entry point function is expected to return something appropriate for being passed to sys.exit(): None or 0 to exit successfully, an int to exit with that error code, or any printable object (such as a string) that will be printed to stderr before exiting with 1. You can see this by looking at the pip-sync-faster script that setuptools generates when you install the package:

$ cat .tox/dev/bin/pip-sync-faster
#!/home/seanh/Projects/pip-sync-faster/.tox/dev/bin/python
# -*- coding: utf-8 -*-
import re
import sys
from pip_sync_faster.cli import cli
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(cli())

A __main__.py or an if __name__ == "__main__" block should do the same thing: sys.exit(cli()).

This means that cli() itself shouldn't call sys.exit(), it should just return None or 0 to exit successfully or return 1 or return "Some error message" to exit with an error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about sys.exit with a function, nice.

from pip_sync_faster.sync import sync


def cli(_argv=None): # pylint:disable=inconsistent-return-statements
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyLint will force you to put return 0 or return None in every branch of the function (including at the very bottom) which just seems annoying for an entry point function.

"src_files", nargs="*", help="the requirements.txt files to synchronize"
)

args = parser.parse_known_args(_argv)
Copy link
Contributor Author

@seanh seanh Jul 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_argv (which is used by the tests) defaults to None and parse_known_args(None) causes argparse to parse the args from sys.argv

# Replace the cached hashes file with one containing the correct hashes for
# the requirements files that pip-sync-faster was called with this time.
with open(cached_hashes_path, "w", encoding="utf-8") as handle:
json.dump(hashes, handle)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you see it split out into its own file like this the actual core logic of pip-sync-faster is very simple (and only about 40 lines of code without comments and docstrings)

Copy link
Contributor

@marcospri marcospri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works as expected. It's easy to understand and comes with good docs 👍

Comment on lines +1 to +5
import sys

from pip_sync_faster.cli import cli

sys.exit(cli())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about sys.exit with a function, nice.



def sync(src_files):
cached_hashes_path = Path(environ["VIRTUAL_ENV"]) / "pip_sync_faster.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reckon environ["VIRTUAL_ENV"] is the right place to store it. It would be just noise in the project's root.


def get_hash(path):
"""Return the hash of the given file."""
hashobj = hashlib.sha512()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably squeeze a few microseconds switching to sha256, if git considers that future-proof should be enough for this.

It doesn't probably matter.

@seanh seanh merged commit 3eb4740 into remove-non-boilerplate-files Jul 30, 2022
@seanh seanh deleted the re-add-non-boilerplate-files-and-add-tests branch July 30, 2022 15:50
seanh added a commit that referenced this pull request Jul 30, 2022
This commit applies suggestions from the code review in #6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants