OEP-0018: Python Dependency Management #56

jmbowman · 2018-03-27T15:23:42Z

Recommendations for managing python package dependencies which should help minimize problems over time. The key technology choice involved is pip-tools, which we already use in many of our repositories. The current draft specifies that we should use a service to auto-generate PRs with requirement updates, but stops short of recommending which one to use (largely because we're still in the process of finalizing that decision, and the choice may change over time anyway).

pwnage101 · 2018-03-27T16:15:58Z

oeps/oep-0018-bp-python-dependencies.rst

+* An attempt to pin dependencies in setup.py (or parse its dependencies
+  automatically from a requirements file) forced us to change that package
+  before we could upgrade one of those dependencies in a downstream
+  repository.


Regarding this bullet, perhaps you can say "downstream repository, and in another case an upstream repository" given that openedx/edx-drf-extensions#26 was done for edx-platform.

nedbat · 2018-04-02T18:47:55Z

oeps/oep-0018-bp-python-dependencies.rst

+contexts.  For example:
+
+* Just the standard set of core dependencies for execution on a production
+  server to perform its primary purpose (``base.in``)


You have not explained what "base.in" is, or why you are mentioning it here. Maybe a paragraph at the top giving the overview ("We'll use pip-tools to manage multiple contexts."), then you can tie these filenames into that.

Yeah, I was hoping to avoid front-loading too much description of the overall process, but there probably does need to be some advance mention of what these names are for.

nedbat · 2018-04-02T18:53:41Z

oeps/oep-0018-bp-python-dependencies.rst

+  `cookiecutter-django-app`_ repository.
+* Each listed dependency should have a brief end-of-line comment explaining
+  its primary purpose(s) in this context.  These comments typically start at
+  the 27th character, but this is just a convention for consistency with files


27th!? We live in a 4-space-indent world, and pip-compile goes to the 27th character? :)

About the purpose: wouldn't this often be obvious or trivial? It sounds kind of like requirement comments on every line of a program in an intro programming course: they don't end up being useful.

Most of those leading 26 characters are often the package name and optional version constraint. pip-tools lines up the start of the comments for ease of reading, I figured it just makes sense to match that for consistency.

Unlike imports in a Python file where you can see later in the file what they're used for, seeing something like "singledispatch" in a requirements file tells you very little about why it's there. The main reason for adding the comments is to make it easier to identify requirements that are no longer needed, which is tough if you're not even sure what half of them do. I'm thinking things like "celery task broker" for redis and "Used in shoppingcart for extracting/parsing pdf text" for pdfminer; the reason for including as a dependency, not the overview from the package metadata. I've typically found it very tough to prune out obsolete dependencies without having this kind of comments in the file, but I'm open to arguments against it.

It seems like something that would easily get out of date. What would the comment for "singledispatch" be? "# So that I can do single dispatch"? If I add it, and put "# used in the foobar manager" in the requirements file, how will that comment get updated if it's also used elsewhere, or if it's no longer used in its original spot? Wouldn't grepping the source for imports be a better way to understand whether and how packages are being used?

The way I've done this before is that I put the main thing the package is currently used for in there. If I later notice when scanning the file that the listed use is no longer relevant, I look to see if it's obsolete or still used somewhere else. If the former, I delete it; if the latter, I update the comment with the current usage info. If I have to hunt down all uses of every single package I don't recognize to see if any of them are obsolete, I'm basically never going to bother...it just takes too long.

About the indentation: looking at the .in files in the edx-platform pull request, on a few lines, 26 isn't enough, and the alignment is bumped out. I'd choose a larger number that is a multiple of four, and let pip-tools do what it wants. In some ways, it's better to have a clear distinction between the tool-generated and human-generated comments anyway.

About the comments themselves:

lxml==3.8.0 # XML parser

I don't find this useful. I don't know how to get the useful comments without requiring people to add useless ones like this.

I struggled a bit with the edx-sandbox/shared.in file comments because it seems like a grab bag of "make this stuff available to instructors in case they want to use it". So yeah, those ended up being pretty generic, but I think that's more the exception than the rule.

I guess I am skeptical that these comments will be accurate and specific enough to be useful in the long run. But we can give it a try. At least requiring a comment will slow people down from adding too many dependencies! :)

nedbat · 2018-04-02T18:56:04Z

oeps/oep-0018-bp-python-dependencies.rst

+  ``-r base.txt``) and explain in an end-of-line comment why that set of
+  dependencies is also needed in this context.  Note that the ``pip-compile``
+  output file should be included rather than the looser requirements file it
+  was generated from, as this ensures that the same versions of packages are


I'm confused by the "include the output file rather than the requirements file." An example would help.

Yeah, this is getting kind of long and hard to follow for a bullet point. I'll see if I can clarify it a bit.

nedbat · 2018-04-02T18:56:48Z

oeps/oep-0018-bp-python-dependencies.rst

+  was generated from, as this ensures that the same versions of packages are
+  installed in the different contexts.  We don't want to run tests with
+  different versions of dependencies than we use in production, for example.
+* Avoid direct links to packages local directories, GitHub, or other version


s/packages/packages'/ ?

I meant to write "...packages in local directories, ..."; fixing.

nedbat · 2018-04-02T18:58:17Z

oeps/oep-0018-bp-python-dependencies.rst

+``requirements/base.in`` with a simple Python function declared in
+``setup.py`` itself, but for packages with few base dependencies it may be
+better to just list them in both places.  Just add comment reminders in both
+places to also update the other dependency listing when making any changes.


As big a fan as I am of not overengineering, it seems like better advice to always use the requirements files if they exist. "Few base dependencies" doesn't seem like a good reason to duplicate the information, especially since adding the reminder comments will lose the terseness benefits of the few dependencies.

Yeah, just not thrilled about copying another function into all setup.py files. Better than the alternatives, I suppose.

nedbat · 2018-04-02T18:59:02Z

oeps/oep-0018-bp-python-dependencies.rst

+  installing a package from a URL is appropriate, see the notes below on
+  `Installing Dependencies from URLs`_
+
+If the repository contains a Python package, base dependencies also need to


What else could a repository contain? IDAs don't have setup.py I guess? Maybe better is: "if the repository contains a setup.py"

Sure, how about: "If the repository contains a setup.py file defining a Python package, the
base dependencies also need to be specified there."

nedbat · 2018-04-02T20:28:52Z

oeps/oep-0018-bp-python-dependencies.rst

+  the packages listed in the given requirements files, installing, upgrading,
+  and uninstalling packages as needed.
+
+Open edX packages use an ``upgrade`` make target to use ``pip-compile`` to


Is this, "do use," or "should use"?

To be technical, "many already do use, and the rest probably should use".

nedbat · 2018-04-02T20:42:22Z

oeps/oep-0018-bp-python-dependencies.rst

+  `AllMyChanges.com`_ can make this easier.
+
+.. _requires.io: https://requires.io/
+.. _AllMyChanges.com: https://allmychanges.com/


Not sure why, but AllMyChanges doesn't know about the last five releases of coverage.py: https://allmychanges.com/p/python/coverage/

It looks like a number of packages haven't been getting updated for a year or so, the issue was reported 6 days ago: https://github.com/AllMyChanges/allmychanges.com/issues/65 . Do you think it's still worth mentioning here, or pull until that gets resolved? It's missing the last several Django releases also. requires.io is current on these, but missing changelog links for several packages that are included on AllMyChanges.com.

Maybe just a sentence like, "At the time of writing, these services were still in development. Carefully evaluate which is currently reliable."

interesting. does requires.io automatically pull in github's release history? From what I can tell it looks like it needs a CHANGES.txt file or similar.

These services parse CHANGES.txt (or some other file). GitHub histories are not good sources of changes, they are too noisy.

releases are not noisy at all in general - https://github.com/jgm/pandoc/releases, https://github.com/tqdm/tqdm/releases, etc...

Oh, sorry, i misread your comment as "commit history." I don't know if they use GitHub releases. Frankly, I hope they don't. A text file in the repo is a low-tech way to make the information available to everyone, regardless of how the source is stored or transmitted. Unless I've misunderstood something, GitHub releases are a proprietary feature that separates information from the source tree.

I agree changes.txt should take precedence, but where none is available it would be nice to fallback to things like github release notes. It's convenient to use github's release system to link to issues, upload binaries etc. anyway this is probably a discussion not meant for here. I was only asking if anyone knows of good crawlers of https://api.github.com/repos/{REPO}/releases, not if this is a good idea :P

The requires.io changelog parsing, at least as of when it was first introduced, is summarized in this blog post. At least at the time, it only looked in PyPI and for appropriately named files in the repo, not at the GitHub releases list or commit history.

There's also the changelog parser for Gemnasium at https://github.com/tech-angels/vandamme , it seems somewhat more comprehensive but I haven't looked specifically to see what it scrapes from GitHub.

Cool. I just implemented https://raw.githubusercontent.com/wiki/tqdm/tqdm/releases.py (result at https://github.com/tqdm/tqdm/wiki/Releases) until https://github.com/AllMyChanges/allmychanges.com/issues/65 gets fixed

nedbat · 2018-04-02T20:47:26Z

oeps/oep-0018-bp-python-dependencies.rst

+  ``scripts/post-pip-compile.sh``.
+
+If you do need to include a package at a URL, it should be editable (start with
+"-e ") and have both the package name and version specified (end with


The "should be editable" advice is contrary to our current instructions. You might want to include here, "(because pip-tools doesn't otherwise support URL installations)" or something.

Sure, I'll reference the bugs noted above.

nedbat · 2018-04-03T11:36:14Z

Something I still don't understand after reading this (maybe because i was too focused on paragraphs, and not enough on the big picture): who is supposed to run "make upgrade" to update the version numbers in the .txt files, and when are they supposed to do it?

nedbat · 2018-04-03T11:53:29Z

oeps/oep-0018-bp-python-dependencies.rst

+   installed.
+2. For each of these contexts, declare the direct dependencies that will be
+   needed.  Use the least restrictive constraints which should allow pip to
+   install a working set of dependencies.


I think I missed a key point here: our philosophy now is to not explicitly pin versions, and instead to let pip-tools pin them for us. This is a big shift from how we used to do things, and needs to be highlighted much more strongly.

Sure, rewording to try to make things like that more clear.

jmbowman · 2018-04-03T19:08:29Z

@nedbat The section on automating updates recommends using a service to auto-generate PRs which run make upgrade when appropriate, and having a designated maintainer for each repo to keep tabs on those and merge, fix, or reject as appropriate. This could probably be the "owner" from OEP-2's openedx.yaml by default. But more generally, anytime somebody needs to add, upgrade, or remove a dependency, they should edit an input requirements file (if necessary) and run make upgrade. It's possible to upgrade just a specific target package in a particular file, but that opens up enough ways for the files to fall out of sync with each other that I hesitate to recommend doing that; I'd rather upgrade a few other things at the same time and only add version constraints if tests fail than never upgrade anything by default.

I'll add some notes to this effect to the doc.

nedbat · 2018-04-03T19:31:08Z

IIUC, this means that I as a developer might need to add a dependency, so I edit a .in file. Then I run make update. This can update the pins of an arbitrary number of other packages, some of which might break existing code. What do I do then?

jmbowman · 2018-04-03T20:12:29Z

If you aren't in an incredible hurry, scan the changelogs of the changed packages to look for potential problems. If there's a change which seems like it will require nontrivial work to adapt to, put an upper version limit on that package in the .in file, rerun "make upgrade", and file a ticket to deal with it later.
Push the changes and wait for tests to finish. If one of the upgrades caused a problem, add a version cap, rerun "make upgrade", and ticket as above.

The automation is important here to make sure that we always get package upgrades in small batches. We should really never go more than a week or so without updating all the dependencies which we're basically current on (unless miraculously no new versions of packages we use were released in that time; more likely to happen for small repos). And I'd argue that someone who doesn't feel qualified to determine if it's probably safe to upgrade a few dependencies probably shouldn't be adding new dependencies without assistance or review either.

nedbat · 2018-04-03T21:08:07Z

That needs to go into the document. :)

nasthagiri · 2018-04-17T17:46:46Z

oeps/oep-0018-bp-python-dependencies.rst

+  dependencies).
+* An automated test suite with reasonably good code coverage, configured to
+  be run on new GitHub pull requests.
+* A service configured to periodically auto-generate a GitHub pull request


How often would such a service be expected to run? Daily, weekly, monthly?

nasthagiri · 2018-04-17T17:48:12Z

oeps/oep-0018-bp-python-dependencies.rst

+  dependencies).
+* An automated test suite with reasonably good code coverage, configured to
+  be run on new GitHub pull requests.
+* A service configured to periodically auto-generate a GitHub pull request


Am I understanding this correctly: Until the next time this service runs, the automatically generated requirements .txt file may not match the output of the .in file, right?

Anytime a developer updates one of the .in files, they should run make upgrade to update all of the .txt files. The periodic auto-generated pull requests (probably weekly or so) are just to notify us of updates to packages that we aren't deliberately holding back, so we don't fall too far behind. Those auto-generated PRs would only update the .txt files, since the input constraints won't have changed (and the suggested changes still need to be reviewed and deliberately merged).

nasthagiri · 2018-04-17T17:51:56Z

oeps/oep-0018-bp-python-dependencies.rst

+* The need for an old dependency was removed.
+* A version constraint needs to be added to prevent upgrading to a
+  backwards-incompatible release of a required package until appropriate code
+  changes can be made.


What is the best practice when a development team needs to make a backward incompatible change to an API that is implemented in 1 repo but used in another? If the team didn't pin down the version number prior to this, what are the ramifications (open edX community, failure on master branch, etc.) if they pin down the version only afterward?

Nothing ever gets automatically upgraded; changes to the generated .txt files should always be reviewed before merging to master. I think in this scenario, best practice is to leave packages unpinned by default, but clearly document backwards-incompatible changes in the changelog and release notes; when an update of the .txt files indicates that the package would be upgraded, that should be noticed so the developer can either constrain the version or make the changes needed to use the new version, as appropriate.

johnbaldwin · 2018-04-19T19:25:47Z

oeps/oep-0018-bp-python-dependencies.rst

+
+Many of the Open edX repositories have already begun to comply with the
+recommendations outlined here.  In particular, repositories generated using
+`cookiecutter-django-app`_ should be configured correctly from the outset.


very minor nit. Might be clearer to say "should already be configured correctly..."

johnbaldwin · 2018-04-19T19:54:16Z

What level of pip knowledge do you expect from the readers of this OEP? I ask because it covers a lot of ground and reads like two documents in one.

A) This is how we plan to structure our python package dependencies for our projects

B) Here are some tips and guidelines for working with the more advanced features of pip and the pipe ecosystem which we use for A, above.

I don't want to suggest unneeded work, but perhaps the OEP be narrowed to A with a bonus of linking to additional reading resources for engineers to get up to speed on the tools and techniques to better work with python package dependencies?

johnbaldwin · 2018-04-19T19:55:00Z

BTW, I've learned a bit of new stuff about pip just by reading it :)

jmbowman · 2018-04-19T20:26:01Z

@johnbaldwin The intended level of detail is to be sufficient to help a developer create a new code repository that follows the guidelines, and keep it that way as changes are made over time. A lot of the seemingly esoteric points covered here come up depressingly often in routine package maintenance, but I'm open to suggestions on specific content that could be harmlessly extracted.

I could move about half of the "Installing Dependencies from URLs" section into the Rationale section (which seems more appropriate for background info), and localize the references to pip-compile and pip-sync to the "Generate Exact Dependency Specifications" section (where that information is important in explaining how to create the make targets), and just reference the make targets elsewhere. Do you think that would help? Is there anything else that seems distinctly secondary to the main content?

johnbaldwin · 2018-04-20T17:00:21Z

@jmbowman Thanks for your reply and additional context. It sounds like you're intentionally using this OEP as both a spec and a guide, and I can see why now. Given your comments, and that you need to keep within the structure of the OEP template, I'm not sure restructuring would be worth the effort. I think the thing to do is for the community to start using this as a reference then perhaps folks will create derivative guides, focusing on aspects of package maintenance and dependency management to different audience experience levels.

cpennington · 2018-05-02T16:33:16Z

oeps/oep-0018-bp-python-dependencies.rst

+* Just the standard set of core dependencies for execution on a production
+  server to perform its primary purpose (``base.in``)
+* Additional dependencies which are only needed when optional extra features
+  of a package are desired


This is the only bullet point that doesn't have a filename attached. Is that an error?

The filename would typically be based on the name of the extra in setup.py, I'll a note to that effect.

cpennington · 2018-05-02T16:35:28Z

oeps/oep-0018-bp-python-dependencies.rst

+base dependencies also need to be specified there.  These can be derived from
+``requirements/base.in`` with a simple Python function declared in
+``setup.py`` itself.  An example can be found in the
+`setup.py file for edx-completion`_.


Would it be better to inline the function definition in this document? That would give a more canonical best-version, it seems like.

Fair enough. The first draft of the OEP didn't particularly recommend this approach, but earlier review discussions concluded that it was best not to explicitly list the core dependencies in two places, so having a canonical implementation of loading them makes sense. The existing code is about 30 lines; a little long, but probably short enough to include.

cpennington · 2018-05-02T16:37:12Z

oeps/oep-0018-bp-python-dependencies.rst

+In most other circumstances, the package should be added to PyPI instead.  If
+you do need to include a package at a URL, it should be editable (start with
+"-e ") and have both the package name and version specified (end with
+"#egg=NAME==VERSION").


This seems like a worthwhile place to have an example of a fully-baked url include line.

Sure, I'll add an example.

cpennington

The discussion on this seems to have reached a conclusion with no hanging issues, so as the arbiter, I'm happy to call this Accepted.

pwnage101 reviewed Mar 27, 2018

View reviewed changes

jmbowman force-pushed the jmbowman/oep-0018 branch from c3882a5 to 6fa6176 Compare April 2, 2018 16:27

nedbat reviewed Apr 2, 2018

View reviewed changes

jmbowman force-pushed the jmbowman/oep-0018 branch from 6fa6176 to fe58927 Compare April 2, 2018 19:58

nedbat reviewed Apr 2, 2018

View reviewed changes

jmbowman force-pushed the jmbowman/oep-0018 branch from fe58927 to 1012b6b Compare April 2, 2018 21:15

nedbat reviewed Apr 3, 2018

View reviewed changes

jmbowman force-pushed the jmbowman/oep-0018 branch from 1012b6b to 062f005 Compare April 3, 2018 22:24

jmbowman mentioned this pull request Apr 5, 2018

Move requirements of local packages to the global requirements openedx/edx-platform#17625

Closed

jmbowman force-pushed the jmbowman/oep-0018 branch 2 times, most recently from 300a58e to 7bc91ce Compare April 12, 2018 20:09

nasthagiri changed the title ~~OEP-18: Python Dependency Management~~ OEP-0018: Python Dependency Management Apr 14, 2018

nasthagiri reviewed Apr 17, 2018

View reviewed changes

johnbaldwin reviewed Apr 19, 2018

View reviewed changes

jmbowman force-pushed the jmbowman/oep-0018 branch from 7bc91ce to dd4b2cf Compare April 20, 2018 19:27

cpennington reviewed May 2, 2018

View reviewed changes

cpennington approved these changes May 2, 2018

View reviewed changes

OEP-18: Python Dependency Management

e2474ce

jmbowman force-pushed the jmbowman/oep-0018 branch from dd4b2cf to e2474ce Compare May 2, 2018 17:34

jmbowman merged commit 0b55225 into master May 2, 2018

jmbowman deleted the jmbowman/oep-0018 branch May 2, 2018 17:39

OEP-0018: Python Dependency Management #56

OEP-0018: Python Dependency Management #56

Conversation

jmbowman commented Mar 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nedbat commented Apr 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmbowman commented Apr 3, 2018

nedbat commented Apr 3, 2018

jmbowman commented Apr 3, 2018

nedbat commented Apr 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnbaldwin commented Apr 19, 2018

johnbaldwin commented Apr 19, 2018

jmbowman commented Apr 19, 2018

johnbaldwin commented Apr 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cpennington left a comment

Choose a reason for hiding this comment