Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deduplicate requirements #28

Merged
merged 8 commits into from
Sep 18, 2019
Merged

Deduplicate requirements #28

merged 8 commits into from
Sep 18, 2019

Conversation

jpmckinney
Copy link
Member

Builds on #27

@jpmckinney
Copy link
Member Author

Hmm, not using setup.py properly has really created a dependency hell in trying to get kingfisher-views to simply build on Travis. Self-merging my PRs as they don't really affect the code, but they do use dependency management properly…

@jpmckinney jpmckinney merged commit 108c29f into master Sep 18, 2019
@jpmckinney jpmckinney deleted the deduplicate-requirements branch September 18, 2019 20:53
@robredpath
Copy link
Contributor

@jpmckinney we were surprised to see a change in dependency management happen without any discussion, especially as we're currently working on CoVE for OCDS!

No worries, though; is this new approach documented somewhere so that we can pick it up and work with the new way of managing dependencies - what your expectations are around changes, ways of working and so on? There doesn't seem to be a corresponding commit to standard-development-handbook, but it may be somewhere else that I'm unaware of. For reference, the previous documentation that we worked to is at https://github.com/OpenDataServices/developer-docs/blob/45d68f708328538b9a69808304f0e699e1d90de5/code-python.md ; we'll update that to link to the new way of working when it's documented.

@jpmckinney
Copy link
Member Author

jpmckinney commented Sep 19, 2019

Hi @robredpath, I would generally have waited to discuss, but my changes ended up affecting many repositories, and it wasn't obvious to me that we'd find a way to quickly discuss and merge the changes before other commits were made (requiring resolving conflicts, etc.).

In short, the prior dependency management approach (using pip requirements files primarily, and only sometimes using setup.py files) didn't take advantage of Python's standard packaging tools (where packages determine a hierarchy of dependencies by looking at setup.py files).

The prior dependency management (putting most dependencies in requirements.txt) meant that each repo needed to declare its own dependencies directly, instead of inheriting dependencies indirectly from its direct dependencies, and then calculating a commonly compatible version number. For example, under the old approach, if lib-cove used packagexyz==1.0, then lib-cove-ocds would have to declare packagexyz==1.0, and if lib-cove changed the version number, then the maintainer of lib-cove-ocds would have to know to update their version number, etc.

As part of this, I created pypi packages for libcoveocds, libcoveoc4ids, and also libcove (though it's not an OCP repo). For all those I added kindly and odscjames as owners on pypi. Feel free to remove me as an owner of libcove if you like, but please always push releases to pypi so that we can use Python's regular dependency management instead of a home-grown approach. It is only one more CLI command to update pypi: python setup.py sdist upload (you can also use twine, but I still use this command).

In terms of documentation, you'll find lots of documentation on the web about the standard approach (which is just updating setup.py), e.g. https://packaging.python.org/overview/

Generally speaking, for libraries, you should not pin dependencies in the way described at https://github.com/OpenDataServices/developer-docs/blob/master/code-python.md#pinned-dependencies There are many blogs explaining why: a first result I found was http://blog.chrisgorgolewski.org/2017/12/to-pin-or-not-to-pin-dependencies.html

Note that the update_requirements.sh script doesn't work on macOS, and doesn't work if the user uses pyenv or similar to manage virtualenvs. Ideally, even for non-libraries, we'd follow a simpler approach.

@odscjames
Copy link
Contributor

As part of this, I created pypi packages for libcoveocds, libcoveoc4ids, and also libcove (though it's not an OCP repo). For all those I added kindly and odscjames as owners on pypi. Feel free to remove me as an owner of libcove if you like, but please always push releases to pypi so that we can use Python's regular dependency management

We are happy to do this; I have made a Trello card to update all relevant documentation and will try and do that soon.

Generally speaking, for libraries, you should not pin dependencies

I fully agree, but 2 things:

This approach creates problems too; now our Travis builds aren’t consistent with each other and with how the library may be used in other repos. A Travis build may pass on master and then a few months later when we create a new branch to do some work, later versions of the libraries have been released and so Travis fails for no reason that is related to the current branch. This is not theoretical; we have seen this happen with new versions of Flake8 before. It creates a frustrating and problematic work environment.

I don’t know what a good answer is here; Travis has options where you can run a matrix of builds with different variables, and I wonder if something could be done here to make the tests run against a range of library versions, but a consistent range. This also better approaches real world usage; a library may be used in apps with different versions of it’s requirements.

I have thought about this problem before; but the lack of a good solution has meant I haven’t done anything. I have it on a list of topics to discuss with the dev team here to improve our working practices and as usual, we document our working practices publically and are happy to have this as a more open discussion.

Requirments has also been removed from https://github.com/open-contracting/kingfisher-views - this is not a library, it’s an app. In this context I’d argue that version locking is the correct thing to do.

Note that the update_requirements.sh script doesn't work on macOS

I can check that out; I have access to MacOS.

In terms of a way forwards, can I suggest

A)

We will make a task to investigate the inconsistent build problem in libraries and prioritise it?

B)

We put back the locking mechanism in Apps and make tasks to

  • Fix the script on Macs
  • To investigate and maybe switch to a different locking mechanism than requirements.txt. Tho that is a common pattern there are many options these days, like pipenv.

@jpmckinney
Copy link
Member Author

jpmckinney commented Sep 20, 2019

For update_requirements.sh on macOS, I think it's just that sed -i needs to be sed -i "". I'm not sure if that then creates issues on Linux.

The issue remains that the first three lines assume that the developer created a virtualenv in .ve with the virtualenv command from the virtualenv package. (I instead use pyenv.) I don't think repos should have an opinion on how developers manage their virtualenvs.

For (B), since we have a setup.py file in Views anyway (I assume there was a reason for it, even though it is an app?), can we instead pin to specific versions in install_requires? Having both setup.py and requirements files is confusing. If we want to use requirements files in Views, then can we remove setup.py?

For (A), this is a common problem, but it's basically the price paid for giving users flexibility in the versions of other dependencies. The usual solution is to do something like Django<1.12. If you know that your library works with Django 1.11, and you also know that Django typically introduces breaking changes in X.Y versions, then you can set your required version as strictly less than the possible breaker.

For things like flake8, I prefer not to set a maximum version, because I am likely to never check for a new, improved version of something like flake8. I'm happy to blindly accept new versions at the cost of there being a new version that breaks things (rarely), in which case I need to interrupt my planned work in order to make a few changes.

For other things, I also prefer not to set a maximum version, because it's generally good to know whether a library still works with the latest versions of dependencies. For example, with pytest, I prefer to have to deal with small breaks every now and then, instead of having to dedicate a few days to multiple major version upgrades if I were to have locked the version many years ago (this happened with Rspec). Similarly, I know many projects that are stuck on unsupported and insecure versions of dependencies, because there was never any pressure to upgrade in the short term, and after many years it became too great an effort to upgrade. Having small breaks every now and then helps to avoid that. And if it's too much work to upgrade a library to work with new versions of dependencies, it's always an option to set a maximum version (easy!).

In short, I see the 'problem' of having to contend with changing dependencies as a good thing.

In any case, ultimately, these libraries are used in apps, and deployments of CoVE, etc. should definitely continue to lock versions and to have predictable builds.

@odscjames
Copy link
Contributor

@odscjames
Copy link
Contributor

Feel free to remove me as an owner of libcove if you like,

Will do this now, just for good data practices.

Can you add us as maintainers to https://pypi.org/project/libcoveoc4ids/ ?

Thanks

@odscjames
Copy link
Contributor

@jpmckinney as you'll probably get emails, just to confirm it was me who created the "OpenDataServices" generic account and added that to projects

@odscjames
Copy link
Contributor

I also added other staff members you will recognise. All the emails. Sorry.

@jpmckinney
Copy link
Member Author

jpmckinney commented Oct 2, 2019 via email

@odscjames
Copy link
Contributor

Shall we have only OpenDataServices and myself as “Owner” and then others
as “Maintainer”? Seems like a safer protocol than everyone as owner.

That seems sensible, yes

@jpmckinney
Copy link
Member Author

Shall we have only OpenDataServices and myself as “Owner” and then others
as “Maintainer”? Seems like a safer protocol than everyone as owner.

That seems sensible, yes

Done – I documented the user policy here: https://ocds-standard-development-handbook.readthedocs.io/en/latest/systems/services.html#pypi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants