-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade numpy to add support for M1 processors #990
Conversation
I need help with the |
Hi @MattiSG ! Thanks for this PR.
The purpose of those type annotations in general is to enforce the contracts of OpenFisca internals, and in particular with Numpy to provide an idea of the type of data we're expecting —an array of As Numpy provided no type annotations prior 1.20.x, those annotations were not useful programatically: MyPy just defaults them to With 1.20.x, taking a look at the documentation, Numpy now provides the generic The current type annotations being invalid, for this update we would have to either remove them or use the new valid one — (Ideally if we remove them we could improve the documentation of the expected parameters of those functions and methods, so to keep their informative value). There's just one caveat regarding the latter, from Numpy's documentation:
Given those type annotations are not actually programmatically linted, I'd say the former would be better as we would have to eventually deprecate Python 3.7 support without justifiable reason. Besides, because of the amount of deprecations listed in the release note I think we should take this update carefully. In fact, last Numpy update took several pull requests spread among different OpenFisca packages in order to deal with the deprecations and make them compatible:
I think we should definitely do not lag with this update, and a quick assessment of the impact of the deprecations would be much welcome. Have you tried it already for example with OpenFisca France? |
Thank you very much @maukoquiroga for this information! 😊
Yes, this changeset is the minimal one that brings M1 compatibility, which I backported from a local branch I experimented with over the last two weeks, and matches the setup I currently use and that enabled me to deliver openfisca/openfisca-france#1489 among others. |
Nope, can't get it to compile before 1.20.0-rc2, even from source and with options such as no-PEP-517. You can find detailed information on numpy/numpy#17807, and in a more readable format on https://stackoverflow.com/a/65581354. |
😍 just push on this branch! |
README.md
Outdated
@@ -26,7 +26,7 @@ If you want to contribute to OpenFisca-Core itself, welcome! To install it local | |||
```bash | |||
git clone https://github.com/openfisca/openfisca-core.git | |||
cd openfisca-core | |||
pip install --editable .[dev] --use-deprecated=legacy-resolver | |||
pip install --editable .[dev] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I invite you to propose this change in a separate pull request, in order to have a specific place to discuss about the new dependency resolver, and the best strategy to use it —or not.
Spoiler alert: it can be way too slow, which has proven to be painful in time-sensitive contexts as CI and test deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just opened this pull request to fix this very same issue on the doc openfisca/openfisca-doc#235
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This very branch takes 2 seconds to install dependencies in CI. Based on the issue you linked to (pypa/pip#9187), I understand that long builds could happen from a specific set of requirements, not randomly. If I understood correctly, is there anything that makes us suspect our set of requirements could generate such long builds? 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, pip tries to download all the intervals, and when it comes to NumPy and the like that adds a lot of overhead.
The problem is with the first run, then libraries should normally be saved to cache in CircleCI, per branch —one possible workaround being a stricter version interval (like ==
) but aren't we then defeating the purpose of the tool? It looks like next version of pip is going to adresses this backtracking problem.
I know @sarafalamaki and @sandcha run into problems as well with the new resolver —I don't know if the same problem or others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case my point is not to not use the new resolver but to do so in a new pull request so to document and to address the problems users may have encountered that led to the actual using of the legacy resolver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW found this previous discussion, very interesting for reference openfisca/extension-template#10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This very branch takes 2 seconds to install dependencies in CI.
That build is using the legacy resolver.
I've also found the strategies proposed by the Pip folks https://pip.pypa.io/en/latest/user_guide/#dependency-resolution-backtracking, which leds me to think that, before PyPi provides package's metadata as an API, we will have to find a workaround for this problem.
Good news is there are some dependencies we know as a fact can be pinned, like numpy
, numexpr
, scipy
, and so on, which could be a solution.
I run the following test —I had to cancel at some point:
First attempt
As it is now.
general_requirements = [
'dpath >= 1.5.0, < 2.0.0',
'pytest >= 4.4.1, < 6.0.0', # For openfisca test
'numpy >= 1.11, < 1.19',
'psutil >= 5.4.7, < 6.0.0',
'PyYAML >= 3.10',
'sortedcontainers == 2.2.2',
'numexpr >= 2.7.0, <= 3.0',
]
api_requirements = [
'werkzeug >= 1.0.0, < 2.0.0',
'flask == 1.1.2',
'flask-cors == 3.0.10',
'gunicorn >= 20.0.0, < 21.0.0',
]
dev_requirements = [
'autopep8 >= 1.4.0, < 1.6.0',
'flake8 >= 3.9.0, < 4.0.0',
'flake8-bugbear >= 19.3.0, < 20.0.0',
'flake8-print >= 3.1.0, < 4.0.0',
'pytest-cov >= 2.6.1, < 3.0.0',
'mypy >= 0.701, < 0.800',
'openfisca-country-template >= 3.10.0, < 4.0.0',
'openfisca-extension-template >= 1.2.0rc0, < 2.0.0'
] + api_requirements
pip freeze | grep -v "^-e" | xargs pip uninstall -y
pip cache purge
pip install --upgrade pip
time pip install --editable .[dev] --upgrade
...
ERROR: Operation cancelled by user
real 17m37.289s
user 2m36.219s
sys 0m17.993s
Second attempt
Tightening dependency versions.
general_requirements = [
'dpath ~= 1.5.0',
'pytest ~= 5.4.0', # For openfisca test
'numpy == 1.18.5',
'psutil ~= 5.8.0',
'PyYAML ~= 5.4.0',
'sortedcontainers == 2.2.2',
'numexpr ~= 2.7.0',
]
api_requirements = [
'werkzeug ~= 1.0.0',
'flask == 1.1.2',
'flask-cors == 3.0.10',
'gunicorn ~= 20.1.0',
]
dev_requirements = [
'autopep8 ~= 1.5.0',
'flake8 ~= 3.9.0',
'flake8-bugbear ~= 19.8.0',
'flake8-print ~= 3.1.0',
'pytest-cov ~= 2.11.0',
'mypy == 0.790',
'openfisca-country-template >= 3.10.0, < 4.0.0',
'openfisca-extension-template >= 1.2.0rc0, < 2.0.0',
] + api_requirements
pip freeze | grep -v "^-e" | xargs pip uninstall -y
pip cache purge
pip install --upgrade pip
time pip install --editable .[dev] --upgrade
...
ERROR: Operation cancelled by user
real 7m22.893s
user 6m25.452s
sys 0m2.349s
Third attempt
As it is but pre-calculating some dependencies with pip-tools
.
It works! I mean, it fails, but the resolver works: we have an incompatible circular dependency.
Even if we reduce the time it take pip to resolve to a couple of minutes, we have core depending on country-template, and country-template depending on core, so it'll inevitably fail.
pip freeze | grep -v "^-e" | xargs pip uninstall -y
pip cache purge
pip install --upgrade pip
pip install pip-tools
time ( echo "--editable .[dev]" | pip-compile --output-file=- > /tmp/requirements.txt - -qo- && pip install -r /tmp/requirements.txt --upgrade )
...
Requirement already satisfied: OpenFisca-Core[web-api]<36.0.0,>=35.0.0 in /Users/hyperion/Sites/openfisca/openfisca-core (from openfisca-country-template==3.12.5->-r /tmp/requirements.txt (line 65)) (35.3.0)
Collecting OpenFisca-Core[web-api]<36.0.0,>=35.0.0
Downloading OpenFisca_Core-35.2.0-py3-none-any.whl (162 kB)
|████████████████████████████████| 162 kB 1.3 MB/s
Downloading OpenFisca_Core-35.1.1-py3-none-any.whl (162 kB)
|████████████████████████████████| 162 kB 1.0 MB/s
Downloading OpenFisca_Core-35.1.0-py3-none-any.whl (162 kB)
|████████████████████████████████| 162 kB 265 kB/s
Downloading OpenFisca_Core-35.0.5-py3-none-any.whl (162 kB)
|████████████████████████████████| 162 kB 2.1 MB/s
Downloading OpenFisca_Core-35.0.4-py3-none-any.whl (162 kB)
|████████████████████████████████| 162 kB 1.2 MB/s
Downloading OpenFisca_Core-35.0.3-py3-none-any.whl (162 kB)
|████████████████████████████████| 162 kB 2.6 MB/s
Downloading OpenFisca_Core-35.0.1-py3-none-any.whl (162 kB)
|████████████████████████████████| 162 kB 1.9 MB/s
Downloading OpenFisca_Core-35.0.0-py3-none-any.whl (162 kB)
|████████████████████████████████| 162 kB 2.0 MB/s
...
ERROR: Cannot install openfisca-core 35.3.0 (from /Users/hyperion/Sites/openfisca/openfisca-core), openfisca-core[web-api]==35.0.0, openfisca-core[web-api]==35.0.1, openfisca-core[web-api]==35.0.3, openfisca-core[web-api]==35.0.4, openfisca-core[web-api]==35.0.5, openfisca-core[web-api]==35.1.0, openfisca-core[web-api]==35.1.1, openfisca-core[web-api]==35.2.0 and openfisca-core[web-api]==35.3.0 because these package versions have conflicting dependencies.
The conflict is caused by:
The user requested openfisca-core 35.3.0 (from /Users/hyperion/Sites/openfisca/openfisca-core)
openfisca-core[web-api] 35.3.0 depends on openfisca-core 35.3.0 (Installed)
...
real 0m49.232s
user 0m23.754s
sys 0m1.487s
Considering the results of those tests, I'm afraid this change breaks things for users, not just because of the time overhead but as we've just discovered also because of the circular dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood correctly, is there anything that makes us suspect our set of requirements could generate such long builds? 🙂
You're right, pip tries to download all the intervals, and when it comes to NumPy and the like that adds a lot of overhead.
Well... when I pinned NumPy's version I discovered the resolver backtracks following dependencies dependencies, which presents two problems:
- Backtracking is exponential.
- We don't have a have any control over the way our dependencies manage their own dependencies.
The following are the solutions proposed by the Pip folks:
- ALLOW PIP TO COMPLETE ITS BACKTRACKING
Already tried, it doesn't work unless we solve our two major problems (wild dependencies' dependencies and circular dependencies).
- REDUCE THE NUMBER OF VERSIONS PIP WILL TRY TO BACKTRACK THROUGH
Also already tried, cf. supra.
- USE CONSTRAINT FILES OR LOCKFILES
This could be easily done with pip-tools
, pipgrip
, Pipenv
, Poetry
, etc.
We'd still have to solve the circular dependency problem, though, unless we ignore them —given pip
allows us to do so, or using pipgrip
as an alternative.
- BE MORE STRICT ON PACKAGE DEPENDENCIES DURING DEVELOPMENT
Same as 1. and 2.
😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this thorough and documented exploration of the problem space @maukoquiroga! 🙏😃
Done! New type hints look very promising on Numpy 1.20 and 1.21 to help refactor OpenFisca Core —as a first step as unit tests are really hard to write today. @cesco-fran in phase with your draft #976. @MattiSG regarding the bump itself, as this version comprises some expired deprecations, I would mark it as a major one. |
Indeed, these elements are removed from the public NumPy API: |
Hello! I was thinking more on the libraries depending on core: as some use NumPy directly and extensively (I'm particularly thinking on OpenFisca France, OpenFisca France Data, LexImpact Server...) an upgrade of NumPy in core will force them to upgrade NumPy as well. However, I haven't checked whether those libraries rely in their codebase on the expired deprecations you mention. I know that we do have no contract with the depending libraries in terms of the public NumPy API, which could indeed make this just be a minor release. On the other hand, I feel inclined at the same time to warn depending libraries' maintainers of possible but unchecked impacts on their codebase, hence a breaking-change for them (but related to NumPy not to core). What do you think? |
Mmh I'm not familiar enough with Python packaging rules to understand how much each dependent sharing a dependency is impacted by conflicting versions 🤔 Spontaneously, I would hope that dependents can have their own version of Numpy without relying on ours! Otherwise, I understand that we should have a breaking change for every major library upgrade, including stuff like |
I tried to rebase this branch on top of |
Of course, most of them are commits of my own. |
f3d77c1
to
be6ebf8
Compare
Done! it think this is no longer a draft 😃 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM at least for me! If you agree we keep aside the resolver commit the time we find a solution for that (just tried again one last time to no avail...).
Needed for M1 processors
Resolver legacy option removal removed. I'm going with a minor version bump as I expect that the API surface forwarded from NumPy to country packages by OF-Core has not changed with this version change. |
I made a mistake 😞
|
Thanks @MattiSG, I made a mistake as well, as you said, I should have waited for your answer and for the version bump before approving it, or have had a way to chime in and to check your last commits. I think automatic review dismissal is a good idea. |
This PR has more negative impact then anticipated. #1012 |
@MattiSG I think that we have a specific situation for |
To give some historical context, there have been three NumPy bumps recently:
And my 2c regarding the situation:
@MattiSG said justly:
They do rely on ours, but on all the range of them, not just the latest. Upgrading NumPy does not force anyone to use that version, unless we force them to do it —which we could— like in #1012. However, the fact that we're not currently testing that they can actually continue to use the version they're already using, makes it impossible to ourselves to know whether we're introducing a breaking-change or not. In that specific case, I think we should not defer the responsibility of knowing so to the users, knowing how to avoid it —we're literally saying "we mark it as a breaking change because, as we can know if it is, but we haven't tested so, you will have to do it by yourself". Said otherwise, impact of an upgrade is completely foreseeable, because users can always opt-out. Of course, we can always assume that:
In any of those cases it would indeed make sense to mark any NumPy bump as a major.
|
The experiment of auto-dismissing stale reviews turns out to be too much of a hassle in our process where we almost systematically rebase before merging and where reviews are very asynchronous (see #1020 (comment)). |
Technical changes
numpy
can be compiled natively on darwin-arm64 from v1.20.1 on. OpenFisca thus needs to allow this version.Some additional work is needed for installing on Apple Silicon M1 processors, which is documented here. I considered updating the README but concluded that the install procedure should be normalised in the coming months, and that it was not worth documenting transitory instructions in the README, and that this PR discussion would be sufficient.
I'm open to requests to include instructions in the README 🙂