Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade numpy to add support for M1 processors #990

Merged
merged 6 commits into from
Apr 16, 2021
Merged

Upgrade numpy to add support for M1 processors #990

merged 6 commits into from
Apr 16, 2021

Conversation

MattiSG
Copy link
Member

@MattiSG MattiSG commented Mar 23, 2021

Technical changes

  • Upgrade numpy dependency.

numpy can be compiled natively on darwin-arm64 from v1.20.1 on. OpenFisca thus needs to allow this version.
Some additional work is needed for installing on Apple Silicon M1 processors, which is documented here. I considered updating the README but concluded that the install procedure should be normalised in the coming months, and that it was not worth documenting transitory instructions in the README, and that this PR discussion would be sufficient.

I'm open to requests to include instructions in the README 🙂

@MattiSG
Copy link
Member Author

MattiSG commented Mar 23, 2021

I need help with the error: "ndarray" expects no type arguments, but 1 given errors given in CI. I tried to look up Numpy types, but the expected usage in OpenFisca is not clear to me.

@MattiSG MattiSG marked this pull request as draft March 23, 2021 21:29
@bonjourmauko bonjourmauko added the kind:build Pull requests that update a dependency file label Mar 25, 2021
@bonjourmauko
Copy link
Member

Hi @MattiSG ! Thanks for this PR.

I need help with the error: "ndarray" expects no type arguments, but 1 given errors given in CI. I tried to look up Numpy types, but the expected usage in OpenFisca is not clear to me.

The purpose of those type annotations in general is to enforce the contracts of OpenFisca internals, and in particular with Numpy to provide an idea of the type of data we're expecting —an array of int, float, and so on.

As Numpy provided no type annotations prior 1.20.x, those annotations were not useful programatically: MyPy just defaults them to Any.

With 1.20.x, taking a look at the documentation, Numpy now provides the generic ArrayLike via the numpy.typing module.

The current type annotations being invalid, for this update we would have to either remove them or use the new valid one —ArrayLike.

(Ideally if we remove them we could improve the documentation of the expected parameters of those functions and methods, so to keep their informative value).

There's just one caveat regarding the latter, from Numpy's documentation:

Some of the types in this module rely on features only present in the standard library in Python 3.8 and greater. If you want to use these types in earlier versions of Python, you should install the typing-extensions package.

Given those type annotations are not actually programmatically linted, I'd say the former would be better as we would have to eventually deprecate Python 3.7 support without justifiable reason.

Besides, because of the amount of deprecations listed in the release note I think we should take this update carefully.

In fact, last Numpy update took several pull requests spread among different OpenFisca packages in order to deal with the deprecations and make them compatible:

I think we should definitely do not lag with this update, and a quick assessment of the impact of the deprecations would be much welcome.

Have you tried it already for example with OpenFisca France?

@MattiSG
Copy link
Member Author

MattiSG commented Mar 26, 2021

Thank you very much @maukoquiroga for this information! 😊

Have you tried it already for example with OpenFisca France?

Yes, this changeset is the minimal one that brings M1 compatibility, which I backported from a local branch I experimented with over the last two weeks, and matches the setup I currently use and that enabled me to deliver openfisca/openfisca-france#1489 among others.

@bonjourmauko
Copy link
Member

Thanks @MattiSG, looking at the StackOverflow link you shared, I still have one question: NumPy prior to 1.20 does not work at all on M1 processors?

BTW, I got the tests passing here.

@MattiSG
Copy link
Member Author

MattiSG commented Mar 27, 2021

NumPy prior to 1.20 does not work at all on M1 processors?

Nope, can't get it to compile before 1.20.0-rc2, even from source and with options such as no-PEP-517. You can find detailed information on numpy/numpy#17807, and in a more readable format on https://stackoverflow.com/a/65581354.

@MattiSG
Copy link
Member Author

MattiSG commented Mar 27, 2021

BTW, I got the tests passing here.

😍 just push on this branch!

README.md Outdated
@@ -26,7 +26,7 @@ If you want to contribute to OpenFisca-Core itself, welcome! To install it local
```bash
git clone https://github.com/openfisca/openfisca-core.git
cd openfisca-core
pip install --editable .[dev] --use-deprecated=legacy-resolver
pip install --editable .[dev]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I invite you to propose this change in a separate pull request, in order to have a specific place to discuss about the new dependency resolver, and the best strategy to use it —or not.

Spoiler alert: it can be way too slow, which has proven to be painful in time-sensitive contexts as CI and test deployments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just opened this pull request to fix this very same issue on the doc openfisca/openfisca-doc#235

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This very branch takes 2 seconds to install dependencies in CI. Based on the issue you linked to (pypa/pip#9187), I understand that long builds could happen from a specific set of requirements, not randomly. If I understood correctly, is there anything that makes us suspect our set of requirements could generate such long builds? 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, pip tries to download all the intervals, and when it comes to NumPy and the like that adds a lot of overhead.

The problem is with the first run, then libraries should normally be saved to cache in CircleCI, per branch —one possible workaround being a stricter version interval (like ==) but aren't we then defeating the purpose of the tool? It looks like next version of pip is going to adresses this backtracking problem.

I know @sarafalamaki and @sandcha run into problems as well with the new resolver —I don't know if the same problem or others.

Copy link
Member

@bonjourmauko bonjourmauko Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case my point is not to not use the new resolver but to do so in a new pull request so to document and to address the problems users may have encountered that led to the actual using of the legacy resolver.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW found this previous discussion, very interesting for reference openfisca/extension-template#10.

Copy link
Member

@bonjourmauko bonjourmauko Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This very branch takes 2 seconds to install dependencies in CI.

That build is using the legacy resolver.


I've also found the strategies proposed by the Pip folks https://pip.pypa.io/en/latest/user_guide/#dependency-resolution-backtracking, which leds me to think that, before PyPi provides package's metadata as an API, we will have to find a workaround for this problem.

Good news is there are some dependencies we know as a fact can be pinned, like numpy, numexpr, scipy, and so on, which could be a solution.

I run the following test —I had to cancel at some point:

First attempt

As it is now.

general_requirements = [
    'dpath >= 1.5.0, < 2.0.0',
    'pytest >= 4.4.1, < 6.0.0',  # For openfisca test
    'numpy >= 1.11, < 1.19',
    'psutil >= 5.4.7, < 6.0.0',
    'PyYAML >= 3.10',
    'sortedcontainers == 2.2.2',
    'numexpr >= 2.7.0, <= 3.0',
    ]

api_requirements = [
    'werkzeug >= 1.0.0, < 2.0.0',
    'flask == 1.1.2',
    'flask-cors == 3.0.10',
    'gunicorn >= 20.0.0, < 21.0.0',
    ]

dev_requirements = [
    'autopep8 >= 1.4.0, < 1.6.0',
    'flake8 >= 3.9.0, < 4.0.0',
    'flake8-bugbear >= 19.3.0, < 20.0.0',
    'flake8-print >= 3.1.0, < 4.0.0',
    'pytest-cov >= 2.6.1, < 3.0.0',
    'mypy >= 0.701, < 0.800',
    'openfisca-country-template >= 3.10.0, < 4.0.0',
    'openfisca-extension-template >= 1.2.0rc0, < 2.0.0'
    ] + api_requirements
pip freeze | grep -v "^-e" | xargs pip uninstall -y
pip cache purge
pip install --upgrade pip
time pip install --editable .[dev] --upgrade

...

ERROR: Operation cancelled by user

real	17m37.289s
user	2m36.219s
sys	0m17.993s
Second attempt

Tightening dependency versions.

general_requirements = [
    'dpath ~= 1.5.0',
    'pytest ~= 5.4.0',  # For openfisca test
    'numpy == 1.18.5',
    'psutil ~= 5.8.0',
    'PyYAML ~= 5.4.0',
    'sortedcontainers == 2.2.2',
    'numexpr ~= 2.7.0',
    ]

api_requirements = [
    'werkzeug ~= 1.0.0',
    'flask == 1.1.2',
    'flask-cors == 3.0.10',
    'gunicorn ~= 20.1.0',
    ]

dev_requirements = [
    'autopep8 ~= 1.5.0',
    'flake8 ~= 3.9.0',
    'flake8-bugbear ~= 19.8.0',
    'flake8-print ~= 3.1.0',
    'pytest-cov ~= 2.11.0',
    'mypy == 0.790',
    'openfisca-country-template >= 3.10.0, < 4.0.0',
    'openfisca-extension-template >= 1.2.0rc0, < 2.0.0',
    ] + api_requirements
pip freeze | grep -v "^-e" | xargs pip uninstall -y
pip cache purge
pip install --upgrade pip
time pip install --editable .[dev] --upgrade

...

ERROR: Operation cancelled by user

real	7m22.893s
user	6m25.452s
sys	0m2.349s

Third attempt

As it is but pre-calculating some dependencies with pip-tools.

It works! I mean, it fails, but the resolver works: we have an incompatible circular dependency.

Even if we reduce the time it take pip to resolve to a couple of minutes, we have core depending on country-template, and country-template depending on core, so it'll inevitably fail.

pip freeze | grep -v "^-e" | xargs pip uninstall -y
pip cache purge
pip install --upgrade pip
pip install pip-tools
time ( echo "--editable .[dev]" | pip-compile --output-file=- > /tmp/requirements.txt - -qo- && pip install -r /tmp/requirements.txt --upgrade )

...

Requirement already satisfied: OpenFisca-Core[web-api]<36.0.0,>=35.0.0 in /Users/hyperion/Sites/openfisca/openfisca-core (from openfisca-country-template==3.12.5->-r /tmp/requirements.txt (line 65)) (35.3.0)
Collecting OpenFisca-Core[web-api]<36.0.0,>=35.0.0
  Downloading OpenFisca_Core-35.2.0-py3-none-any.whl (162 kB)
     |████████████████████████████████| 162 kB 1.3 MB/s 
  Downloading OpenFisca_Core-35.1.1-py3-none-any.whl (162 kB)
     |████████████████████████████████| 162 kB 1.0 MB/s 
  Downloading OpenFisca_Core-35.1.0-py3-none-any.whl (162 kB)
     |████████████████████████████████| 162 kB 265 kB/s 
  Downloading OpenFisca_Core-35.0.5-py3-none-any.whl (162 kB)
     |████████████████████████████████| 162 kB 2.1 MB/s 
  Downloading OpenFisca_Core-35.0.4-py3-none-any.whl (162 kB)
     |████████████████████████████████| 162 kB 1.2 MB/s 
  Downloading OpenFisca_Core-35.0.3-py3-none-any.whl (162 kB)
     |████████████████████████████████| 162 kB 2.6 MB/s 
  Downloading OpenFisca_Core-35.0.1-py3-none-any.whl (162 kB)
     |████████████████████████████████| 162 kB 1.9 MB/s 
  Downloading OpenFisca_Core-35.0.0-py3-none-any.whl (162 kB)
     |████████████████████████████████| 162 kB 2.0 MB/s 

...

ERROR: Cannot install openfisca-core 35.3.0 (from /Users/hyperion/Sites/openfisca/openfisca-core), openfisca-core[web-api]==35.0.0, openfisca-core[web-api]==35.0.1, openfisca-core[web-api]==35.0.3, openfisca-core[web-api]==35.0.4, openfisca-core[web-api]==35.0.5, openfisca-core[web-api]==35.1.0, openfisca-core[web-api]==35.1.1, openfisca-core[web-api]==35.2.0 and openfisca-core[web-api]==35.3.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested openfisca-core 35.3.0 (from /Users/hyperion/Sites/openfisca/openfisca-core)
    openfisca-core[web-api] 35.3.0 depends on openfisca-core 35.3.0 (Installed)

...

real	0m49.232s
user	0m23.754s
sys	0m1.487s

Considering the results of those tests, I'm afraid this change breaks things for users, not just because of the time overhead but as we've just discovered also because of the circular dependencies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood correctly, is there anything that makes us suspect our set of requirements could generate such long builds? 🙂

You're right, pip tries to download all the intervals, and when it comes to NumPy and the like that adds a lot of overhead.

Well... when I pinned NumPy's version I discovered the resolver backtracks following dependencies dependencies, which presents two problems:

  1. Backtracking is exponential.
  2. We don't have a have any control over the way our dependencies manage their own dependencies.

The following are the solutions proposed by the Pip folks:

  1. ALLOW PIP TO COMPLETE ITS BACKTRACKING

Already tried, it doesn't work unless we solve our two major problems (wild dependencies' dependencies and circular dependencies).

  1. REDUCE THE NUMBER OF VERSIONS PIP WILL TRY TO BACKTRACK THROUGH

Also already tried, cf. supra.

  1. USE CONSTRAINT FILES OR LOCKFILES

This could be easily done with pip-tools, pipgrip, Pipenv, Poetry, etc.

We'd still have to solve the circular dependency problem, though, unless we ignore them —given pip allows us to do so, or using pipgrip as an alternative.

  1. BE MORE STRICT ON PACKAGE DEPENDENCIES DURING DEVELOPMENT

Same as 1. and 2.

😃

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this thorough and documented exploration of the problem space @maukoquiroga! 🙏😃

@bonjourmauko
Copy link
Member

Done! New type hints look very promising on Numpy 1.20 and 1.21 to help refactor OpenFisca Core —as a first step as unit tests are really hard to write today.

@cesco-fran in phase with your draft #976.

@MattiSG regarding the bump itself, as this version comprises some expired deprecations, I would mark it as a major one.

@MattiSG
Copy link
Member Author

MattiSG commented Apr 7, 2021

as this version comprises some expired deprecations, I would mark it as a major one.

Indeed, these elements are removed from the public NumPy API: Complex64, Complex32, sctypeNA, typeNA and ctypeslib.ctypes_load_library. However, they are not used in the codebase. Did Core expose any of them? 🤔

@bonjourmauko
Copy link
Member

Hello!

I was thinking more on the libraries depending on core: as some use NumPy directly and extensively (I'm particularly thinking on OpenFisca France, OpenFisca France Data, LexImpact Server...) an upgrade of NumPy in core will force them to upgrade NumPy as well.

However, I haven't checked whether those libraries rely in their codebase on the expired deprecations you mention.

I know that we do have no contract with the depending libraries in terms of the public NumPy API, which could indeed make this just be a minor release.

On the other hand, I feel inclined at the same time to warn depending libraries' maintainers of possible but unchecked impacts on their codebase, hence a breaking-change for them (but related to NumPy not to core).

What do you think?

@MattiSG
Copy link
Member Author

MattiSG commented Apr 7, 2021

Mmh I'm not familiar enough with Python packaging rules to understand how much each dependent sharing a dependency is impacted by conflicting versions 🤔 Spontaneously, I would hope that dependents can have their own version of Numpy without relying on ours! Otherwise, I understand that we should have a breaking change for every major library upgrade, including stuff like flake8.

@MattiSG
Copy link
Member Author

MattiSG commented Apr 7, 2021

I tried to rebase this branch on top of master but after #992 every modified file has disappeared 😭
@maukoquiroga do you feel like re-applying your changes? I can try to do it but it'd be quite literally monkey-patching as in “I have no idea what I'm doing” 😅

@bonjourmauko
Copy link
Member

I tried to rebase this branch on top of master but after #992 every modified file has disappeared 😭
@maukoquiroga do you feel like re-applying your changes? I can try to do it but it'd be quite literally monkey-patching as in “I have no idea what I'm doing” 😅

Of course, most of them are commits of my own.

@bonjourmauko
Copy link
Member

Done! it think this is no longer a draft 😃

@bonjourmauko bonjourmauko requested a review from a team April 9, 2021 19:00
@MattiSG MattiSG marked this pull request as ready for review April 12, 2021 13:12
Copy link
Member

@bonjourmauko bonjourmauko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM at least for me! If you agree we keep aside the resolver commit the time we find a solution for that (just tried again one last time to no avail...).

@bonjourmauko bonjourmauko requested a review from a team April 13, 2021 20:02
@MattiSG
Copy link
Member Author

MattiSG commented Apr 16, 2021

Resolver legacy option removal removed.

I'm going with a minor version bump as I expect that the API surface forwarded from NumPy to country packages by OF-Core has not changed with this version change.

@MattiSG MattiSG merged commit 6c05349 into master Apr 16, 2021
@MattiSG MattiSG deleted the m1 branch April 16, 2021 16:43
@MattiSG
Copy link
Member Author

MattiSG commented Apr 16, 2021

I made a mistake 😞
The version bump that I made in the code was not a minor version bump but a patch version bump.

  • Expected impact: minimal. Dependents will have to use a specific patch version as minimum if they expect to support M1 processors, instead of a minimum minor version number, which would be more semantic.
  • Worst case impact: minor. If the API surface is impacted by breaking changes in NumPy, updating the library could break some use cases. However, the intention was already to publish a minor version and not a major one, so the main foreseen impact would be that dependents would have to specify an upper version number lock that would prevent patches from being applied in their older, locked-down version. However, backporting patches is very rare in OpenFisca, which has a very linear deployment process.
  • Avoiding similar mistakes in the future: (1) re-request a review after version bump; or (2) do not approve PRs that don't include a version bump commit. Either seem hard to achieve in country packages considering the amount of small-scale PRs and associated administrative costs. However, it could be made mandatory in Core, for example by enabling automatic review dismissal after new commits are pushed.

@bonjourmauko
Copy link
Member

Thanks @MattiSG, I made a mistake as well, as you said, I should have waited for your answer and for the version bump before approving it, or have had a way to chime in and to check your last commits. I think automatic review dismissal is a good idea.

@bonjourmauko
Copy link
Member

#1004

@guillett
Copy link
Member

This PR has more negative impact then anticipated. #1012

@sandcha
Copy link
Collaborator

sandcha commented Apr 28, 2021

Mmh I'm not familiar enough with Python packaging rules to understand how much each dependent sharing a dependency is impacted by conflicting versions 🤔 Spontaneously, I would hope that dependents can have their own version of Numpy without relying on ours! Otherwise, I understand that we should have a breaking change for every major library upgrade, including stuff like flake8.

@MattiSG I think that we have a specific situation for numpy updates as you can have some code written in openfisca syntax that is not in the code base of a country model as it is, for example, the case for the reforms. And we frequently have user scripts in python that create an openfisca simulation and also use numpy to manage the numpy arrays of the simulation results. In those two examples, I think that we need to inform the users that getting an openfisca with a bump on numpy might break their code. And to do so, the only way that I know to communicate is by a major bump on openfisca-core.
We can also discuss this on #1010 that currently needs a version bump 🙂

@bonjourmauko
Copy link
Member

bonjourmauko commented Apr 28, 2021

To give some historical context, there have been three NumPy bumps recently:

And my 2c regarding the situation:

  1. The actual problem we have in master is not at all related to NumPy, but MyPy. This problem could have been avoided, if I didn't accept my own code as I now avoided in Extend NumPy compatibility to v1.20 #1010, if we tested against the versions we declare as being compatible, which we do not do, or not (we cannot just foresee and avoid everything and every single bug). In any case, that fact alone does not justify by itself a major version IMHO.

  2. When we consider supporting a new NumPy version, or in general a new version of any "general" dependency (for everyone, not just a "dev" one, like flake8), we can rightly say, as it was done here, that it is a minor change, given we ensure compatibility of new changes to the core with not just the new added version but at least the older one supported (in this case 1.11.0 and 1.20.0).

@MattiSG said justly:

I would hope that dependents can have their own version of Numpy without relying on ours!

They do rely on ours, but on all the range of them, not just the latest. Upgrading NumPy does not force anyone to use that version, unless we force them to do it —which we could— like in #1012.

However, the fact that we're not currently testing that they can actually continue to use the version they're already using, makes it impossible to ourselves to know whether we're introducing a breaking-change or not.

In that specific case, I think we should not defer the responsibility of knowing so to the users, knowing how to avoid it —we're literally saying "we mark it as a breaking change because, as we can know if it is, but we haven't tested so, you will have to do it by yourself". Said otherwise, impact of an upgrade is completely foreseeable, because users can always opt-out.

Of course, we can always assume that:

  • by default all users will be willing to upgrade their dependencies when we upgrade ours (contrary to what happened here)
  • we use versioning more as a dissuasive tool than an informative one (let's just not jump on that NumPy upgrade so easily before we can test backwards compatibility)

In any of those cases it would indeed make sense to mark any NumPy bump as a major.

  1. When we consider dropping support of an older NumPy version, it's the exact opposite, and I 100% agree with @sandcha's comment supra: we do absolutely not know how that will impact users, and contrary to a version bump I can just ignore, here I'm being forced to upgrade against my will. In that case I'd systematically assume a breaking change.

@MattiSG
Copy link
Member Author

MattiSG commented Oct 7, 2021

it could be made mandatory in Core, for example by enabling automatic review dismissal after new commits are pushed

The experiment of auto-dismissing stale reviews turns out to be too much of a hassle in our process where we almost systematically rebase before merging and where reviews are very asynchronous (see #1020 (comment)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:build Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants