Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python3 Migration Strategy #11468

Closed
citibeth opened this issue May 15, 2019 · 6 comments · Fixed by #10319
Closed

Python3 Migration Strategy #11468

citibeth opened this issue May 15, 2019 · 6 comments · Fixed by #10319

Comments

@citibeth
Copy link
Member

@liamrpowell @adamjstewart @tgamblin @alalazo @scheibelp
Comments / discussion encouraged. We need to get this figured out, and we're running out of time.

Problem Summary

In just 7.5 months (2020-01-01), Python2 will be officially retired; and Spack currently lacks a coherent plan of how we will deal with this. At that point, a number of well-known Python libraries will require Python3. If Spack cannot easily install Python3 stacks out of the box by then, I believe we stand to lose users to systems that CAN do that (eg Anaconda); see #11396. I believe that January 1 2020 is the date that we should officially switch Spack to default to Python3.

The purpose of this thread is to come up with a roadmap on how to do so. The core problem is that, with the current concretizer, it is not possible to have packages support Python2 and Python3 in a totally transparent manner. For example, in py-mypackage.py we need to (but cannot use) statements such as:

depends_on('py-backported_stuff', when='^python@2')

General Solutions

This problem has prevented Spack from supporting Python3 in a "just works" manner, so far. There are two known solutions to it:

  1. Successfully Concretize Python3-Based DAGs  #7926 uses a +python3 variant as a manual way for users to specify that Python3 is being used. This variant is applied to app packages (where applicable) using the all: functionality in packages.yaml. The above dependency can then be rewritten as:
depends_on('py-backported_stuff', when='~python3')

A user working in Python3 land can pretty much set-and-forget the all: +python3 directive in packages.yaml.

  1. One might consider Successfully Concretize Python3-Based DAGs  #7926 to be a dirty "hack." A new concretizer that fixes these problem promises to be a more elegant long-term solution, allowing Python packages to specify dependencies directly.

So far, a long-term preference for (2) has prevented us from merging (1). However, I believe the time to wait for an improved concretizer is over: We need to be ready for an all-Python3 world by the end of this year. And currently, we are NOT EVEN CLOSE. Therefore, I believe we should formulate a plan along the lines of (1), using technology we already have. That will give us enough time to fix all the Python packages that need fixing, and to test them with Python3.

Python3 Transition Plan

This is a transition plan that I think makes sense.

  1. Re-work Successfully Concretize Python3-Based DAGs  #7926 to use a python2 variant instead of python3. This makes sense because starting 2020, Python3 needs to be the default, in accordance with the rest of the Python community.

  2. Review all Python packages for possible Python2-vs-3 issues; and code them in appropriately. Use the examples in Successfully Concretize Python3-Based DAGs  #7926 as a guide. Also see Default to Python 3.7 #10319.

  3. Test that spack install py-xyz works for all Python packages. Test "out of the box" (Python3), and also with the all: +python2 variant enabled. We expect that some packages might not work for Python2 or Python3; and they should be updated appropriately. Sometimes a newer upstream version might be needed; in other cases, a package might really be Python2-only and be headed for the dustbin. Do this all on a branch that's periodically kept up-to-date with Spack's develop.

  4. Fall 2019: Provide ABUNDANT notification to the Spack community that starting January 1 2020, Spack will go Python3 by default. Provide instructions on how to continue building Python2 stacks after that date.

  5. Tag the last version of Spack BEFORE Python3 is made default; and notify users of that tag, if they wish to keep using it for a while (without re-doing their config to continue using Python2).

  6. January 1 2020: Merge the branch into Spack's develop.

  7. When the new concretizer comes out, remove the +python2 hack.

Alternate Plan

The above plan maintains backwards compatibility with Python2 systems. Alternately, we could decide that this is too much effort. In that case, the branch we prepare would only be tested for Python3, and the +python2 variant would not be needed. Python2-specific dependencies could simply be eliminated from Packages, making them simpler.

The alternate plan would be less work. But it would also mean that Spack users still using Python2 would no longer be able to upgrade to the latest Spack. My guess is, the additional effort to maintain backwards compatibility is probably worth it.

@citibeth citibeth changed the title Python3: Migration Strategy Python3 Migration Strategy May 15, 2019
@adamjstewart
Copy link
Member

I'm personally fine with the alternate plan of simply making sure that (almost) every package works correctly for Python 3 and not officially maintaining Python 2 support, but letting users submit bug reports and PRs to make that happen until the concretizer is fixed. I'm also fine with adding a +python2 variant if that seems like a better option to others. I agree with the timeline of switching to Python 3 by default by January 1st, 2020.

If anyone has time, it would be great to go through the list of Python 2-only packages in #10319 and seeing how many now support Python 3 in newer releases, or how many are projects that are no longer maintained. We could even add patches for Python 3 support for smaller libraries.

@svenevs may also be interested in this discussion. I really shouldn't work on this until I get some publications out and appease my advisors, but if push comes to shove and we need someone to push this through, I would be interested in making this happen.

@scheibelp
Copy link
Member

I think people will be using Python 2 past the "expiration date", so ideally we can make it easy to use both.

We need to (but cannot use) statements such as:
depends_on('py-backported_stuff', when='^python@2')

Note that you can use statements like these, but there are cases where you can end up picking up the backport library unexpectedly unless you explicitly specify the use of Python 3, e.g. like py-dependent ^python@3:.

At least one case where it can get confused is if any spec constrains Python to be a list of version ranges which includes 2.x versions, for example: 2.7:2.8,3.4:; in those cases (e.g. for py-ipython) it will add the backports library because python is constrained to be either 2.7:2.8 or 3.4:. I would characterize this as a bug and need to look around for whether it's covered by an existing issue.

As a matter of preparation though, adding constraints like when='^python@:2.9.99' should work in many cases and I think this issue can be resolved by handling bugs like the above rather than introducing a general python2/python3 variant.

@citibeth
Copy link
Member Author

citibeth commented May 22, 2019

I think people will be using Python 2 past the "expiration date", so ideally we can make it easy to use both.

Do you think that's possible with #11396? If not, then we should drop Python2 support; users who need it can use the last Spack tag/release/version from 2019. Realistically, I think the possibility that people NEED to use a 2020-era Spack with Python2 is going to be low. Because increasingly, new versions of packages are Python3-only; and that will turn into a tidal wave of Python3-only packages next year. If people can't use any of the latest versions of packages (because they're doing Python2), then why do they need the latest Spack that includes those versions?

We need to (but cannot use) statements such as:
depends_on('py-backported_stuff', when='^python@2')

Note that you can use statements like these, but there are cases where you can end up picking up the backport library unexpectedly unless you explicitly specify the use of Python 3, e.g. like py-dependent ^python@3:.

Can you be more specific here? At some time in the past, I concluded that this construct didn't work; but when I looked in the code this week, I saw a lot of uses of it. Do you still need to explicitly specify Python3 in the specs if you set python@3 in your packages.yaml?

At least one case where it can get confused is if any spec constrains Python to be a list of version ranges which includes 2.x versions, for example: 2.7:2.8,3.4:; in those cases (e.g. for py-ipython) it will add the backports library because python is constrained to be either 2.7:2.8 or 3.4:. I would characterize this as a bug and need to look around for whether it's covered by an existing issue.

I see the problem; but don't understand the cause and effect relationship implied by the word "because" above. Does the technique from #7926 work around this problem?

As a matter of preparation though, adding constraints like when='^python@:2.9.99' should work in many cases and I think this issue can be resolved by handling bugs like the above rather than introducing a general python2/python3 variant.

I'd love to see it "just work." But these bugs in the concretizer are longstanding, and we have only 6 months to fix this now. If we are going to meet that timetable, we need to have the basic technology that we need in place just about today. If we wait 4 months for a concretizer bug fix (which could easily be optimistic), then we miss the deadline. Todd this (AFAIK) the only person who can address these bugs, and he's REALLY busy; and any concretizer changes could introduce other bugs / instabilities elsewhere in the system, just because the concretizer is so central to Spack.

In other words... unless we can see a clear and immediate path to getting a concretizer that does what we need, I think we should go with some form of Plan B: either python2/python3 hack, or drop support for Python. We can always leave in the constraints like above (in code or in comment); and restore them and get rid of the hack once we get a better concretizer.

I'm looking for opinions of which of those realistic options we should adopt. Do we use a variant hack to support Python3 by default and Python2 with a bit of effort? Or do we drop Python2 support and streamline our Python3 conversion? (Basically, drop all old versions of Python, and all versions of packages that use Python2). The longer we wait, the more likely we will need to just drop Python2 support, since that is clearly the least labor-intensive approach.

@scheibelp
Copy link
Member

Can you be more specific here? At some time in the past, I concluded that this construct didn't work; but when I looked in the code this week, I saw a lot of uses of it.

If you explicitly specify ^python@3: it will work because the depends_on won't be processed until the concretizer adds some python dependency to the DAG, and if you specify ^python@3: it is guaranteed that any recorded python dependency will be immediately merged with this constraint, so any backports library dependency registered as

depends_on(backports-lib, when='python@2')

will never have this constraint satisfied. I don't know when packages started adding when constraints like ^python@2 but my points are based on looking at the concretizer algorithm as it behaves currently.

Ideally you would not have to specify that explicitly though. #11468 (comment) highlights a case where it won't work if you don't specify it explicitly. I think resolving that would resolve this issue.

I'd love to see it "just work." But these bugs in the concretizer are longstanding, and we have only 6 months to fix this now.
...
If we wait 4 months for a concretizer bug fix (which could easily be optimistic), then we miss the deadline. Todd this (AFAIK) the only person who can address these bugs

I have fixed several bugs in the concretizer; admittedly there are other more-ambitious changes to the concretizer which I haven't pulled off, but I don't currently think this issue falls into that category. To be clear I also don't think this needs to be attached to the timeline of SAT-based concretization.

At least one case where it can get confused is if any spec constrains Python to be a list of version ranges which includes 2.x versions, for example: 2.7:2.8,3.4:; in those cases (e.g. for py-ipython) it will add the backports library because python is constrained to be either 2.7:2.8 or 3.4:. I would characterize this as a bug and need to look around for whether it's covered by an existing issue.

I see the problem; but don't understand the cause and effect relationship implied by the word "because" above. Does the technique from #7926 work around this problem?

Some dependents require 2.7:2.8 or 3:. If the depends_on(..., when=python@2) is checked when the python spec is constrained like this, it will succeed (erroneously). This relates to why specifying ^python@3: explicitly resolves the issue.

Regarding #7926, I think that's the +python3 variant you also propose in the issue description (#11468 (comment)), and yes that should also work. I'm inclined to avoid a variant though.

I think people will be using Python 2 past the "expiration date", so ideally we can make it easy to use both.

Do you think that's possible with #11396?

I'm not sure how #11396 relates (as in, I'm not sure if we have gotten to the true problem there yet). Do you mean #7926 by chance?

@citibeth
Copy link
Member Author

I have fixed several bugs in the concretizer

@scheibelp this is new (to me) information, and really good news. It looks like we should be able to come up with a (hopefully short) list of concretizer bugs to be fixed that would support universal Python2/Python3 operation; and then come up with a timeline for them. IMHO this depends entirely on you, because you're the only one who claims any ability at fixing concretizer bugs. If you can get them fixed soon enough (for some as-yet-ill-defined notion of "soon"), then we should be able to go forward with Plan A, which I think everybody agrees is the best in the long run.

From the above, it looks like the (known) bugs so far are:

Dependencies for Python3 and some Python2

Some dependents require 2.7:2.8 or 3:. If the depends_on(..., when=python@2) is checked when the python spec is constrained like this, it will succeed (erroneously). This relates to why specifying ^python@3: explicitly resolves the issue.

Could this be worked around by converting lines like:

depends_on('py-xyz', when='^python@2.7:2.8,3:')

into two lines..?

depends_on('py-xyz', when='^python@2.7:2.8')
depends_on('py-xyz', when='^python@3:')

Or... I'm still not sure I understand. Can you give an explicit example? Something like "package A looks like this, package B looks like that, the user provides a spec like so... and then this bad thing happens."


Do you think that's possible with #11396?
I'm not sure how #11396 relates (as in, I'm not sure if we have gotten to the true problem there yet).

I meant, do you think that the problem in #11396 can be solved in a way that allows the package to work equally for Python2 and Python3?

@citibeth
Copy link
Member Author

See #11531 for how to configure the default YAML files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants