Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a build script for Python 2.7.12 with UCS-4 Unicode support #322

Closed
wants to merge 2 commits into from
Closed

Add a build script for Python 2.7.12 with UCS-4 Unicode support #322

wants to merge 2 commits into from

Conversation

edmorley
Copy link
Member

@edmorley edmorley commented Aug 5, 2016

1) Clean up the latest Python 2.7 and 3.5 build scripts:

  • Switches to the canonical source tarball URL, to avoid the redirect
  • Avoids the repetition of the version number & outputs it to console
  • Whitespace cleanup

2) Add a build script for Python 2.7.13 with UCS-4 Unicode support:

The default Python build's unicode mode (until Python 3.3, when PEP 393 landed) is UCS-2. However Ubuntu's system Python 2.7 is compiled in UCS-4 mode. This new build script overrides the default for parity with system Python, and to avoid hard to debug unicode bugs.

Fixes #305.

@edmorley edmorley changed the title Python2.7.12 ucs4 Add a build script for Python 2.7.12 with UCS-4 Unicode support Aug 5, 2016
@edmorley
Copy link
Member Author

edmorley commented Aug 5, 2016

Tested building with --enable-unicode=ucs4 on a cedar-14 dyno:

~ $ mkdir /app/test
~ $ curl -Ls "https://www.python.org/ftp/python/2.7.12/Python-2.7.12.tgz" | tar xz
~ $ mv "Python-2.7.12" src
~ $ cd src/
~/src $ ./configure --prefix=/app/test --with-ensurepip=no --enable-unicode=ucs4
...
~/src $ make install
...
~/src $ /app/test/bin/python -c "import sys; print sys.maxunicode"
1114111

...which is UCS-4 mode (see here).

For comparison, this is a cedar-14 dyno's system Python vs the buildpack's current Python:

~ $ /usr/bin/python -c "import sys; print sys.maxunicode"
1114111
~ $ python -c "import sys; print sys.maxunicode"
65535

@kennethreitz
Copy link
Contributor

I have concerns about this regarding wheels — they are complex, but speaking with @dstufft gave me some great insight into the behavior and availability of wheels of both types within the community.

@edmorley
Copy link
Member Author

I have concerns about this regarding wheels — they are complex, but speaking with @dstufft gave me some great insight into the behavior and availability of wheels of both types within the community.

A few points to note about this:

  • Unlike for Windows, Wheels for linux for anything but pure Python packages are virtually non-existent at this point - PEP513 is still pretty new and not fully supported everywhere yet.
  • PEP513 actually covers the UCS2 vs UCS4 case (https://www.python.org/dev/peps/pep-0513/#ucs-2-vs-ucs-4-builds).
  • Since this PR adds the UCS4 runtime with it's own suffix, this will:
    • only affect people who specifically choose to switch to it
    • ensure that site-packages is purged when people switch, avoiding compatibility issues with already installed non-pure packages
    • give us a good way to test out compatibility and as to whether to enable UCS4 by default for the next Python 2.7 point release

Given it won't break anyone else, could we just merge this and give it a go? :-)

@kennethreitz
Copy link
Contributor

Yes, this is the information I collected. There are a few subtleties.

As for the suffix, there are a few others available that use this mechanism (e.g. -shared). There are reasons that make it less than useful to do it, but it can be useful for a handful of people (approx. <5). It's something that isn't really necessary unless for testing purposes.

I am thinking about it.

@kennethreitz
Copy link
Contributor

How will having these specific builds available for you, today, help you? I have no intention of releasing one of each ongoing in the future. This would be a one-time thing.

@edmorley
Copy link
Member Author

edmorley commented Aug 19, 2016

I absolutely agree - we shouldn't release one of each for future Python versions - UCS-4 should be the default long term. I explained the reasoning in the OP, but perhaps some more context would help.

Reasons for making UCS-4 the default:

  • It prevents many unicode edge-cases/bugs, and as far as I'm aware doesn't introduce any others.
  • It is the default for Ubuntu/Debian/Redhat/Travis's Python 2.7 (and no doubt others), so switching increases consistency between testing/development and what runs on Heroku.
  • The system Python in the cedar-14 stack itself uses UCS-4, which is inconsistent with this buildpack's Python. This means anyone using another language buildpack (eg heroku-buildpack-nodejs) but using Python during the compile, who later adds the python buildpack as a second buildpack will have actually switched UCS mode without realising (eg the scenario in Support Python 3.5 #246 (comment)).
  • Stock Python defaults to UCS-4 as of Python 3.3+, making UCS-2 effectively the legacy mode.

Ways to do this:

  1. Wait until the Python 2.7.13 release, and just make that (and any future versions) UCS-4
  2. Replace the existing Python 2.7.12 release with a UCS-4 version (and make future releases UCS-4 too)
  3. Release a one-off separate UCS-4 release of Python 2.7.12, with the intention of 2.7.13+ being UCS-4

Option (1) would mean us having to wait until December 2016 for the Python 2.7.13 release (schedule) or else fork the buildpack, build a custom Python 2.7.12 UCS-4 release and upload to our own S3 bucket, which in reality is too much of a hassle (politics with getting S3 bucket access etc). We could of course build Python during initial compile, however that still means forking the buildpack and longer initial build times in case of clearing the cache etc.

Option (2) seemed less likely to be merged/accepted, due to possible risk of changing existing pinned Python runtimes. (Though I guess people would only redownload the new UCS4 archive if their build cache were cleared, so existing users would mostly be unaware?)

Which is why I went with (3), however I'm also open to (2) :-)

@edmorley
Copy link
Member Author

edmorley commented Sep 6, 2016

@kennethreitz any thoughts on the last comment? :-)

@YenTheFirst
Copy link

This PR doesn't appear to work, at the moment.

setting runtime.txt to python-2.7.12-ucs4, and the buildback to 'https://github.com/edmorley/heroku-buildpack-python.git#python2.7.12-ucs4',

the build process fails with

-----> Python app detected
-----> Found python-2.7.12, removing
-----> Installing python-2.7.12-ucs4
 !     Requested runtime (python-2.7.12-ucs4) is not available for this stack (cedar-14).
 !     Aborting.  More info: https://devcenter.heroku.com/articles/python-support
 !     Push rejected, failed to compile Python app.
 !     Push failed

https://github.com/edmorley/heroku-buildpack-python/blob/python2.7.12-ucs4/bin/steps/python#L31

seems to be the issue.

@edmorley
Copy link
Member Author

edmorley commented Sep 27, 2016

After this PR is merged, a new python archive has to be created using the script, and uploaded to S3 - so it's expected it won't work as-is.

@kennethreitz, any news on my replies to your questions above?

Many thanks!

@kennethreitz
Copy link
Contributor

UCS-4 will be added in 2.7.13

@afritzler
Copy link

@kennethreitz any timeline when 2.7.13 based buildpack with UCS-4 will be released?

@cclauss
Copy link

cclauss commented Nov 17, 2016

https://www.python.org/dev/peps/pep-0373/#maintenance-releases says that Python 2.7.13 is due to be released in December 2016.

Ed Morley added 2 commits November 17, 2016 12:44
* Switches to the canonical source tarball URL, to avoid the redirect
* Avoids the repetition of the version number & outputs it to console
* Whitespace cleanup
Unlike the previous Heroku Python 2.7.x builds, this one enables the
UCS-4 unicode mode (instead of using UCS-2, which was the default until
Python 3.3 when PEP393 landed), for parity with Ubuntu system Python.

Fixes #305.
@edmorley
Copy link
Member Author

@kennethreitz many thanks :-)

PR updated to use 2.7.13, ready for December.

@kennethreitz
Copy link
Contributor

haha, excellent

@kennethreitz
Copy link
Contributor

Can't wait! ✨🍰✨

@kennethreitz
Copy link
Contributor

Thanks for your due-diligence on this. I think this will be a bit improvement for the platform (mostly for wheels, but also for "astral-plane" unicode support).

@kennethreitz
Copy link
Contributor

this should be released over the weekend — 2.7.13 is already released

@kennethreitz
Copy link
Contributor

released!

@kennethreitz
Copy link
Contributor

the latest python, python-2.7.13 on heroku uses ucs-4 now. thanks, @edmorley! ✨🍰✨

@edmorley edmorley deleted the python2.7.12-ucs4 branch March 2, 2017 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants