New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] PyPy support #11010

Merged
merged 64 commits into from Jul 20, 2018

Conversation

Projects
None yet
@rlamy
Contributor

rlamy commented Apr 22, 2018

Reference Issues/PRs

See #8625

What does this implement/fix? Explain your changes.

Working on adding PyPy support.

Disable Cython modules using array.array and reimplement in pure Python
This allows pypy3 to run and pass most tests.
@rth

This comment has been minimized.

Show comment
Hide comment
@rth

rth Apr 22, 2018

Member

Thanks for working on this @rlamy !

For future reference to create a Python 3.5 environment with PyPy 5.10.1, I used the following steps,

  1. Installed PyPy 5.10 (from my Linux distribution),
  2. Run,
    virtualenv -p /usr/bin/pypy3 pypy-env
    source pypy-env/bin/activate
    pip install numpy Cython Tempita
    pip install scipy==1.1.0rc1
    # pandas is used in benchmarks
    pip install pandas --no-build-isolation
    cd scikit-learn
    pip install -e .
Member

rth commented Apr 22, 2018

Thanks for working on this @rlamy !

For future reference to create a Python 3.5 environment with PyPy 5.10.1, I used the following steps,

  1. Installed PyPy 5.10 (from my Linux distribution),
  2. Run,
    virtualenv -p /usr/bin/pypy3 pypy-env
    source pypy-env/bin/activate
    pip install numpy Cython Tempita
    pip install scipy==1.1.0rc1
    # pandas is used in benchmarks
    pip install pandas --no-build-isolation
    cd scikit-learn
    pip install -e .
@rth

This comment has been minimized.

Show comment
Hide comment
@rth

rth Apr 22, 2018

Member

I added CI setup for PyPy and disabled other builds temporarly to avoid uncessary runs. Also copied the cython code that includes array.array back into separate files _hashing_pypy.py and _svmlight_format_pypy.py so it's easier to diff with the original version. Also added a IS_PYPY global flag similarly to scipy.

Unfortunately,

  • despite what the Travis CI documentation says pypy for Python 2 does not seem to be included anymore,
     $ ls ~/virtualenv/
     pypy3.5		python2.7     python2.7_with_system_site_packages  python3.6
     pypy3.5-5.10.1	python2.7.14  python3.4_with_system_site_packages  python3.6.3
    
    (specifying python: pypy2.7 returns an error about not found virtualenv). Unless I'm looking in wrong places
  • on Python 3, the build timeouts when building scipy at
    Processing scipy/spatial/_voronoi.pyx
    No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
    

generally I'm not sure if it's worth spending time on fixing this as locally the test suite with PyPy 3 takes ~30min for me (without the examples), so accounting for the time to build scipy I am not sure we will be able to run the test suite within the 50 min limit on Travis CI.

The output of the test suite I am getting locally with PyPy 3 - 5.10.1 can be found in
scikit-learn-pypy3-5.10.log; as you noted @rlamy there are ~105 failing tests out of 8727 which is surprisingly good for a first try. Most _load_svmlight_file tests fails so the included changes have issues (it seems I have missed a few imports while moving code - this should be fixed now).

Benchmark wise, we need to think which existing benchmarks under benchmarks/ could be used.

cc @jnothman @lesteve

Member

rth commented Apr 22, 2018

I added CI setup for PyPy and disabled other builds temporarly to avoid uncessary runs. Also copied the cython code that includes array.array back into separate files _hashing_pypy.py and _svmlight_format_pypy.py so it's easier to diff with the original version. Also added a IS_PYPY global flag similarly to scipy.

Unfortunately,

  • despite what the Travis CI documentation says pypy for Python 2 does not seem to be included anymore,
     $ ls ~/virtualenv/
     pypy3.5		python2.7     python2.7_with_system_site_packages  python3.6
     pypy3.5-5.10.1	python2.7.14  python3.4_with_system_site_packages  python3.6.3
    
    (specifying python: pypy2.7 returns an error about not found virtualenv). Unless I'm looking in wrong places
  • on Python 3, the build timeouts when building scipy at
    Processing scipy/spatial/_voronoi.pyx
    No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
    

generally I'm not sure if it's worth spending time on fixing this as locally the test suite with PyPy 3 takes ~30min for me (without the examples), so accounting for the time to build scipy I am not sure we will be able to run the test suite within the 50 min limit on Travis CI.

The output of the test suite I am getting locally with PyPy 3 - 5.10.1 can be found in
scikit-learn-pypy3-5.10.log; as you noted @rlamy there are ~105 failing tests out of 8727 which is surprisingly good for a first try. Most _load_svmlight_file tests fails so the included changes have issues (it seems I have missed a few imports while moving code - this should be fixed now).

Benchmark wise, we need to think which existing benchmarks under benchmarks/ could be used.

cc @jnothman @lesteve

@njsmith

This comment has been minimized.

Show comment
Hide comment
@njsmith

njsmith Apr 22, 2018

despite what the Travis CI documentation says pypy for Python 2 does not seem to be included anymore

The Travis CI docs are always misleading about what Pythons are actually available. If you want to actually know, the secret trick is to look directly at their s3 bucket:

s3cmd ls s3://travis-python-archives/binaries/ubuntu/16.04/x86_64/

So for example, right now that list includes pypy2.7-5.10.0.tar.bz2 and pypy3.5-5.10.1.tar.bz2, which means you can use python: pypy2.7-5.10.0 or python: pypy3.5-5.10.1 in your .travis.yml.

njsmith commented Apr 22, 2018

despite what the Travis CI documentation says pypy for Python 2 does not seem to be included anymore

The Travis CI docs are always misleading about what Pythons are actually available. If you want to actually know, the secret trick is to look directly at their s3 bucket:

s3cmd ls s3://travis-python-archives/binaries/ubuntu/16.04/x86_64/

So for example, right now that list includes pypy2.7-5.10.0.tar.bz2 and pypy3.5-5.10.1.tar.bz2, which means you can use python: pypy2.7-5.10.0 or python: pypy3.5-5.10.1 in your .travis.yml.

@rth

This comment has been minimized.

Show comment
Hide comment
@rth

rth Apr 22, 2018

Member

Thanks for the trick @njsmith! will use pypy2.7-5.10.0

On another topic, for future reference, I opened an issue at PyPy regarding array.array in Cython https://bitbucket.org/pypy/pypy/issues/2807/arrayarray-fails-to-import-in-cython

Member

rth commented Apr 22, 2018

Thanks for the trick @njsmith! will use pypy2.7-5.10.0

On another topic, for future reference, I opened an issue at PyPy regarding array.array in Cython https://bitbucket.org/pypy/pypy/issues/2807/arrayarray-fails-to-import-in-cython

rth added some commits Apr 22, 2018

@rth

This comment has been minimized.

Show comment
Hide comment
@rth

rth Apr 23, 2018

Member

Using scipy 1.1.rc1 worked much better in CI, thank you.

Current status:

  • pypy2.7-5.10.0 : 1 error at test collection time
  • pypy3.5-5.10 : Tavis CI timouts after 50min at 70% of the test suite

CircleCI seems to have a larger maximum allowed build time (5hours) so I'll try to use that instead.

Member

rth commented Apr 23, 2018

Using scipy 1.1.rc1 worked much better in CI, thank you.

Current status:

  • pypy2.7-5.10.0 : 1 error at test collection time
  • pypy3.5-5.10 : Tavis CI timouts after 50min at 70% of the test suite

CircleCI seems to have a larger maximum allowed build time (5hours) so I'll try to use that instead.

@rlamy

This comment has been minimized.

Show comment
Hide comment
@rlamy

rlamy Apr 23, 2018

Contributor

Using @antocuni 's pypy-wheels should make the install much faster. Unfortunately, there is no suitable scipy wheel yet, but we can at least get numpy.

Contributor

rlamy commented Apr 23, 2018

Using @antocuni 's pypy-wheels should make the install much faster. Unfortunately, there is no suitable scipy wheel yet, but we can at least get numpy.

@rlamy

This comment has been minimized.

Show comment
Hide comment
@rlamy

rlamy Apr 23, 2018

Contributor

Well, numpy seems to be pre-installed by Travis, so this doesn't change anything.

Contributor

rlamy commented Apr 23, 2018

Well, numpy seems to be pre-installed by Travis, so this doesn't change anything.

@rth

This comment has been minimized.

Show comment
Hide comment
@rth

rth Apr 24, 2018

Member

Migrated setup to CircleCI. Thanks, using the pre-build numpy wheels works better and also fixing a class without the __init__ as you mentionned in #11016 makes test collection work on PyPy2 as well.

So now we have,

  • PyPy2: 160 failed, 8911 passed, 81 skipped, 750 warnings in 1546.51 seconds
  • PyPy3: 11 failed, 9062 passed, 79 skipped, 8599 warnings in 1767.54 seconds

The CI uses PyPy docker images, and the setup can be run locally with,

$ cd scikit-learn
$ docker run --rm -v $(pwd):/scikit-learn/ -w /scikit-learn pypy:3-5.10.1 \
          bash ./build_tools/circle/install_pypy.sh

@rlamy Do you think there is anything that can be done about array.arrray short of keeping a copy of the code in pure Python? The issue only appears in combination with Cython, right? So it's possible to use mostly the same files, after removing cdefs etc; in this PR the array.array was removed altogether is this necessary?

Member

rth commented Apr 24, 2018

Migrated setup to CircleCI. Thanks, using the pre-build numpy wheels works better and also fixing a class without the __init__ as you mentionned in #11016 makes test collection work on PyPy2 as well.

So now we have,

  • PyPy2: 160 failed, 8911 passed, 81 skipped, 750 warnings in 1546.51 seconds
  • PyPy3: 11 failed, 9062 passed, 79 skipped, 8599 warnings in 1767.54 seconds

The CI uses PyPy docker images, and the setup can be run locally with,

$ cd scikit-learn
$ docker run --rm -v $(pwd):/scikit-learn/ -w /scikit-learn pypy:3-5.10.1 \
          bash ./build_tools/circle/install_pypy.sh

@rlamy Do you think there is anything that can be done about array.arrray short of keeping a copy of the code in pure Python? The issue only appears in combination with Cython, right? So it's possible to use mostly the same files, after removing cdefs etc; in this PR the array.array was removed altogether is this necessary?

@pv

This comment has been minimized.

Show comment
Hide comment
@pv

pv Apr 24, 2018

Contributor
Contributor

pv commented Apr 24, 2018

@pv

This comment has been minimized.

Show comment
Hide comment
@pv

pv Apr 24, 2018

Contributor
Contributor

pv commented Apr 24, 2018

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Apr 24, 2018

Member
Member

jnothman commented Apr 24, 2018

@antocuni

This comment has been minimized.

Show comment
Hide comment
@antocuni

antocuni Apr 24, 2018

could cython's pure python mode facilitate keeping only a single version of
the code? could some indirection in the import of array that special-cases
for pypy work?

FWIW, I think that using cython's pure-python mode together with an external pxd (and/or @cython.* annotations) it's a very effective way of using without hurting PyPy performance.

I have used this pattern in capnpy and it worked very well.

antocuni commented Apr 24, 2018

could cython's pure python mode facilitate keeping only a single version of
the code? could some indirection in the import of array that special-cases
for pypy work?

FWIW, I think that using cython's pure-python mode together with an external pxd (and/or @cython.* annotations) it's a very effective way of using without hurting PyPy performance.

I have used this pattern in capnpy and it worked very well.

@rlamy

This comment has been minimized.

Show comment
Hide comment
@rlamy

rlamy Apr 24, 2018

Contributor

@rth I removed array.array due to the use of the Cython-specific resize_smart() but I guess that using array.append() would allow the pure-Python and Cython code to be more similar.

Contributor

rlamy commented Apr 24, 2018

@rth I removed array.array due to the use of the Cython-specific resize_smart() but I guess that using array.append() would allow the pure-Python and Cython code to be more similar.

@pv

This comment has been minimized.

Show comment
Hide comment
@pv

pv Apr 25, 2018

Contributor

... or write a new custom class in Cython in 200 lines to do the stuff here --- it's just used for expandable temporary buffers.

Contributor

pv commented Apr 25, 2018

... or write a new custom class in Cython in 200 lines to do the stuff here --- it's just used for expandable temporary buffers.

@rth

This comment has been minimized.

Show comment
Hide comment
@rth

rth Jul 15, 2018

Member

Thanks a lot @amueller !

(I just realized the FAQ entry about PyPy needs to be updated too, will add another commit).

Member

rth commented Jul 15, 2018

Thanks a lot @amueller !

(I just realized the FAQ entry about PyPy needs to be updated too, will add another commit).

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Jul 15, 2018

Member

We should find a way to avoid building numpy and scipy at each push in each pull request. Either there is an easy way to Dockerfile it and use the circle-ci cache.

Or we run the PyPy tests in a cron job rather than a regular build job.

Member

ogrisel commented Jul 15, 2018

We should find a way to avoid building numpy and scipy at each push in each pull request. Either there is an easy way to Dockerfile it and use the circle-ci cache.

Or we run the PyPy tests in a cron job rather than a regular build job.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Jul 15, 2018

Member

It would be good if there was a cron job for testing this, we certainly do not want to test this in PRs: here is the doc I found for what seems the equivalent of CircleCI cron jobs: https://circleci.com/docs/2.0/configuration-reference/#triggers.

Alternatively trying on Travis and crossing our fingers hoping that it fits within the time limit may be an option?

Member

lesteve commented Jul 15, 2018

It would be good if there was a cron job for testing this, we certainly do not want to test this in PRs: here is the doc I found for what seems the equivalent of CircleCI cron jobs: https://circleci.com/docs/2.0/configuration-reference/#triggers.

Alternatively trying on Travis and crossing our fingers hoping that it fits within the time limit may be an option?

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Jul 15, 2018

Member

We should find a way to avoid building numpy and scipy at each push in each pull request. Either there is an easy way to Dockerfile it and use the circle-ci cache.

Actually no need for docker: if we can just configure the wheel cache folder of pip to be cached by circle ci this is probably enough.

Member

ogrisel commented Jul 15, 2018

We should find a way to avoid building numpy and scipy at each push in each pull request. Either there is an easy way to Dockerfile it and use the circle-ci cache.

Actually no need for docker: if we can just configure the wheel cache folder of pip to be cached by circle ci this is probably enough.

Show outdated Hide outdated sklearn/utils/__init__.py
Show outdated Hide outdated sklearn/feature_extraction/setup.py
Show outdated Hide outdated conftest.py
Show outdated Hide outdated sklearn/feature_extraction/hashing.py
Show outdated Hide outdated setup.py

rth added some commits Jul 16, 2018

@ogrisel

LGTM.

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Jul 17, 2018

Member

pypy3 build setup is failing:

Installing collected packages: scipy, MarkupSafe, Jinja2, Pygments, docutils, snowballstemmer, pytz, babel, alabaster, imagesize, chardet, idna, urllib3, certifi, requests, pyparsing, packaging, sphinxcontrib-websupport, sphinx, numpydoc
Successfully installed Jinja2-2.10 MarkupSafe-1.0 Pygments-2.2.0 alabaster-0.7.11 babel-2.6.0 certifi-2018.4.16 chardet-3.0.4 docutils-0.14 idna-2.7 imagesize-1.0.0 numpydoc-0.8.0 packaging-17.1 pyparsing-2.2.0 pytz-2018.5 requests-2.19.1 scipy-1.1.0 snowballstemmer-1.2.1 sphinx-1.7.6 sphinxcontrib-websupport-1.1.0 urllib3-1.23
+ ccache -M 512M
./build_tools/circle/build_test_pypy.sh: line 24: ccache: command not found
Exited with code 127
Member

ogrisel commented Jul 17, 2018

pypy3 build setup is failing:

Installing collected packages: scipy, MarkupSafe, Jinja2, Pygments, docutils, snowballstemmer, pytz, babel, alabaster, imagesize, chardet, idna, urllib3, certifi, requests, pyparsing, packaging, sphinxcontrib-websupport, sphinx, numpydoc
Successfully installed Jinja2-2.10 MarkupSafe-1.0 Pygments-2.2.0 alabaster-0.7.11 babel-2.6.0 certifi-2018.4.16 chardet-3.0.4 docutils-0.14 idna-2.7 imagesize-1.0.0 numpydoc-0.8.0 packaging-17.1 pyparsing-2.2.0 pytz-2018.5 requests-2.19.1 scipy-1.1.0 snowballstemmer-1.2.1 sphinx-1.7.6 sphinxcontrib-websupport-1.1.0 urllib3-1.23
+ ccache -M 512M
./build_tools/circle/build_test_pypy.sh: line 24: ccache: command not found
Exited with code 127

rth added some commits Jul 17, 2018

Show outdated Hide outdated doc/faq.rst

rth added some commits Jul 17, 2018

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 20, 2018

Member

Let's merge.

Member

jnothman commented Jul 20, 2018

Let's merge.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 20, 2018

Member

Thank you @rlamy and @rth.

Member

jnothman commented Jul 20, 2018

Thank you @rlamy and @rth.

@jnothman jnothman merged commit 5592a2e into scikit-learn:master Jul 20, 2018

3 of 4 checks passed

codecov/patch 75.86% of diff hit (target 95.31%)
Details
codecov/project 95.3% (-0.02%) compared to 2489035
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@rth

This comment has been minimized.

Show comment
Hide comment
@rth

rth Jul 20, 2018

Member

Thanks @jnothman , the issue was that Circle CI was down, and so it didn't trigger at the last commit, and the PyPy build didn't pass on master.

I'll try to fix it in a separate PR, but a temporary workaround could be to disable the corresponding pypy3 build in CircleCI to avoid failing CI status on master...

Member

rth commented Jul 20, 2018

Thanks @jnothman , the issue was that Circle CI was down, and so it didn't trigger at the last commit, and the PyPy build didn't pass on master.

I'll try to fix it in a separate PR, but a temporary workaround could be to disable the corresponding pypy3 build in CircleCI to avoid failing CI status on master...

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Jul 20, 2018

Contributor

but a temporary workaround could be to disable the corresponding pypy3 build in CircleCI to avoid failing CI status on master..

+1

Contributor

jorisvandenbossche commented Jul 20, 2018

but a temporary workaround could be to disable the corresponding pypy3 build in CircleCI to avoid failing CI status on master..

+1

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Jul 20, 2018

Member

Seems that we're running pypi (which fails) in PRs, which might confuse contributors and block codecov from providing results. Maybe we need a temporary workaround ASAP.

Member

qinhanmin2014 commented Jul 20, 2018

Seems that we're running pypi (which fails) in PRs, which might confuse contributors and block codecov from providing results. Maybe we need a temporary workaround ASAP.

@rth

This comment has been minimized.

Show comment
Hide comment
@rth

rth Jul 20, 2018

Member

Temporary workaround applied in 6e94a63

Member

rth commented Jul 20, 2018

Temporary workaround applied in 6e94a63

@giodegas

This comment has been minimized.

Show comment
Hide comment
@giodegas

giodegas Sep 2, 2018

Hi to all. I just participated to the EuroSciPy 2018 sprint about PyPy. I am trying to build a dockerized pypy/numpy/scipy/scikitlearn set with jupyter notebook image.

docker pull giodegas/pypy-jupyter:scipy

( source repo here , please open issues )

Result of

pytest --pyargs sklearn > pytest.log

pytest.log

How can I quickly fix the error:

ImportError: cannot import name '_load_svmlight_file'

Thank's.

giodegas commented Sep 2, 2018

Hi to all. I just participated to the EuroSciPy 2018 sprint about PyPy. I am trying to build a dockerized pypy/numpy/scipy/scikitlearn set with jupyter notebook image.

docker pull giodegas/pypy-jupyter:scipy

( source repo here , please open issues )

Result of

pytest --pyargs sklearn > pytest.log

pytest.log

How can I quickly fix the error:

ImportError: cannot import name '_load_svmlight_file'

Thank's.

@rth

This comment has been minimized.

Show comment
Hide comment
@rth

rth Sep 2, 2018

Member

@giodegas Great to know you are working on this! Please make sure you are using the just released 0.20.rc1 -- version 0.19 doesn't support PyPy. Also it's expected that you would get some tests failures on Python 2, there shouldn't be any on Python 3. If you do experience other problems please open a new issue.

Member

rth commented Sep 2, 2018

@giodegas Great to know you are working on this! Please make sure you are using the just released 0.20.rc1 -- version 0.19 doesn't support PyPy. Also it's expected that you would get some tests failures on Python 2, there shouldn't be any on Python 3. If you do experience other problems please open a new issue.

@rth

This comment has been minimized.

Show comment
Hide comment
@rth

rth Sep 2, 2018

Member

Also I would like to thank @rlamy for starting this effort, and all reviewers for very helpful suggestions!

(I still need to re-enable PyPy CI probably in a Circle CI Cron job)

Member

rth commented Sep 2, 2018

Also I would like to thank @rlamy for starting this effort, and all reviewers for very helpful suggestions!

(I still need to re-enable PyPy CI probably in a Circle CI Cron job)

@giodegas

This comment has been minimized.

Show comment
Hide comment
@giodegas

giodegas Sep 2, 2018

I also get the error:

ValueError: array.array has the wrong size, try recompiling. Expected 24, got 72

giodegas commented Sep 2, 2018

I also get the error:

ValueError: array.array has the wrong size, try recompiling. Expected 24, got 72
@giodegas

This comment has been minimized.

Show comment
Hide comment
@giodegas

giodegas Sep 2, 2018

@rth done the upgrade to 0.20.rc1.
The new pytest.log

New error showing up...

giodegas commented Sep 2, 2018

@rth done the upgrade to 0.20.rc1.
The new pytest.log

New error showing up...

@giodegas

This comment has been minimized.

Show comment
Hide comment
@giodegas

giodegas Sep 2, 2018

@rth I am testing starting from latest pypy3 docker image.

giodegas commented Sep 2, 2018

@rth I am testing starting from latest pypy3 docker image.

@rth

This comment has been minimized.

Show comment
Hide comment
@rth

rth Sep 2, 2018

Member

@giodegas Could you please open a new issue with this information?

Member

rth commented Sep 2, 2018

@giodegas Could you please open a new issue with this information?

@giodegas

This comment has been minimized.

Show comment
Hide comment
@giodegas

giodegas Sep 2, 2018

Here suggests to do:

import pickle

instead of

import _pickle

giodegas commented Sep 2, 2018

Here suggests to do:

import pickle

instead of

import _pickle
@giodegas

This comment has been minimized.

Show comment
Hide comment
@giodegas

giodegas commented Sep 2, 2018

moved to #11971

@rth rth referenced this pull request Sep 3, 2018

Closed

PyPy support #160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment