Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] PyPy support #11010

Merged
merged 64 commits into from
Jul 20, 2018
Merged
Show file tree
Hide file tree
Changes from 52 commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
6e2f4df
Disable Cython modules using array.array and reimplement in pure Python
rlamy Apr 22, 2018
13c2469
Hack all_estimators() to avoid the modules that are broken on PyPy
rlamy Apr 22, 2018
be622fa
PyPy CI setup
rth Apr 22, 2018
5d7d25d
PyPy specific copy of _hashing.pyx and _svmlight_format.pyx
rth Apr 22, 2018
2857205
Fix _load_svmlight_file failures
rth Apr 22, 2018
cb02f29
Fix imports in _svmlight_format_pypy.py
rth Apr 22, 2018
cb56013
Use pypy2.7-5.10.0 on Travis CI
rth Apr 22, 2018
7f381fc
Use scipy==1.1.0rc1 in Travis CI
rth Apr 22, 2018
40b12e7
Update pip on Travis CI
rth Apr 22, 2018
f482cee
Install BLAS on Travis CI
rth Apr 22, 2018
85e4ee7
Change pip install verbosity to prevent CI timeout
rth Apr 23, 2018
dd3647d
Use pypy-wheels to speed up numpy install
rlamy Apr 23, 2018
7222ce7
Run PyPy tests on CircleCI
rth Apr 23, 2018
11cc6fb
Fix error in CircleCI
rth Apr 23, 2018
f86c60e
Another attempt to fix CircleCI
rth Apr 23, 2018
3d99fca
Factorize PyPy Ci installation setup
rth Apr 23, 2018
37a0294
chmod a+x build_tools/circle/install_pypy.sh
rth Apr 23, 2018
8e90d56
Use the correct python executable
rth Apr 23, 2018
275c6d7
Install pytest
rth Apr 23, 2018
7e80717
Merge branch 'master' into pypy-testing
rth Apr 23, 2018
46878e2
Fix merge conflict resolution issue
rth Apr 24, 2018
4b239d6
Add __init__ to KernelCenterer
rth Apr 24, 2018
c4c2995
Use PyPy 6.0
rth Apr 27, 2018
456f94d
Skip standard _svmlight_format and _hashing imports tests
rth Apr 27, 2018
1b95ac9
Replace array.resize by np.resize in gradient boosting
rth Apr 27, 2018
02454b9
Revert CI
rth May 2, 2018
9a5d71e
Use a virtualenv in CI
rth May 2, 2018
5faf55e
Fix test_svmlight_format.py::test_load_compressed
rth May 2, 2018
b5fbe28
Fix check_no_attributes_set_in_init
rth May 2, 2018
5958a23
Disable PyPy2 in Circle CI
rth Jun 3, 2018
ef14a60
Merge branch 'master' into pypy-testing
rth Jun 3, 2018
2b98923
Remove duplicate pure python files and mark corresponding tests as xfail
rth Jun 3, 2018
0bc6753
Use scipy 1.1.0 for PyPy and fix PEP8
rth Jun 3, 2018
a300228
Use pre-compiled scipy for PyPy
rth Jun 3, 2018
9c29561
Use quotes for pip version specifications
rth Jun 3, 2018
ac8c286
Skip the check that segfaults with RO memmaps in AgglomerativeClustering
rth Jun 3, 2018
2faea96
Revert "Skip the check that segfaults with RO memmaps in Agglomerativ…
rth Jun 4, 2018
006ecc0
Revert to building scipy from sources
rth Jun 4, 2018
218c529
Use pytestmark to mark modules as xfail on PyPy
rth Jun 9, 2018
3634eaf
TST Check PyPy3 5.10
rth Jun 9, 2018
695585b
TST Check PyPy + numpy 1.13.3 and scipy 1.1.0
rth Jun 9, 2018
a7dd224
TST Check PyPy + numpy 1.14.4 and scipy 1.0.0
rth Jun 9, 2018
4636266
TST Check PyPy + numpy 1.14.0 and scipy 1.0.0
rth Jun 11, 2018
f3007a3
Add what's new
rth Jun 11, 2018
eb85b96
Remove Tempita from build_test_pypy.sh
rth Jun 12, 2018
9cc0970
Use scipy 1.1.0+ as minimal depdency with PyPy
rth Jun 13, 2018
7cdc822
Joel's comments documentation and packaging
rth Jun 13, 2018
0d971d4
Also run test-doc and test-sphinxext
rth Jun 14, 2018
6c95101
Merge branch 'master' into pypy-testing
rth Jun 14, 2018
5702aac
Add missing packages for doctests
rth Jun 14, 2018
82302d6
Skip tutorial/text_analytics/working_with_text_data.rst on PyPy
rth Jun 14, 2018
dbc192c
Merge branch 'master' into pypy-testing
rth Jul 15, 2018
4931d94
Olivier's comments
rth Jul 16, 2018
47f9f76
Cache pip and use ccache
rth Jul 16, 2018
b538e8e
Fix NUMPY_MIN_VERSION for PyPy
rth Jul 16, 2018
122f38b
Fix skipped modules in setup.py
rth Jul 17, 2018
71ec01c
Fix circle ci
rth Jul 17, 2018
9f9aebd
PEP8 and remove redundant line in setup.py
rth Jul 17, 2018
33f2f3a
Actually install ccache
rth Jul 17, 2018
66fcbe7
Update FAQ entry
rth Jul 17, 2018
40ada79
Improve FAQ entry formulation
rth Jul 17, 2018
b71e79f
Merge branch 'master' into pypy-testing
rth Jul 17, 2018
6390cf6
Merge branch 'master' into pypy-testing
rth Jul 19, 2018
ae76d01
PEP8 after merge
rth Jul 19, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,13 @@ jobs:
path: ~/log.txt
destination: log.txt

pypy3:
docker:
- image: pypy:3-6.0.0
steps:
- checkout
- run: ./build_tools/circle/build_test_pypy.sh

deploy:
docker:
- image: circleci/python:3.6.1
Expand All @@ -88,6 +95,7 @@ workflows:
jobs:
- python3
- python2
- pypy3
- deploy:
requires:
- python3
Expand Down
25 changes: 25 additions & 0 deletions build_tools/circle/build_test_pypy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/usr/bin/env bash
set -x
set -e

apt-get -yq update
apt-get -yq install libatlas-dev libatlas-base-dev liblapack-dev gfortran

pip install virtualenv

if command -v pypy3; then
virtualenv -p $(command -v pypy3) pypy-env
elif command -v pypy; then
virtualenv -p $(command -v pypy) pypy-env
fi

source pypy-env/bin/activate

python --version
which python

pip install --extra-index https://antocuni.github.io/pypy-wheels/ubuntu numpy==1.14.4 Cython pytest
pip install "scipy>=1.1.0" sphinx numpydoc docutils
pip install -e .

make test
10 changes: 10 additions & 0 deletions conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,23 @@
# doc/modules/clustering.rst and use sklearn from the local folder rather than
# the one from site-packages.

import sys
from distutils.version import LooseVersion

import pytest
from _pytest.doctest import DoctestItem


def pytest_collection_modifyitems(config, items):

# FeatureHasher is not compatible with PyPy
if '__pypy__' in sys.modules:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

platform.python_implementation() == 'PyPy'

skip_marker = pytest.mark.skip(
reason='FeatureHasher is not compatible with PyPy')
for item in items:
if item.name == 'sklearn.feature_extraction.hashing.FeatureHasher':
item.add_marker(skip_marker)

# numpy changed the str/repr formatting of numpy arrays in 1.14. We want to
# run doctests only for numpy >= 1.14.
skip_doctests = True
Expand Down
6 changes: 6 additions & 0 deletions doc/conftest.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
import os
from os.path import exists
from os.path import join

import numpy as np

from sklearn.utils import IS_PYPY
from sklearn.utils.testing import SkipTest
from sklearn.utils.testing import check_skip_network
from sklearn.datasets import get_data_home
Expand Down Expand Up @@ -55,6 +57,8 @@ def setup_twenty_newsgroups():


def setup_working_with_text_data():
if IS_PYPY and os.environ.get('CI', None):
raise SkipTest('Skipping too slow test with PyPy on CI')
check_skip_network()
cache_path = _pkl_filepath(get_data_home(), CACHE_NAME)
if not exists(cache_path):
Expand Down Expand Up @@ -91,6 +95,8 @@ def pytest_runtest_setup(item):
setup_working_with_text_data()
elif fname.endswith('modules/compose.rst') or is_index:
setup_compose()
elif IS_PYPY and fname.endswith('modules/feature_extraction.rst'):
raise SkipTest('FeatureHasher is not compatible with PyPy')
elif fname.endswith('modules/impute.rst'):
setup_impute()

Expand Down
6 changes: 6 additions & 0 deletions doc/developers/advanced_installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,12 @@ Scikit-learn requires:
- NumPy (>= 1.8.2),
- SciPy (>= 0.13.3).

.. note::

For installing on PyPy, PyPy3-v5.10+, Numpy 1.14.0+, and scipy 1.1.0+
are required. For PyPy, only installation instructions with pip apply.


Building Scikit-learn also requires

- Cython >=0.23
Expand Down
2 changes: 1 addition & 1 deletion doc/developers/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -352,7 +352,7 @@ and Cython optimizations.

* Travis is used for testing on Linux platforms
* Appveyor is used for testing on Windows platforms
* CircleCI is used to build the docs for viewing
* CircleCI is used to build the docs for viewing and for testing with PyPy on Linux

Please note that if one of the following markers appear in the latest commit
message, the following actions are taken.
Expand Down
6 changes: 6 additions & 0 deletions doc/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,12 @@ it as ``scikit-learn[alldeps]``. The most common use case for this is in a
application or a Docker image. This option is not intended for manual
installation from the command line.

.. note::

For installing on PyPy, PyPy3-v5.10+, Numpy 1.14.0+, and scipy 1.1.0+
are required.


For installation instructions for more distributions see
:ref:`other distributions <install_by_distribution>`.
For compiling the development version from source, or building the package
Expand Down
6 changes: 6 additions & 0 deletions doc/whats_new/v0.20.rst
Original file line number Diff line number Diff line change
Expand Up @@ -408,6 +408,12 @@ Miscellaneous
:issue:`9101` by :user:`alex-33 <alex-33>`
and :user:`Maskani Filali Mohamed <maskani-moh>`.

- Add almost complete PyPy 3 support. Known unsupported functionalities are
:func:`datasets.load_svmlight_file`, :class:`feature_extraction.FeatureHasher` and
:class:`feature_extraction.text.HashingVectorizer`. For running on PyPy, PyPy3-v5.10+,
Numpy 1.14.0+, and scipy 1.1.0+ are required.
:issue:`11010` by :user:`Ronan Lamy <rlamy>` and `Roman Yurchak`_.

Bug fixes
.........

Expand Down
12 changes: 10 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,12 @@

VERSION = sklearn.__version__

SCIPY_MIN_VERSION = '0.13.3'
NUMPY_MIN_VERSION = '1.8.2'
if '__pypy__' in sys.modules:
SCIPY_MIN_VERSION = '1.1.0'
NUMPY_MIN_VERSION = '1.4.0'
Copy link
Contributor

@pv pv Jul 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.14.0?

else:
SCIPY_MIN_VERSION = '0.13.3'
NUMPY_MIN_VERSION = '1.8.2'


# Optional setuptools features
Expand Down Expand Up @@ -185,6 +189,10 @@ def setup_package():
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
('Programming Language :: Python :: '
'Implementation :: CPython'),
('Programming Language :: Python :: '
'Implementation :: PyPy')
],
cmdclass=cmdclass,
install_requires=[
Expand Down
8 changes: 5 additions & 3 deletions sklearn/datasets/setup.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@

import numpy
import os
import sys


def configuration(parent_package='', top_path=None):
Expand All @@ -10,9 +11,10 @@ def configuration(parent_package='', top_path=None):
config.add_data_dir('descr')
config.add_data_dir('images')
config.add_data_dir(os.path.join('tests', 'data'))
config.add_extension('_svmlight_format',
sources=['_svmlight_format.pyx'],
include_dirs=[numpy.get_include()])
if '__pypy__' not in sys.modules:
config.add_extension('_svmlight_format',
sources=['_svmlight_format.pyx'],
include_dirs=[numpy.get_include()])
config.add_subpackage('tests')
return config

Expand Down
10 changes: 8 additions & 2 deletions sklearn/datasets/svmlight_format.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,18 @@
import numpy as np
import scipy.sparse as sp

from ._svmlight_format import _load_svmlight_file
from .. import __version__
from ..externals import six
from ..externals.six import u, b
from ..externals.six.moves import range, zip
from ..utils import check_array
from ..utils import check_array, IS_PYPY

if not IS_PYPY:
from ._svmlight_format import _load_svmlight_file
else:
def _load_svmlight_file(*args, **kwargs):
raise NotImplementedError('load_svmlight_file is currently not '
'compatible with PyPy.')


def load_svmlight_file(f, n_features=None, dtype=np.float64,
Expand Down
9 changes: 7 additions & 2 deletions sklearn/datasets/tests/test_svmlight_format.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from sklearn.utils.testing import assert_raises
from sklearn.utils.testing import assert_raises_regex
from sklearn.utils.testing import assert_in
from sklearn.utils.testing import fails_if_pypy
from sklearn.utils.fixes import sp_version

import sklearn
Expand All @@ -30,6 +31,8 @@
invalidfile = os.path.join(currdir, "data", "svmlight_invalid.txt")
invalidfile2 = os.path.join(currdir, "data", "svmlight_invalid_order.txt")

pytestmark = fails_if_pypy


def test_load_svmlight_file():
X, y = load_svmlight_file(datafile)
Expand Down Expand Up @@ -119,7 +122,8 @@ def test_load_compressed():
with NamedTemporaryFile(prefix="sklearn-test", suffix=".gz") as tmp:
tmp.close() # necessary under windows
with open(datafile, "rb") as f:
shutil.copyfileobj(f, gzip.open(tmp.name, "wb"))
with gzip.open(tmp.name, "wb") as fh_out:
shutil.copyfileobj(f, fh_out)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, not using a context manager here results in an empty file when load_svmlight_file is called. Probably due to GC differences,

For files that are opened for writing, data can be left sitting in their output buffers for a while, making the on-disk file appear empty or truncated.

Xgz, ygz = load_svmlight_file(tmp.name)
# because we "close" it manually and write to it,
# we need to remove it manually.
Expand All @@ -130,7 +134,8 @@ def test_load_compressed():
with NamedTemporaryFile(prefix="sklearn-test", suffix=".bz2") as tmp:
tmp.close() # necessary under windows
with open(datafile, "rb") as f:
shutil.copyfileobj(f, BZ2File(tmp.name, "wb"))
with BZ2File(tmp.name, "wb") as fh_out:
shutil.copyfileobj(f, fh_out)
Xbz, ybz = load_svmlight_file(tmp.name)
# because we "close" it manually and write to it,
# we need to remove it manually.
Expand Down
8 changes: 5 additions & 3 deletions sklearn/ensemble/gradient_boosting.py
Original file line number Diff line number Diff line change
Expand Up @@ -931,12 +931,14 @@ def _resize_state(self):
raise ValueError('resize with smaller n_estimators %d < %d' %
(total_n_estimators, self.estimators_[0]))

self.estimators_.resize((total_n_estimators, self.loss_.K))
self.train_score_.resize(total_n_estimators)
self.estimators_ = np.resize(self.estimators_,
(total_n_estimators, self.loss_.K))
Copy link
Member

@rth rth Jun 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the difference here between numpy.ndarray.resize and numpy.resize matters, particularly with respect to performance and

[numpy.resize docstring]
If the new array is larger than the original array, then the new array is filled with repeated copies of a. Note that this behavior is different from a.resize(new_shape) which fills with zeros instead of repeated copies of a.

Maybe @amueller or @glemaitre would have an opinion?

ndarray.reshape ndarray.resize is not supported on PyPy..

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment mixes up reshape and resize... AFAIK reshape is totally supported on PyPy. resize is also supported, for the common case where the new shape has the same total size as the old shape. But if you try change the size of an array, that requires reallocated the underlying data, which is a pretty delicate operation – it swaps out the actual memory buffer underlying the array, so if there are any views on the array, they'll end up pointing into free'd memory and causing a segfault or worse.

Probably numpy should never have supported such a dangerous operation in the first place, but these things happen...

On CPython, numpy can use the refcount to check if the operation is even possibly-maybe safe, and only allows it if there are no other references to the array. On PyPy, there are no refcounts, so numpy is conservative and assumes views might exist.

If you're really sure that this is what you want to do, and are prepared to accept the risks, then you can pass refcheck=False. If you know that no-one is holding a view on self.estimators_, and that this will always be true in the future, then this should be safe... and if CPython would have allowed the operation, then passing refcheck=False on PyPy shouldn't be any less safe.

Copy link
Member

@rth rth Jun 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the insightful comment!

Your comment mixes up reshape and resize

Yes, definitely I meant resize in my comment (edited it).

self.train_score_ = np.resize(self.train_score_, total_n_estimators)
if (self.subsample < 1 or hasattr(self, 'oob_improvement_')):
# if do oob resize arrays or create new if not available
if hasattr(self, 'oob_improvement_'):
self.oob_improvement_.resize(total_n_estimators)
self.oob_improvement_ = np.resize(self.oob_improvement_,
total_n_estimators)
else:
self.oob_improvement_ = np.zeros((total_n_estimators,),
dtype=np.float64)
Expand Down
10 changes: 8 additions & 2 deletions sklearn/feature_extraction/hashing.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,15 @@
import numpy as np
import scipy.sparse as sp

from . import _hashing
from ..utils import IS_PYPY
from ..base import BaseEstimator, TransformerMixin

if not IS_PYPY:
from ._hashing import transform as _hashing_transform
else:
def _hashing_transform(*args, **kwargs):
raise NotImplementedError('FeatureHasher is not compatible with PyPy.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would help if this error message could also include the URL of a github issue that explains why this cannot be currently be implemented with the current versions of PyPy & Cython.



def _iteritems(d):
"""Like d.iteritems, but accepts any collections.Mapping."""
Expand Down Expand Up @@ -155,7 +161,7 @@ def transform(self, raw_X):
elif self.input_type == "string":
raw_X = (((f, 1) for f in x) for x in raw_X)
indices, indptr, values = \
_hashing.transform(raw_X, self.n_features, self.dtype,
_hashing_transform(raw_X, self.n_features, self.dtype,
self.alternate_sign)
n_samples = indptr.shape[0] - 1

Expand Down
10 changes: 6 additions & 4 deletions sklearn/feature_extraction/setup.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import os
import sys


def configuration(parent_package='', top_path=None):
Expand All @@ -10,10 +11,11 @@ def configuration(parent_package='', top_path=None):
if os.name == 'posix':
libraries.append('m')

config.add_extension('_hashing',
sources=['_hashing.pyx'],
include_dirs=[numpy.get_include()],
libraries=libraries)
if "__pypy__" not in sys.modules:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be IS_PYPY

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait this is setup, never mind...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, the following is cleaner:

platform.python_implementation() == 'PyPy'

config.add_extension('_hashing',
sources=['_hashing.pyx'],
include_dirs=[numpy.get_include()],
libraries=libraries)
config.add_subpackage("tests")

return config
4 changes: 3 additions & 1 deletion sklearn/feature_extraction/tests/test_feature_hasher.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@

from sklearn.feature_extraction import FeatureHasher
from sklearn.utils.testing import (assert_raises, assert_true, assert_equal,
ignore_warnings)
ignore_warnings, fails_if_pypy)

pytestmark = fails_if_pypy


def test_feature_hasher_dicts():
Expand Down
16 changes: 12 additions & 4 deletions sklearn/feature_extraction/tests/test_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,13 @@
import numpy as np
from numpy.testing import assert_array_almost_equal
from numpy.testing import assert_array_equal
from sklearn.utils import IS_PYPY
from sklearn.utils.testing import (assert_equal, assert_false, assert_true,
assert_not_equal, assert_almost_equal,
assert_in, assert_less, assert_greater,
assert_warns_message, assert_raise_message,
clean_warning_registry, ignore_warnings,
SkipTest, assert_raises,
SkipTest, assert_raises, fails_if_pypy,
assert_allclose_dense_sparse)
from sklearn.utils.fixes import _Mapping as Mapping
from collections import defaultdict
Expand Down Expand Up @@ -502,6 +503,7 @@ def test_tfidf_vectorizer_setters():
assert_true(tv._tfidf.sublinear_tf)


@fails_if_pypy
@ignore_warnings(category=DeprecationWarning)
def test_hashing_vectorizer():
v = HashingVectorizer()
Expand Down Expand Up @@ -684,6 +686,7 @@ def test_count_binary_occurrences():
assert_equal(X_sparse.dtype, np.float32)


@fails_if_pypy
@ignore_warnings(category=DeprecationWarning)
def test_hashed_binary_occurrences():
# by default multiple occurrences are counted as longs
Expand Down Expand Up @@ -819,6 +822,7 @@ def test_vectorizer_pipeline_cross_validation():
assert_array_equal(cv_scores, [1., 1., 1.])


@fails_if_pypy
@ignore_warnings(category=DeprecationWarning)
def test_vectorizer_unicode():
# tests that the count vectorizer works with cyrillic.
Expand Down Expand Up @@ -886,9 +890,12 @@ def test_pickling_vectorizer():
copy = pickle.loads(s)
assert_equal(type(copy), orig.__class__)
assert_equal(copy.get_params(), orig.get_params())
assert_array_equal(
copy.fit_transform(JUNK_FOOD_DOCS).toarray(),
orig.fit_transform(JUNK_FOOD_DOCS).toarray())
if IS_PYPY and isinstance(orig, HashingVectorizer):
continue
else:
assert_array_equal(
copy.fit_transform(JUNK_FOOD_DOCS).toarray(),
orig.fit_transform(JUNK_FOOD_DOCS).toarray())


def test_countvectorizer_vocab_sets_when_pickling():
Expand Down Expand Up @@ -990,6 +997,7 @@ def test_non_unique_vocab():
assert_raises(ValueError, vect.fit, [])


@fails_if_pypy
def test_hashingvectorizer_nan_in_docs():
# np.nan can appear when using pandas to load text fields from a csv file
# with missing values.
Expand Down
Loading