Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST Test failures with OpenML for PyPy #18906

Closed
alfaro96 opened this issue Nov 25, 2020 · 2 comments
Closed

TST Test failures with OpenML for PyPy #18906

alfaro96 opened this issue Nov 25, 2020 · 2 comments

Comments

@alfaro96
Copy link
Member

alfaro96 commented Nov 25, 2020

The following tests are failing for PyPy:

FAILED datasets/tests/test_openml.py::test_fetch_openml_iris_pandas - TypeErr...
FAILED datasets/tests/test_openml.py::test_fetch_openml_iris_pandas_equal_to_no_frame
FAILED datasets/tests/test_openml.py::test_fetch_openml_iris_multitarget_pandas
FAILED datasets/tests/test_openml.py::test_fetch_openml_anneal_pandas - TypeE...
FAILED datasets/tests/test_openml.py::test_fetch_openml_cpu_pandas - TypeErro...
FAILED datasets/tests/test_openml.py::test_fetch_openml_as_frame_auto - TypeE...
FAILED datasets/tests/test_openml.py::test_convert_arff_data_dataframe_warning_low_memory_pandas
FAILED datasets/tests/test_openml.py::test_fetch_openml_adultcensus_pandas_return_X_y
FAILED datasets/tests/test_openml.py::test_fetch_openml_adultcensus_pandas - ...
FAILED datasets/tests/test_openml.py::test_fetch_openml_miceprotein_pandas - ...
FAILED datasets/tests/test_openml.py::test_fetch_openml_emotions_pandas - Typ...
FAILED datasets/tests/test_openml.py::test_fetch_openml_titanic_pandas - Type...
FAILED datasets/tests/test_openml.py::test_fetch_openml_verify_checksum[True]

with the following error traceback:

2020-11-24T16:19:24.2263007Z self = RangeIndex(start=0, stop=1, step=1)
2020-11-24T16:19:24.2263373Z 
2020-11-24T16:19:24.2263663Z     @cache_readonly
2020-11-24T16:19:24.2264148Z     def nbytes(self) -> int:
2020-11-24T16:19:24.2264461Z         """
2020-11-24T16:19:24.2264877Z         Return the number of bytes in the underlying data.
2020-11-24T16:19:24.2265280Z         """
2020-11-24T16:19:24.2265578Z         rng = self._range
2020-11-24T16:19:24.2265961Z >       return getsizeof(rng) + sum(
2020-11-24T16:19:24.2266404Z             getsizeof(getattr(rng, attr_name))
2020-11-24T16:19:24.2266869Z             for attr_name in ["start", "stop", "step"]
2020-11-24T16:19:24.2267216Z         )
2020-11-24T16:19:24.2267567Z E       TypeError: getsizeof(...)
2020-11-24T16:19:24.2268134Z E           getsizeof(object, default) -> int
2020-11-24T16:19:24.2268491Z E       
2020-11-24T16:19:24.2268855Z E           Return the size of object in bytes.
2020-11-24T16:19:24.2269205Z E       
2020-11-24T16:19:24.2269695Z E       sys.getsizeof(object, default) will always return default on PyPy, and
2020-11-24T16:19:24.2270318Z E       raise a TypeError if default is not provided.
2020-11-24T16:19:24.2270721Z E       
2020-11-24T16:19:24.2271186Z E       First note that the CPython documentation says that this function may
2020-11-24T16:19:24.2271859Z E       raise a TypeError, so if you are seeing it, it means that the program
2020-11-24T16:19:24.2272447Z E       you are using is not correctly handling this case.
2020-11-24T16:19:24.2272840Z E       
2020-11-24T16:19:24.2273307Z E       On PyPy, though, it always raises TypeError.  Before looking for
2020-11-24T16:19:24.2274004Z E       alternatives, please take a moment to read the following explanation as
2020-11-24T16:19:24.2274687Z E       to why it is the case.  What you are looking for may not be possible.
2020-11-24T16:19:24.2275112Z E       
2020-11-24T16:19:24.2275573Z E       A memory profiler using this function is most likely to give results
2020-11-24T16:19:24.2276218Z E       inconsistent with reality on PyPy.  It would be possible to have
2020-11-24T16:19:24.2276874Z E       sys.getsizeof() return a number (with enough work), but that may or
2020-11-24T16:19:24.2277656Z E       may not represent how much memory the object uses.  It doesn't even
2020-11-24T16:19:24.2278343Z E       make really sense to ask how much *one* object uses, in isolation
2020-11-24T16:19:24.2278956Z E       with the rest of the system.  For example, instances have maps,
2020-11-24T16:19:24.2279765Z E       which are often shared across many instances; in this case the maps
2020-11-24T16:19:24.2280459Z E       would probably be ignored by an implementation of sys.getsizeof(),
2020-11-24T16:19:24.2281121Z E       but their overhead is important in some cases if they are many
2020-11-24T16:19:24.2281768Z E       instances with unique maps.  Conversely, equal strings may share
2020-11-24T16:19:24.2282610Z E       their internal string data even if they are different objects---or
2020-11-24T16:19:24.2283287Z E       empty containers may share parts of their internals as long as they
2020-11-24T16:19:24.2284099Z E       are empty.  Even stranger, some lists create objects as you read
2020-11-24T16:19:24.2284800Z E       them; if you try to estimate the size in memory of range(10**6) as
2020-11-24T16:19:24.2285661Z E       the sum of all items' size, that operation will by itself create one
2020-11-24T16:19:24.2286395Z E       million integer objects that never existed in the first place.

I will open a PR marking them as XFAIL to not block #18879 but I think that we should investigate the underlaying issue.

@thomasjpfan
Copy link
Member

As a data point, I was not able to reproduce this issue on pypy3.6 on OSX using pypy3.6 from conda-forge.

@lesteve
Copy link
Member

lesteve commented Jun 3, 2024

Closing after PyPy official support has been dropped #29128

@lesteve lesteve closed this as not planned Won't fix, can't repro, duplicate, stale Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants