Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: PyPy3 compatibility: sys.getsizeof() #8586

Merged
merged 1 commit into from
Feb 9, 2017

Conversation

rlamy
Copy link
Contributor

@rlamy rlamy commented Feb 8, 2017

I'm currently working on ensuring compatibility of the upcoming pypy3.5 with numpy[*].

This use of sys.getsizeof() causes many spurious test failures on pypy3, because sys.getsizeof() is CPython-specific. Since there doesn't seem to be a pure-Python way of getting the size of the
internal PEP393 Unicode representation, I'm recomputing it using documented
invariants instead.

[*]: If you're interested, you can grab a Linux nightly from here and check for yourself. Most things already work, barring the occasional puzzling segfault.

…ode representation size

This is for PyPy3 compatibility: sys.getsizeof() is CPython-specific and
there doesn't seem to be a pure-Python way of getting the size of the
internal PEP393 Unicode representation, so recompute it using documented
invariants.
@charris
Copy link
Member

charris commented Feb 8, 2017

Ugh, that's ugly. How does pypy store unicode strings?

@charris charris closed this Feb 8, 2017
@charris
Copy link
Member

charris commented Feb 8, 2017

It's not even clear to me that the original is correct ;) Maybe for testing purposes.

@charris charris reopened this Feb 8, 2017
@charris
Copy link
Member

charris commented Feb 8, 2017

Sorry, fat fingered the comment and close button.

@rlamy
Copy link
Contributor Author

rlamy commented Feb 9, 2017

  • In cpyext, pypy3 emulates PEP393, so the C-visible objects are similar to CPython.
  • Indeed, if I understand the CPython source correctly, the correctness of the original relies on nothing changing between s and s + 'a' except for the length of the string, and on the UTF-8 representation not having been computed.

@njsmith
Copy link
Member

njsmith commented Feb 9, 2017 via email

@charris
Copy link
Member

charris commented Feb 9, 2017

I suppose it would also be possible to try ascii, latin1, utf-16, and utf-32 encodings and see which one first had a compatible length, {8,16,32}*number_characters.

@charris charris merged commit 520e498 into numpy:master Feb 9, 2017
@charris charris changed the title PyPy3 compatibility: sys.getsizeof() MAINT: PyPy3 compatibility: sys.getsizeof() Feb 9, 2017
@charris
Copy link
Member

charris commented Feb 9, 2017

Thanks @rlamy

@charris
Copy link
Member

charris commented Feb 9, 2017

@njsmith It occurs to me that perhaps we should discourage this usage of unicode strings in Python3. It was justified in Python2 which only offered ucs2 and ucs4, but in python3 the corresponding functionality would (almost) be byte strings encoded in utf-16 or utf-32.

@rlamy rlamy deleted the pypy3-getsizeof branch October 22, 2021 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants