Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too much data before UTF-8 encoding #204

Closed
igor-wl opened this issue Mar 28, 2023 · 2 comments
Closed

Too much data before UTF-8 encoding #204

igor-wl opened this issue Mar 28, 2023 · 2 comments

Comments

@igor-wl
Copy link

igor-wl commented Mar 28, 2023

I'm trying to push an 8KB message to pusher and while it works for ascii characters it breaks if for example there is even a single "🦄" char in the dictionary because now sys.getsizeof returns size of the string object as 34KB.

Considering that the final dictionary is encoded to UTF-8 before sending to pusher I think that the length check should also be performed on the UTF-8 encoded string and not on the python object.

The issue is with this code:

if sys.getsizeof(data) > 30720:
    raise ValueError("Too much data")

Small demo how much more bytes is consumed for python object compared to UTF-8 string.
image

@benjamin-tang-pusher
Copy link
Contributor

Hi, I was able to send that character from my Python backend.

Screenshot 2023-07-10 at 18 20 23

Are you using the latest version of our library? We made a change where ensure_ascii=False at

return json.dumps(data, cls=json_encoder, ensure_ascii=False)

utf-8 characters passed to our library remains as-is (before we accidentally escaped unicode characters so the size ballooned, but this shouldn't be the case anymore)

@benjamin-tang-pusher
Copy link
Contributor

Closing due to no response.

@benjamin-tang-pusher benjamin-tang-pusher closed this as not planned Won't fix, can't repro, duplicate, stale Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants