Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result of dumps is corrupt in 3.8.4 (windows only) #331

Closed
ryuuji opened this issue Jan 6, 2023 · 5 comments
Closed

Result of dumps is corrupt in 3.8.4 (windows only) #331

ryuuji opened this issue Jan 6, 2023 · 5 comments

Comments

@ryuuji
Copy link

ryuuji commented Jan 6, 2023

After updating to 3.8.4, I have seen some errors.
The dumps results are giving "JSONDecodeError" and not decoding correctly as JSON.
Revert to 3.8.3, I don't have this problem.

I have tried it in Windows and Mac environments, but it only occurs in Windows. Therefore, it is possible that this is a problem specific to me, but I am reporting it just in case.

I noticed that some of the data that corrupts the output appears to be random, but has a high probability of being corrupted. I created a simple program to test a file (errors.json) that collected them.

import orjson

idx = 0
count = 0
with open('errors.json', 'rt', encoding='utf-8') as f:
    for line in f:
        try:
            orjson.loads(orjson.dumps(orjson.loads(line)))
        except orjson.JSONDecodeError:
            print(f'line.#{idx} JSONDecodeError')
            count += 1
        idx += 1
print(f'{count}/{idx} error happen.')
PS R:\orjson_tests> poetry run python .\main.py
line.#4 JSONDecodeError
line.#8 JSONDecodeError
line.#16 JSONDecodeError
line.#17 JSONDecodeError
line.#19 JSONDecodeError
line.#21 JSONDecodeError
line.#23 JSONDecodeError
line.#24 JSONDecodeError
line.#25 JSONDecodeError
line.#28 JSONDecodeError
line.#29 JSONDecodeError
line.#30 JSONDecodeError
line.#31 JSONDecodeError
line.#32 JSONDecodeError
line.#33 JSONDecodeError
line.#34 JSONDecodeError
line.#36 JSONDecodeError
line.#37 JSONDecodeError
line.#38 JSONDecodeError
line.#39 JSONDecodeError
line.#41 JSONDecodeError
line.#43 JSONDecodeError
line.#46 JSONDecodeError
line.#47 JSONDecodeError
line.#49 JSONDecodeError
line.#54 JSONDecodeError
line.#55 JSONDecodeError
line.#59 JSONDecodeError
line.#61 JSONDecodeError
29/69 error happen.

testcase.zip

Python version is 3.11.1 / Windows11

@ryuuji ryuuji changed the title Breaking dumps results in 3.8.4 (windows only) Result of dumps is corrupt in 3.8.4 (windows only) Jan 6, 2023
@ijl
Copy link
Owner

ijl commented Jan 6, 2023

Is there a line in the test cases that consistently fails on that machine? Was this a prebuilt wheel? Those are built on Windows VMs and exercise the library a good amount.

@ryuuji
Copy link
Author

ryuuji commented Jan 7, 2023

Yes, I use pre-build wheel via pip. (orjson-3.8.4-cp311-none-win_amd64.whl)
I have not found any data that consistently fails. However, I have uploaded data that occurs at a very high rate (about 50%). Every time I run it, a different row fails.

Look at output JSON that results in an error in a binary editor, it looks like a NULL is inserted before the key.

2023-01-07 11 04 22

@ryuuji
Copy link
Author

ryuuji commented Jan 7, 2023

I looped and got samples for the same data when it was handled correctly and when it was corrupted.

data.zip

In another Windows environment with a clean install is same result.

@ijl
Copy link
Owner

ijl commented Jan 10, 2023

Ok, I can reproduce it on CI. Thank you for the research. I've released 3.8.5 with a fix for Windows at the cost of performance. If someone wants to investigate this further, please do. I'm not up on Windows.

@ijl ijl closed this as completed Jan 10, 2023
@ryuuji
Copy link
Author

ryuuji commented Jan 11, 2023

Thanks, I have confirmed that 3.8.5 has solved the problem in some Windows environments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants