Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the yajl_c backend supported on PyPy? #82

Closed
jpmckinney opened this issue Jan 12, 2023 · 7 comments
Closed

Is the yajl_c backend supported on PyPy? #82

jpmckinney opened this issue Jan 12, 2023 · 7 comments
Labels
question Further information is requested

Comments

@jpmckinney
Copy link

I think yajl2_cffi worked for me, last time I tested, but yajl_c was causing C errors.

The docs mention "python: pure Python parser, good to use with PyPy"

Do you happen to know the difference in performance between a YAJL and pure-Python backend when using PyPy?

Also, should the backend selection code use different options/ordering on PyPy?

for backend in ('yajl2_c', 'yajl2_cffi', 'yajl2', 'yajl', 'python'):

@jpmckinney jpmckinney added the question Further information is requested label Jan 12, 2023
@rtobar
Copy link

rtobar commented Jan 13, 2023

@jpmckinney all good questions! So:

  • I know it compiles with PyPy (we even publish binary wheels provided by cibuildwheel). About running it: I'm not a PyPy user myself, so I've only ever tried it sporadically to check that the tests run, and they have. I don't do it very often though, so there could be issues that I don't know of (but the CI tests that run when building the wheels always pass with PyPy).
  • I don't know about performance, I've never properly measured it (again, no by PyPy user). Since the Python C API impl for PyPy is slower than for CPython I'd assume yajl2_c runs slower in PyPy than in CPython, but I don't know how it fares with respect to the other backends. If you are willing to put some numbers in I'd be interested to see them, and potentially change the default order in which backends are loaded to determine the default on PyPy. You can use the benchmark.py utility in the top-level directory to run one of the built-in synthetic scenarios, or against your own JSON files.
  • I think the "python: pure Python parser, good to use with PyPy" phrase is mostly a historical relic, as all backends should be good to use with PyPy really. Based on the benchmark results it could still be that this is the fastest in PyPy though.

@jpmckinney
Copy link
Author

jpmckinney commented Jan 13, 2023

Thanks!

That jogged my memory a bit – I do something unusual in my code, where I build a dict in which some values are generators. I then use this code when I need to serialize the dict to JSON.

https://github.com/open-contracting/ocdskit/blob/9984b80b524c0a57222f10f76a209bc906c09799/ocdskit/util.py#L40-L68

Somewhere in there, the combination of generators and ijson caused a C error.

Anyway, I'm trying to reproduce it now, but I can't get pip to find YAJL headers when using PyPy (I can use the yajl_c backend in CPython, but I think it's included in the wheel). python -c 'import ijson; print(ijson.backend)' just returns 'python' in my PyPy environment.

@rtobar
Copy link

rtobar commented Jan 13, 2023

That's interesting about the backends available to you. I just double-checked one the latest I published just the other day for ijson 3.2.0 under https://pypi.org/project/ijson/#files (pypy39, manylinux, x86_64) and it contained both the compiled yajl library and the yajl2_c backends. I also gave it a quick wirl:

$ sudo apt install pypy3-venv
$> pypy3 -mvenv lala
$> source lala/bin/activate
(lala) $ pypy -c 'import ijson; print(ijson.backend)'
yajl2_c

Boom!

And as a tiny benchmark:

(lala) $ cp ~/scm/git/ijson/benchmark.py . # otherwise it uses *that* copy of ijson and doesn't load all backends properly
(lala) $ pypy benchmark.py 
#mbytes,method,test_case,backend,time,mb_per_sec
0.191, basic_parse, long_list, python, 0.036, 5.326
0.191, basic_parse, long_list, yajl2, 0.196, 0.973
0.191, basic_parse, long_list, yajl2_cffi, 0.030, 6.262
0.191, basic_parse, long_list, yajl2_c, 0.062, 3.061
1.886, basic_parse, big_int_object, python, 0.107, 17.704
1.886, basic_parse, big_int_object, yajl2, 0.319, 5.905
1.886, basic_parse, big_int_object, yajl2_cffi, 0.054, 35.115
1.886, basic_parse, big_int_object, yajl2_c, 0.146, 12.930
2.077, basic_parse, big_decimal_object, python, 0.236, 8.783
2.077, basic_parse, big_decimal_object, yajl2, 0.379, 5.475
2.077, basic_parse, big_decimal_object, yajl2_cffi, 0.100, 20.775
2.077, basic_parse, big_decimal_object, yajl2_c, 0.332, 6.248
1.801, basic_parse, big_null_object, python, 0.094, 19.090
1.801, basic_parse, big_null_object, yajl2, 0.273, 6.598
1.801, basic_parse, big_null_object, yajl2_cffi, 0.040, 44.615
1.801, basic_parse, big_null_object, yajl2_c, 0.101, 17.829
1.849, basic_parse, big_bool_object, python, 0.078, 23.842
1.849, basic_parse, big_bool_object, yajl2, 0.288, 6.426
1.849, basic_parse, big_bool_object, yajl2_cffi, 0.044, 42.343
1.849, basic_parse, big_bool_object, yajl2_c, 0.096, 19.163
2.649, basic_parse, big_str_object, python, 0.095, 27.807
2.649, basic_parse, big_str_object, yajl2, 0.353, 7.501
2.649, basic_parse, big_str_object, yajl2_cffi, 0.057, 46.466
2.649, basic_parse, big_str_object, yajl2_c, 0.147, 18.059
8.000, basic_parse, big_longstr_object, python, 0.146, 54.769
8.000, basic_parse, big_longstr_object, yajl2, 0.480, 16.654
8.000, basic_parse, big_longstr_object, yajl2_cffi, 0.057, 141.468
8.000, basic_parse, big_longstr_object, yajl2_c, 0.164, 48.791
19.264, basic_parse, object_with_10_keys, python, 0.764, 25.209
19.264, basic_parse, object_with_10_keys, yajl2, 3.049, 6.318
19.264, basic_parse, object_with_10_keys, yajl2_cffi, 0.461, 41.819
19.264, basic_parse, object_with_10_keys, yajl2_c, 1.902, 10.128
0.381, basic_parse, empty_lists, python, 0.036, 10.482
0.381, basic_parse, empty_lists, yajl2, 0.113, 3.375
0.381, basic_parse, empty_lists, yajl2_cffi, 0.026, 14.803
0.381, basic_parse, empty_lists, yajl2_c, 0.051, 7.532
0.381, basic_parse, empty_objects, python, 0.021, 18.226
0.381, basic_parse, empty_objects, yajl2, 0.282, 1.355
0.381, basic_parse, empty_objects, yajl2_cffi, 0.022, 17.367
0.381, basic_parse, empty_objects, yajl2_c, 0.050, 7.614

So cffi seems to be the winner in this case.

It'd be good to see more evidence that gives these backends a natural sorting order in which we can recommend them under pypy.

For reference, this is the same benchmark with CPython 3.10:

(ijson) $ python benchmark.py 
#mbytes,method,test_case,backend,time,mb_per_sec
0.191, basic_parse, long_list, python, 0.154, 1.235
0.191, basic_parse, long_list, yajl2, 0.091, 2.093
0.191, basic_parse, long_list, yajl2_cffi, 0.089, 2.154
0.191, basic_parse, long_list, yajl2_c, 0.008, 24.960
1.886, basic_parse, big_int_object, python, 0.327, 5.764
1.886, basic_parse, big_int_object, yajl2, 0.177, 10.642
1.886, basic_parse, big_int_object, yajl2_cffi, 0.167, 11.311
1.886, basic_parse, big_int_object, yajl2_c, 0.017, 107.875
2.077, basic_parse, big_decimal_object, python, 0.343, 6.053
2.077, basic_parse, big_decimal_object, yajl2, 0.192, 10.839
2.077, basic_parse, big_decimal_object, yajl2_cffi, 0.177, 11.746
2.077, basic_parse, big_decimal_object, yajl2_c, 0.028, 74.584
1.801, basic_parse, big_null_object, python, 0.270, 6.667
1.801, basic_parse, big_null_object, yajl2, 0.101, 17.869
1.801, basic_parse, big_null_object, yajl2_cffi, 0.111, 16.208
1.801, basic_parse, big_null_object, yajl2_c, 0.014, 131.166
1.849, basic_parse, big_bool_object, python, 0.272, 6.803
1.849, basic_parse, big_bool_object, yajl2, 0.106, 17.429
1.849, basic_parse, big_bool_object, yajl2_cffi, 0.117, 15.738
1.849, basic_parse, big_bool_object, yajl2_c, 0.026, 70.817
2.649, basic_parse, big_str_object, python, 0.312, 8.488
2.649, basic_parse, big_str_object, yajl2, 0.151, 17.525
2.649, basic_parse, big_str_object, yajl2_cffi, 0.142, 18.710
2.649, basic_parse, big_str_object, yajl2_c, 0.016, 163.509
8.000, basic_parse, big_longstr_object, python, 0.323, 24.801
8.000, basic_parse, big_longstr_object, yajl2, 0.153, 52.134
8.000, basic_parse, big_longstr_object, yajl2_cffi, 0.143, 56.138
8.000, basic_parse, big_longstr_object, yajl2_c, 0.016, 510.421
19.264, basic_parse, object_with_10_keys, python, 3.236, 5.954
19.264, basic_parse, object_with_10_keys, yajl2, 1.582, 12.178
19.264, basic_parse, object_with_10_keys, yajl2_cffi, 1.490, 12.932
19.264, basic_parse, object_with_10_keys, yajl2_c, 0.168, 114.446
0.381, basic_parse, empty_lists, python, 0.159, 2.398
0.381, basic_parse, empty_lists, yajl2, 0.041, 9.251
0.381, basic_parse, empty_lists, yajl2_cffi, 0.073, 5.217
0.381, basic_parse, empty_lists, yajl2_c, 0.010, 36.912
0.381, basic_parse, empty_objects, python, 0.160, 2.390
0.381, basic_parse, empty_objects, yajl2, 0.041, 9.342
0.381, basic_parse, empty_objects, yajl2_cffi, 0.073, 5.203
0.381, basic_parse, empty_objects, yajl2_c, 0.010, 36.672

@jpmckinney
Copy link
Author

Ah, I'm on macos arm64, so that might be the reason – there's no arm 64 wheel for PyPy on macos.

So it looks like on PyPy (on that benchmark): _cffi > python > yajl2 > _c.

That said, yajl_c on CPython seems fastest all around.

@rtobar
Copy link

rtobar commented Jan 18, 2023

Yes, that seems to be more or less the order. Still I'd hesitate to make a decision based on those alone; if you (or someone else) could provide more real-life numbers it'd be great -- things might be different on a macos arm64 for example.

@jpmckinney
Copy link
Author

I probably won't be able to, as I can't figure out how to make ijson find YAJL headers on PyPy. Feel free to close the issue.

@rtobar
Copy link

rtobar commented Jan 20, 2023

OK, thanks for the feedback! I'll close this now, but this issue should be a good reference for future PyPy users.

@rtobar rtobar closed this as completed Jan 20, 2023
jpmckinney added a commit to open-contracting/ocdskit that referenced this issue Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants