Join GitHub today
Optimize json parsing #6160
$ time python manage.py reindex_elasticsearch real 0m24.272s user 0m20.457s sys 0m0.337s
$ time python manage.py reindex_elasticsearch real 0m14.907s user 0m10.274s sys 0m0.336s
davidfischer left a comment
There's two pieces here:
In both cases, we are looking to swap to more performant libraries. One thing I'd like to see if where is this performance benefit coming from. Was JSON parsing the slow part and
My biggest concern about this is that selectolax has a single developer even though it does appear to be actively developed. The modest engine has a bit more developer backing. Orjson I've used in the past where JSON parsing speed was really important and it worked reasonably well.
# with `json` and `selectolax` real 0m20.142s user 0m14.789s sys 0m0.405s # with `orjson` and `selectolax` real 0m17.934s user 0m13.506s sys 0m0.412s # with `json` and `pyquery` real 0m31.838s user 0m25.688s sys 0m0.414s # with `orjson` and `pyquery` real 0m26.549s user 0m21.444s sys 0m0.314s
I believe both
humitos left a comment
I do not have too much experience with this code. Although, it seems a good change and I think that it's easy to revert if we find out that it does not behave better in the next deploy (reindex) than the current implementation.
My only concern about this PR is that it seems that we are changing some small logic on it as well (I'm not sure, though). If that's the case, I'd recommend to make this changes in different PR.
Also, if we care too much about having different implementations, this could be refactored to use a class with a specific API so we can swap from one to another. I suppose it's not worth the effort at this point, though.