-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
This is very very slow on my computer #24
Comments
What backend are you using? See what happens when you try to import the other backends explicitly (see the README file for instructions). I suspect you are using the python backend, which is most likely the cause of the trouble. Also, what platform are you on, and how did you get ijson? There are binary wheels in PyPI for most Linux/Mac combinations in which the |
Separately, do you need to iterate over the whole file a first time? If you are extracting/filtering only some information out of it you could break sooner and not read all of it -- unless of course the filtering itself depends on the length of the locations array |
@rtobar I think I might have jumped a bit early to how to use this. I did not look at any backend stuff cause I thought it was for special cases. Right now Im running it on a Mac, and im aiming to run it in a docker container, where the VM has a memory restriction of 2GB. I will come back with some questions after I read up on your suggestion |
@rtobar I cant really get backends to load in any different way. import ijson.backends.yajl2_cffi as ijson but with yajl2_c But was not able to load this, it failed. Is ther anything else one should do to load the backend properly? |
@vongohren sorry, but I couldn't understand what exactly work and what didn't. Did both A different test you can try out is running he https://github.com/ICRAR/ijson/blob/master/benchmark.py tool. Download that file, and run |
@rtobar thanks for the patience 😁 And sorry for the sparse communication. Im very new to the python environment so I still need to learn how all the different things can stick together 🤓So not sure how I add yajl2 to my MAC, or the docker container this eventually is going to run in That benchmarking tool did give some insights!
I guess I have very few backends available. Is there a way to run the benchmark inside the venv where I did pip install the ijson? It did also finish the test, which is not optimal. You think it can be faster?
|
Ok, so a bit silly, but I just ran brew install yajl and the cloned benchmark gave yajl2 as a possible backend. But is it suppos eto show yajl2_c as a possible backend aswell, if I can use it? Because I see yajl2, is slower than the two other versions?
But it is still quite time consuming. Should it be this high? Or are there any other tweaking stuff I can do? |
@vongohren thanks for all the details, now things are becoming clear. Indeed you were using the python backend, which was my initial suspicion. If you What version of python (and MacOS) are you running? If it's 3.8 that might explain it, as I think (from memory) I had to skip generating binary wheels for that version. This is not the case for Linux wheels, which are generated for all python versions correctly. Now that you have yajl installed, you could try to compile the package yourself, hoping that you will end up with a usable yajl2_c backend for your tests (again, when building your container this shouldn't be a problem, as the package installed with pip should have it). The yajl2_c backend is usually ~10x faster than yajl2 and yajl2_cffi, so you should be down to reasonable times. |
Im running: Im getting my code to run when i add Im running this simple code
It takes
Iv also found this library: https://pypi.org/project/jsonslicer/#description, that was able to get through the file and I could handle all entries in about 98.12s. Without any special configuration. I would love to understand if im not able to run yajl2_c, that is why its not showing its true speed, or if this is a limet? Im basically trying to map the jsonfile with just a couple of the map_keys, included. |
I also tried this code on this many entries: 1062126
So maybe I'm taking som wrong approach to your lib? |
Iv might have found the culprit. Suddenly my function was blazingly fast. ijson got the running down to 10.8 seconds. But still, I got it faster with jsonslicer |
Yes. I'm still puzzled: you said that in your code you do
It seems you can do better than what you are doing. You mentioned a couple of times you just want to take some map keys out of the JSON stream, and it that case Or try also using `ijson.items(f, 'locations.item.timestampMs'). That should return only those values and nothing else, rather than building loads of objects that you end up discarding anyway. |
If you’re just looking for the best performance, try pip install ijson==3.0rc2, which is faster than JsonSlicer on their own benchmark. |
Thanks @jpmckinney that is interseting, I will look at it! |
@rtobar cool, thanks will check it up. It takes some time because this is a hobby project. But I appriciate the feedback. The jsonslicer have not provided feedback yet, so I would more preferrebly use this repo which do answer :) |
@rtobar @jpmckinney thanks for the followup, I will close this as Im moving onwards with a satisiefied result. But the feedback and assitance is much appreciated |
So I have a json file, 330 ish MB.
The content is like this
Meaning an array of locations.
If I run this through json_load, then iterate over the file, pull out the two map_keys I want. It takes about 20 seconds. It is doable.
But I cannot load the whole thing in memory anymore, it is to big for mye infrastructure, so I found this lib. But when I run fex
It takes many many minutes. I dont know how long acctually becaus i quit it everytime it goes to far.
Why is this? Im just trying to get the length of that list.
See how many points im working with.
Afterwards I want to pull out 3 map_keys, and combine them into a smaller object. But just need ti naje sure this software is fast enough.
Anyone with some insight on this?
The text was updated successfully, but these errors were encountered: