Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache #670

Merged
merged 3 commits into from Jul 6, 2021
Merged

Cache #670

merged 3 commits into from Jul 6, 2021

Conversation

@ohler55
Copy link
Owner

@ohler55 ohler55 commented Jul 6, 2021

Adds hash key caching as well as string caching. Both are options. Gives a 20% performance boost.

@ohler55 ohler55 merged commit 78074e4 into develop Jul 6, 2021
29 of 31 checks passed
@ohler55 ohler55 deleted the cache branch Jul 6, 2021
@PikachuEXE
Copy link

@PikachuEXE PikachuEXE commented Jul 7, 2021

Will the new options cache_str & cache_keys be documented?

@ohler55
Copy link
Owner Author

@ohler55 ohler55 commented Jul 7, 2021

I guess I should upload the new docs. :-)

@ohler55
Copy link
Owner Author

@ohler55 ohler55 commented Jul 7, 2021

Ok, updated the docs on the web site.

@xokocodo
Copy link

@xokocodo xokocodo commented Jul 7, 2021

Hi, I noticed some performance problems with large JSON files (many different keys) and the latest change. I tried using the cache_str & cache_keys options, but they don't seem to be helping either. Any ideas?

@ohler55
Copy link
Owner Author

@ohler55 ohler55 commented Jul 7, 2021

If using many keys then it would be best if cache_keys was turned off. When turned off the performance should be the same as previous versions. Are you seeing something different?

@xokocodo
Copy link

@xokocodo xokocodo commented Jul 8, 2021

Yeah, with some large JSON files I have I am seeing a 10x running time versus version 3.11.8, even with cache_keys turned off. I haven't looked into it too deeply but just wanted to check if this was a known issue or if there was something I am missing.

@ohler55
Copy link
Owner Author

@ohler55 ohler55 commented Jul 8, 2021

Do you have a file I can test against and what mode you are using?

@xokocodo
Copy link

@xokocodo xokocodo commented Jul 8, 2021

Sure, here is a link to one of the files (19MB): https://drive.google.com/file/d/10sZ-8ksIfgDsq-kfPJUECa9WffzXx8u5/view?usp=sharing

In my environment this takes ~2.4 seconds with 3.11.8 and ~25 seconds with 3.12.0

@ohler55
Copy link
Owner Author

@ohler55 ohler55 commented Jul 8, 2021

Thank you. What mode are you using? :compat, :object, etc

I'll dig into it tomorrow and get it resolved one way or another.

@xokocodo
Copy link

@xokocodo xokocodo commented Jul 8, 2021

Thank you! I was using the default (:object) mode but it looks like the difference is consistent across different modes

@ohler55
Copy link
Owner Author

@ohler55 ohler55 commented Jul 8, 2021

Pushed a "cache-fix" branch. The primary fix was to have :compat mode honor the :cache_keys option. I suspect that was the mode you were using. :object mode does not use the cache and :strict was already honoring the option.

The secondary change was to increase the cache size. Being a cache it will only show any kind of improvement with repetitions of the keys in either the same file or in multiple files. Even then the number of different keys should be less than something like 100,000. Maybe lower like 20,000. It really depends on the situation and benchmarking with the option on and off would be worthwhile for any given situation.

Anyway, please let me know what you think about the branch.

@xokocodo
Copy link

@xokocodo xokocodo commented Jul 9, 2021

The branch seems to be performing much better again (back around ~3 seconds). I tried a few different modes again and they are all much faster. I'm not really sure why that would be the case, but maybe I'm doing something wrong.

Anyways this resolves the issue I was seeing. Thank you very much for taking a look at it!

@ohler55
Copy link
Owner Author

@ohler55 ohler55 commented Jul 9, 2021

For that JSON and only parsing it one time I would not turn caching on. Each key is unique and there are around 700,000 keys. Far outside the recommended limits.

Thanks for bringing the issue to my attention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants