New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
analyze performance of user_agent_parser gem #5
Comments
Interesting option to improve performance for slow filters: |
Do you think the Java version could be faster? Cheers, |
@jsvd looked into this a little today. I guess step 1 here would be answering:
As far as I can see the linked PRs only attempted using some other Java lib and not specifically the Java version of the lib currently used (https://github.com/ua-parser/uap-java). I'd be very surprised if that wouldn't yield a serious speedup. And even if it doesn't it shouldn't be so hard to fix whatever is holding the Java library back (still seeing some nasty things in the Java version too, so there's room <= lots of redundant parsing of part of the UA String for example). If you want I can take a stab at integrating the Java version + setting up a realistic benchmark to judge it. That shouldn't take much time :) |
@original-brownbear ++ on experimenting with uap-java, specially since user-agent-utils was eol'd. |
@jsvd should I go ahead on this one? :) |
yep. regardless of the outcome it will be an interesting exercise and will help us understand the performance nuances of this kind of problem |
@jsvd @suyograo so I set up the java version and a benchmark in https://github.com/original-brownbear/logstash-filter-useragent/tree/5. Used the test datasets the uap data has here (https://github.com/ua-parser/uap-core/tree/master/test_resources) and just ran over it (43k samples) in two runs: Good news:
Not so exciting news:
Parse all sequentially: Parse all sequentially and repeat each String 10 times: => I'll see if I can make the Java version faster with reasonable effort |
Did you try this one https://rubygems.org/gems/logstash-filter-useragent2/versions/3.0.0-java ? Using a different Java UA parser. |
@ebuildy no I tried https://github.com/ua-parser/uap-java here. Is that the version you used to get the 2.5x speedup in #23 (comment) ? |
Ya, but fields are different, in my use case (many different browsers), this helped a lot. I didnt catch up latest news about this plugin, do you plan to do an official Java version? thanks you |
I think for now (step 1) we need to keep the fields (and unfortunately the underlying regular expressions) exactly the same for compatibility reasons.
I think so if it actually does improve performance it is my understanding that we will move to a Java version. |
Very nice ! Keep me posted for a test it on a real env. if you want (10m hits per hour) ebuildy at gmail dot com Many thanks |
@jsvd @suyograo so this is what I found out/created:
=> I think we may be good (enough) here with the above. Realistically speaking, I feel like we could simply advise users to set |
This is great, @original-brownbear! the speed up + lower cache footprint are definitely enough gains to move to creating a PR. |
@original-brownbear nice work! Bummer about Given your analysis, aggressive caching + UAP in java seems like a good step, so +1 to turn this into a PR. |
* Speedups in UAP-Java code * Output format adjustments to UAP-Java code * Refactored Ruby code to work with UAP-Java code Fixes logstash-plugins#5
Currently this plugin can be a major resource of CPU usage during data ingestion.
In my MBP 13" core i5, 16gb and SSD, adding this plugin to a stdin -> grok -> date -> geoip -> elasticsearch pipeline slows the ingestion of 300k events by 30-40%
This is due to the high number of Regexp#match operations it's required to do for each single event.
Possible improvements: carefully introducing a LRU cache or reorganizing the yml file without losing the "specific to general" regexp pattern matching
The text was updated successfully, but these errors were encountered: