Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework multi date formatter merging causes a throughput drop for http_logs #36602

Closed
danielmitterdorfer opened this issue Dec 13, 2018 · 4 comments · Fixed by #36814
Closed
Assignees

Comments

@danielmitterdorfer
Copy link
Member

danielmitterdorfer commented Dec 13, 2018

#36447 (merged to master in c4f4378) has caused a performance regression the http_logs benchmarks on Dec 12, 2018. The numbers below are for our nightly benchmarking hardware for a single node with 4GB heap.

  • Before: median indexing throughput ~ 197000 docs/s
  • After: median indexing throughput ~ 155000 docs/s

Steps to reproduce

Run a benchmark with Rally and have a look at indexing throughput (note that this will download a benchmark data set with ~ 1.2GB):

esrally --runtime-jdk=8 --car="4gheap" --pipeline=from-sources-complete --revision=c4f437800680a8858131b5a5e7bb370c41d109a9 --track=http_logs --challenge=append-no-conflicts-index-only

To get the baseline numbers for comparison, point Rally to the previous commit 1bb6f84:

esrally --runtime-jdk=8 --car="4gheap" --pipeline=from-sources-complete --revision=1bb6f844fee92d1c240a124499f0c299c09a1daf --track=http_logs --challenge=append-no-conflicts-index-only

Note: If you are using Oracle JDK you can also get a profile by specifying --telemetry=jfr.

@rjernst can you please take a look at this one?

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@rjernst
Copy link
Member

rjernst commented Dec 13, 2018

@spinscale This is concerning to me because the new way is mimicking what we must do with java 8 time. So I would expect this same issue to appear when using java 8 time, because it does not have the equivalent to Joda's multiple parsers within a date formatter. Thoughts?

@rjernst
Copy link
Member

rjernst commented Dec 13, 2018

To clarify my previous statement: @dm reached out to me on another channel with this info:

it seems the problem might be due to a lot of IllegalArgumentException instances getting thrown during date parsing

With java 8 time, and what the referenced PR does, is try each format in a multi date format one at a time, catching IllegalArgumentException. In joda, multiple parsers could be set within a DateTimeFormatter. This can be done (sort of) in java time by using DateTimeFormatterBuilder and using appendOptional with each DateTimeFormatter. However, there is no builtin or easy way to make a DateTimeFormatter for epoch seconds or milliseconds. The first is possible, but kludgy, and the second is not possible with the ChronoFields built into java, although it should be possible in theory. But this seems to be our only option to avoid having many IAEs thrown internally.

I'm going to try reworking the multi date formats back to having a single parser as described above.

@spinscale
Copy link
Contributor

@rjernst ++ that outlined approach makes sense to me! Thanks for checking

rjernst added a commit to rjernst/elasticsearch that referenced this issue Dec 19, 2018
This commit partially reverts elastic#36447 by using the ability of Joda time's
DateTimeFormatterBuilder to append multiple parsers instead of using the
MergedDateFormatter. The MergedDateFormatter will be removed in a future
change, as it is not as performant due to creating potentially many
exceptions during heavy date parsing. This change is a stop-gap until
that followup is ready.

closes elastic#36602
@jasontedor jasontedor added v6.7.0 and removed v6.6.0 labels Dec 19, 2018
@jasontedor jasontedor added v6.6.0 and removed v6.7.0 labels Dec 19, 2018
rjernst added a commit that referenced this issue Dec 19, 2018
This commit partially reverts #36447 by using the ability of Joda time's
DateTimeFormatterBuilder to append multiple parsers instead of using the
MergedDateFormatter. The MergedDateFormatter will be removed in a future
change, as it is not as performant due to creating potentially many
exceptions during heavy date parsing. This change is a stop-gap until
that followup is ready.

closes #36602
rjernst added a commit that referenced this issue Dec 19, 2018
This commit partially reverts #36447 by using the ability of Joda time's
DateTimeFormatterBuilder to append multiple parsers instead of using the
MergedDateFormatter. The MergedDateFormatter will be removed in a future
change, as it is not as performant due to creating potentially many
exceptions during heavy date parsing. This change is a stop-gap until
that followup is ready.

closes #36602
rjernst added a commit that referenced this issue Dec 19, 2018
This commit partially reverts #36447 by using the ability of Joda time's
DateTimeFormatterBuilder to append multiple parsers instead of using the
MergedDateFormatter. The MergedDateFormatter will be removed in a future
change, as it is not as performant due to creating potentially many
exceptions during heavy date parsing. This change is a stop-gap until
that followup is ready.

closes #36602
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants