Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HWKMETRICS-692] Remove unnecessary calls to findMetric #840

Merged
merged 2 commits into from Aug 23, 2017

Conversation

burmanm
Copy link
Contributor

@burmanm burmanm commented Jul 4, 2017

This PR requires PR #818 to be merged first. This removes the calls to metricsService.findMetric() in cases where it's not needed. This reduces the amount of Cassandra calls in read requests that use tags (other than fetching full metric definition).

@gbaufake
Copy link
Member

gbaufake commented Jul 6, 2017

@burmanm

I started a soak test on this PR like #818.

Results:

Beginning of Soak Test

  • 2017-07-05 19:37:54,547 UTC time

First Error appears (~2 minutes after test started)

2017-07-05 19:39:54,338, an nullPoint exception error manifests:

ERROR [org.hawkular.metrics.api.jaxrs.util.ApiUtils] (default task-6) HAWKMETRICS200010: Failed to process request: java.lang.NullPointerException at org.hawkular.metrics.core.service.MetricsServiceImpl.findDataPoints(MetricsServiceImpl.java:651) at org.hawkular.metrics.core.service.MetricsServiceImpl.findDataPoints(MetricsServiceImpl.java:644) at org.hawkular.metrics.core.service.MetricsServiceImpl.findGaugeStats(MetricsServiceImpl.java:908) Caused by: rx.exceptions.OnErrorThrowable$OnNextValue: OnError while emitting onNext value: org.hawkular.metrics.api.jaxrs.param.TimeAndBucketParams.class

Temporary Tables starts (~3 minutes after test started)

  • 2017-07-05 19:40:01,908 INFO [org.hawkular.metrics.core.jobs.JobsServiceImpl] (metricsservice-lifecycle-thread) Scheduled temporary table creator JobDetailsImpl{jobId=b034ba03-b800-47ab-b6cf-e4e2e2f362bc, jobType=TEMP_TABLE_CREATOR, jobName=TEMP_TABLE_CREATOR, parameters={}, trigger=RepeatingTrigger{triggerTime=1499298120000, interval=7200000, delay=60000}, status=NONE

TempDataCompressor starts and results in error (~83 minutes or 1 hour and 23 minutes after the test started)

2017-07-05 21:00:02,568 INFO [org.hawkular.metrics.core.jobs.TempDataCompressor] (RxIoScheduler-4) Starting to process temp table for starting time of 2017-07-05T22:00:00.000Z 2017-07-05 21:00:58,739 ERROR [org.hawkular.metrics.api.jaxrs.util.ApiUtils] (RxComputationScheduler-2) HAWKMETRICS200010: Failed to process request: java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:500) at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:135) at fi.iki.yak.ts.compression.gorilla.ByteBufferBitInput.flipByte(ByteBufferBitInput.java:87) at fi.iki.yak.ts.compression.gorilla.ByteBufferBitInput.readBit(ByteBufferBitInput.java:37) at fi.iki.yak.ts.compression.gorilla.ByteBufferBitInput.nextClearBit(ByteBufferBitInput.java:74) at fi.iki.yak.ts.compression.gorilla.Decompressor.bitsToRead(Decompressor.java:60) at fi.iki.yak.ts.compression.gorilla.Decompressor.nextTimestamp(Decompressor.java:86) at fi.iki.yak.ts.compression.gorilla.Decompressor.next(Decompressor.java:55) at fi.iki.yak.ts.compression.gorilla.Decompressor.readPair(Decompressor.java:37) at org.hawkular.metrics.core.service.transformers.DataPointDecompressTransformer.lambda$call$1(DataPointDecompressTransformer.java:87) at rx.internal.operators.OnSubscribeMap$MapSubscriber.onNext(OnSubscribeMap.java:69) at rx.internal.operators.OperatorMerge$MergeSubscriber.emitScalar(OperatorMerge.java:395) at rx.internal.operators.OperatorMerge$MergeSubscriber.tryEmit(OperatorMerge.java:355) at rx.internal.operators.OperatorMerge$InnerSubscriber.onNext(OperatorMerge.java:846) at org.hawkular.rx.cassandra.driver.ResultSetToRowsTransformer$RowProducer.produce(ResultSetToRowsTransformer.java:111) at org.hawkular.rx.cassandra.driver.ResultSetToRowsTransformer$RowProducer.lambda$execute$0(ResultSetToRowsTransformer.java:154) at rx.internal.schedulers.EventLoopsScheduler$EventLoopWorker$1.call(EventLoopsScheduler.java:172) at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

Caused by: rx.exceptions.OnErrorThrowable$OnNextValue: OnError while emitting onNext value: com.datastax.driver.core.ArrayBackedRow.class at rx.exceptions.OnErrorThrowable.addValueAsLastCause(OnErrorThrowable.java:118)

@burmanm
Copy link
Contributor Author

burmanm commented Jul 6, 2017

@gbaufake Those lines make no sense. You're using a very old build? Line 651 in MetricsServiceImpl has been a comment line from a commit done on 19th of June.

@burmanm
Copy link
Contributor Author

burmanm commented Jul 7, 2017

Rebased against current master.

@jshaughn
Copy link
Contributor

jshaughn commented Jul 7, 2017

@burmanm :
(2:26:01 PM) gaYak: Well, when you wake up.. can you check if in metrics PR #840 the alert parts still needs Metric and not MetricId?
(2:26:09 PM) gaYak: GroupTriggerManager
(3:57:33 PM) jshaughn: We do need the Metric (well, we need more than the metricId) because we depend on the metric tags in this logic.

@burmanm
Copy link
Contributor Author

burmanm commented Jul 10, 2017

retest this please

@burmanm
Copy link
Contributor Author

burmanm commented Jul 10, 2017

(this should have no effect on the write performance as this doesn't touch the write path)

@burmanm
Copy link
Contributor Author

burmanm commented Jul 12, 2017

retest this please

1 similar comment
@FilipB
Copy link

FilipB commented Jul 12, 2017

retest this please

@burmanm
Copy link
Contributor Author

burmanm commented Jul 27, 2017

Rebased from master

@burmanm
Copy link
Contributor Author

burmanm commented Aug 2, 2017

@jsanda Can you review and merge?

@@ -149,7 +149,8 @@ public void getMetrics(

Observable<Metric<Double>> metricObservable = null;
if (tags != null) {
metricObservable = metricsService.findMetricsWithFilters(getTenant(), GAUGE, tags);
metricObservable = metricsService.findMetricIdentifiersWithFilters(getTenant(), GAUGE, tags)
.flatMap(metricsService::findMetric);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these findMetric calls are against the same partition. We ought to consider collecting all of the ids and doing a single query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, or at least make that request happen in a single unlogged batch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.doOnError(Throwable::printStackTrace)
.filter(row -> tenantId.equals(row.getString(0)))
.distinct()
.compose(new MetricFromFullDataRowTransformer(defaultTTL))
.compose(new MetricIdentifierFromFullDataRowTransformer(defaultTTL))
.flatMap(this::findMetric)
Copy link
Contributor

@jsanda jsanda Aug 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you do this findMetric call here? I am not sure why it is needed since findMetricsInMetricsIndex gets called below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you define which gets selected in the distinct operator? Say they both return:

Metric(Gauge, "t1", "m1"), but the one from metricsIndex returns with Tags: a=b and the one from data doesn't. We do a distinct and drop one. I'd definitely want to drop the one from findAllMetricIdentifiersInData but how does one select correct one? Ideas? This should be fixed in any case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand what you are saying. I think the distinct call at line 528 is not needed. See HWKMETRICS-715. I think it should come after line 529.

We do not store metric tags in the data tables. If the metric id is found in the data tables, it is not necessarily in metrics_idx. Line 530 is doing the same thing as the call to findMetricsInMetricsIndex at line 539, except it is getting executed for every metric id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we were referring to different distinct operators. I was thinking about the one in the line 545 which is linked to why line 530 does this::findMetric

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without that "this::findMetric", we could inadvertently remove the tags from the end result (as we compare MetricId equality, but one object does not have tags)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, no.. because of concatWith + distinct we should first let the metricsIndex survive and then check the data against duplicates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, because of the concatWtih + distinct the metrics emitted from the metricsIndex observable will survive. So can we remove the findMetric call on the setFromData observable?

@burmanm
Copy link
Contributor Author

burmanm commented Aug 3, 2017

Rebased from master at the same time (due to changes related to this PR).

@burmanm
Copy link
Contributor Author

burmanm commented Aug 16, 2017

Rebase from the master

@jsanda
Copy link
Contributor

jsanda commented Aug 22, 2017

@burmanm what is the status of this PR? We need this bug fix, and it probably needs to be back ported as well.

@burmanm
Copy link
Contributor Author

burmanm commented Aug 22, 2017

Just like PR #843 and #854, it's just waiting for someone to merge these. At least I haven't seen any comments here.

@jsanda
Copy link
Contributor

jsanda commented Aug 22, 2017

I thought we had discussed and agreed that the findMetric call at line 536 in MetricsServiceImpl is not needed because line 551 is

setFromMetricsIndex.concatWith(setFromData).distinct(Metric::getMetricId)

If the order of the concatenation is reversed, then it would break. I tested this locally to verify for myself as well.

@burmanm
Copy link
Contributor Author

burmanm commented Aug 22, 2017

Although it would seem so, a large amount of our tests break with that change.

@jsanda
Copy link
Contributor

jsanda commented Aug 22, 2017

I think some of the tests might need to be refactored. I see some failures happening in BaseMetricsITest.assertMetricsIndexMatches. That method was written when metric definitions were only read from the metrics_idx table, and I do not think it was ever updated to take into account that we now also fetch ids from the data table(s). I will do some more investigation.

@burmanm
Copy link
Contributor Author

burmanm commented Aug 23, 2017

That should fix it

@jsanda
Copy link
Contributor

jsanda commented Aug 23, 2017

retest this please

@jsanda jsanda merged commit 586d254 into hawkular:master Aug 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants