Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add response body for partial 207 errors for aggregated payload #751

Merged
merged 2 commits into from Nov 10, 2016

Conversation

VinnyQ
Copy link
Contributor

@VinnyQ VinnyQ commented Nov 8, 2016

What

Add response body for aggregated/multi payload ingestion with partial invalid metrics (207 response code).
Fix missing metricName for aggregated payload validation response.
Fix and refactor integration tests.

Why

Currently partial failure during ingestion for aggregated metrics return a 207 response code with an empty body. A response body is returned to indicate which metrics failed and the reason for the failure.

How

Use the new sendErrorResponse() method in the HttpAggregatedMultiIngestionHandler.java class.

and various validation error message clean up.
@VinnyQ
Copy link
Contributor Author

VinnyQ commented Nov 8, 2016

@ChandraAddala @usnavi @shintasmith please review, thanks!

@coveralls
Copy link

coveralls commented Nov 8, 2016

Coverage Status

Coverage decreased (-0.02%) to 76.12% when pulling 0dfe983 on VinnyQ:response_body_for_partial_failures into fb65eef on rackerlabs:master.

@@ -113,6 +113,10 @@ private static void validate(AggregatedPayload payload) {
metricName = ((BluefloodSet) leafBean).getName();
}

if ((metricName.isEmpty()) && (payload.getAllMetricNames().size() > 0)) {
metricName = payload.getAllMetricNames().get(0);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the first metricName in the list always relate to the current violation? If there isn't a metricName, does that mean that it couldn't be processed, and should be empty?

Copy link
Contributor Author

@VinnyQ VinnyQ Nov 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I tested this. Each "single" aggregatedPayload will have the metricName of the offending metric as the first element. And actually, it'll always be the case that each instance of AggregatedPayload will only contain one leaf "metric". Here's an example of the part of a payload this class represent:

{
      "tenantId":"333333",
      "timestamp":1478711535000,
      "flushInterval": 15000,
      "counters":[
          {
              "name":"agg.count.ten",
              "value":1,
              "rate":1
          }
      ]
    }

Even if the metric above is part of a multi-metrics payload like this:

curl -i -H "content-type: application/json" -X POST 'http://localhost:2440/v2.0/836986/ingest/aggregated/multi' -d \
  '[
    {
      "tenantId":"333333",
      "timestamp":1478711535000,
      "flushInterval": 15000,
      "counters":[
          {
              "name":"agg.count.nine",
              "value":1,
              "rate":1
          }
      ]
    },
    {
      "tenantId":"333333",
      "timestamp":1478711535000,
      "flushInterval": 15000,
      "counters":[
          {
              "name":"agg.count.ten",
              "value":2,
              "rate":1
          }
      ]
    },
    {
      "tenantId":"333333",
      "timestamp":1478711535000,
      "flushInterval": 15000,
      "counters":[
          {
              "name":"agg.count.eleven",
              "value":1,
              "rate":1
          },
          {
              "name":"agg.count.twelve",
              "value":1,
              "rate":1
          }
      ]
    }
  ]'

This is because if you look in the HttpAggregatedMultiIngestionHandler class, it first parse the entire json payload to each individual element (stored in bundleList:
https://github.com/rackerlabs/blueflood/blob/master/blueflood-http/src/main/java/com/rackspacecloud/blueflood/inputs/handlers/HttpAggregatedMultiIngestionHandler.java#L76
and
https://github.com/rackerlabs/blueflood/blob/master/blueflood-http/src/main/java/com/rackspacecloud/blueflood/inputs/handlers/HttpAggregatedMultiIngestionHandler.java#L176)

Then for each of the element in bundleList (an AggregatedPayload), it does the validation and create the error payload if there's violation: https://github.com/rackerlabs/blueflood/blob/master/blueflood-http/src/main/java/com/rackspacecloud/blueflood/inputs/handlers/HttpAggregatedMultiIngestionHandler.java#L85

In the majority of the cases, the metricName is set from the previous block in the AggregatedPayload.validate() here: https://github.com/rackerlabs/blueflood/blob/master/blueflood-core/src/main/java/com/rackspacecloud/blueflood/inputs/formats/AggregatedPayload.java#L105-L114

with the assumption that leafBean was created correctly from the json parser.

However, there were cases where the parser itself didn't create the AggregatedPayload element correctly (e.g. a missing "value" field that is required), resulting in the type of the element not being set and an empty metricName. In this case, the only way to retrieve the metricName is through the single json element payload itself.

Hope that make sense. But do know that I tried different cases during testing, including one with multiple aggregated payloads, with the first few being valid and the last one being invalid, and the error response has the correct metricName for the one failing the validation (and not ingested), not the first metricName in the request payload. You'll see a few of this cases in the system integration tests itself.

assertEquals("Invalid error source", "collectionTime", errorResponse.getErrors().get(i).getSource());
assertEquals("Invalid error message", "Out of bounds. Cannot be more than " + BEFORE_CURRENT_COLLECTIONTIME_MS + " milliseconds into the past." +
" Cannot be more than " + AFTER_CURRENT_COLLECTIONTIME_MS + " milliseconds into the future", errorResponse.getErrors().get(i).getMessage());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@VinnyQ
Copy link
Contributor Author

VinnyQ commented Nov 9, 2016

nevermind you're right, good catch @usnavi , we really shouldn't return any metricNames for an entire payload if the collectionTime is invalid, since it's invalid for all of them, just not the first one.

I added the timestamp to the error response, which is more helpful for the user to pinpoint which metrics failed validation and ingestion.

@usnavi @shintasmith @ChandraAddala @izrik please review latest when you have a chance.

@coveralls
Copy link

coveralls commented Nov 9, 2016

Coverage Status

Coverage decreased (-0.1%) to 76.048% when pulling cb13313 on VinnyQ:response_body_for_partial_failures into fb65eef on rackerlabs:master.

@usnavi
Copy link
Contributor

usnavi commented Nov 10, 2016

Thanks @VinnyQ!

Copy link
Contributor

@shintasmith shintasmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than one minor comment to add comment, changes look good. 👍

private String metricName;
private String source;
private String message;

public ErrorData() {
}

public ErrorData(String tenantId, String metricName, String source, String message) {
public ErrorData(String tenantId, String metricName, String source, String message, Long timestamp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe more comment on what this timestamp represent? Is this ingest time? Collection time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that. Where should I put the comment?

@VinnyQ VinnyQ force-pushed the response_body_for_partial_failures branch from cb13313 to e00a7e2 Compare November 10, 2016 21:05
@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 76.154% when pulling e00a7e2 on VinnyQ:response_body_for_partial_failures into fb65eef on rackerlabs:master.

@coveralls
Copy link

coveralls commented Nov 10, 2016

Coverage Status

Coverage decreased (-0.08%) to 76.067% when pulling e00a7e2 on VinnyQ:response_body_for_partial_failures into fb65eef on rackerlabs:master.

@VinnyQ VinnyQ merged commit 674bd0a into rax-maas:master Nov 10, 2016
@VinnyQ VinnyQ deleted the response_body_for_partial_failures branch November 10, 2016 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants