Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating headers to use new scheme #265

Merged
merged 3 commits into from Jun 8, 2017

Conversation

MSLaguana
Copy link
Member

Request IDs are expected to follow https://github.com/lmolkova/correlation/blob/master/hierarchical_request_id.md
and https://github.com/lmolkova/correlation/blob/master/http_protocol_proposal_v1.md

This change defaults to using the new hierarchical ID format, but continues
to interoperate with the legacy format as well.

Request IDs are expected to follow https://github.com/lmolkova/correlation/blob/master/hierarchical_request_id.md
and https://github.com/lmolkova/correlation/blob/master/http_protocol_proposal_v1.md

This change defaults to using the new hierarchical ID format, but continues
to interoperate with the legacy format as well.
@@ -20,6 +20,8 @@ class AutoCollectClientRequests {

public static INSTANCE: AutoCollectClientRequests;

private static requestNumber = 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you ever reset this counter? It is possible that uniqueParentId will not be unique. For instance, if Node server restarts and the counter goes to 1, or if there are more than one Node servers and each has a different requestNumber. Will it cause any problems?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parentId is a hierarchical Id of the form |root.suffix. or |freshRoot., and suffix and freshRoot are randomly generated for this particular request. Assuming that parentId is unique for this particular incoming request (which is reasonable, a collision of suffix or newly created GUID is unlikely) the uniqueParentId will be unique for each outgoing request unless requestNumber is ever reset. If the node server restarts or there are multiple servers, they should all be generating different parentIds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should not assume that parentId is indeed unique. It could not be unique because of

  • bug on the sender side
  • redirects (headers are copied as it on redirects)
  • some proxy that retries requests with the same headers

The spec suggests to append small e.g. 32-bits random integer (hex encoded to reduce length) for the parent id

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that the implementation does exactly what spec suggests: for the incoming requests it appends random int suffix, for the outgoing requests, it appends monotonically incremented number. So the outgoing request will have unique Id.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small note here: I've used a single counter in the whole process, while the .NET sdk seems to use one counter for each request. This means that the IDs could get large in a long-lived process.

parentId = parentId || operationId;

if (this.enabled) {
return {
operation: {
name: operationName,
id: operationId,
parentId: parentId
parentId: parentId,
correlationContextHeader: correlationContextHeader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary here? If possible, I would like to keep this operation object free of properties not present on https://github.com/Microsoft/ApplicationInsights-dotnet/blob/de488729c28b721a758e0024ea2d4f79909ff797/src/Core/Managed/Shared/Extensibility/Implementation/OperationContext.cs so that the API surface is somewhat consistent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that if an incoming request has a Correlation-Context header, then we must forward that value on in any outgoing requests. Where would you suggest that we store the value, if not here with the other values that have the same (whole-request) lifetime?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that the property bag header? It should fill CorrelationContext.customProperties. That was the placeholder object intended for this. The idea is to deserialize it into a dictionary so that its user-editable, and then re-serialize it when outgoing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to update the comment there. Today it says

Properties here can be exposed in a future SDK release via outgoing HTTP headers for correlating data cross-component.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my reading of https://github.com/lmolkova/correlation/blob/master/http_protocol_proposal_v1.md#correlation-context the Correlation-Context header should only be writable in the originating service, and should be read-only in all others. If that's the case, does deserializing into a dictionary still make sense? Although I see at the end of that section it does also talk about adding to it, which is a little confusing.

Copy link
Member

@lmolkova lmolkova Jun 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec says that the very first service may add properties to the Correlation-Context.
Any further service MUST consider properties as read only i.e. MUST NOT modify existing properties. And SHOULD NOT add new properties.
This is a guidance, but AppInsights SDK does not know which service is the first one, so you should just allow adding properties to the Correlation-Context.

If the requirement to keep properties came from upstream as read-only is too strict/hard/inefficient to implement and you would rather use simple dictionary - go ahead, make them mutable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can't deserialize the header into a map, since the spec explicitly says that keys may be duplicated, and we must not aggregate or deduplicate values. I'd prefer to keep this as an opaque string (that we potentially add , key=value to the end of), but I'd also be reasonably happy with converting it to an array of key=value strings, splitting on ', ' and potentially appending more in future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no way to reconcile the API surface here while keeping it an opaque string, unless we ignore the spec requirement to support adding properties and make the entire functionality private/internal to the SDK (which is maybe fine? but it would be great to get this working if the only issue is on how the end user interacts with it).

We have a key-value store already in the public API surface designated for this use, so if we keep this as an opaque string and publicly make it available on the correlation context we now have an easy to use storage that says its for correlation but doesn't work, and a complicated to use one that does work but you have to handle formatting yourself and know you cant use any illegal characters for headers.

We can replace the already designated object perhaps with one with custom get/setters. Avoiding the edge cases about duplication shouldn't be too difficult, this was already handled in .NET https://github.com/Microsoft/ApplicationInsights-dotnet-server/blob/c64fc1d28773c229b4351e6acf7dc6788229dd31/Src/Common/HeadersUtilities.cs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how .NET is handling duplicates in that code; they essentially aggregate all duplicates together and ignore all but the first value (and when they write a value, they erase all previous instances of the key and write the new key/value to the end). That doesn't seem like it matches the spec at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is strange isn't it.. The get returning only the first value seems perfectly fine (our SDK doesn't allow you to create duplicate keys, so if you have them you must have special code to read them. I don't think this is really an aggregation, nothing is being combined) That set seems totally broken though.

To strictly follow the spec (which as @lmolkova mentioned isn't totally necessary if it is too difficult) I think setting would instead work on a different data set that only you can initialize (and cannot conflict with any values that were prepopulated into the context), and before sending the header this different data set is appended on to the header string


if (request.headers[RequestResponseHeaders.requestIdHeader]) {
this.parentId = request.headers[RequestResponseHeaders.requestIdHeader];
this.requestId = CorrelationIdManager.generateRequestId(this.parentId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the manually supplied parentId ignored in the new format?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure this approach is correct. I would think our request ids for putting into AI telemetry envelopes do not change format but only the id sent out in the header does. This replaces the id for the envelope with the hierarchical id if I'm reading correctly.

@SergeyKanzhelev What's the correct approach here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that I understand the purpose of the code-supplied parentId. From what I understand of the spec, if an incoming request specifies a Request-Id, then this request is part of a larger tracked operation, and we should take that Id and extend it appropriately, which is what I do here. What I don't understand is how to combine an incoming Request-Id with an ambient/passed-in Request-Id. If the incoming request does not specify a Request-Id then we will go into the else case which checks for legacy headers, and if they are not found it will use the passed-in parentId.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manually passed in parent id was to allow filling requestId from correlationContext.operation.parentId when manually calling trackRequestSync. TrackRequestSync is too late to get the value from headers (since it should be the last call in the request handler). If you do, you won't be correlated with any of the dependency telemetry that used a different parentId.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also noticing the parameter is renamed here from requestid to parentid. It's a bit confusing to do that as the parameter should set requestid, we just happen to set the requestid with the value of parentid (to resolve the situation above).

EDIT: For clarity, requestId here does not mean the contents of the header Request-Id but the id field on the envelope generated in getRequestData (the plain, non-hierarchical id from the AI schema to associate children who have parentId tags matching this value)

@@ -110,6 +106,10 @@ class ServerRequestParser extends RequestParser {
return this.requestId;
}

public getCorrelationContext() {
return this.correlationContext;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fairly confusing naming, given that this is not a CorrelationContext object

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'm open to suggestions; perhaps getCorrelationContextHeader? It is "whatever value is specified in the correlation-context header, if any"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could get removed entirely since it should be CorrelationContext.customProperties

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now this getter is how we get access to the value to put it in the custom properties; all the header parsing happens in this class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, sorry I thought this class had access to the correlation context and could fill it directly. It would make sense since that is part of the header parsing

this.requestId = CorrelationIdManager.generateRequestId(rootId || this.parentId || parentId);
this.correlationContext = null;
}
this.operationId = CorrelationIdManager.getRootId(this.requestId);
Copy link
Member

@lmolkova lmolkova Jun 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if there were nor Request-Id nor legacy headers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no headers are present, then this.parentId will be null, but this.requestId will be CorrelationIdManager.generateRequestId(parentId) which will generate a hierarchical ID from the passed-in identifier (if any) or from nothing, so this.requestId should always be set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I thought there is the else-if branch for legacy headers. never mind

Copy link
Member

@OsvaldoRosado OsvaldoRosado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove side-effects this has on AI schema constructs. Notably the id field in getRequestData should continue to be a plain guid. ParentId restoration logic for TrackRequestSync should be unaffected as well. In general, there should be no changes to the form of the data sent out of the SDK to the AI backend.

Edit: Cleared up offline that this id logic will actually work okay. TrackRequestSync still needs to be addressed however

Additionally adding 'Id' data to tracked Dependency events.
@MSLaguana
Copy link
Member Author

I believe I have addressed these issues now.

@MSLaguana
Copy link
Member Author

Looks like one of the tests failed, but only on one of the node versions. It seems that client.sendPendingData invoked its callback twice? Nothing I touched should impact that though.

@@ -91,6 +91,9 @@ class AutoCollectClientRequests {

let requestParser = new ClientRequestParser(requestOptions, request);

const currentContext = CorrelationContextManager.getCurrentContext();
const uniqueParentId = currentContext && currentContext.operation && (currentContext.operation.parentId + AutoCollectClientRequests.requestNumber++ + '.');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Isn't this a request id not a parent id?

addHeaderData(header: string): void;
getProperty(prop: string): string;
setProperty(prop: string, val: string): void;
serializeToHeader(): string;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think anything other than get and set should be part of the public API

@@ -8,13 +8,17 @@ export interface CorrelationContext {
name: string;
id: string;
parentId: string; // Always used for dependencies, may be ignored in favor of incoming headers for requests
correlationContextHeader?: string;
};

/** Do not store sensitive information here.
* Properties here can be exposed in a future SDK release via outgoing HTTP headers for correlating data cross-component.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be updated since we now support this 😄

Copy link
Member

@OsvaldoRosado OsvaldoRosado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments to address. Looks awesome!

@OsvaldoRosado
Copy link
Member

@MSLaguana Looks like a re-run of the unit tests removed the failure. Might have already been a flaky test

Also addressing other code review comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants