Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(node): Remove query strings from transaction and span names #2857

Merged
merged 14 commits into from
Sep 2, 2020

Conversation

lobsterkatie
Copy link
Member

@lobsterkatie lobsterkatie commented Aug 28, 2020

Including query strings in the URLs we use to name transactions and spans has at least two drawbacks:

  • It makes them harder to group effectively
  • It exposes potentially sensitive data

This removes the query string from the URLs we use to name both transactions representing incoming requests and spans representing outgoing requests. It also fixes a special case in which some outgoing requests to external servers were missing the protocol, and adds tests not only for this behavior but for the tracing handler in general.

Before:

image

After:

image

@github-actions
Copy link
Contributor

github-actions bot commented Aug 28, 2020

size-limit report

Path Size
@sentry/browser - CDN Bundle (gzipped) 17.68 KB (+0.01% 🔺)
@sentry/browser - Webpack 18.47 KB (0%)
@sentry/react - Webpack 18.47 KB (0%)
@sentry/browser + @sentry/tracing - CDN Bundle (gzipped) 22.79 KB (+0.01% 🔺)

Copy link
Contributor

@rhcarvalho rhcarvalho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're removing the query string, we should also remove fragments.

The way this is implemented so far, it will remove query string and fragments when a query string exists (http://example.com/?q=1#top => http://example.com/), but will not remove fragments from URLs like http://example.com/#top.

@rhcarvalho
Copy link
Contributor

If we're talking about grouping, we may also consider whether the protocol should be part of the name.

let u = new URL(url);
return u.host + u.pathname;
urlOrPath => {
  let i = Math.min(urlOrPath.indexOf('#'), urlOrPath.indexOf('?'));
  return i == -1 ? urlOrPath : urlOrPath.slice(0, i);
}

@HazAT
Copy link
Member

HazAT commented Aug 28, 2020

We should also do this for Angular https://app.asana.com/0/1169125731075107/1191128420238986
Maybe it's worth moving this function into @sentry/utils

Also @rhcarvalho I would go with just pathname we never had the protocol anywhere yet and it will also mostly always be the same.

@lobsterkatie
Copy link
Member Author

If we're talking about grouping, we may also consider whether the protocol should be part of the name.

let u = new URL(url);
return u.host + u.pathname;

Also @rhcarvalho I would go with just pathname we never had the protocol anywhere yet and it will also mostly always be the same.

I think it's a different conversation depending on whether we're talking about incoming or outgoing requests.

Incoming requests (for which we create transactions) will presumably always be to <http or https, whichever is my standard>://<my domain, which is known because that's the server I'm monitoring>/<some route>, and so in that case, I do think it should just be pathname*. This is the current behavior, both before and in this PR.

Outgoing requests (for which we create spans) are highly likely to be to outside domains, so we have historically taken (and I think we should continue to take) the full URL.

* @HazAT, you mentioned parameterizing the path (/users/<user-id> rather than /users/1226), which I think is also a good idea, but which I'm going to punt to a future PR in order to get this out.

@lobsterkatie
Copy link
Member Author

lobsterkatie commented Aug 28, 2020

If we're removing the query string, we should also remove fragments.

The way this is implemented so far, it will remove query string and fragments when a query string exists (http://example.com/?q=1#top => http://example.com/), but will not remove fragments from URLs like http://example.com/#top.

Good point. Thanks for catching that!

urlOrPath => {
  let i = Math.min(urlOrPath.indexOf('#'), urlOrPath.indexOf('?'));
  return i == -1 ? urlOrPath : urlOrPath.slice(0, i);
}

Wouldn't this no-op on any url that didn't have both a ? and #, since .indexOf() returns -1 if it doesn't find something? (I'll admit I haven't tested it.)

Anyway, this seems to work:

if (urlPath.includes('?') || urlPath.includes('#')) {
  // in theory, the fragment should always come after the query string, but you never know, so make
  // the # a ? so the split will always happen if either is there
  return urlPath.replace('#', '?').split('?')[0];
}

Comment on lines +181 to +182
// internal routes end up with too many slashes
return `${protocol}//${hostname}${port}${path}`.replace('///', '/');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very poor-man's slash normalization.

We'd get this for free if using new Url(...).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say more here? I couldn't find any reference to a Url class in the MDN docs, and the URL class doesn't seem to be able to deal with just a path (even if it starts with three slashes):

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we even have the case when we have too many slashes? Afaik ClientRequest is printing http or https, without any slashes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my reply below: #2857 (comment)

packages/node/src/handlers.ts Outdated Show resolved Hide resolved
@@ -32,7 +32,7 @@ export function tracingHandler(): (
// TODO: At this point `req.route.path` (which we use in `extractTransaction`) is not available
// but `req.path` or `req.url` should do the job as well. We could unify this here.
const reqMethod = (req.method || '').toUpperCase();
const reqUrl = req.url;
const reqUrl = req.url && stripUrlPath(req.url);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about dropping req.url && here and making stripUrlPath return the input directly if it's not a string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would mean adding undefined to both the input and output types, which then would just push the &&ing to the other end (because it the result wouldn't be guaranteed to be defined) - no? Is that better? (Happy to do it if you think so.)

Comment on lines +181 to +182
// internal routes end up with too many slashes
return `${protocol}//${hostname}${port}${path}`.replace('///', '/');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we even have the case when we have too many slashes? Afaik ClientRequest is printing http or https, without any slashes

describe('tracingHandler', () => {
const urlString = 'http://dogs.are.great:1231/yay/';
const queryString = '?furry=yes&funny=very';
const fragment = '#adoptnotbuy';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we even have the case when we have too many slashes? Afaik ClientRequest is printing http or https, without any slashes

If it's an internal route, there's no protocol or hostname, and it looks like this:

image

packages/utils/src/misc.ts Outdated Show resolved Hide resolved
@lobsterkatie lobsterkatie marked this pull request as ready for review September 1, 2020 17:30
@kamilogorek kamilogorek merged commit 4d8abee into master Sep 2, 2020
@kamilogorek kamilogorek deleted the kmclb-simplify-transaction-names branch September 2, 2020 09:55
lobsterkatie added a commit that referenced this pull request Sep 10, 2020
…ons (#2882)

Reverts the `span` half of #2857, which added the stripping.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants