Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE-2674 Strip sensitive data from the url #6417

Merged
merged 20 commits into from Oct 7, 2022

Conversation

MALPI
Copy link
Contributor

@MALPI MALPI commented Aug 3, 2022

Fixes #2674 by replacing basic auth information as part of the URL with username:password.

@MALPI MALPI requested a review from a team as a code owner August 3, 2022 14:12
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Aug 3, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

@mateuszrzeszutek
Copy link
Member

Hey @MALPI ,
Can you sign the CLA?

@MALPI
Copy link
Contributor Author

MALPI commented Aug 4, 2022

hey @mateuszrzeszutek,

thanks for reaching out. We would need to sing a Company CLA. Could you provide us a PDF or something like that?

@mateuszrzeszutek
Copy link
Member

mateuszrzeszutek commented Aug 4, 2022

Could you provide us a PDF or something like that?

I honestly have no idea where to get it from. Maybe you can try the "please submit a support request ticket" link in the CLA bot message above; or maybe you can ask around in the CNCF #opentelemetry slack.

@MALPI
Copy link
Contributor Author

MALPI commented Aug 5, 2022

I am trying to sort this out with the legal department @mateuszrzeszutek. Is there anything that I can improve on the PR itself meanwhile?

return url;
}
// replace username & password
return url.replaceFirst("(?<=\\/\\/)(.+)(?=@)", "username:password");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex pattern matching is rather expensive; even more so when you're not using a precompiled Pattern.

How about using indexOf() to find out the positions of // and @ instead?
Also, you could just simply remove the credentials (together with the @ character); this would be more in line with the spec:

http.url MUST NOTcontain credentials passed via URL in form of https://username:password@www.example.com/. In such case the attribute's value should be https://www.example.com/

MALPI and others added 2 commits August 5, 2022 15:01
…trumentation/api/instrumenter/http/HttpClientAttributesExtractor.java

Co-authored-by: Mateusz Rzeszutek <mrzeszutek@splunk.com>
…trumentation/api/instrumenter/http/HttpClientAttributesExtractor.java

Co-authored-by: Mateusz Rzeszutek <mrzeszutek@splunk.com>
@MALPI MALPI closed this Aug 22, 2022
@MALPI MALPI reopened this Aug 22, 2022
@MALPI MALPI closed this Aug 23, 2022
@MALPI MALPI reopened this Aug 23, 2022
Comment on lines 67 to 68
if(url.contains("@")) {
return url.substring(0, url.indexOf("//")+2) + url.substring(url.indexOf("@")+1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: no need to lookup "@" twice:

Suggested change
if(url.contains("@")) {
return url.substring(0, url.indexOf("//")+2) + url.substring(url.indexOf("@")+1);
int atIndex = url.indexOf("@");
if (atIndex != -1) {
return url.substring(0, url.indexOf("//")+2) + url.substring(atIndex+1);

@mateuszrzeszutek
Copy link
Member

@MALPI can you run ./gradlew spotlessApply? I think the formatting might need to be fixed

Copy link
Member

@mateuszrzeszutek mateuszrzeszutek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍
Thanks @MALPI !

int atIndex = url.indexOf("@");
// replace username & password
if (atIndex != -1) {
return url.substring(0, url.indexOf("//") + 2) + url.substring(atIndex + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the url is http://somehost?foo=b@r or http://somehost/p@ge? I believe you should check that @ occurs before any of /?# and strip only then. Secondly can we be certain that the url actually contains //?

Copy link
Contributor Author

@MALPI MALPI Aug 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will add the check for query parameters. I know that basic auth works as well with ftp, but this would require a protocol prefix too? I could add a check for // before @?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mateuszrzeszutek mateuszrzeszutek dismissed their stale review September 6, 2022 13:23

Lauri pointed out a scenario that needs to be implemented

@MALPI MALPI closed this Sep 19, 2022
@MALPI MALPI reopened this Sep 19, 2022
@MALPI
Copy link
Contributor Author

MALPI commented Sep 21, 2022

@laurit @mateuszrzeszutek @trask

Can you have another look?

MALPI and others added 5 commits September 27, 2022 15:49
…trumentation/api/instrumenter/http/HttpClientAttributesExtractor.java

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
…trumentation/api/instrumenter/http/HttpClientAttributesExtractor.java

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
…trumentation/api/instrumenter/http/HttpClientAttributesExtractor.java

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
…trumentation/api/instrumenter/http/HttpClientAttributesExtractor.java

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
@MALPI
Copy link
Contributor Author

MALPI commented Oct 4, 2022

hey @trask @mateuszrzeszutek anything else?

}

int atIndex = url.lastIndexOf("@", index - 1);
int questionMarkIndex = url.indexOf("?");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is questionMarkIndex really needed?

int questionMarkIndex = url.indexOf("?");

// '@' char is present and is not a query param
if (atIndex == -1 || (questionMarkIndex != -1 && atIndex > index)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should also check whether @ is the last character and if it is return the url as is. Try with https://github.com@ java.net.URI parses it so that github.com@ is authority not userinfo.

int questionMarkIndex = url.indexOf("?");

// '@' char is present and is not a query param
if (atIndex == -1 || (questionMarkIndex != -1 && atIndex > index)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atIndex > index will always be false -- as Lauri's mentioned, you don't need that questionMarkIndex variable.

@mateuszrzeszutek
Copy link
Member

hey @trask @mateuszrzeszutek anything else?

Sorry, I was on vacation, I didn't check GH last week at all.

@MALPI
Copy link
Contributor Author

MALPI commented Oct 5, 2022

All good. Hope you enjoyed your vacation.

To be honest this PR got slightly out of hands and I am wondering if it was really the smartest decision to change the implementation from the Regex I had in the beginning to this.

We now have to exclude all the cases where an @ is not expected, and I'm by no means expert on URLs. Whereas we were looking for what we expect beforehand which makes it way more deterministic.

@mateuszrzeszutek
Copy link
Member

To be honest this PR got slightly out of hands and I am wondering if it was really the smartest decision to change the implementation from the Regex I had in the beginning to this.

Unfortunately regex patterns can be quite slow; and since this code runs on the hot path of every HTTP client call, it needs to be as fast as possible, and create as little garbage as possible. While manual iterating over the string can be inconvenient, it's faster.

MALPI and others added 3 commits October 5, 2022 13:53
…trumentation/api/instrumenter/http/HttpClientAttributesExtractor.java

Co-authored-by: Mateusz Rzeszutek <mrzeszutek@splunk.com>
# Conflicts:
#	instrumentation-api-semconv/src/main/java/io/opentelemetry/instrumentation/api/instrumenter/http/HttpClientAttributesExtractor.java
@mateuszrzeszutek mateuszrzeszutek merged commit 7117f42 into open-telemetry:main Oct 7, 2022
@mateuszrzeszutek
Copy link
Member

Thanks @MALPI ! 🚀

LironKS pushed a commit to helios/opentelemetry-java-instrumentation that referenced this pull request Oct 23, 2022
Fixes open-telemetry#2674 by replacing basic auth information as part of the URL with
`username:password`.

Co-authored-by: Malte <MALPI@users.noreply.github.com>
Co-authored-by: Mateusz Rzeszutek <mrzeszutek@splunk.com>
Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
LironKS pushed a commit to helios/opentelemetry-java-instrumentation that referenced this pull request Oct 31, 2022
Fixes open-telemetry#2674 by replacing basic auth information as part of the URL with
`username:password`.

Co-authored-by: Malte <MALPI@users.noreply.github.com>
Co-authored-by: Mateusz Rzeszutek <mrzeszutek@splunk.com>
Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
LironKS pushed a commit to helios/opentelemetry-java-instrumentation that referenced this pull request Dec 4, 2022
Fixes open-telemetry#2674 by replacing basic auth information as part of the URL with
`username:password`.

Co-authored-by: Malte <MALPI@users.noreply.github.com>
Co-authored-by: Mateusz Rzeszutek <mrzeszutek@splunk.com>
Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTTP Span Attributes: http.url must not contain username / password
4 participants