Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a more general (and faster) method to sanitize URLs with credentials #19239

Merged
merged 16 commits into from
Mar 31, 2022

Conversation

wxiaoguang
Copy link
Contributor

As discussed in

We can use a more general (and faster) method to sanitize URLs with credentials

Benefits:

  • Simple and intuitive
    • before: util.NewStringURLSanitizer(opts.Remote, true).Replace(opts.Remote)
    • now: util.SanitizeCredentialURLs(opts.Remote)
  • For errors:
    • before: get a remoteAddr; util.NewURLSanitizedError(err, remoteAddr, true)
    • now: util.SanitizeErrorCredentialURLs(err)
  • Faster, fast return when no credential, one time allocation for a bytes buffer for most cases.
  • The new implemention can remove all credentials in all URLs, while the old one could only remove the only one targeted URL.

@wxiaoguang wxiaoguang added type/refactoring Existing code has been cleaned up. There should be no new functionality. skip-changelog This PR is irrelevant for the (next) changelog, for example bug fixes for unreleased features. labels Mar 28, 2022
@wxiaoguang wxiaoguang added this to the 1.17.0 milestone Mar 28, 2022
"https://" + userPlaceholder + "@github.com/go-gitea/test_repo.git",
},
// case 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need this normal cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which case? I think I have covered most.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean why we delete these original test cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All code are rewritten, all new cases cover old ones. If you feel which is missing, please just point out and add it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And many cases are indeed the old cases, for example, these github.com cases.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Mar 28, 2022
modules/util/sanitize.go Outdated Show resolved Hide resolved
Co-authored-by: zeripath <art27@cantab.net>
Copy link
Contributor

@Gusted Gusted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems okay... The current code seems to be indeed faster:

BenchmarkNew-12          3575206               368.0 ns/op           320 B/op          2 allocs/op
BenchmarkOld-12           373596              2956 ns/op            3288 B/op          9 allocs/op

A relative minor issue that comes with this, is that the current code will "handle" URL's like ://@ into -> ://sanitized-credential@ whereby the old function would return a error that it is a incorrect URL, but like I said relative minor issue.

@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Mar 28, 2022
@wxiaoguang
Copy link
Contributor Author

Seems okay... The current code seems to be indeed faster:

BenchmarkNew-12          3575206               368.0 ns/op           320 B/op          2 allocs/op
BenchmarkOld-12           373596              2956 ns/op            3288 B/op          9 allocs/op

A relative minor issue that comes with this, is that the current code will "handle" URL's like ://@ into -> ://sanitized-credential@ whereby the old function would return a error that it is a incorrect URL, but like I said relative minor issue.

Thank you for the benchmark!

In my mind the major purpose is to make code more simple and intuitive, the performance is just a bonus.

And the edge case ://@ is fixed now, the minimal pattern is "s://u@h" now

@Gusted
Copy link
Contributor

Gusted commented Mar 29, 2022

Why the skip-changelog?

@wxiaoguang
Copy link
Contributor Author

Why the skip-changelog?

this PR is just a refactoring, it doesn't change the expected output, and doesn't improve performance obviously (because the sanitizer isn't called too much), so I think it's not necessary to mention it in the changelog, it doesn't provide useful information to end users.

feel free to adjust the labels.

u.User = url.User(userPlaceholder)
} else {
u.User = nil
// SanitizeCredentialURLs remove all credentials in URLs for the input string: "https://user:pass@domain.com" => "https://sanitized-credential@domain.com"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code only removes the credential from the first URL it finds in the string

If we want to strip out credentials from all urls in the string we're gonna need to do something else. (Possibly iterate across all "://" in the string however see below...)

One other issue is that there are URLs which do not have the ":", for example //username:password@www.google.com which says use the current protocol. I think /\username:password@www.google.com also works.

IMO I don't think we need to necessarily sanitize these - we should probably just make it clear that we're not going to do this.

Copy link
Contributor Author

@wxiaoguang wxiaoguang Mar 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will add it to comment. IMO "//xxxx" couldn't be treated as a valid URL in strings. It's only valid in a context with scheme already (say, inside a browser)

And /\ is not standard, it's just a browser's dirty hack. Golang's url.Parse doesn't support /\ either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// SanitizeCredentialURLs remove all credentials in URLs for the input string: "https://user:pass@domain.com" => "https://sanitized-credential@domain.com"
// SanitizeCredentialURLs remove all credentials in URLs (starting with "scheme://") for the input string: "https://user:pass@domain.com" => "https://sanitized-credential@domain.com"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This code only removes the credential from the first URL it finds in the string"

No, it removes all. See the test case

		{
			"URLs in log https://u:b@h and https://u:b@h:80/, with https://h.com and u@h.com",
			"URLs in log https://" + userPlaceholder + "@h and https://" + userPlaceholder + "@h:80/, with https://h.com and u@h.com",
		},

Co-authored-by: zeripath <art27@cantab.net>
@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Mar 30, 2022
@wxiaoguang wxiaoguang merged commit c831681 into go-gitea:main Mar 31, 2022
@wxiaoguang wxiaoguang deleted the refactor-url-sanitizer branch March 31, 2022 02:25
zjjhot added a commit to zjjhot/gitea that referenced this pull request Mar 31, 2022
* giteaofficial/main:
  Update reserved usernames list (go-gitea#18438)
  Configure OpenSSH log level via Environment in Docker (go-gitea#19274)
  Use a more general (and faster) method to sanitize URLs with credentials (go-gitea#19239)
  [skip ci] Updated translations via Crowdin
  fix link to package registry docs (go-gitea#19268)
  Add Redis Sentinel Authentication Support (go-gitea#19213)
@go-gitea go-gitea locked and limited conversation to collaborators Apr 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. skip-changelog This PR is irrelevant for the (next) changelog, for example bug fixes for unreleased features. type/refactoring Existing code has been cleaned up. There should be no new functionality.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants