Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repospec: support ssh urls with ssh certificates #4741

Closed
wants to merge 9 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 23 additions & 10 deletions api/internal/git/repospec.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import (
"fmt"
"net/url"
"path/filepath"
"regexp"
"strconv"
"strings"
"time"
Expand Down Expand Up @@ -212,18 +213,26 @@ func peelQuery(arg string) (string, string, time.Duration, bool) {
return parsed.Path, ref, duration, submodules
}

var userRegex = regexp.MustCompile(`^[a-zA-Z][a-zA-Z0-9-]*@`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm for sure being pedantic, but:

  • Should the * be a +? I'd expect we wouldn't want to match an @ with nothing before it.
  • Should the ^ be \A?

Copy link
Contributor Author

@mightyguava mightyguava Sep 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The character class [a-zA-Z] already ensures a character before @. It is standalone from the second character class because the first character cannot be a number or -.

I've never seen \A before. Looking at the docs, its behavior is the same as ^ when multiline is not enabled. I think it's better to stick with ^ as it's more common.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're totally right about the initial character class 🤦‍♀️

I think \A is perfectly standard. I first learned about it in the context of writing safe regexes in a web app context (see the very first entry here for example). Testing some examples with Go indicates that it unexpectedly (to me) doesn't allow the newline already, and indeed it looks like it treats ^ as \A by default: https://github.com/golang/go/blob/master/src/regexp/syntax/parse.go#L819-L825


func parseHostSpec(n string) (string, string) {
var host string
// Start accumulating the host part.
for _, p := range []string{
// Order matters here.
"git::", "gh:", "ssh://", "https://", "http://", "file://",
"git@", "github.com:", "github.com/"} {
if len(p) < len(n) && strings.ToLower(n[:len(p)]) == p {
n = n[len(p):]
host += p
consumeHostStrings := func(parts []string) {
for _, p := range parts {
if len(p) < len(n) && strings.ToLower(n[:len(p)]) == p {
n = n[len(p):]
host += p
}
}
}
// Start accumulating the host part.
// Order matters here.
consumeHostStrings([]string{"git::", "gh:", "ssh://", "https://", "http://", "file://"})
if p := userRegex.FindString(n); p != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we end up forever supporting everything that we make work, I'm trying to wrap my head around everything this makes possible in addition to what if anything it could break.

  • Since we're relying on the @ in effect, what other valid paths might contain it, and does our parsing of them change before/after? For example, https://username:password@github.com/username/repository.git seems to parse currently, albeit discarding the username/password parts.
  • For the purposes of the feature you need, we only need to add support for arbitrary usernames when SSH in particular is being used with Github specifically, but will this make anything else new work? It seems to me arbitrary protocol prefixes will be accepted with arbitrary usernames, but only for Github. Other hosts will continue to only work with the Git user. Should we further restrict this? Any new combos that would work should get test coverage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://username:password@github.com/username/repository.git seems to parse currently, albeit discarding the username/password parts.

I have added a test to preserve the behavior.

I've also updated the regexp to specifically match on SSH only for GitHub.com

n = n[len(p):]
host += p
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm right about the simplification from the git protocol not being valid with Github, then I think the remaining source of correctness risk is for any non-Github URLs that work if they contain usernames today. Notably, I found an interaction between this code and the non-Github handling at L261, which expects the protocol to have been the last thing parsed. For example, the following smoke test passes on master and fails (with poor host parsing) on this branch:

		{
			name:      "t71",
			input:     "ssh://myusername@bitbucket.org/ourteamname/ourrepositoryname.git//path?ref=branch",
			cloneSpec: "ssh://myusername@bitbucket.org/ourteamname/ourrepositoryname.git",
			absPath:   notCloned.Join("path"),
			repoSpec: RepoSpec{
				Host:      "ssh://myusername@bitbucket.org/",
				OrgRepo:   "ourteamname/ourrepositoryname",
				Path:      "/path",
				Ref:       "branch",
				GitSuffix: ".git",
			},
		},

I suspect we need to make more changes to this method to both make this work and leave the method in a state that makes some sense.

}
consumeHostStrings([]string{"github.com:", "github.com/"})
if host == "git@" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Playing devil's advocate with my own comment above: arguably this conditional here is our current username handling, and we could reduce special casing in the implementation by changing it to apply whenever we've found any username, and allow it to consume any domain, rather than just github.com above. That would add risk in the form of additional supported permutations we can't specifically integration test... but in another way also reduce it by making use cases like the new one in this PR use the code paths we already have integration coverage for (not to mention reducing the difficulty of understanding this code). Wdyt?

cc @natasha41575 for another opinion on this tradeoff

Copy link
Contributor Author

@mightyguava mightyguava Sep 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledging I read this comment but don't have an opinion. I do like the idea! I'd prefer not to include that change in this PR though, since it would probably be much bigger.

i := strings.Index(n, "/")
if i > -1 {
Expand Down Expand Up @@ -257,10 +266,14 @@ func parseHostSpec(n string) (string, string) {

func normalizeGitHostSpec(host string) string {
s := strings.ToLower(host)
m := userRegex.FindString(host)
if strings.Contains(s, "github.com") {
if strings.Contains(s, "git@") || strings.Contains(s, "ssh:") {
switch {
case m != "":
host = m + "github.com:"
case strings.Contains(s, "ssh:"):
host = "git@github.com:"
} else {
default:
host = "https://github.com/"
}
}
Expand Down
2 changes: 2 additions & 0 deletions api/internal/git/repospec_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ func TestNewRepoSpecFromUrl_Permute(t *testing.T) {
{"git::https://git.example.com/", "https://git.example.com/"},
KnVerey marked this conversation as resolved.
Show resolved Hide resolved
{"git@github.com:", "git@github.com:"},
{"git@github.com/", "git@github.com:"},
{"org-12345@github.com:", "org-12345@github.com:"},
{"org-12345@github.com/", "org-12345@github.com:"},
KnVerey marked this conversation as resolved.
Show resolved Hide resolved
}
var orgRepos = []string{"someOrg/someRepo", "kubernetes/website"}
var pathNames = []string{"README.md", "foo/krusty.txt", ""}
Expand Down