Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo name mappings are not universally supported #462

Open
sqs opened this Issue Oct 20, 2018 · 9 comments

Comments

Projects
None yet
7 participants
@sqs
Copy link
Member

sqs commented Oct 20, 2018

  • As a site admin, I want to be able to make my repositories on Sourcegraph have "nice names". (Not all site admins want this, but some do.) By that I mean I want them to be no longer than needed, like foo/bar instead github.com/foo/bar, or baz/qux instead of my-corporate-github-enterprise.internal.northamerica.my-long-company-name.net/baz/qux.

    It is important to make them short because

    1. it makes Sourcegraph URLs shorter, which makes them easier to read and share
    2. it makes many repo: filters easier to type
    3. places in the Sourcegraph UI that show the full repo name don't overflow due to taking up too much screen width.
  • As a user, I want to be able to jump from the repository on my code host or in my editor to the repository on Sourcegraph, even if the repository is not the full long name on Sourcegraph.

Current state:

  • repositoryPathPattern has too much power and flexibility. It was a mistake to add such an option (it is my (@sqs) fault). The only cases we need to support are (1) the "fully-qualified" repo names (with hostname, eg my-corporate-github-enterprise.internal.northamerica.my-long-company-name.net/baz/qux) or the path components only (eg baz/qux).
    • I (@sqs) am not aware of any customers using more complex configs than that. If we discover any are, we can publish docs for adding nginx rewrite rules instead of needing to handle backcompat for these more complex cases.

Properties of solution:

  • It is acceptable but not necessary for both the fully-qualified name and the path-only name to work.
  • It is acceptable for all repositories from a given external service to have the same kind of naming (i.e., fully-qualified or path-only). It is not necessary to be able to set this on a per-repository basis.
  • If the site admin changes how repositories are named, it must not leave a bunch of stray repositories around with the old naming scheme.
  • Every instance in code where we check for a github.com/ string prefix to a repo name should probably be fixed.
  • In all external service types other than OTHER, the source of truth of the name should be the code host. We should not allow users to come up with a custom alias for a repository that is hard-coded, because this will go stale if the repository is renamed.

Useful initial decisions:

  • What is the API that the browser extension uses to map from a repository on a code host to the repository on Sourcegraph? What are the inputs?
  • Same, but for editor extensions.
  • Should the work for this issue only consist of the backend work, or also the work to actually update the browser extension and editor extensions to work with this new way of naming?

Related:

Customers (not exhaustive):

@sqs sqs added this to the November 2018 milestone Oct 20, 2018

@sqs sqs added the plan label Oct 20, 2018

@sqs sqs unassigned sqs and beyang Oct 25, 2018

@sqs sqs removed plan labels Nov 9, 2018

@sqs sqs modified the milestones: November 2018, 3.0-preview Nov 10, 2018

@nicksnyder nicksnyder modified the milestones: 3.0-preview, 3.0, 3.1 Dec 14, 2018

@nicksnyder nicksnyder assigned tsenart and unassigned keegancsmith Dec 14, 2018

@nicksnyder

This comment has been minimized.

Copy link
Member

nicksnyder commented Feb 11, 2019

I don't think this should be prioritized for 3.1.

  1. It feels like less of a priority than #2025
  2. It isn't clear to me what the solution is
  3. It isn't clear to me that this can be solved by 3.1 (i.e. this week).
@tsenart

This comment has been minimized.

Copy link
Contributor

tsenart commented Feb 12, 2019

  1. It isn't clear to me that this can be solved by 3.1 (i.e. this week).

Thanks for the estimation. Let's postpone it then.

@tsenart tsenart modified the milestones: 3.1, Backlog Feb 12, 2019

@keegancsmith keegancsmith self-assigned this Feb 12, 2019

@keegancsmith

This comment has been minimized.

Copy link
Member

keegancsmith commented Feb 12, 2019

Do we have any customers relying on this feature? Is there potential for removing this feature and maybe adding it back in another way. For example we have been thinking about implications of #2025, and one of our mentioned solutions is aliases instead of user controlled names in the repo table. IE multiple names mapping to a repo, instead of a unique name.

@nicksnyder

This comment has been minimized.

Copy link
Member

nicksnyder commented Feb 12, 2019

I believe that we do have customers who use this.

Personally, I like the idea of always having a full "canonical" name, and then allowing sites to configure a (single) alias.

When an alias is configured, both should work, but I would probably expect the full canonical name to redirect to the alias.

@dobrou

This comment has been minimized.

Copy link

dobrou commented Apr 15, 2019

Hi,
I'd like to vote for fixing this issue.

Better said, to fix browser extension integration when "repositoryPathPattern" is configured to non-default pattern on sourcegraph.

In bitbucket configuration, we define: "repositoryPathPattern": "{projectKey}/{repositorySlug}"
Reason is that we have only one source code server, so presense of it's url in repo id is useless.
While we have many repositories (100+) and many searches returns results from 10+ repositories.

So BIG advantages of custom repo id pattern are:

  • much cleaner search results page - eg. repo filters buttons are smaller with shorter repo names
  • it's easier to write "repo:..." in search query

However we really miss the browser extension integration with bitbucket server, caused by "bug" described here.

As I read the 'alias' proposal by @nicksnyder , this would work perfectly in our case.

Thank you

@sqs

This comment has been minimized.

Copy link
Member Author

sqs commented Apr 15, 2019

Thank you, @dobrou. This is the 2nd highest priority issue after we ship #2025 and follow up on any other directly related tasks from that. We will have more details and (likely) a timeline next week after we discuss it.

@keegancsmith

This comment has been minimized.

Copy link
Member

keegancsmith commented Apr 18, 2019

I have in mind a PoC I'd like to experiment with. It is inspired by the updated description from @sqs. It will both help me understand the scope of a solution and is quick to implement (probably an hour to implement, full day of testing our different clients / use cases).

PoC Proposal

  1. Put the name without repositoryPathPattern into the unused uri column. This will be done via the syncer.
  2. Only update db.Repos.Get() to fallback looking up by uri iff looking up by name fails. In the case lookup via uri works, return a redirect error to name.

I'm unsure what would break, but on the face of it this would work quite well. I believe some of our language servers make assumptions on the name. In that case they would still fail, but possibly we can include uri as part of the repo API.

Potentially uri would have a uniqueness constraint and the syncer handles renames/deletions on it like we do for name. However, I will avoid this in the PoC and just deterministically pick a winner in the case of multiple rows with the same uri. (In theory this shouldn't happen thanks to constraints on the external_* columns).

@sqs, @tsenart: Can you poke holes into this idea in two ways? Does this solution match up with the constraints laid out? What technically would not work?

@sqs

This comment has been minimized.

Copy link
Member Author

sqs commented Apr 18, 2019

I believe some of our language servers make assumptions on the name. In that case they would still fail, but possibly we can include uri as part of the repo API.

What kinds of assumptions do you have in mind, and what failure case are you thinking of? Can you give a specific example to help me understand? Like Go import paths being (by default) the repo name?

I don't have a good understanding of the technical details anymore, but the high-level idea of falling back to the "fully qualified" name and returning a redirect in that case makes sense and sounds like it would work.

@keegancsmith

This comment has been minimized.

Copy link
Member

keegancsmith commented Apr 18, 2019

@sqs

If the site admin changes how repositories are named, it must not leave a bunch of stray repositories around with the old naming scheme.

FYI this is already supported in 3.3 release due to the changes in #2025

Every instance in code where we check for a github.com/ string prefix to a repo name should probably be fixed.

Most instances of this will be solved by the RepoLookup cleanup work we plan to do in 3.4.

Next Steps: Write down a test plan to validate PoC. I'll implement it today and leave it in a state where @tsenart could possibly pick it up when I go for vacation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.