Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GitHub advanced search for repository selection #482

Merged
merged 7 commits into from Oct 27, 2018

Conversation

@faec
Copy link
Contributor

commented Oct 22, 2018

Resolves #123.

With this change, if github.repositoryQuery is set to anything other than "public", "affiliated" or "none" it is treated as a GitHub advanced repository search.

This PR updates CHANGELOG.md

@faec faec requested review from beyang and sqs as code owners Oct 22, 2018

@sourcegraph-bot

This comment has been minimized.

Copy link

commented Oct 22, 2018

Thanks for the contribution, @faec!

You should receive feedback on your pull request within a few days. If you haven't already, please read through the contributing guide, and ensure that you've signed the CLA.

Did you run into any issues when creating this PR? Please describe them in an issue so we can make the experience better for the next contributor.

@faec faec changed the title WIP: Add organization-level repo selection for GitHub hosts Support GitHub advanced search for repository selection Oct 23, 2018

@faec

This comment has been minimized.

Copy link
Contributor Author

commented Oct 23, 2018

Tests are still pending, but the implementation is ready for review.

var repos []*github.Repository
var rateLimitCost int
var err error
repos, hasNextPage, rateLimitCost, err = c.client.ListRepositoriesForSearch(ctx, repositoryQuery, page)

This comment has been minimized.

Copy link
@nicksnyder

nicksnyder Oct 23, 2018

Member

If the search endpoint has a separate rate limit then I think there is a problem here because the remaining rate limit is stored in the client. If a single client is shared between search and non-search requests, then the rate limit will keep getting overwritten.

This comment has been minimized.

Copy link
@faec

faec Oct 23, 2018

Author Contributor

Ah, you're right. Fundamentally the client assumes that there's only one rate limit to track, and it'll use whichever one it last processed a response from.

How fancy do we want to get to do the right thing here? On the simple end is just using a fixed delay for the search queries, since we only make them from one place so far. At the fancier end the client could maintain multiple rate limits, maybe keyed by the API being called. It might also be possible to factor rate information out of the client and get it from individual responses as needed.

My inclination is to go with a simple fixed delay for the initial checkin, but I don't have much context for the importance here, so let me know if you'd prefer something more precise.

This comment has been minimized.

Copy link
@nicksnyder

nicksnyder Oct 23, 2018

Member

A simple suggestion is to have two clients in the githubConnection struct. For this change you would add a new client c.searchClient that is only used right here. That way the rate limits are tracked independently.

This comment has been minimized.

Copy link
@faec

faec Oct 24, 2018

Author Contributor

When I made that change I realized that this also creates two independent repository caches, which is probably undesirable. I think either the client needs to be aware there are multiple rate limits, or the cache needs to be moved up to the connection struct. Or we could punt and put in a one-off delay here unless / until we have another caller for the search API.

This comment has been minimized.

Copy link
@nicksnyder

nicksnyder Oct 24, 2018

Member

My instinct would be to share the cache across two clients.

This comment has been minimized.

Copy link
@faec

faec Oct 25, 2018

Author Contributor

Done. I tried to do it with minimal exposure of client internals to the connection, see if this approach looks right to you.

@faec

This comment has been minimized.

Copy link
Contributor Author

commented Oct 24, 2018

As for tests, I tried the existing tests but the build seems to be broken:

fae@faefriend ~/g/s/g/s/s/c/r/repos> go test
# github.com/sourcegraph/docsite/markdown
../../../../docsite/markdown/markdown.go:40:18: *blackfriday.Markdown is not a type
../../../../docsite/markdown/markdown.go:79:2: undefined: blackfriday.HTMLRenderer
# github.com/sourcegraph/sourcegraph/pkg/rcache
../../../pkg/rcache/mutex.go:37:35: cannot use pool (type *"github.com/garyburd/redigo/redis".Pool) as type redsync.Pool in array or slice literal:
        *"github.com/garyburd/redigo/redis".Pool does not implement redsync.Pool (wrong type for Get method)
                have Get() "github.com/garyburd/redigo/redis".Conn
                want Get() "github.com/gomodule/redigo/redis".Conn
FAIL    github.com/sourcegraph/sourcegraph/cmd/repo-updater/repos [build failed]

Is this a genuine problem in the current repo or am I misconfigured somehow?

@nicksnyder

This comment has been minimized.

Copy link
Member

commented Oct 24, 2018

It looks like you have Sourcegraph checked out in GOPATH, which means Go modules are turned off by default. I filed #512 to fix our documentation.

In the meantime you can try GO111MODULE=on go test or just move your sourcegraph/sourcegraph checkout outside of gopath (I recommend the latter).

@faec

This comment has been minimized.

Copy link
Contributor Author

commented Oct 25, 2018

Moving the repo out of GOPATH fixed the testing problem, thanks! There is still a broken test at HEAD though:

-- FAIL: TestUnmarshal (0.00s)
    client_test.go:53: Unexpected error message [{"FieldA": 1}]
        got:  graphql: cannot unmarshal at offset 13: before "[{\"FieldA\": 1"; after "}]": json: cannot unmarshal number into Go struct field result.FieldA of type string
        want: graphql: cannot unmarshal at offset 13: before "[{\"FieldA\": 1"; after "}]": json: cannot unmarshal number into Go value of type string
    client_test.go:53: Unexpected error message [{"FieldA": "hi", "FieldB": "bye"},{"FieldA": "hi"...
        got:  graphql: cannot unmarshal at offset 3414: before ", \"FieldB\": \"bye\"},{\"FieldA\": \"hi\", \"FieldB\": \"bye\"},{\"FieldA\": \"hi\", \"FieldB\": \"bye\"}, {\"FieldA\": 1"; after "}, {\"FieldA\": \"hi\", \"FieldB\": \"bye\"},{\"FieldA\": \"hi\", \"FieldB\": \"bye\"},{\"FieldA\": \"hi\", \"FieldB\": \"b": json: cannot unmarshal number into Go struct field result.FieldA of type string
        want: graphql: cannot unmarshal at offset 3414: before ", \"FieldB\": \"bye\"},{\"FieldA\": \"hi\", \"FieldB\": \"bye\"},{\"FieldA\": \"hi\", \"FieldB\": \"bye\"}, {\"FieldA\": 1"; after "}, {\"FieldA\": \"hi\", \"FieldB\": \"bye\"},{\"FieldA\": \"hi\", \"FieldB\": \"bye\"},{\"FieldA\": \"hi\", \"FieldB\": \"b": json: cannot unmarshal number into Go value of type string
FAIL
exit status 1
FAIL    github.com/sourcegraph/sourcegraph/cmd/repo-updater/internal/externalservice/github     0.026s

That is no longer a blocking problem for me though, so I'll proceed.

@nicksnyder

This comment has been minimized.

Copy link
Member

commented Oct 25, 2018

I filed an issue for the failing test
#523

@faec

This comment has been minimized.

Copy link
Contributor Author

commented Oct 26, 2018

Addressed comments and added unit tests.

I tried adding a test around the independent rate limits too, since listAllRepositories is hard to test in its current form but that was the most subtle part of the change, but got stalled: I can only mock the Client responses from within the github package which defines it, but the multiple rate limits are only visible from the repos package that defines githubConnection. I can refactor further to make it possible, but wanted to check in (and see if you had alternate suggestions) before putting more hours into it.

@nicksnyder
Copy link
Member

left a comment

Thanks for adding tests!

@nicksnyder

This comment has been minimized.

Copy link
Member

commented Oct 26, 2018

Can you update the changelong and PR description?

@faec

This comment has been minimized.

Copy link
Contributor Author

commented Oct 26, 2018

@nicksnyder

This comment has been minimized.

Copy link
Member

commented Oct 26, 2018

Support GitHub advanced search for repository selection

That is the title, I am referring to the first comment on this thread (which will become part of the commit message when we squash merge this into master).

For example, you can delete this part at a minimum:

This is still in progress. The initial implementation uses the "orgs" property suggested by @nicksnyder, but the plan now is to revise it to follow @sqs's ideas later in the thread.

And optionally add a short description.

@faec

This comment has been minimized.

Copy link
Contributor Author

commented Oct 26, 2018

@nicksnyder

This comment has been minimized.

Copy link
Member

commented Oct 26, 2018

Thanks! Tests for listAllRepositories would be nice but seems out of scope for this change. The fact that the clients are different makes it clear that the rate limits are different.

@sqs any comments?

@sqs

sqs approved these changes Oct 26, 2018

Copy link
Member

left a comment

Looks great!

@nicksnyder nicksnyder merged commit 9a434e6 into sourcegraph:master Oct 27, 2018

beyang added a commit that referenced this pull request Oct 27, 2018

nicksnyder added a commit that referenced this pull request Oct 30, 2018

nicksnyder added a commit that referenced this pull request Oct 30, 2018

nicksnyder added a commit that referenced this pull request Oct 30, 2018

nicksnyder added a commit that referenced this pull request Oct 30, 2018

nicksnyder added a commit that referenced this pull request Oct 30, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.