Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GitHub advanced search for repository selection #482

Merged
merged 7 commits into from Oct 27, 2018
Merged

Support GitHub advanced search for repository selection #482

merged 7 commits into from Oct 27, 2018

Conversation

faec
Copy link

@faec faec commented Oct 22, 2018

Resolves #123.

With this change, if github.repositoryQuery is set to anything other than "public", "affiliated" or "none" it is treated as a GitHub advanced repository search.

This PR updates CHANGELOG.md

@sourcegraph-bot
Copy link
Contributor

Thanks for the contribution, @faec!

You should receive feedback on your pull request within a few days. If you haven't already, please read through the contributing guide, and ensure that you've signed the CLA.

Did you run into any issues when creating this PR? Please describe them in an issue so we can make the experience better for the next contributor.

@faec faec changed the title WIP: Add organization-level repo selection for GitHub hosts Support GitHub advanced search for repository selection Oct 23, 2018
@faec
Copy link
Author

faec commented Oct 23, 2018

Tests are still pending, but the implementation is ready for review.

var repos []*github.Repository
var rateLimitCost int
var err error
repos, hasNextPage, rateLimitCost, err = c.client.ListRepositoriesForSearch(ctx, repositoryQuery, page)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the search endpoint has a separate rate limit then I think there is a problem here because the remaining rate limit is stored in the client. If a single client is shared between search and non-search requests, then the rate limit will keep getting overwritten.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right. Fundamentally the client assumes that there's only one rate limit to track, and it'll use whichever one it last processed a response from.

How fancy do we want to get to do the right thing here? On the simple end is just using a fixed delay for the search queries, since we only make them from one place so far. At the fancier end the client could maintain multiple rate limits, maybe keyed by the API being called. It might also be possible to factor rate information out of the client and get it from individual responses as needed.

My inclination is to go with a simple fixed delay for the initial checkin, but I don't have much context for the importance here, so let me know if you'd prefer something more precise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple suggestion is to have two clients in the githubConnection struct. For this change you would add a new client c.searchClient that is only used right here. That way the rate limits are tracked independently.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I made that change I realized that this also creates two independent repository caches, which is probably undesirable. I think either the client needs to be aware there are multiple rate limits, or the cache needs to be moved up to the connection struct. Or we could punt and put in a one-off delay here unless / until we have another caller for the search API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My instinct would be to share the cache across two clients.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I tried to do it with minimal exposure of client internals to the connection, see if this approach looks right to you.

@faec
Copy link
Author

faec commented Oct 24, 2018

As for tests, I tried the existing tests but the build seems to be broken:

fae@faefriend ~/g/s/g/s/s/c/r/repos> go test
# github.com/sourcegraph/docsite/markdown
../../../../docsite/markdown/markdown.go:40:18: *blackfriday.Markdown is not a type
../../../../docsite/markdown/markdown.go:79:2: undefined: blackfriday.HTMLRenderer
# github.com/sourcegraph/sourcegraph/pkg/rcache
../../../pkg/rcache/mutex.go:37:35: cannot use pool (type *"github.com/garyburd/redigo/redis".Pool) as type redsync.Pool in array or slice literal:
        *"github.com/garyburd/redigo/redis".Pool does not implement redsync.Pool (wrong type for Get method)
                have Get() "github.com/garyburd/redigo/redis".Conn
                want Get() "github.com/gomodule/redigo/redis".Conn
FAIL    github.com/sourcegraph/sourcegraph/cmd/repo-updater/repos [build failed]

Is this a genuine problem in the current repo or am I misconfigured somehow?

@nicksnyder
Copy link
Contributor

It looks like you have Sourcegraph checked out in GOPATH, which means Go modules are turned off by default. I filed #512 to fix our documentation.

In the meantime you can try GO111MODULE=on go test or just move your sourcegraph/sourcegraph checkout outside of gopath (I recommend the latter).

@faec
Copy link
Author

faec commented Oct 25, 2018

Moving the repo out of GOPATH fixed the testing problem, thanks! There is still a broken test at HEAD though:

-- FAIL: TestUnmarshal (0.00s)
    client_test.go:53: Unexpected error message [{"FieldA": 1}]
        got:  graphql: cannot unmarshal at offset 13: before "[{\"FieldA\": 1"; after "}]": json: cannot unmarshal number into Go struct field result.FieldA of type string
        want: graphql: cannot unmarshal at offset 13: before "[{\"FieldA\": 1"; after "}]": json: cannot unmarshal number into Go value of type string
    client_test.go:53: Unexpected error message [{"FieldA": "hi", "FieldB": "bye"},{"FieldA": "hi"...
        got:  graphql: cannot unmarshal at offset 3414: before ", \"FieldB\": \"bye\"},{\"FieldA\": \"hi\", \"FieldB\": \"bye\"},{\"FieldA\": \"hi\", \"FieldB\": \"bye\"}, {\"FieldA\": 1"; after "}, {\"FieldA\": \"hi\", \"FieldB\": \"bye\"},{\"FieldA\": \"hi\", \"FieldB\": \"bye\"},{\"FieldA\": \"hi\", \"FieldB\": \"b": json: cannot unmarshal number into Go struct field result.FieldA of type string
        want: graphql: cannot unmarshal at offset 3414: before ", \"FieldB\": \"bye\"},{\"FieldA\": \"hi\", \"FieldB\": \"bye\"},{\"FieldA\": \"hi\", \"FieldB\": \"bye\"}, {\"FieldA\": 1"; after "}, {\"FieldA\": \"hi\", \"FieldB\": \"bye\"},{\"FieldA\": \"hi\", \"FieldB\": \"bye\"},{\"FieldA\": \"hi\", \"FieldB\": \"b": json: cannot unmarshal number into Go value of type string
FAIL
exit status 1
FAIL    github.com/sourcegraph/sourcegraph/cmd/repo-updater/internal/externalservice/github     0.026s

That is no longer a blocking problem for me though, so I'll proceed.

@nicksnyder
Copy link
Contributor

I filed an issue for the failing test
#523

@faec
Copy link
Author

faec commented Oct 26, 2018

Addressed comments and added unit tests.

I tried adding a test around the independent rate limits too, since listAllRepositories is hard to test in its current form but that was the most subtle part of the change, but got stalled: I can only mock the Client responses from within the github package which defines it, but the multiple rate limits are only visible from the repos package that defines githubConnection. I can refactor further to make it possible, but wanted to check in (and see if you had alternate suggestions) before putting more hours into it.

Copy link
Contributor

@nicksnyder nicksnyder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding tests!

@nicksnyder
Copy link
Contributor

Can you update the changelong and PR description?

@faec
Copy link
Author

faec commented Oct 26, 2018 via email

@nicksnyder
Copy link
Contributor

Support GitHub advanced search for repository selection

That is the title, I am referring to the first comment on this thread (which will become part of the commit message when we squash merge this into master).

For example, you can delete this part at a minimum:

This is still in progress. The initial implementation uses the "orgs" property suggested by @nicksnyder, but the plan now is to revise it to follow @sqs's ideas later in the thread.

And optionally add a short description.

@faec
Copy link
Author

faec commented Oct 26, 2018 via email

@nicksnyder
Copy link
Contributor

Thanks! Tests for listAllRepositories would be nice but seems out of scope for this change. The fact that the clients are different makes it clear that the rate limits are different.

@sqs any comments?

Copy link
Member

@sqs sqs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@nicksnyder nicksnyder merged commit 9a434e6 into sourcegraph:master Oct 27, 2018
beyang added a commit that referenced this pull request Oct 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants