Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable indexed search by default on sourcegraph/server #2176

Open
sqs opened this Issue Feb 6, 2019 · 14 comments

Comments

Projects
None yet
6 participants
@sqs
Copy link
Member

sqs commented Feb 6, 2019

Indexed search is currently not enabled by default for sourcegraph/server because it has unbounded resource requirements that @keegancsmith (rightly) deemed likely to overwhelm single-node instances. It is enabled by default on local dev and on cluster deployments. See https://sourcegraph.slack.com/archives/C07KZF47K/p1549416905712500?thread_ts=1549416905.712500 for context.

It would be good to enable it by default on sourcegraph/server (and do the requisite work necessary to make that not overwhelm a server).

Related to #2165 (that is about docs).

cc @slimsag

Customers: https://app.hubspot.com/contacts/2762526/company/887065580

@sqs sqs added this to the Backlog milestone Feb 6, 2019

@slimsag

This comment has been minimized.

Copy link
Member

slimsag commented Mar 21, 2019

Note: Enabling indexed search on a sourcegraph/server instance and gradually adding repositories is mostly OK because it then consumes resources gradually. However, if you enable it after adding a lot of repositories to your instance, it can quickly consume all resources and effectively take down the instance or render it unusable.

@keegancsmith

This comment has been minimized.

Copy link
Member

keegancsmith commented Apr 1, 2019

Why would zoekt-sourcegraph-indexserver have a hard time if indexing is enabled after thousands of repos have been added to an instance?

zoekt-sourcegraph-indexserver is bounded in the amount of resources used. In particular it has a tuneable maximum amount index jobs it runs at one time (related to the number of cores). Each job tries to use as much CPU as possible, but is bounded in the amount of RAM it will use. Essentially it will shard a repo into 100mb chunks, where 100mb is the output. So it should use roughly 100mb * jobs when indexing at capacity. Note: I may be remembering the number incorrectly for maximum shard size.

The webserver is problematic after indexing is complete because it is bounded by the size of the working copy of all cloned repositories. It keeps in memory the index (trigram -> posting lists), and it builds result sets in memory (results sets are bounded though). I can't remember what the usual factor is offhand, but the number in my head is 0.7 the size of the working copy. Note: it excludes a lot of files like binary files / large generated files / etc so the working copy size can actually be quite reasonable compared to what du -h tells you locally.

Now the issue boils down to this: The top of our funnel is people trying out Sourcegraph on there dev laptop in Docker (possibly on a Mac). A laptop getting pummelled by CPU intensive index jobs can leave a bad taste / make a laptop unuseable. Once the indexing is done, the docker container may just continually OOM.

cc @ijt

@ijt

This comment has been minimized.

Copy link
Contributor

ijt commented Apr 1, 2019

Now the issue boils down to this: The top of our funnel is people trying out Sourcegraph on there dev laptop in Docker (possibly on a Mac). A laptop getting pummelled by CPU intensive index jobs can leave a bad taste / make a laptop unuseable. Once the indexing is done, the docker container may just continually OOM.

@keegancsmith, would it prevent OOMing if we enabled indexing from the start of the instance?

@keegancsmith

This comment has been minimized.

Copy link
Member

keegancsmith commented Apr 2, 2019

No. The webserver memory usage has to do with the size of all cloned repos working copy. So the issue is in resource constrained environments this is not acceptable and is the only example in Sourcegraph were we use that much memory without any related actions. I think the solution here is to either change zoekt such that it can rely on FS paging (maybe mmap the indexes?) or somehow make zoekt adapt to the amount of memory used to avoid it using too much. If it needs too much show the admin a warning.

@sqs

This comment has been minimized.

Copy link
Member Author

sqs commented Apr 19, 2019

Idea: With #3374, I think we could just enable indexed search by default on all instances, including single-node sourcegraph/server instances.

If there are > 100 repositories, it will show a perf warning. That perf warning is intended for non-indexed search right now, but it also makes sense for indexed search (because < 100 repositories is probably fine for single-node indexed search, but > 100 is probably getting into OOM danger zone).

The one change we should make is to ALSO show that alert at the top of the page (in a global alert) for all users if > 100 repositories are present (because indexed search can affect the system in the background even if the user isn't on the search page).

The argument for this is basically the same as #3262 (comment).

@nicksnyder @ijt What do you think?

@slimsag

This comment has been minimized.

Copy link
Member

slimsag commented Apr 19, 2019

That logic makes sense to me.

I will later make a proposal / argument for introducing a third official deployment option between our Server and Kubernetes cluster offering, which is based on docker-compose. It would make these problems moot and address other issues, I haven't had a chance to do this yet though.

@ijt

This comment has been minimized.

Copy link
Contributor

ijt commented Apr 19, 2019

I buy the argument and I'm in favor of doing something like this.

At the same time, a risk is that this heuristic won't detect potential issues with a small number of large repositories. The index size grows with the number of locations where trigrams are found, not the number of repos, although those things are correlated.

@slimsag

This comment has been minimized.

Copy link
Member

slimsag commented Apr 19, 2019

@ijt I think the same issue is true of non-indexed search: large repositories (which would have a large index size) would also cause searcher to consume more resources. The scaling is probably pretty similar between the twow

@sqs

This comment has been minimized.

Copy link
Member Author

sqs commented Apr 19, 2019

@ijt Now that there is a way forward here in the near term, this is higher priority than other longer-term search work. Would you be comfortable getting #2176 (comment) into 3.3.1 (ie doing this soon and getting it out in a patch release)? By "comfortable" I specifically want to make sure you feel it would not be a risky/big change to add on a patch release. I was hoping it is just a matter of showing the alert in another place, in https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/web/src/global/GlobalAlerts.tsx#L31:14, and no search backend changes.

Accounting for the upgrade path, here is what I would propose:

  • On existing instances, do NOT change the current effective value of search.index.enabled.

  • On new instances, default to search.index.enabled = true.

  • On all instances, if there are >100 repositories AND search.index.enabled is true AND the deployment type is sourcegraph/server AND the viewer settings search.suppressPerformanceWarning property is false, show a dismissible global alert to ALL users:

    This Sourcegraph instance has a large number of repositories but is deployed as a single node. For performance during searches and indexing, we recommend that instances with 100+ repositories deploy to a cluster. A site admin can contact us for support or to learn more.

(The search.suppressPerformanceWarning setting would be how we let specific customers who we know to have provisioned appropriate hardware for a single node to avoid having their users be confused by the warning after the 3.3.1 upgrade.)

@ijt

This comment has been minimized.

Copy link
Contributor

ijt commented Apr 19, 2019

@sqs, sure, I'm fine with that. When would I need to check that in for 3.3.1?

@nicksnyder

This comment has been minimized.

Copy link
Member

nicksnyder commented Apr 19, 2019

Patch releases are on demand so as soon as your change is merged into master then @slimsag (the release captain) can help you cherry-pick it onto the release branch and tag a release.

@ijt ijt assigned ijt and unassigned ijsnow Apr 19, 2019

@slimsag

This comment has been minimized.

Copy link
Member

slimsag commented Apr 19, 2019

Anybody can cut a patch release, it doesn't involve the release captain, you just need to follow the steps outlined here: https://github.com/sourcegraph/sourcegraph/blob/master/doc/dev/patch_release_issue_template.md

@slimsag

This comment has been minimized.

Copy link
Member

slimsag commented Apr 19, 2019

I am of course happy to help / assist as needed

@ijt

This comment has been minimized.

Copy link
Contributor

ijt commented Apr 19, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.