Parallelize text search #16286

roblourens · 2016-11-30T18:41:42Z

Fixes #15384 (comment), comment is updated with final timings.

This PR rearranges the existing text search code to run using multiple processes. This doesn't include work to show results sooner - users still need to wait for a batch of 512 file matches to be filled. That work is tracked in #16284. Here's a brain dump for posterity -

RawSearchService creates an instance of textSearch.Engine and passes it a FileWalker which is producing paths in the workspace. This is the same.

textSearch.Engine creates some number of Search Worker processes. The number is determined by require('os').cpus().length which returns 8 on a MBP, because it has 4 physical cores, with hyperthreading.

The fileWalker produces file paths, and the sizes in bytes of those files. They are grouped into batches of roughly equal sizes in bytes. The number of files in each batch may vary a lot, but each batch should represent files that are cumulatively about 1MB in size. Hopefully this will make them more likely to be processed in a similar amount of time, with a similar number of results, so one process isn't running for much longer than others. This might sound like a large batch - smaller workspaces won't benefit much from parallelization - but they don't need it. I did some testing on a few workspaces of different sizes with different batch sizes, and 1MB seemed like the sweet spot. I also experimented with a batch size that starts small and grows exponentially, thinking it might help on medium-sized workspaces, but it didn't seem to make a difference.

Batches are sent to the worker as soon as they're ready, in a round-robin fashion. I experimented with sending one batch at a time to each process, with another when it's finished, but that was slower because the worker isn't doing anything while it's waiting for a new batch. I also tried sending a couple batches up front, and more as needed, but it was no better than sending them ASAP. Looking at timings, I think this is because the workers are slow to handle the first couple batches they receive - the fileWalker is actually often done by the time the workers are sending back their first batch of results. I need to investigate more to figure out why this is - maybe because the new processes are still loading code, JITting, etc. They process the rest of the batches much faster after those first few. It may be faster to keep the worker processes around between searches, but I don't know that we want to persist 8 (or more) processes per vscode window.

The worker searches the files using the same code that we already had, this hasn't changed much. It does include the change to eliminate Buffer.slice calls that is already checked in. The only change is that I limited the number of files that it will open at a time, because I kept running into the OS open file limit without it.

The batch includes the current maxResults, i.e. the initial max minus the number of results received so far. The main process might receive more than the max number of results from workers, so it has to trim the result set. This is kind of annoying to get exactly right, because the result set is not a flat list, but a tree of files with matches, lines in those files with matches, and the matches on each line. So, it will just take batches until the total is greater than the max. It can go over the max, but this is much easier.

chrmarti · 2016-11-30T19:05:52Z

src/vs/workbench/test/electron-browser/textsearch.perf.test.ts

@@ -114,7 +118,7 @@ suite('TextSearch performance', () => {
 		const finishedEvents = [];
 		return runSearch() // Warm-up first
 			.then(() => {
-				if (testWorkspaceArg) { // Don't measure by default
+				if (testWorkspacePath) { // Don't measure by default


This is always set. I'd revert this change so unit tests run without additional output.

Btw., any unit tests that would make sense adding or are the existing ones still sufficient?

We have some unit tests in search.test.ts.

chrmarti · 2016-11-30T19:31:30Z