[LIT] Workaround the 60 processed limit on Windows #157759

joker-eph · 2025-09-09T22:14:43Z

Python multiprocessing is limited to 60 workers at most:

https://github.com/python/cpython/blob/6bc65c30ff1fd0b581a2c93416496fc720bc442c/Lib/concurrent/futures/process.py#L669-L672

The limit being per thread pool, we can work around it by using multiple pools on windows when we want to actually use more workers.

Copilot

Pull Request Overview

This PR implements a workaround for Windows multiprocessing limitations that cap worker processes at 60 per pool. Instead of restricting the total number of workers, the solution creates multiple process pools when more than 60 workers are requested on Windows.

Key changes:

Removed the hard limit of 60 workers from usable_core_count() function
Implemented multi-pool architecture in test execution that distributes workers and tests across multiple pools
Added logging to inform users when the workaround is active

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
llvm/utils/lit/lit/util.py	Removes Windows-specific worker count limitation from usable_core_count()
llvm/utils/lit/lit/run.py	Implements multi-pool workaround for Windows 60-worker limit with test distribution logic

Comments suppressed due to low confidence (1)

llvm/utils/lit/lit/run.py:1

The test indexing is incorrect when using multiple pools. The idx from async_results enumeration doesn't correspond to the correct index in self.tests because tests are distributed across pools with different starting indices.

import multiprocessing

llvm/utils/lit/lit/run.py

llvmbot · 2025-09-09T22:15:19Z

@llvm/pr-subscribers-testing-tools

Author: Mehdi Amini (joker-eph)

Changes

Python multiprocessing is limited to 60 workers at most:

https://github.com/python/cpython/blob/6bc65c30ff1fd0b581a2c93416496fc720bc442c/Lib/concurrent/futures/process.py#L669-L672

The limit being per thread pool, we can work around it by using multiple pools on windows when we want to actually use more workers.

Full diff: https://github.com/llvm/llvm-project/pull/157759.diff

2 Files Affected:

(modified) llvm/utils/lit/lit/run.py (+45-10)
(modified) llvm/utils/lit/lit/util.py (-5)

diff --git a/llvm/utils/lit/lit/run.py b/llvm/utils/lit/lit/run.py
index 62070e824e87f..b24a50911eb71 100644
--- a/llvm/utils/lit/lit/run.py
+++ b/llvm/utils/lit/lit/run.py
@@ -72,25 +72,60 @@ def _execute(self, deadline):
             if v is not None
         }
 
-        pool = multiprocessing.Pool(
-            self.workers, lit.worker.initialize, (self.lit_config, semaphores)
+        # Windows has a limit of 60 workers per pool, so we need to use multiple pools
+        # if we have more than 60 workers requested
+        max_workers_per_pool = 60 if os.name == "nt" else self.workers
+        num_pools = max(
+            1, (self.workers + max_workers_per_pool - 1) // max_workers_per_pool
         )
+        workers_per_pool = min(self.workers, max_workers_per_pool)
 
-        async_results = [
-            pool.apply_async(
-                lit.worker.execute, args=[test], callback=self.progress_callback
+        if num_pools > 1:
+            self.lit_config.note(
+                "Using %d pools with %d workers each (Windows worker limit workaround)"
+                % (num_pools, workers_per_pool)
             )
-            for test in self.tests
-        ]
-        pool.close()
+
+        # Create multiple pools
+        pools = []
+        for i in range(num_pools):
+            pool = multiprocessing.Pool(
+                workers_per_pool, lit.worker.initialize, (self.lit_config, semaphores)
+            )
+            pools.append(pool)
+
+        # Distribute tests across pools
+        tests_per_pool = (len(self.tests) + num_pools - 1) // num_pools
+        async_results = []
+        test_to_pool_map = {}
+
+        for pool_idx, pool in enumerate(pools):
+            start_idx = pool_idx * tests_per_pool
+            end_idx = min(start_idx + tests_per_pool, len(self.tests))
+            pool_tests = self.tests[start_idx:end_idx]
+
+            for test in pool_tests:
+                ar = pool.apply_async(
+                    lit.worker.execute, args=[test], callback=self.progress_callback
+                )
+                async_results.append(ar)
+                test_to_pool_map[ar] = pool
+
+        # Close all pools
+        for pool in pools:
+            pool.close()
 
         try:
             self._wait_for(async_results, deadline)
         except:
-            pool.terminate()
+            # Terminate all pools on exception
+            for pool in pools:
+                pool.terminate()
             raise
         finally:
-            pool.join()
+            # Join all pools
+            for pool in pools:
+                pool.join()
 
     def _wait_for(self, async_results, deadline):
         timeout = deadline - time.time()
diff --git a/llvm/utils/lit/lit/util.py b/llvm/utils/lit/lit/util.py
index b03fd8bc22693..b1552385ccc53 100644
--- a/llvm/utils/lit/lit/util.py
+++ b/llvm/utils/lit/lit/util.py
@@ -121,11 +121,6 @@ def usable_core_count():
     except AttributeError:
         n = os.cpu_count() or 1
 
-    # On Windows with more than 60 processes, multiprocessing's call to
-    # _winapi.WaitForMultipleObjects() prints an error and lit hangs.
-    if platform.system() == "Windows":
-        return min(n, 60)
-
     return n
 
 def abs_path_preserve_drive(path):

Python multiprocessing is limited to 60 workers at most: https://github.com/python/cpython/blob/6bc65c30ff1fd0b581a2c93416496fc720bc442c/Lib/concurrent/futures/process.py#L669-L672 The limit being per thread pool, we can work around it by using multiple pools on windows when we want to actually use more workers.

Copilot

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

llvm/utils/lit/lit/run.py

Copilot

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

llvm/utils/lit/lit/run.py

Copilot

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

llvm/utils/lit/lit/run.py

boomanaiden154

Are you able to add unit tests for the logic that figures out the number of pools to create/workers per pool, and the logic that distributes the tests? It looks correct to me after spending some time thinking about it, but I would rather not rely on my thinking being correct.

This adds a bit of complexity, but I think it's very much worth it given high core count systems are becoming increasingly popular. Thanks for working on this.

llvm/utils/lit/lit/run.py

joker-eph · 2025-09-17T14:58:14Z

Are you able to add unit tests for the logic that figures out the number of pools to create/workers per pool, and the logic that distributes the tests? It looks correct to me after spending some time thinking about it, but I would rather not rely on my thinking being correct.

Right, I tried it locally by changing the WINDOWS_MAX_WORKERS_PER_POOL value to a smaller number, using various number of workers, and checking the distribution of workers in the pools :)

I added a test!

llvm/utils/lit/tests/windows-pools.py

boomanaiden154

LGTM. Thanks for pushing this through.

If you have benchmarking numbers on a high core count Windows machine, it would be nice if you could throw them in this thread.

joker-eph requested a review from Copilot September 9, 2025 22:14

llvmbot added llvm-lit testing-tools labels Sep 9, 2025

Copilot AI reviewed Sep 9, 2025

View reviewed changes

llvm/utils/lit/lit/run.py Outdated Show resolved Hide resolved

joker-eph force-pushed the lit_windows branch from b206f2f to 6c7d15a Compare September 16, 2025 20:21

joker-eph force-pushed the lit_windows branch from 6c7d15a to 5328d27 Compare September 16, 2025 20:23

joker-eph requested review from boomanaiden154 and Copilot September 16, 2025 20:24

Copilot AI reviewed Sep 16, 2025

View reviewed changes

llvm/utils/lit/lit/run.py Outdated Show resolved Hide resolved

llvm/utils/lit/lit/run.py Outdated Show resolved Hide resolved

joker-eph force-pushed the lit_windows branch 2 times, most recently from 2ba9107 to b3e4997 Compare September 16, 2025 20:46

joker-eph requested a review from Copilot September 16, 2025 20:47

Copilot AI reviewed Sep 16, 2025

View reviewed changes

llvm/utils/lit/lit/run.py Outdated Show resolved Hide resolved

llvm/utils/lit/lit/run.py Outdated Show resolved Hide resolved

Distribute workers more evenly

1935344

joker-eph force-pushed the lit_windows branch from b3e4997 to 1935344 Compare September 17, 2025 07:51

joker-eph requested a review from Copilot September 17, 2025 07:55

Copilot AI reviewed Sep 17, 2025

View reviewed changes

llvm/utils/lit/lit/run.py Show resolved Hide resolved

llvm/utils/lit/lit/run.py Show resolved Hide resolved

boomanaiden154 reviewed Sep 17, 2025

View reviewed changes

llvm/utils/lit/lit/run.py Show resolved Hide resolved

joker-eph requested a review from boomanaiden154 September 17, 2025 14:58

boomanaiden154 reviewed Sep 17, 2025

View reviewed changes

llvm/utils/lit/tests/windows-pools.py Outdated Show resolved Hide resolved

joker-eph force-pushed the lit_windows branch from af5904a to 6238641 Compare September 17, 2025 21:25

Add a test

c9e4f40

joker-eph force-pushed the lit_windows branch from 6238641 to c9e4f40 Compare September 17, 2025 21:26

boomanaiden154 approved these changes Sep 17, 2025

View reviewed changes

[LIT] Workaround the 60 processed limit on Windows #157759

Are you sure you want to change the base?

[LIT] Workaround the 60 processed limit on Windows #157759

Uh oh!

Conversation

joker-eph commented Sep 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

llvmbot commented Sep 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

boomanaiden154 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joker-eph commented Sep 17, 2025

Uh oh!

Uh oh!

boomanaiden154 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants