-
-
Notifications
You must be signed in to change notification settings - Fork 33k
gh-139482: speed up os.environ.clear
by 2x
#139483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The implicit implementation of environ.clear() is MutableMapping.clear(), which calls iter(self) in a loop, once per deleted key. But because iter(self) for environ creates a snapshot of all keys, this results in O(N^2) complexity for environ.clear(). This problem is especially evident on large environments. On my M3 MacBook Pro, it takes 500ms to clear an environment with only 10K variables. A more extreme example: 100K variables take 23s to clear. Environments with thousands of environment variables are rare, but they do exist. The new implementation avoids creating a snapshot of the keys on each iteration, and instead repeatedly tries to delete keys until the environment is empty. This mirrors the current behavior of environ.clear(), while being more efficient asymptotically. Further improvement on Linux/FreeBSD could be achieved by using clearenv() which is part of the standard C library.
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests and a blurb.
os.environ.clear
Misc/NEWS.d/next/Library/2025-10-01-17-45-27.gh-issue-139482.yDMeEa.rst
Outdated
Show resolved
Hide resolved
…DMeEa.rst Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
@StanFromIreland I'm happy to add tests; do you have any specific test cases in mind? I see that |
Usually, we have some tests when we know that inputs will take a certain amount of time to complete (we do that for int->str conversion) but the tests may be flaky, so we only test for cases when we know that an input was meant to produce an infinite loop somewhere. Here I don't think tests are needed. If you can provide some small benchmarks to be sure that we're not doing something worse, I think we can merge it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding an explicit test for os.environ.clear()
in test_os.test_os.EnvironTests
? Something like:
diff --git a/Lib/test/test_os/test_os.py b/Lib/test/test_os/test_os.py
index 623d0523583..b15ee1d4d3e 100644
--- a/Lib/test/test_os/test_os.py
+++ b/Lib/test/test_os/test_os.py
@@ -1494,6 +1494,13 @@ def test_reload_environ(self):
self.assertNotIn(b'test_env', os.environb)
self.assertNotIn('test_env', os.environ)
+ def test_clear(self):
+ os.environ.clear()
+ self.assertEqual(os.environ, {})
+
+ self.assertRaises(TypeError, os.environ.clear, None)
+
+
class WalkTests(unittest.TestCase):
"""Tests for os.walk()."""
is_fwalk = False
After doing a benchmark, I made a fascinaning discovery - the In other words, we avoid quadratic time complexity in one place, but still have it in another. Having said that, the current change still makes the clear() method 2x faster. Before:
After:
Benchmark code: https://gist.github.com/gukoff/9a05412f193ba61798251f5ca0bc1906 @picnixz @StanFromIreland @vstinner what do you think? |
self.assertEqual(os.environ, {}) | ||
self.assertEqual(os.environb, {}) | ||
|
||
def test_clear_empties_process_environment(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that this test is useful.
self.assertEqual(os.environb, {}) | ||
|
||
@unittest.skipUnless(os.supports_bytes_environ, "os.environb required for this test.") | ||
def test_clear_empties_environb(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is more or less the same as test_clear_empties_environ() but on os.environb. I'm not sure that it's useful.
os.environ.clear() | ||
|
||
self.assertEqual(os.environ, {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.environ.clear() | |
self.assertEqual(os.environ, {}) | |
os.environ.clear() | |
self.assertEqual(os.environ, {}) |
os.environ.clear() | ||
|
||
self.assertEqual(os.environ, {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.environ.clear() | |
self.assertEqual(os.environ, {}) | |
os.environ.clear() | |
self.assertEqual(os.environ, {}) |
os.environ["test_key_to_clear1"] = "test_value_to_clear1" | ||
os.environ["test_key_to_clear2"] = "test_value_to_clear2" | ||
os.environ["test_key_to_clear3"] = "test_value_to_clear3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that 3 variables are useful. IMO a single variable to make sure that os.environ is non-empty is enough.
os.environ["test_key_to_clear1"] = "test_value_to_clear1" | |
os.environ["test_key_to_clear2"] = "test_value_to_clear2" | |
os.environ["test_key_to_clear3"] = "test_value_to_clear3" | |
os.environ["test_key_to_clear"] = "test_value_to_clear" |
os.environ.clear
os.environ.clear
by 2x
In this case, I'm not really sure it's worth the change. There is an extension Now, I wouldn't mind a 2x speed-up but I think we should indicate in the docs that Btw, please update the NEWS entry as well. |
I'm not really impressed by benchmark numbers. I expected a x10 or x100 difference since you announced that So I'm not fully convinced that this change is required. |
Issue with a detailed description: #139482
The new implementation avoids creating a snapshot of all keys on each iteration of
os.environ.clear()
,and instead repeatedly tries to delete keys until the environment is empty.
This closely mirrors the current behavior of environ.clear(), while being implemented more efficiently both for small and (especially) for large collections of environment variables.
os.environ.clear()
has a quadratic time complexity #139482