Skip to content

Conversation

gukoff
Copy link

@gukoff gukoff commented Oct 1, 2025

Issue with a detailed description: #139482


The new implementation avoids creating a snapshot of all keys on each iteration of os.environ.clear(),
and instead repeatedly tries to delete keys until the environment is empty.

This closely mirrors the current behavior of environ.clear(), while being implemented more efficiently both for small and (especially) for large collections of environment variables.

The implicit implementation of environ.clear() is MutableMapping.clear(),
which calls iter(self) in a loop, once per deleted key.
But because iter(self) for environ creates a snapshot of all keys,
this results in O(N^2) complexity for environ.clear().

This problem is especially evident on large environments.
On my M3 MacBook Pro, it takes 500ms to clear an environment with only 10K variables.
A more extreme example: 100K variables take 23s to clear.

Environments with thousands of environment variables are rare, but they do exist.

The new implementation avoids creating a snapshot of the keys on each iteration,
and instead repeatedly tries to delete keys until the environment is empty.
This mirrors the current behavior of environ.clear(), while being more efficient asymptotically.

Further improvement on Linux/FreeBSD could be achieved by using clearenv()
which is part of the standard C library.
@bedevere-app
Copy link

bedevere-app bot commented Oct 1, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@python-cla-bot
Copy link

python-cla-bot bot commented Oct 1, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

Copy link
Member

@StanFromIreland StanFromIreland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests and a blurb.

@picnixz picnixz changed the title gh-139482: Make environ.clear() efficient (O(N^2) -> O(N)). gh-139482: avoid quadratic complexity in os.environ.clear Oct 1, 2025
…DMeEa.rst

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
@gukoff
Copy link
Author

gukoff commented Oct 1, 2025

@StanFromIreland I'm happy to add tests; do you have any specific test cases in mind?

I see that os.environ has the existing test coverage, while this PR is focussed on the performance, not changing the already tested behavior.

@picnixz
Copy link
Member

picnixz commented Oct 1, 2025

I see that os.environ has the existing test coverage, while this PR is focussed on the performance, not changing the already tested behavior.

Usually, we have some tests when we know that inputs will take a certain amount of time to complete (we do that for int->str conversion) but the tests may be flaky, so we only test for cases when we know that an input was meant to produce an infinite loop somewhere.

Here I don't think tests are needed. If you can provide some small benchmarks to be sure that we're not doing something worse, I think we can merge it.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding an explicit test for os.environ.clear() in test_os.test_os.EnvironTests? Something like:

diff --git a/Lib/test/test_os/test_os.py b/Lib/test/test_os/test_os.py
index 623d0523583..b15ee1d4d3e 100644
--- a/Lib/test/test_os/test_os.py
+++ b/Lib/test/test_os/test_os.py
@@ -1494,6 +1494,13 @@ def test_reload_environ(self):
             self.assertNotIn(b'test_env', os.environb)
             self.assertNotIn('test_env', os.environ)
 
+    def test_clear(self):
+        os.environ.clear()
+        self.assertEqual(os.environ, {})
+
+        self.assertRaises(TypeError, os.environ.clear, None)
+
+
 class WalkTests(unittest.TestCase):
     """Tests for os.walk()."""
     is_fwalk = False

@gukoff
Copy link
Author

gukoff commented Oct 2, 2025

After doing a benchmark, I made a fascinaning discovery - the unsetenv function itself has linear complexity. On every env variable change, it scans and rewrites the whole envrironment variables block. For example, in glibc: https://codebrowser.dev/glibc/glibc/stdlib/setenv.c.html#258

In other words, we avoid quadratic time complexity in one place, but still have it in another.


Having said that, the current change still makes the clear() method 2x faster.

Before:

root@9f106f251c44:/mnt# ./python.exe bench_clearenv.py --count 100
Mean: 0.2ms ± 0.0ms
root@9f106f251c44:/mnt# ./python.exe bench_clearenv.py --count 1000
Mean: 6.5ms ± 0.2ms
root@9f106f251c44:/mnt# ./python.exe bench_clearenv.py --count 10000
Mean: 568.9ms ± 37.2ms

After:

root@43b65a4aaaa9:/mnt# ./python.exe bench_clearenv.py --count 100
Mean: 0.0ms ± 0.0ms
root@43b65a4aaaa9:/mnt# ./python.exe bench_clearenv.py --count 1000
Mean: 2.3ms ± 0.1ms
root@43b65a4aaaa9:/mnt# ./python.exe bench_clearenv.py --count 10000
Mean: 217.7ms ± 8.1ms

Benchmark code: https://gist.github.com/gukoff/9a05412f193ba61798251f5ca0bc1906


@picnixz @StanFromIreland @vstinner what do you think?

self.assertEqual(os.environ, {})
self.assertEqual(os.environb, {})

def test_clear_empties_process_environment(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that this test is useful.

self.assertEqual(os.environb, {})

@unittest.skipUnless(os.supports_bytes_environ, "os.environb required for this test.")
def test_clear_empties_environb(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is more or less the same as test_clear_empties_environ() but on os.environb. I'm not sure that it's useful.

Comment on lines +1503 to +1505
os.environ.clear()

self.assertEqual(os.environ, {})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
os.environ.clear()
self.assertEqual(os.environ, {})
os.environ.clear()
self.assertEqual(os.environ, {})

Comment on lines +1510 to +1512
os.environ.clear()

self.assertEqual(os.environ, {})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
os.environ.clear()
self.assertEqual(os.environ, {})
os.environ.clear()
self.assertEqual(os.environ, {})

Comment on lines +1498 to +1500
os.environ["test_key_to_clear1"] = "test_value_to_clear1"
os.environ["test_key_to_clear2"] = "test_value_to_clear2"
os.environ["test_key_to_clear3"] = "test_value_to_clear3"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that 3 variables are useful. IMO a single variable to make sure that os.environ is non-empty is enough.

Suggested change
os.environ["test_key_to_clear1"] = "test_value_to_clear1"
os.environ["test_key_to_clear2"] = "test_value_to_clear2"
os.environ["test_key_to_clear3"] = "test_value_to_clear3"
os.environ["test_key_to_clear"] = "test_value_to_clear"

@picnixz picnixz changed the title gh-139482: avoid quadratic complexity in os.environ.clear gh-139482: speed up os.environ.clear by 2x Oct 3, 2025
@picnixz
Copy link
Member

picnixz commented Oct 3, 2025

In other words, we avoid quadratic time complexity in one place, but still have it in another.

In this case, I'm not really sure it's worth the change. There is an extension clearenv() that may be available but it's not part of POSIX so it's a conditional usage. I don't think having os.clearenv() instead of looping over os.environ is really a good idea (for instance os.clearenv() could be done but os.environ would still be waiting to be mutated, so it's likely better to remove the keys one by one).

Now, I wouldn't mind a 2x speed-up but I think we should indicate in the docs that os.environ.clear has quadratic time complexity because of the underlying glibc implementation. I don't know how it's on Windows.

Btw, please update the NEWS entry as well.

@vstinner
Copy link
Member

vstinner commented Oct 8, 2025

@picnixz @StanFromIreland @vstinner what do you think?

I'm not really impressed by benchmark numbers. I expected a x10 or x100 difference since you announced that os.environ.clear() has a "quadratic time complexity".

So I'm not fully convinced that this change is required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants