-
-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tarfile.open(mode="r") race condition when importing lzma #83611
Comments
Hey guys, We have a component that archives and unarchives multiple files in separate threads that started to We have noticed a bunch of What is unfortunate, is that given the traceback: Traceback (most recent call last):
File "test.py", line 18, in <module>
list(pool.map(test_lzma, range(100)))
File "/opt/lang/python37/lib/python3.7/concurrent/futures/_base.py", line 598, in result_iterator
yield fs.pop().result()
File "/opt/lang/python37/lib/python3.7/concurrent/futures/_base.py", line 428, in result
return self.__get_result()
File "/opt/lang/python37/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/opt/lang/python37/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "test.py", line 14, in test_lzma
tarfile.open(fileobj=buf, mode="r")
File "/opt/lang/python37/lib/python3.7/tarfile.py", line 1573, in open
return func(name, "r", fileobj, **kwargs)
File "/opt/lang/python37/lib/python3.7/tarfile.py", line 1699, in xzopen
fileobj = lzma.LZMAFile(fileobj or name, mode, preset=preset)
AttributeError: module 'lzma' has no attribute 'LZMAFile' the last line of the traceback is right AFTER this block (tarfile.py:1694):
Importing lzma in ipython fails properly:
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-2-c0255b54beb9> in <module>
----> 1 import lzma
When trying to debug the problem, we have noticed it's not deterministic. In order to reproduce it, Race condition occurs both on Python 3.7.3 and 3.7.6. I know that the test script writes uncompressed archives and during opening tries to guess the compression. |
Uploading fixed file (the former had a typo) |
PR 18161 fixes race condition by using "from ... import ..." which waits until the module be completely initialized if the specified names are not set. |
Correct me if I'm wrong, but I think the behavior of 'import lzma' in In general, I understand that due to how dynamic Python is, it might not be But, looking at the stacktrace from ipython the problem lies in a sequence of I think a race condition caused by simply using Haven't checked if the cause is isolated to how tarfile works, or works in general, though :-( |
By the way, thanks a lot for the fix <3 |
It is intended to support circular imports. Let foo.py contains "import bar" and bar.py contains "import foo". When you execute "import foo", the import machinery first creates an empty module foo, adds it to sys.modules, reads foo.py and executes it in the namespace of module foo. When the interpreter encounters "import bar" in foo.py, the import machinery creates an empty module bar, adds it to sys.modules, reads bar.py and executes it in the namespace of module bar. When the interpreter encounters "import foo" in bar.py, the import machinery takes the module foo from sys.modules. So you break an infinite cycle and can import modules with cyclic dependencies. You can argue that cyclic import does not look as a good practice, but actually it is pretty common case when you import a submodule in a package. If foo/init.py contains "from .bar import Bar", the foo module must be imported before you import foo.bar, but is not completely initialized at that time yet. |
This is a HUGE eye opener! Didn't know of that 'import' vs 'from x import pt., 24 sty 2020, 15:08 użytkownik Serhiy Storchaka <report@bugs.python.org>
|
It's covered in the language reference where import semantics are explained. It isn't explicitly called out in higher-level docs that I'm aware of because it only really comes up in edge cases like importing in a thread which we actively discourage unless there's a compelling reason not to. |
Thanks, Serhiy! I had no idea of those changes to import. This does raise the question, though, of whether accessing an empty module's attributes should either:
It seems like this is a trap waiting for anybody using threaded code and on-demand imports. |
Brett:
When was this change made? Assuming it is, in fact, a change. It seems like most of the stdlib could be called in a thread, so the practice of lazily importing a module wholesale might need to be audited and proactively fixed. Maybe that could be a newcomer's easy(ish) project? |
I suppose it fixed the bug. But I cannot confirm because I cannot reproduce the original bug. |
In msg360620 Serhiy wrote:
The following pdb session that is run when executing the foo.py script, shows that this is not quite accurate. When the interpreter encounters "import foo" in bar.py, the import machinery instead starts executing foo.py once again and detects that at line 9 of foo.py the 'import bar' statement does not have to be executed since the import machinery is already in the process of importing this module. So it is at that point that the infinite cycle is broken. $ cat -n foo.py
1 import pdb
2 debug = 1
3
4 # Prevent starting a new pdb session when foo is imported by bar.
5 if debug and not hasattr(pdb, 'is_pdb_started'):
6 pdb.Pdb(skip=['importlib*']).set_trace()
7 pdb.is_pdb_started = True
8
9 import bar
10 FOO = 'foo'
11 try:
12 print(bar.BAR)
13 except AttributeError as e:
14 # The exception is raised with an explicit reason when foo is imported by
15 # bar due to partially initialized module 'bar'.
16 print(e)
$ cat -n bar.py
1 import foo
2 BAR = 'bar'
3 print(foo.FOO)
$ python foo.py
> /tmp/foo.py(7)<module>()
-> pdb.is_pdb_started = True
(Pdb) step
> /tmp/foo.py(9)<module>()
-> import bar
(Pdb) step
--Call--
> /tmp/bar.py(1)<module>()
-> import foo
(Pdb) step
> /tmp/bar.py(1)<module>()
-> import foo
(Pdb) step
--Call--
> /tmp/foo.py(1)<module>()
-> import pdb
(Pdb) step
> /tmp/foo.py(1)<module>()
-> import pdb
(Pdb) step
> /tmp/foo.py(2)<module>()
-> debug = 1
(Pdb) step
> /tmp/foo.py(5)<module>()
-> if debug and not hasattr(pdb, 'is_pdb_started'):
(Pdb) step
> /tmp/foo.py(9)<module>()
-> import bar
(Pdb) step
> /tmp/foo.py(10)<module>()
-> FOO = 'foo'
(Pdb) step
> /tmp/foo.py(11)<module>()
-> try:
(Pdb) step
> /tmp/foo.py(12)<module>()
-> print(bar.BAR)
(Pdb) step
AttributeError: partially initialized module 'bar' has no attribute 'BAR' (most likely due to a circular import)
> /tmp/foo.py(12)<module>()
-> print(bar.BAR)
(Pdb) step
> /tmp/foo.py(13)<module>()
-> except AttributeError as e:
(Pdb) step
> /tmp/foo.py(16)<module>()
-> print(e)
(Pdb) step
partially initialized module 'bar' has no attribute 'BAR' (most likely due to a circular import)
--Return--
> /tmp/foo.py(16)<module>()->None
-> print(e)
(Pdb) step
> /tmp/bar.py(2)<module>()
-> BAR = 'bar'
(Pdb) step
> /tmp/bar.py(3)<module>()
-> print(foo.FOO)
(Pdb) step
foo
--Return--
> /tmp/bar.py(3)<module>()->None
-> print(foo.FOO)
(Pdb) step
> /tmp/foo.py(10)<module>()
-> FOO = 'foo'
(Pdb) step
> /tmp/foo.py(11)<module>()
-> try:
(Pdb) step
> /tmp/foo.py(12)<module>()
-> print(bar.BAR)
(Pdb) step
bar
--Return--
> /tmp/foo.py(12)<module>()->None
-> print(bar.BAR)
(Pdb) step
$ |
This fixed the bug. To reproduce the bug, substitute the "from _lzma import *" statement with "raise ImportError" |
It has always been this way, so no change here. We have actually improved the situation over the years with attempts at better locking in importlib itself. |
Cool. I appreciate all the work in this area, thank you! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: