-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Glob-related error appeared on 0.3.3 #97
Comments
@avibrazil thanks for the issue How can I reproduce it? |
Same issue here, also threading-related. Relevant stack trace:
The simplified code goes like this: src = Path('some-local')
dst = S3Path.from_uri('s3://something') / 'foo.csv'
dst.write_bytes(src.read_bytes()) And run this massively parallel. Today I found out that creating boto3 clients is not thread safe and changed my own code like this. s3path is creating its own boto3 resource in |
The issue was there all the time, but became critical in 9950dd0 with version 0.3.3. `class _S3ConfigurationMap` is not thread safe because both `set/get_configuration()` ask for `self.arguments` and `self.resources` being `None`. However, since the configuration map is a singleton, when using the library in a multithreaded environment, it can happen that half the part of `_initial_setup` has run and `self.arguments` is indeed not `None`, but `self.resources` is still `None` because in another thread, creation of the boto3 resource is still ongoing. Then, our thread would *not* call `_initial_setup()`, but still die on `self.resources` being `None` when trying to access it, leading to the error described in liormizr#97. I could not make the bug appear without introducing the IO delay by creating the resource in `_initial_setup()`. Maybe it is a CPython implementation detail that makes `_initial_setup()` atomic of incredibly rare to not be atomic without the creation of the boto resource. Anyways, the setup is now synchronised with a lock, which is required, again, given that `_S3ConfigurationMap` is a singleton. Here is the code that reliably triggers the bug without the change: ```py from multiprocessing.pool import ThreadPool from s3path import S3Path def do(i): dst = S3Path.from_uri('s3://something') / f'hello_{i}.txt' dst.write_bytes(b'hello') ThreadPool(processes=20).map(do, range(100)) ```
The issue was there all the time, but became critical in 9950dd0 with version 0.3.3. `class _S3ConfigurationMap` is not thread safe because both `set/get_configuration()` ask for `self.arguments` and `self.resources` being `None`. However, since the configuration map is a singleton, when using the library in a multithreaded environment, it can happen that half the part of `_initial_setup` has run and `self.arguments` is indeed not `None`, but `self.resources` is still `None` because in another thread, creation of the boto3 resource is still ongoing. Then, our thread would *not* call `_initial_setup()`, but still die on `self.resources` being `None` when trying to access it, leading to the error described in liormizr#97. I could not make the bug appear without introducing the IO delay by creating the resource in `_initial_setup()`. Maybe it is a CPython implementation detail that makes `_initial_setup()` atomic of incredibly rare to not be atomic without the creation of the boto resource. Anyways, the setup is now synchronised with a lock, which is required, again, given that `_S3ConfigurationMap` is a singleton. Here is the code that reliably triggers the bug without the change: ```py from multiprocessing.pool import ThreadPool from s3path import S3Path def do(i): dst = S3Path.from_uri('s3://something') / f'hello_{i}.txt' dst.write_bytes(b'hello') ThreadPool(processes=20).map(do, range(100)) ```
I submitted a fix. Since I understand the issue better now, here is a workaround that should fix the issue until a new version is released: Basically, use any feature of the library without threads at the beginning of your application. It will make the library initialise without the concurrency issue that is the cause of the exception. I.e.: S3Path.from_uri('s3://some-bucket-you-have-access-to/').exists() There are probably other methods that are even shorter, but this one works. |
I’m not sure I got it. You mean, call the lib somehow before any threading and then the lib can be used later in threading contexts? Is this the temporary fix? |
@avibrazil Sorry, I realise I was not clear 😞 Take the example from the pull request: from multiprocessing.pool import ThreadPool
from s3path import S3Path
def do(i):
dst = S3Path.from_uri('s3://something') / f'hello_{i}.txt'
dst.write_bytes(b'hello')
# This line will "fix" the issue because it does ultimately call
# _S3ConfigurationMap.get_configuration()
S3Path.from_uri('s3://some-bucket-you-have-access-to/').exists()
# Now, do your threaded stuff.
ThreadPool(processes=20).map(do, range(100)) The thing is, it needs to call EDIT: The important part is calling |
Hi all, I merged the fix Thanks! |
@liormizr Anything we can help with? This issue is blocking my upgrade to Python 3.10, so I am eager to see the new release 🙂 |
@sbrandtb will it help if I'll deploy a new version now only with this fix? |
@liormizr yes 🙂 |
Version 0.3.4 deployed to PyPi I'll update tomorrow when I'll deploy to Conda |
Version deployed to Conda |
Oh, that's great! |
Update: Unrelated. This issue was related to this workaround After updating and running my project in our CI, I am seemingly randomly getting
I did not observe this while testing the change locally and I am not sure if if is not connected to another change when migrating to Python 3.10 and updating packages, but want to drop this message here so if anyone else encounters it, he can +1. After all, it looks like some other non-deterministic thing. |
0.3.2 was working fine until I upgraded to 0.3.3 and then the following error started to happen. I don't know why and where is the problem but going back to 0.3.2 fixes the problem.
filename
variable contains something likerandom • * • *.pkl*
. And I don't think this is relevant information but this code is a static method of a class and it was running twice, in 2 parallel threads, where first thread succeeded and second failed with above error.The text was updated successfully, but these errors were encountered: