Glob-related error appeared on 0.3.3 #97

avibrazil · 2022-01-05T03:07:44Z

0.3.2 was working fine until I upgraded to 0.3.3 and then the following error started to happen. I don't know why and where is the problem but going back to 0.3.2 fixes the problem.

  File "/home/sagemaker-user/Kroft/my_avm/avm/avm.py", line 1063, in load
    available=[f for f in list(resolved_path.glob(filename)) if '.dvc' not in str(f)]
  File "/home/sagemaker-user/.local/lib/python3.9/site-packages/s3path.py", line 706, in glob
    yield from super().glob(pattern)
  File "/home/sagemaker-user/.pyenv/versions/3.9.9/lib/python3.9/pathlib.py", line 1177, in glob
    for p in selector.select_from(self):
  File "/home/sagemaker-user/.pyenv/versions/3.9.9/lib/python3.9/pathlib.py", line 523, in select_from
    if not is_dir(parent_path):
  File "/home/sagemaker-user/.local/lib/python3.9/site-packages/s3path.py", line 678, in is_dir
    return self._accessor.is_dir(self)
  File "/home/sagemaker-user/.local/lib/python3.9/site-packages/s3path.py", line 169, in is_dir
    resource, _ = self.configuration_map.get_configuration(path)
  File "/home/sagemaker-user/.local/lib/python3.9/site-packages/s3path.py", line 90, in get_configuration
    if resources is None and path in self.resources:
TypeError: argument of type 'NoneType' is not iterable

filename variable contains something like random • * • *.pkl*. And I don't think this is relevant information but this code is a static method of a class and it was running twice, in 2 parallel threads, where first thread succeeded and second failed with above error.

The text was updated successfully, but these errors were encountered:

liormizr · 2022-01-06T14:40:15Z

@avibrazil thanks for the issue

How can I reproduce it?
can you please give me a line to run or a small script?

sbrandtb · 2022-01-10T13:53:55Z

Same issue here, also threading-related. Relevant stack trace:

  File "/usr/local/lib/python3.9/pathlib.py", line 1264, in write_bytes
    with self.open(mode='wb') as f: 
  File "/usr/local/lib/python3.9/site-packages/s3path.py", line 721, in open
    return self._accessor.open(         
  File "/usr/local/lib/python3.9/site-packages/s3path.py", line 205, in open
    resource, config = self.configuration_map.get_configuration(path)
  File "/usr/local/lib/python3.9/site-packages/s3path.py", line 90, in get_configuration
    if resources is None and path in self.resources:                                                      
TypeError: argument of type 'NoneType' is not iterable

The simplified code goes like this:

src = Path('some-local')
dst = S3Path.from_uri('s3://something') / 'foo.csv'
dst.write_bytes(src.read_bytes())

And run this massively parallel. Today I found out that creating boto3 clients is not thread safe and changed my own code like this. s3path is creating its own boto3 resource in class _S3Accessor, maybe that is the culprit.

The issue was there all the time, but became critical in 9950dd0 with version 0.3.3. `class _S3ConfigurationMap` is not thread safe because both `set/get_configuration()` ask for `self.arguments` and `self.resources` being `None`. However, since the configuration map is a singleton, when using the library in a multithreaded environment, it can happen that half the part of `_initial_setup` has run and `self.arguments` is indeed not `None`, but `self.resources` is still `None` because in another thread, creation of the boto3 resource is still ongoing. Then, our thread would *not* call `_initial_setup()`, but still die on `self.resources` being `None` when trying to access it, leading to the error described in liormizr#97. I could not make the bug appear without introducing the IO delay by creating the resource in `_initial_setup()`. Maybe it is a CPython implementation detail that makes `_initial_setup()` atomic of incredibly rare to not be atomic without the creation of the boto resource. Anyways, the setup is now synchronised with a lock, which is required, again, given that `_S3ConfigurationMap` is a singleton. Here is the code that reliably triggers the bug without the change: ```py from multiprocessing.pool import ThreadPool from s3path import S3Path def do(i): dst = S3Path.from_uri('s3://something') / f'hello_{i}.txt' dst.write_bytes(b'hello') ThreadPool(processes=20).map(do, range(100)) ```

sbrandtb · 2022-03-01T16:49:22Z

I submitted a fix. Since I understand the issue better now, here is a workaround that should fix the issue until a new version is released: Basically, use any feature of the library without threads at the beginning of your application. It will make the library initialise without the concurrency issue that is the cause of the exception. I.e.:

S3Path.from_uri('s3://some-bucket-you-have-access-to/').exists()

There are probably other methods that are even shorter, but this one works.

avibrazil · 2022-03-02T10:57:34Z

I’m not sure I got it.

You mean, call the lib somehow before any threading and then the lib can be used later in threading contexts?

Is this the temporary fix?

sbrandtb · 2022-03-02T12:26:13Z

@avibrazil Sorry, I realise I was not clear 😞

Take the example from the pull request:

from multiprocessing.pool import ThreadPool

from s3path import S3Path

def do(i):
    dst = S3Path.from_uri('s3://something') / f'hello_{i}.txt'
    dst.write_bytes(b'hello')

# This line will "fix" the issue because it does ultimately call
# _S3ConfigurationMap.get_configuration()
S3Path.from_uri('s3://some-bucket-you-have-access-to/').exists()

# Now, do your threaded stuff.
ThreadPool(processes=20).map(do, range(100))

The thing is, it needs to call _S3ConfigurationMap.get_configuration(). Pardon me that I do not compile a list of possible functions that do this, because my time is limited 🙂 With the code above you can experiment and find one that also works, if the provided one is not good for you, i.e. because it requires some permission you do not have or so.

EDIT: The important part is calling exists(). How you construct S3Path does not matter.

Fix #97: Concurrency Issue

liormizr · 2022-03-21T09:01:25Z

Hi all,

I merged the fix
I'll update in this issue when I'll create a new release

Thanks!

sbrandtb · 2022-03-22T16:31:30Z

@liormizr Anything we can help with? This issue is blocking my upgrade to Python 3.10, so I am eager to see the new release 🙂

liormizr · 2022-03-22T16:44:33Z

@sbrandtb will it help if I'll deploy a new version now only with this fix?

sbrandtb · 2022-03-22T17:47:53Z

@liormizr yes 🙂

liormizr · 2022-03-22T18:03:01Z

Version 0.3.4 deployed to PyPi
With only this issue fix
Sorry for the delay

I'll update tomorrow when I'll deploy to Conda

liormizr · 2022-03-23T09:15:17Z

Version deployed to Conda

avibrazil · 2022-03-23T13:48:54Z

Oh, that's great!
I'll test it later.
But having it published, maybe it is time to close this bug.

sbrandtb · 2022-03-24T10:11:09Z

Update: Unrelated. This issue was related to this workaround

After updating and running my project in our CI, I am seemingly randomly getting

ValueError: the bucket 'xxx does not exist, or is forbidden for access (ClientError('An error occurred (InvalidAccessKeyId) when calling the CreateMultipartUpload operation: The AWS Access Key Id you provided does not exist in our records.'))

I did not observe this while testing the change locally and I am not sure if if is not connected to another change when migrating to Python 3.10 and updating packages, but want to drop this message here so if anyone else encounters it, he can +1. After all, it looks like some other non-deterministic thing.

avibrazil mentioned this issue Feb 8, 2022

glob is broken #101

Closed

sbrandtb mentioned this issue Mar 1, 2022

Fix #97: Concurrency Issue #102

Merged

liormizr closed this as completed in 21c1cef Mar 21, 2022

liormizr added a commit that referenced this issue Mar 21, 2022

Merge pull request #102 from FRI-DAY/97_fix_concurrency_issue

261aabc

Fix #97: Concurrency Issue

liormizr reopened this Mar 21, 2022

liormizr closed this as completed Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Glob-related error appeared on 0.3.3 #97

Glob-related error appeared on 0.3.3 #97

avibrazil commented Jan 5, 2022 •

edited

Loading

liormizr commented Jan 6, 2022

sbrandtb commented Jan 10, 2022

sbrandtb commented Mar 1, 2022

avibrazil commented Mar 2, 2022

sbrandtb commented Mar 2, 2022 •

edited

Loading

liormizr commented Mar 21, 2022

sbrandtb commented Mar 22, 2022

liormizr commented Mar 22, 2022

sbrandtb commented Mar 22, 2022

liormizr commented Mar 22, 2022

liormizr commented Mar 23, 2022

avibrazil commented Mar 23, 2022

sbrandtb commented Mar 24, 2022 •

edited

Loading

Glob-related error appeared on 0.3.3 #97

Glob-related error appeared on 0.3.3 #97

Comments

avibrazil commented Jan 5, 2022 • edited Loading

liormizr commented Jan 6, 2022

sbrandtb commented Jan 10, 2022

sbrandtb commented Mar 1, 2022

avibrazil commented Mar 2, 2022

sbrandtb commented Mar 2, 2022 • edited Loading

liormizr commented Mar 21, 2022

sbrandtb commented Mar 22, 2022

liormizr commented Mar 22, 2022

sbrandtb commented Mar 22, 2022

liormizr commented Mar 22, 2022

liormizr commented Mar 23, 2022

avibrazil commented Mar 23, 2022

sbrandtb commented Mar 24, 2022 • edited Loading

avibrazil commented Jan 5, 2022 •

edited

Loading

sbrandtb commented Mar 2, 2022 •

edited

Loading

sbrandtb commented Mar 24, 2022 •

edited

Loading