Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azure: use a custom event loop for authentication #5958

Merged
merged 5 commits into from
May 5, 2021

Conversation

isidentical
Copy link
Contributor

Resolves #5556

@isidentical
Copy link
Contributor Author

@friggog can you try this patch to confirm whether it works for you or not? I've tested it by myself, and didn't hit any issues

@friggog
Copy link
Contributor

friggog commented May 4, 2021

Unfortunately not fixed:

DVC version: 2.0.18+59dbea
---------------------------------
Platform: Python 3.7.10 on Windows-10-10.0.19041-SP0
Supports: azure, http, https
Cache types: hardlink
Cache directory: NTFS on C:\
Caches: local
Remotes: azure
Workspace directory: NTFS on C:\
Repo: dvc (subdir), git

Same error as before when running dvc pull/fetch

Traceback (most recent call last):
  File "...lib\site-packages\dvc\remote\base.py", line 35, in wrapper
    func(from_info, to_info, *args, **kwargs)
  File "...lib\site-packages\dvc\fs\base.py", line 284, in download
    return self._download_file(from_info, to_info, name, no_progress_bar,)
  File "...lib\site-packages\dvc\fs\base.py", line 341, in _download_file
    from_info, tmp_file, name=name, no_progress_bar=no_progress_bar
  File "...lib\site-packages\dvc\fs\fsspec_wrapper.py", line 150, in _download
    with self.open(from_info, "rb") as fobj:
  File "...lib\site-packages\dvc\fs\fsspec_wrapper.py", line 86, in open
    return self.fs.open(self._with_bucket(path_info), mode=mode)
  File "...lib\site-packages\fsspec\spec.py", line 946, in open
    **kwargs,
  File "...lib\site-packages\adlfs\spec.py", line 1551, in _open
    **kwargs,
  File "...lib\site-packages\adlfs\spec.py", line 1615, in __init__
    fs.service_client.get_container_client(self.container_name)
  File "...lib\site-packages\azure\storage\blob\aio\_blob_service_client_async.py", line 591, in get_container_client
    key_resolver_function=self.key_resolver_function, loop=self._loop)
  File "...lib\site-packages\azure\storage\blob\aio\_container_client_async.py", line 118, in __init__
    **kwargs)
  File "...lib\site-packages\azure\storage\blob\_container_client.py", line 149, in __init__
    super(ContainerClient, self).__init__(parsed_url, service='blob', credential=credential, **kwargs)
  File "...lib\site-packages\azure\storage\blob\_shared\base_client.py", line 114, in __init__
    self._config, self._pipeline = self._create_pipeline(self.credential, storage_sdk=service, **kwargs)
  File "...lib\site-packages\azure\storage\blob\_shared\base_client_async.py", line 71, in _create_pipeline
    self._credential_policy = AsyncBearerTokenCredentialPolicy(credential, STORAGE_OAUTH_SCOPE)
  File "...lib\site-packages\azure\core\pipeline\policies\_authentication_async.py", line 24, in __init__
    self._lock = asyncio.Lock()
  File "...lib\asyncio\locks.py", line 161, in __init__
    self._loop = events.get_event_loop()
  File "...lib\asyncio\events.py", line 644, in get_event_loop
    % threading.current_thread().name)
RuntimeError: There is no current event loop in thread 'ThreadPoolExecutor-10_0'.

Using azure-identity 1.5.0 and adlfs 0.7.4 if that's relevant

I think the issue arises when using the filesystem from within a ThreadPoolExecutor (e.g. here and here) rather than when initializing it

@isidentical
Copy link
Contributor Author

I think the issue arises when using the filesystem from within a ThreadPoolExecutor (e.g. here and here) rather than when initializing it

Generally, the methods of adlfs are always wrapped with an event loop, though there are 2 exceptions that I can see. One was the initialization, and now by looking at your traceback I also noticed that the open() calls are implemented separately and not wrapped with event loops. Can you try again with the latest version of this PR?

@friggog
Copy link
Contributor

friggog commented May 5, 2021

@isidentical seems to be working locally using DefaultAzureCredential! I'll check with SPA later, but seems like this resolves the issue 🎉

@isidentical
Copy link
Contributor Author

Thanks for trying it out! We can probably wait for the SPA test, and then merge it.

@friggog
Copy link
Contributor

friggog commented May 5, 2021

Also works using SPA - I get these warnings after the command completes though:

Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f9a4869db90>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f9a48670510>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f9a48616050>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f9a486097d0>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f9a48619c10>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f9a485cb7d0>

This didn't happen locally interestingly

@isidentical
Copy link
Contributor Author

This is probably related to adlfs (unlikely) or the azure's python bindings (likely), where they don't close something somewhere. Honestly I think that we could ignore them just like we ignore normal asyncio warnings, since we are seeing those when they are getting deallocated so that means that is no leak or something like that.

dvc/fs/azure.py Show resolved Hide resolved
dvc/fs/azure.py Show resolved Hide resolved
@efiop efiop merged commit a234f53 into iterative:master May 5, 2021
@efiop efiop added the bugfix fixes bug label May 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix fixes bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue with fetch from azure remote "no current event loop in thread"
3 participants