GCSFileSystem() hangs when called from multiple processes #379

JackKelly · 2021-04-26T14:33:24Z

What happened:
In the last two versions of gcsfs (versions 2021.04.0 and 0.8.0), calling gcsfs.GCSFileSystem() from multiple processes hangs without any error messages if gcsfs.GCSFileSystem() has been called previously in the same Python interpreter session.

This bug was not present in gcsfs version 0.7.2 (with fsspec 0.8.7). All the code examples below work perfectly with gcsfs version 0.7.2 (with fsspec 0.8.7).

Minimal Complete Verifiable Example:

The examples below assume gcsfs version 2021.04.0 is installed (with fsspec 2021.04.0) or gcsfs version 0.8.0 (with fsspec 0.9.0)

Install a fresh conda environment: conda create --name test_gcsfs python=3.8 gcsfs ipykernel

The last block of this code hangs:

from concurrent.futures import ProcessPoolExecutor
import gcsfs

# This line works fine!  (And it's fine to repeat this line multiple times.)
gcs = gcsfs.GCSFileSystem() 

# This block hangs, with no error messages:
with ProcessPoolExecutor() as executor:
    for i in range(8):
        future = executor.submit(gcsfs.GCSFileSystem)

But, if we don't do gcs = gcsfs.GCSFileSystem(), then the code works fine. The next code example works perfectly, if run in a fresh Python interpreter. The only difference between the next code example and the previous code example is I've removed gcs = gcsfs.GCSFileSystem().

from concurrent.futures import ProcessPoolExecutor
import gcsfs

# This works fine:
with ProcessPoolExecutor() as executor:
    for i in range(8):
        future = executor.submit(gcsfs.GCSFileSystem)

Likewise, calling the ProcessPoolExecutor multiple times works the first time, but hangs on subsequent tries:

from concurrent.futures import ProcessPoolExecutor
import gcsfs

def process_pool():
    with ProcessPoolExecutor(max_workers=1) as executor:
        for i in range(8):
            future = executor.submit(gcsfs.GCSFileSystem)

# The first attempt works fine:
process_pool()

# This second attempt hangs:
process_pool()

Anything else we should know

Thank you so much for all your hard work on gcsfs - it's a hugely useful tool! Sorry to be reporting a bug!

I tested all this code in a Jupyter Lab notebook.

This issue might be related to this Stack Overflow issue: https://stackoverflow.com/questions/66283634/use-gcsfilesystem-with-multiprocessing

Environment:

Dask version: Not installed
Python version: 3.8
Operating System: Ubuntu 20.10
Install method: conda, from conda-forge, using a fresh conda environment.

The text was updated successfully, but these errors were encountered:

martindurant · 2021-04-26T14:37:19Z

Please try changing your subprocess execution method to spawn or forkserver https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

JackKelly · 2021-04-26T14:45:36Z

Wow - thank you for the very quick reply! You're right, adding the line multiprocessing.set_start_method('forkserver') or multiprocessing.set_start_method('spawn') fixes the issue in my minimal example...

...Let me check if this also fixes the issue in my 'real' code project (loading data from a Zarr store in parallel from a PyTorch DataLoader)....

Is this expected behaviour?

martindurant · 2021-04-26T14:55:16Z

For is known to cause problems when there are any other threads or event loops already active. I thought there was a guard in the code to give a reasonable error in this case, but it seems not.

JackKelly · 2021-04-26T14:55:17Z

Unfortunately setting the start method to either spawn or forkserver doesn't work for my PyTorch code.

Here's the error when I add multiprocessing.set_start_method('spawn') at the start of the code:

------------------------------------------------------------------------
RuntimeError                           Traceback (most recent call last)
~/miniconda3/envs/predict_pv_yield/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout)
    985         try:
--> 986             data = self._data_queue.get(timeout=timeout)
    987             return (True, data)

~/miniconda3/envs/predict_pv_yield/lib/python3.8/multiprocessing/queues.py in get(self, block, timeout)
    106                     timeout = deadline - time.monotonic()
--> 107                     if not self._poll(timeout):
    108                         raise Empty

~/miniconda3/envs/predict_pv_yield/lib/python3.8/multiprocessing/connection.py in poll(self, timeout)
    256         self._check_readable()
--> 257         return self._poll(timeout)
    258 

~/miniconda3/envs/predict_pv_yield/lib/python3.8/multiprocessing/connection.py in _poll(self, timeout)
    423     def _poll(self, timeout):
--> 424         r = wait([self], timeout)
    425         return bool(r)

~/miniconda3/envs/predict_pv_yield/lib/python3.8/multiprocessing/connection.py in wait(object_list, timeout)
    930             while True:
--> 931                 ready = selector.select(timeout)
    932                 if ready:

~/miniconda3/envs/predict_pv_yield/lib/python3.8/selectors.py in select(self, timeout)
    414         try:
--> 415             fd_event_list = self._selector.poll(timeout)
    416         except InterruptedError:

~/miniconda3/envs/predict_pv_yield/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py in handler(signum, frame)
     65         # Python can still get and update the process status successfully.
---> 66         _error_if_any_worker_fails()
     67         if previous_handler is not None:

RuntimeError: DataLoader worker (pid 67567) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.

The above exception was the direct cause of the following exception:

RuntimeError                           Traceback (most recent call last)
<timed exec> in <module>

~/miniconda3/envs/predict_pv_yield/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __next__(self)
    515             if self._sampler_iter is None:
    516                 self._reset()
--> 517             data = self._next_data()
    518             self._num_yielded += 1
    519             if self._dataset_kind == _DatasetKind.Iterable and \

~/miniconda3/envs/predict_pv_yield/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _next_data(self)
   1180 
   1181             assert not self._shutdown and self._tasks_outstanding > 0
-> 1182             idx, data = self._get_data()
   1183             self._tasks_outstanding -= 1
   1184             if self._dataset_kind == _DatasetKind.Iterable:

~/miniconda3/envs/predict_pv_yield/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _get_data(self)
   1146         else:
   1147             while True:
-> 1148                 success, data = self._try_get_data()
   1149                 if success:
   1150                     return data

~/miniconda3/envs/predict_pv_yield/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout)
    997             if len(failed_workers) > 0:
    998                 pids_str = ', '.join(str(w.pid) for w in failed_workers)
--> 999                 raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
   1000             if isinstance(e, queue.Empty):
   1001                 return (False, None)

RuntimeError: DataLoader worker (pid(s) 67567) exited unexpectedly

Here's the error when I add multiprocessing.set_start_method('forkserver') at the start of the code:

------------------------------------------------------------------------
Empty                                  Traceback (most recent call last)
~/miniconda3/envs/predict_pv_yield/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout)
    985         try:
--> 986             data = self._data_queue.get(timeout=timeout)
    987             return (True, data)

~/miniconda3/envs/predict_pv_yield/lib/python3.8/multiprocessing/queues.py in get(self, block, timeout)
    107                     if not self._poll(timeout):
--> 108                         raise Empty
    109                 elif not self._poll():

Empty: 

The above exception was the direct cause of the following exception:

RuntimeError                           Traceback (most recent call last)
<timed exec> in <module>

~/miniconda3/envs/predict_pv_yield/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __next__(self)
    515             if self._sampler_iter is None:
    516                 self._reset()
--> 517             data = self._next_data()
    518             self._num_yielded += 1
    519             if self._dataset_kind == _DatasetKind.Iterable and \

~/miniconda3/envs/predict_pv_yield/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _next_data(self)
   1180 
   1181             assert not self._shutdown and self._tasks_outstanding > 0
-> 1182             idx, data = self._get_data()
   1183             self._tasks_outstanding -= 1
   1184             if self._dataset_kind == _DatasetKind.Iterable:

~/miniconda3/envs/predict_pv_yield/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _get_data(self)
   1146         else:
   1147             while True:
-> 1148                 success, data = self._try_get_data()
   1149                 if success:
   1150                     return data

~/miniconda3/envs/predict_pv_yield/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout)
    997             if len(failed_workers) > 0:
    998                 pids_str = ', '.join(str(w.pid) for w in failed_workers)
--> 999                 raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
   1000             if isinstance(e, queue.Empty):
   1001                 return (False, None)

RuntimeError: DataLoader worker (pid(s) 67640, 67641, 67642, 67643, 67644, 67645, 67646, 67647) exited unexpectedly

martindurant · 2021-04-26T15:05:34Z

I'm afraid those tracebacks are not coming from the child process, so they don't contain any useful information.

martindurant · 2021-04-26T15:27:17Z

discussion on fork and event loops: https://bugs.python.org/issue21998
Note that threads suffer from this separately. You should, at the very minimum, clear fsspec's instance cache before calling subprocesses

gcs = fsspec.filesystem("gcs", ...)
# use gcs
gcs.clear_instance_cache()
# call subprocesses

and do not pass the instance gcs to the child processes.

JackKelly · 2021-04-26T15:39:54Z

Thanks loads for the additional help!

Unfortunately, PyTorch doesn't log error messages from the child processes. I've tried (briefly) and I can't seem to peer into the child processes (at least, not without hacking the PyTorch library itself).

I've tried setting the start method using torch.multiprocessing.set_start_method and/or using dataloader.multiprocessing_context = 'spawn' but these still result in the same error messages above :(

Right now, the only fix I'm aware of for my code is to use gcsfs version 0.7.2.

Do you know what might have changed within gcsfs between version 0.7.2 and version 0.8.0 to cause this to stop working in 0.8.0? Version 0.7.2 works fine for the code above :)

martindurant · 2021-04-26T15:44:15Z

You might also try v 2021.04.0.

JackKelly · 2021-04-26T15:50:37Z

Unfortunately this is broken for me in the two most recent versions of gcsfs (i.e. it's broken in both 0.8.0 and 2021.04.0).

martindurant · 2021-04-26T15:52:17Z

To be sure, if you don't use gcs/fsspec in the main process, all is well?

martindurant · 2021-04-26T15:57:11Z

Pytorch claims to use spawn, so if you can make code that causes a crash with spawn, I'll have something specific to fix.
The fsspec tests try HTTPFileSystem (another one that is async) with both spawn and forkserver.

JackKelly · 2021-04-26T16:02:27Z

To be sure, if you don't use gcs/fsspec in the main process, all is well?

That's almost the case in the minimal code snippet at the top of this thread (although the code hangs a second time we try the ProcessPoolExecutor code block).

In my PyTorch code, I haven't yet tried removing calls to gcsfs.GCSFileSystem() from the main process yet (but shouldn't be too much work)

Pytorch claims to use spawn, so if you can make code that causes a crash with spawn, I'll have something specific to fix.

OK, sounds good. Yeah, I think that perhaps my minimal code snippet isn't perfectly capturing the issue I'm experiencing with PyTorch + gcsfs. I'll need to dig through the PyTorch library code to try to come up with a better minimal code snippet... unfortunately that'll take a little while as the rest of this week is full of meetings :(

martindurant · 2021-04-26T16:16:03Z

That's almost the case in the minimal code snippet at the top of this thread (although the code hangs a second time we try the ProcessPoolExecutor code block).

Right, but only with fork - which is known to be troublesome, and we attempt to warn the user against. It is interesting that it works one time and then not - not sure what that means.

JackKelly · 2021-04-26T18:10:56Z

Right, but only with fork

Yeah, exactly.

I've had a quick look at the PyTorch library code and tried to create a simple minimal example. I've failed so far! TBH, the PyTorch multiprocessing code looks too complex for me to wrap my head round any time soon. So, instead, I'm going to push ahead with my machine learning research using gcsfs 0.7.2 (which works perfectly for my needs!). If this ML research pans out and we build a production service then I may revisit this issue to try to help fix it! Sorry, I appreciate that's not idea for gcsfs!

For reference, here's the PyTorch loop which constructs the worker processes:
https://github.com/pytorch/pytorch/blob/2f598b53ddfbd2dbbaddb76a0f30018a713f3c7a/torch/utils/data/dataloader.py#L898

JackKelly · 2021-05-04T10:40:31Z

Quick update:

I've just tried opening my Zarr file on Google Cloud like this:

import xarray as xr
dataset = xr.open_zarr('gs://<bucket>/<path>')

instead of like this:

    gcs = gcsfs.GCSFileSystem(access='read_only')
    store = gcsfs.GCSMap(root='<bucket>/<path>', gcs=gcs)
    dataset = xr.open_zarr(store, consolidated=True)

I've also tried updating all my other Python packages.

Unfortunately, neither of these changes fixes the crash when using gcsfs 2021.4.0 with a PyTorch multi-process data loader. But, as before, this isn't fatal for my work, because I can continue using gcsfs 0.7.2 and all is well :)

martindurant · 2021-05-06T15:15:02Z

dataset = xr.open_zarr('gs:///')

This is essentially the same as before; but you are not using multiprocessing now?

JackKelly · 2021-05-07T07:59:52Z

Ah, sorry for not being explicit: I'm still using multiprocessing (via PyTorch), and the code still hangs if I use multiprocessing with gcsfs 2021.4.0 or 0.8, sorry.

In terms of next steps... would one or both of these actions be useful?

Whilst it's probably beyond me (!) to create a minimal code example demonstrating this bug without PyTorch (because that would require me to understand PyTorch's innards well enough to re-create the essential parts of the PyTorch code!), I should be able to make a minimal code example with PyTorch and Zarr. Would that be useful?
And/or I could post on the PyTorch GitHub issue queue to ask them to help.

JackKelly · 2021-05-12T16:36:50Z

Hi @martindurant,

Re-reading this issue, I'm starting to wonder if I'm doing something stupid in my code :)

In my code, every sub-process does this when the sub-process starts up:

gcs = gcsfs.GCSFileSystem(access='read_only')
store = gcsfs.GCSMap(root=filename, gcs=gcs)
dataset = xr.open_zarr(store, consolidated=consolidated)
gcs.clear_instance_cache()

Is that wrong?! Should only the parent process run the code above, and then pass a copy of dataset into each child process?!

martindurant · 2021-05-12T16:40:48Z

Maybe call gcs.clear_instance_cache() before the block instead of at the end, or include skip_instance_cache=True in the constructor; but this still doesn't clear the reference to the loop and thread. You could do that with

fsspec.asyn.iothread[0] = None
fsspec.asyn.loop[0] = None

and that is what any fork-detecting code should be doing.

JackKelly · 2021-05-12T19:58:38Z

Oooh... some good news! I've got my multiprocessing code working with PyTorch and gcsfs 2021.4.0 :) I've made quite a few changes so I need to do some more tinkering to figure out exactly which change fixed the issue! (Thanks for all your help btw!)

martindurant · 2021-05-12T20:00:04Z

Very glad to hear it

JackKelly · 2021-05-12T20:09:28Z

I'm pretty sure it's the trick you suggested in your comment above which fixed the issue :)

    fsspec.asyn.iothread[0] = None
    fsspec.asyn.loop[0] = None

I run that in every process. To be even more specific, I run this in every process:

def open_zarr_on_gcp(filename: Union[str, Path]) -> xr.DataArray:
    """Lazily opens the Zarr store on Google Cloud Storage (GCS)."""
    gcs = gcsfs.GCSFileSystem(
        access='read_only', 
        skip_instance_cache=True  # Why skip_instance_cache?  See https://github.com/dask/gcsfs/issues/379#issuecomment-839929801
    )

    # Clear reference to the loop and thread.
    # See https://github.com/dask/gcsfs/issues/379#issuecomment-839929801
    # Only relevant for fsspec >= 0.9.0
    fsspec.asyn.iothread[0] = None
    fsspec.asyn.loop[0] = None
    
    store = gcsfs.GCSMap(root=filename, gcs=gcs)
    return xr.open_zarr(store)

JackKelly · 2021-05-13T10:16:40Z

I've done a few more experiments (in the hopes that this might be of use to other people in a similar situation; or maybe useful to help understand what's going on!)

It turns out that fsspec.asyn.iothread[0] = None; fsspec.asyn.loop[0] = None needs to be run in every worker process. It's not sufficient to just do this in the parent process.

It doesn't matter if the code does fsspec.asyn.iothread[0] = None; fsspec.asyn.loop[0] = None before or after gcs = gcsfs.GCSFileSystem().

When using fsspec.asyn.iothread[0] = None; fsspec.asyn.loop[0] = None, it's no longer necessary to do skip_instance_cache=True or gcs.clear_instance_cache().

Each worker process has to open the Zarr store. If I try lazily opening the Zarr store in the main process and passing this object into each worker process then fsspec throws an error saying it's not thread safe. That's fine, it's no problem for my code to open the Zarr store in each worker process.

martindurant · 2021-05-13T12:54:43Z

OK, that more or less confirms what I thought. I still don't know how the code can be in a place to set those to None after fork and before any instance tries to use the defunct objects. Or maybe we can simply document this situation?

JackKelly · 2021-05-13T13:03:48Z

Yeah, I agree - I think it's sufficient to just document this somewhere. Happy to have a go at drafting a paragraph or two if you can recommend where best to document this! (But also more than happy for someone else to write it!)

martindurant · 2021-05-13T13:48:25Z

I suppose it should be mainly documented in fsspec along with the rest of the async docs, but also mentioned here in gcsfs (which has far less docs in general).

ShethR · 2021-07-20T16:24:55Z

downgrading to gscfs==0.7.2 and fsspec==0.8.0 worked for me.

martindurant · 2021-07-20T16:34:22Z

@ShethR : did you try avoiding fork, as discussed above?

ShethR · 2021-07-20T16:45:11Z

No I did not! I used the default fork method.
Just for added information, I am using torch.multiprocessing instead of native multiprocessing. But i guess they both should have similar behavior.

martindurant · 2021-07-20T16:47:18Z

i guess they both should have similar behavior.

They will be executing the same low-level OS calls, yes. Fork is a long-standing highly efficient by problematic process.

lhoestq · 2022-05-19T13:30:57Z

Same issue for HTTPFileSystem, and fsspec.asyn.iothread[0] = None; fsspec.asyn.loop[0] = None also does the job.

I still don't know how the code can be in a place to set those to None after fork and before any instance tries to use the defunct objects.

It's been a few months since the last message, do you have a better idea of how it can be done ?

martindurant · 2022-05-19T13:33:44Z

Again, I would urge everyone simply not to do this! I'll comment shortly in fsspec/filesystem_spec#963 on what needs to happen.

swt2c · 2022-05-19T15:22:37Z

Again, I would urge everyone simply not to do this! I'll comment shortly in fsspec/filesystem_spec#963 on what needs to happen.

There are situations where fork() is useful, though, such as sharing a large read-only object between multiple processes without having to copy it / duplicate memory usage. It sure would be nice if fsspec was fork-friendly. :-)

martindurant · 2022-05-19T15:31:43Z

It sure would be nice if fsspec was fork-friendly

In practice, it's hard, and there's a reason that dask, for instance, doesn't do this. Following the prescription in the linked thread, I think it would be fair for fsspec to require the calling code to make sure that fork safety requirements are met, rather than attempt to automatically detect the change of PID and arrange the appropriate discarding of objects itself.

There are actually many python constructs that are (surprisingly) not fork safe, such as a requests Session or an open file object - never mind anything to do with async or threads.

sharing a large read-only object between multiple processes

I intend on working on (single node) shared memory for dask distributed this summer, if that helps.

frankShih · 2023-01-29T02:09:49Z

I am facing the same issue when I using concurrent.futures.ThreadPoolExecutor
Any possible solutions?

martindurant · 2023-02-02T01:11:06Z

@frankShih , please provide a reproducer. gcsfs is routinely used from multiple threads, particularly by dask.

fg91 · 2023-02-22T10:48:08Z

I recommend this blog post if you want to understand why using the start method os.fork in combination with multi-threading generally is a bad idea.

SaravananSathyanandhaQC · 2023-05-26T10:29:11Z

I came across the same issue - I'm running FastAPI on Gunicorn with Uvicorn workers, 4 workers running. What's the recommended way to use GCSFileSystem in such a setup? I use fsspec.filesystem("gs") to initialize in case that's relevant.

fg91 · 2023-05-26T10:37:12Z

I came across the same issue - I'm running FastAPI on Gunicorn with Uvicorn workers, 4 workers running. What's the recommended way to use GCSFileSystem in such a setup? I use fsspec.filesystem("gs") to initialize in case that's relevant.

You could try this (assuming that you fork the worker processes causing deadlocks):

os.register_at_fork(
    after_in_child=fsspec.asyn.reset_lock,
)

SaravananSathyanandhaQC · 2023-05-26T10:44:27Z

I came across the same issue - I'm running FastAPI on Gunicorn with Uvicorn workers, 4 workers running. What's the recommended way to use GCSFileSystem in such a setup? I use fsspec.filesystem("gs") to initialize in case that's relevant.

You could try this (assuming that you fork the worker processes causing deadlocks):
os.register_at_fork(
    after_in_child=fsspec.asyn.reset_lock,
)

No difference unfortunately. I'm using gunicorn with --preload so putting that line in at the start of my app.py should get picked up before the fork happens, but I'm not certain about how gunicorn handles its forking

martindurant · 2023-05-26T19:00:47Z

The best solution is not to use gcsfs in the parent process before fork, sorry :|

You may wish to try the nascent GCS support in rfsspec, which does not use an IO thread. It has not been tested against fork at all.

fg91 · 2023-05-28T11:50:25Z

I came across the same issue - I'm running FastAPI on Gunicorn with Uvicorn workers, 4 workers running. What's the recommended way to use GCSFileSystem in such a setup? I use fsspec.filesystem("gs") to initialize in case that's relevant.

You could try this (assuming that you fork the worker processes causing deadlocks):
os.register_at_fork(
    after_in_child=fsspec.asyn.reset_lock,
)
No difference unfortunately. I'm using gunicorn with --preload so putting that line in at the start of my app.py should get picked up before the fork happens, but I'm not certain about how gunicorn handles its forking

If you are not sure whether the registered hook gets picked up before the fork you could also just directly call fsspec.asyn.reset_lock() at the beginning of your code. This should release the lock that is causing the deadlock.

SaravananSathyanandhaQC · 2023-06-02T10:15:47Z

I came across the same issue - I'm running FastAPI on Gunicorn with Uvicorn workers, 4 workers running. What's the recommended way to use GCSFileSystem in such a setup? I use fsspec.filesystem("gs") to initialize in case that's relevant.

You could try this (assuming that you fork the worker processes causing deadlocks):
os.register_at_fork(
    after_in_child=fsspec.asyn.reset_lock,
)
No difference unfortunately. I'm using gunicorn with --preload so putting that line in at the start of my app.py should get picked up before the fork happens, but I'm not certain about how gunicorn handles its forking
If you are not sure whether the registered hook gets picked up before the fork you could also just directly call fsspec.asyn.reset_lock() at the beginning of your code. This should release the lock that is causing the deadlock.

This did the trick for me! As you say had to find the right place to put it (there was some ThreadPoolExecutor code pre-fork as well which was messing with things). By making sure I called reset_lock (in the parent thread) after fsspec.filesystem(...) had been called in the parent thread, things worked nicely on all the forks

arnavmehta7 · 2023-08-09T01:29:12Z

Hey guys, Did anyone figured out how to use it in threads?

I am getting this error:

  File "/usr/local/lib/python3.10/site-packages/fsspec/asyn.py", line 81, in sync
    raise RuntimeError("Loop is not running")
RuntimeError: Loop is not running"

I have following block in "each" thread and new threads might spawn:
fsspec.asyn.iothread[0] = None

JackKelly mentioned this issue May 12, 2021

Fix crash with gcsfs openclimatefix/predict_pv_yield#19

Closed

13 tasks

ungarj mentioned this issue Jul 28, 2021

use "spawn" as default multiprocessing method ungarj/mapchete#351

Closed

peterdudfield mentioned this issue Oct 1, 2021

fix for multiprocess dataloader openclimatefix/nowcasting_dataset#190

Merged

7 tasks

XinlinSong mentioned this issue Mar 28, 2022

GCSFileSystem Has Side Effects on MultiProcessing square/blocks#36

Closed

njelmert mentioned this issue Nov 10, 2022

[BUG] regenerate_dataset() method yielding RuntimeError: cuDF failure at: ... /dask_sink.cpp:37 when using gcs protocol NVIDIA-Merlin/NVTabular#1701

Open

This was referenced Nov 17, 2022

running FSSpecFileLister in ikernel doesn't work pytorch/data#497

Open

Implements Google Cloud Storage for tf.io.gfile.GFile tensorflow/tensorboard#2982

Open

ljstrnadiii mentioned this issue Jun 30, 2023

Tasks hang when operating on writing Zarr-backed Dataset pydata/xarray#7952

Closed

4 tasks

ranchodeluxe mentioned this issue Mar 26, 2024

Potentially (M)RE for 'hanging' during rechunk based on CMIP6 recipes. pangeo-forge/pangeo-forge-recipes#715

Open

GCSFileSystem() hangs when called from multiple processes #379

GCSFileSystem() hangs when called from multiple processes #379

Comments

JackKelly commented Apr 26, 2021 • edited Loading

martindurant commented Apr 26, 2021

JackKelly commented Apr 26, 2021

martindurant commented Apr 26, 2021

JackKelly commented Apr 26, 2021

martindurant commented Apr 26, 2021

martindurant commented Apr 26, 2021

JackKelly commented Apr 26, 2021

martindurant commented Apr 26, 2021

JackKelly commented Apr 26, 2021

martindurant commented Apr 26, 2021

martindurant commented Apr 26, 2021

JackKelly commented Apr 26, 2021

martindurant commented Apr 26, 2021

JackKelly commented Apr 26, 2021

JackKelly commented May 4, 2021 • edited Loading

martindurant commented May 6, 2021

JackKelly commented May 7, 2021

JackKelly commented May 12, 2021

martindurant commented May 12, 2021

JackKelly commented May 12, 2021

martindurant commented May 12, 2021

JackKelly commented May 12, 2021

JackKelly commented May 13, 2021 • edited Loading

martindurant commented May 13, 2021

JackKelly commented May 13, 2021

martindurant commented May 13, 2021

ShethR commented Jul 20, 2021

martindurant commented Jul 20, 2021

ShethR commented Jul 20, 2021

martindurant commented Jul 20, 2021

lhoestq commented May 19, 2022 • edited Loading

martindurant commented May 19, 2022

swt2c commented May 19, 2022

martindurant commented May 19, 2022

frankShih commented Jan 29, 2023

martindurant commented Feb 2, 2023

fg91 commented Feb 22, 2023

SaravananSathyanandhaQC commented May 26, 2023 • edited Loading

fg91 commented May 26, 2023 • edited Loading

SaravananSathyanandhaQC commented May 26, 2023

martindurant commented May 26, 2023

fg91 commented May 28, 2023

SaravananSathyanandhaQC commented Jun 2, 2023

arnavmehta7 commented Aug 9, 2023

JackKelly commented Apr 26, 2021 •

edited

Loading

JackKelly commented May 4, 2021 •

edited

Loading

JackKelly commented May 13, 2021 •

edited

Loading

lhoestq commented May 19, 2022 •

edited

Loading

SaravananSathyanandhaQC commented May 26, 2023 •

edited

Loading

fg91 commented May 26, 2023 •

edited

Loading