Skip to content

Conversation

@isidentical
Copy link
Contributor

@isidentical isidentical commented Sep 14, 2021

When an async filesystem (fsspec-level) makes a call to set_size() callback with None (the size is not inferrable), we call the info() on the dvc-level sync filesystem which delates this call to fsspec's async function run in a separate loop. This situation is creating a deadlock where we make a call from an async function (e.g _get_file) to another async function (_info) with a wrapped sync execution (info()) between them.

For the simplicity, this patch drops the current behavior of consulting to info() on set_size(None).

@isidentical isidentical self-assigned this Sep 14, 2021
@isidentical isidentical requested a review from a team as a code owner September 14, 2021 09:47
@efiop
Copy link
Contributor

efiop commented Sep 14, 2021

@isidentical Personally, the second option seems to be way too convoluted and looks out of place in the callback itself. The first option seems nicer. But I'm not sure I fully understand

But for example, for goole storage, it might not find the size among the headers but can find it on a HEAD_OBJECT call.

could you elaborate, please?

@skshetry
Copy link
Collaborator

I have gone through other filesystems and have been doing fs.getsize for callback support (we were doing this anyway for tqdm progress bars in upload/download). I think we should be able to remove this and replace with explicit calls in the future.

@isidentical
Copy link
Contributor Author

Personally, the second option seems to be way too convoluted and looks out of place in the callback itself.

I also feel that way.

could you elaborate, please?

image
For example this is how gcsfs calls the set_size callback, which might get None if the server doesn't include headers. The same might go to HTTP, where the GET request might not contain the content-length, but the HEAD can. So if the set_size() is called with None, we call info() to retrieve the size in the proper way (for gcsfs, we make an HEAD_OBJECT API call, and retrieve the object's size from that call's result, not from header).

@skshetry
Copy link
Collaborator

Hmm, that's weird. I'd expect Google Cloud Storage to provide a uniform response, unlike HTTP servers.

@isidentical
Copy link
Contributor Author

From what I have observed, it does on the cases I have tried. This was just an example of what could have happen if it didn't (or another provider didn't, e.g a mock server).

@efiop
Copy link
Contributor

efiop commented Sep 14, 2021

@isidentical if mock server doesn't provide it - it is a bug in the mock server and it should be fixed to follow the behaviour of the real thing. So we probably don't need to overengineer it like that and could use a simpler option.

@isidentical isidentical force-pushed the remove-progress-updates branch from 236a730 to e62caa5 Compare September 14, 2021 14:32
dvc/progress.py Outdated
Comment on lines +177 to +178
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So looks like we no longer need self.fs and self.path_info anymore?

@efiop
Copy link
Contributor

efiop commented Sep 15, 2021

btw, please rebase on. top of master, we've increased timeouts for now (though windows taking half an hour is not normal)

@isidentical isidentical force-pushed the remove-progress-updates branch from e62caa5 to c9db3d9 Compare September 15, 2021 10:38
@isidentical isidentical changed the title [WIP] progress: do not block on async filesystems progress: do not block on async filesystems Sep 15, 2021
@isidentical isidentical force-pushed the remove-progress-updates branch from c9db3d9 to e2c9ba6 Compare September 15, 2021 10:41
@efiop efiop merged commit a478a0e into iterative:master Sep 21, 2021
@efiop efiop added the bugfix fixes bug label Sep 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix fixes bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants