Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Blob.exists()' does not work within batch context #31

Closed
ludwigschubert opened this issue May 25, 2018 · 5 comments
Closed

'Blob.exists()' does not work within batch context #31

ludwigschubert opened this issue May 25, 2018 · 5 comments
Labels
api: storage Issues related to the googleapis/python-storage API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@ludwigschubert
Copy link

The Blob.exists() method does not work when run within a Batch context. The normal behavior of exists() is to return True unless a NotFound exception occurs. Within the Batch context the exception seems to be suppressed and the function returns True. After leaving the Batch context, an Exception is then thrown.

This is how I expected to be able to use the exists() function:

blobs = [storage.blob.Blob(path, bucket) for path in paths]
with client.batch():
  bools = [blob.exists() for blob in blobs]

Without the Batch contextmanager this code works, if inefficiently. With the Batch contextmanager the code returns all Trues and throws an exception when leaving the context.

This behavior seems unintuitive to me. Please let me know if the API is meant to be used differently. If it is meant to be used as in the provided code sample, I'd be happy to attempt a fix if one of the maintainers could point me in the right direction.

Environment configuration just in case:

  • macOS 10.13.3
  • Python 3.6.5
  • google-cloud-storage==1.8.0
@tseaver
Copy link
Contributor

tseaver commented May 25, 2018

@ludwigschubert Under the covers, Blob.exists() makes a GET request to the blob's resource URL and converts a 404 response into a False, 20x into a True. This implementation does not fit well with the current Batch design, which fakes the responses inside the context manager, and then applies the sub-responses to the individual targets at exit. See the comment in Blob.exists.

@IlyaFaer IlyaFaer self-assigned this Aug 2, 2019
@crwilcox crwilcox transferred this issue from googleapis/google-cloud-python Jan 31, 2020
@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/python-storage API. label Jan 31, 2020
@yoshi-automation yoshi-automation added 🚨 This issue needs some love. triage me I really want to be triaged. labels Feb 3, 2020
@frankyn frankyn added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Feb 4, 2020
@davidbernat
Copy link

To be clear: when this was moved to "feature request" by @frankyn , the implied statement is:

there is no way to use blob.exists() within a batch context

Is this correct?

Thanks!

@davidbernat
Copy link

For @ludwigschubert and those who arrive via Google.

I found a simple fix for this: wrap each call for blob.exists() in a try/catch.

def to_parallelize_catch_exceptions(f, on_error_value, *args):
    try:
        return f(*args)
    except:
        return on_error_value

def parallelize(blobs, function, args_lists=None, on_error_value=None, n_threads=50):
    if args_lists is None:
        args = [(function, on_error_value, blob) for i, blobs in enumerate(stubs)]
   else:
        args = [(function, on_error_value, blob, *args_lists[i]) for i, stub in enumerate(blobs)]
    with multiprocessing.Pool(min(n_threads, len(blobs))) as p:
       values = p.starmap(to_parallelize_catch_exceptions, args)
   return values

def blob_exists(blob):
    blob.exists()

blobs = [....]
with client.batch():
    parallelize(blobs, blob_exists, on_error_value=False)

Of course, there are two layers of abstraction above which are not expressly necessary, but I've copy-pasted my internal code that handles the parallelization for all the cloud client functions. Feel free to comment or DM me for more information as necessary. Cheers.

@tseaver tseaver changed the title Storage: Blob.exists() does not work within Batch context 'Blob.exists()' does not work within batch context Aug 17, 2020
@tseaver
Copy link
Contributor

tseaver commented Aug 17, 2020

Possible implementation at googleapis/google-cloud-python#8618

@cojenco
Copy link
Contributor

cojenco commented Jun 6, 2023

Thank you folks for providing above workarounds!

The current batch design does not support library methods whose return values depend on the response payload. In this case, the error handling and conversion in blob.exists() is not fully supported in a Batch context.

However, note that a new raise_exception flag is added to Batch via #1043 (pending release in v2.10.0). Setting raise_exception=False allows all exceptions to be included in list of return responses. Although not recommended, it is now possible to get a list of 404 responses calling blob.exists() with storage_client.batch(raise_exception=False)

In addition, we've also added clarifications on the limited Batch support in the python client, see details in #1045

@cojenco cojenco closed this as completed Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/python-storage API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

7 participants