Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'Blob' object has no attribute 'download_as_bytes' #601

Closed
3 tasks done
Cartman75 opened this issue Apr 5, 2021 · 11 comments · Fixed by #687
Closed
3 tasks done

AttributeError: 'Blob' object has no attribute 'download_as_bytes' #601

Cartman75 opened this issue Apr 5, 2021 · 11 comments · Fixed by #687

Comments

@Cartman75
Copy link

Cartman75 commented Apr 5, 2021

Problem description

Be sure your description clearly answers the following questions:

  • What are you trying to achieve?
    Trying to pull a file from gcp storage

  • What is the expected result?
    I can read a file from gcp storage

  • What are you seeing instead?
    Error: AttributeError: 'Blob' object has no attribute 'download_as_bytes'

Steps/code to reproduce the problem

Failed on this line: for line in open(file, 'r'):
I think it gets the file open and begins reading for the loop when it fails.

In order for us to be able to solve your problem, we have to be able to reproduce it on our end.
Without reproducing the problem, it is unlikely that we'll be able to help you.

Include full tracebacks, logs and datasets if necessary.
Please keep the examples minimal (minimal reproducible example).

Apr 05 16:19:30 docker[2234166]: File "/usr/local/lib/python3.7/site-packages/smart_open/gcs.py", line 325, in read1
Apr 05 16:19:30 docker[2234166]: return self.read(size=size)
Apr 05 16:19:30 docker[2234166]: File "/usr/local/lib/python3.7/site-packages/smart_open/gcs.py", line 320, in read
Apr 05 16:19:30 docker[2234166]: self._fill_buffer(size)
Apr 05 16:19:30 docker[2234166]: File "/usr/local/lib/python3.7/site-packages/smart_open/gcs.py", line 374, in _fill_buffer
Apr 05 16:19:30 docker[2234166]: bytes_read = self._current_part.fill(self._raw_reader)
Apr 05 16:19:30 docker[2234166]: File "/usr/local/lib/python3.7/site-packages/smart_open/bytebuffer.py", line 152, in fill
Apr 05 16:19:30 docker[2234166]: new_bytes = source.read(size)
Apr 05 16:19:30 docker[2234166]: File "/usr/local/lib/python3.7/site-packages/smart_open/gcs.py", line 178, in read
Apr 05 16:19:30 docker[2234166]: binary = self._download_blob_chunk(size)
Apr 05 16:19:30 docker[2234166]: File "/usr/local/lib/python3.7/site-packages/smart_open/gcs.py", line 194, in _download_blob_chunk
Apr 05 16:19:30 i docker[2234166]: binary = self._blob.download_as_bytes(start=start, end=end)
Apr 05 16:19:30 docker[2234166]: AttributeError: 'Blob' object has no attribute 'download_as_bytes'
Apr 05 16:19:30 docker[2234166]: I0405 16:19:30.721222 139676694796096 gcs.py:241] close: called

Versions

Please provide the output of:

import platform, sys, smart_open
print(platform.platform())
print("Python", sys.version)
print("smart_open", smart_open.__version__)

Python 3.7.10 (default, Apr 2 2021, 22:24:51)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import platform, sys, smart_open
print(platform.platform())
Linux-5.4.89+-x86_64-with-debian-10.9
print("Python", sys.version)
Python 3.7.10 (default, Apr 2 2021, 22:24:51)
[GCC 8.3.0]
print("smart_open", smart_open.version)
smart_open 5.0.0

Tried on the latest 4 version too, same thing.

Checklist

Before you create the issue, please make sure you have:

  • Described the problem clearly
  • Provided a minimal reproducible example, including any required data
  • Provided the version numbers of the relevant software
@mpenkov
Copy link
Collaborator

mpenkov commented Apr 5, 2021

What version of google-cloud-storage are you using?

@Cartman75
Copy link
Author

Whatever gets install when I install smart-open[gcs]. This is a clean docker that's created specifically for this job. Does it lock in a version?

@Cartman75
Copy link
Author

Looked up the docker build:
Successfully installed PyYAML-5.4.1 Pygments-2.8.1 aws-shell-0.2.2 awscli-1.19.44 boto3-1.17.44 botocore-1.20.44 cachetools-4.2.1 certifi-2020.12.5 cffi-1.14.5 chardet-4.0.0 colorama-0.4.3 configobj-5.0.6 docutils-0.15.2 google-api-core-1.26.3 google-auth-1.28.0 google-cloud-core-1.6.0 google-cloud-storage-1.37.1 google-crc32c-1.1.2 google-resumable-media-1.2.0 googleapis-common-protos-1.53.0 idna-2.10 jmespath-0.10.0 packaging-20.9 prompt-toolkit-1.0.18 protobuf-3.15.7 pyasn1-0.4.8 pyasn1-modules-0.2.8 pycparser-2.20 pyparsing-2.4.7 python-dateutil-2.8.1 pytz-2021.1 requests-2.25.1 rsa-4.7.2 s3transfer-0.3.6 six-1.15.0 slacker-0.13.0 slacker-cli-0.4.2 smart-open-5.0.0 urllib3-1.26.4 wcwidth-0.2.5

@mpenkov
Copy link
Collaborator

mpenkov commented Apr 6, 2021

Very odd, the download_as_bytes method is still definitely there:

https://github.com/googleapis/python-storage/blob/7fb2ee452604a9945342a7c7b986b240ba91b6ab/google/cloud/storage/blob.py#L1215

https://googleapis.dev/python/storage/latest/blobs.html#google.cloud.storage.blob.Blob.download_as_bytes

@petedannemann Can you please investigate?

Does it lock in a version?

No, we don't pin any versions for this package.

@petedannemann
Copy link
Contributor

I can't reproduce this. @Cartman75 can you post exactly what code you are running? What is your file?

@Cartman75
Copy link
Author

Essentially I am taking mysqldump .sql files from gcp and loading them in a side table:

    process = subprocess.Popen(
        cmd, stdin=subprocess.PIPE, stdout=subprocess.DEVNULL, universal_newlines=True)
    for line in open(file, 'r'):
        process.stdin.write(line)
        process.stdin.flush()
        if line[0:6] == 'INSERT':
            process.stdin.write('commit;')
    process.stdin.close()
    return_code = process.wait()

@petedannemann
Copy link
Contributor

petedannemann commented Apr 6, 2021

Ok I still can't reproduce this at all and still have no idea what your file actually is. From this log Apr 05 16:19:30 docker[2234166]: I0405 16:19:30.721222 139676694796096 gcs.py:241] close: called it looks like somehow you already closed the Blob prior to reading from it?

Can you try a minimally reproducible example like this? This is a publically available file in GCS.

path = "gs://tensorflow-nightly/prod/tensorflow/release/ubuntu_16/gpu_py37_full/nightly_release/18/20190813-010608/github/tensorflow/pip_pkg/tf_nightly_gpu-1.15.0.dev20190813-cp37-cp37m-linux_x86_64.whl"

import smart_open
import google.cloud.storage

client = google.cloud.storage.Client.create_anonymous_client()
with smart_open.open(path, transport_params=dict(client=client)) as f:
    for line in f:
        print(line)

@Cartman75
Copy link
Author

I don't have the transport_params part on my open statement. Is that maybe why I am seeing this? What's that doing? I have to log in to get access to the gs buckets. Its via a server-side service account.. So how your doing it is much different than what I am doing. Not sure if it matters.

@Cartman75
Copy link
Author

Cartman75 commented Apr 12, 2021

I just downgraded to smart_open[gcp]==2.2.1 and its working again. The plot thickens.

The latest version I can use is 4.0.1. All newer version break with the above error.

@gelioz
Copy link
Contributor

gelioz commented Jun 18, 2021

Had the same issue. Pinning to the latest version of google-cloud-storage fixed the problem.

It would be nice to pin google-cloud-storage>=1.31.0 in smart_open[gcs] dependencies, as there was no Blob.download_as_bytes method before this release: https://github.com/googleapis/python-storage/releases/tag/v1.31.0

@mpenkov
Copy link
Collaborator

mpenkov commented Jun 18, 2021

@gelioz Sounds reasonable. Are you interested in making a PR?

PLPeeters added a commit to PLPeeters/smart_open that referenced this issue Feb 7, 2022
mpenkov added a commit that referenced this issue Feb 18, 2022
* Pin google-cloud-storage to >=1.31.1 in extras

Fixes #601

* Update CHANGELOG.md

Co-authored-by: Michael Penkov <m@penkov.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants