Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 FileField returns zero bytes when read #4098

Open
flooie opened this issue Jun 5, 2024 · 2 comments
Open

S3 FileField returns zero bytes when read #4098

flooie opened this issue Jun 5, 2024 · 2 comments

Comments

@flooie
Copy link
Contributor

flooie commented Jun 5, 2024

Summary

When attempting to read a file stored on AWS S3 using Django’s FieldFile object, the file content is returned as an empty binary string (b''), despite the file existing and having a non-zero size.

Steps to Reproduce

Example Recap Document id: 102021417

In [26]: r.filepath_local
Out[26]: <FieldFile: recap/gov.uscourts.cacd.683100/gov.uscourts.cacd.683100.178.0.pdf>

In [27]: r.filepath_local.read()
Out[27]: b''

In [28]: r.filepath_local.size
Out[28]: 100549

Expected Results

The content of the file should be read and returned as a non-empty binary string.

Actual Results

The FieldFile.read() method returns an empty binary string (b'').

This is causing the extract recap into op crash - somewhere in the micro service because obviously it's sending zero bytes and its crashing it.

On the good news front the retry works as expected for timeouts of legitimate failures of DOCTOR although I want to bump it to 5 retries at least from 3.

@mlissner if you have any insights into how this could be occurring I would appreciate it.

@mlissner
Copy link
Member

mlissner commented Jun 5, 2024

Is this some of the time or all of the time?

@mlissner mlissner changed the title Extract Recap to Opinions Bug S3 FileField returns zero bytes when read Jun 5, 2024
@quevon24
Copy link
Member

quevon24 commented Jun 6, 2024

I think it is a different document, using that id gives me another document: https://www.courtlistener.com/api/rest/v3/recap-documents/102021417/ (gov.uscourts.cacd.572997.50.0.pdf)

I cloned everything from that recap document id(docket entry, docket and recap document) and uploaded the file to dev bucket using the admin: https://dev-com-courtlistener-storage.s3.amazonaws.com/recap/dev.gov.uscourts.cacd.572997/gov.uscourts.cacd.572997.50.0.pdf

I was able to read the file correctly

filepath_local.read()
b'%PDF-1.6\r%\xe2\xe3\xcf\xd3\r\n83 0 obj\r<</Filter/FlateDecode/First 5/Length 177/N 1/Type/ObjStm>>stream\r\nh\xded\xcc\xb1\n\xc20\x14@\xd1_y\x9b\xc9\xd0\xf65!m*\xa5P\x0cn\x82\x88\xd8E\x90\xdaD\x0c\x94>H"\xfa\xf9:\x88\x8b\xfb\xb9Wj@h\xdb\xa2\x7f\xa4;\x05\x16\xa3\xd4\xb5\x92\xbc\xd8\x047&O\x8b\x19\x93cf-\xb0lPc\x8d\x8d.................

filepath_local.read() 207714

Maybe it is some permissions problem that prevents the file from being read? does it read other files correctly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants