Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using st_size on FIFO files or stdin will cause super_len to return 0 as the length #2215

Closed
cecton opened this issue Sep 10, 2014 · 14 comments

Comments

@cecton
Copy link

cecton commented Sep 10, 2014

Hello,

Use Case

I run a process using subprocess and give the stdout as data of a POST request. Since it is a file-descriptor, it has a fileno, and super_len() "stat" the file descriptor and use the st_size to determine the length.

The problem is: if the st_size returned is 0, it is possibly due to other reasons, in this case: the length is not determined (despite st_size is 0). But requests send a header specifying a Content-Length of 0 whilst it is actually not determined.

Work Around

Wrap the stream in a generator, so the method fileno() cannot be accessed anymore.

request.post(url, data=(x for x in p.stdout))

Another Use Case

[14:05:31] ~ > mkfifo /tmp/a
[14:05:35] ~ > stat /tmp/a
  File: ‘/tmp/a’
  Size: 0               Blocks: 0          IO Block: 4096   fifo
Device: 1eh/30d Inode: 514910      Links: 1
Access: (0644/prw-r--r--)  Uid: ( 1001/ openerp)   Gid: (  100/   users)
Access: 2014-09-10 14:05:35.415835558 +0200
Modify: 2014-09-10 14:05:35.415835558 +0200
Change: 2014-09-10 14:05:35.415835558 +0200
 Birth: -

The size of a named pipe is 0.

Fix

According to man 2 stat:

For most files under the /proc directory, stat() does not return the file size in the st_size field; instead the field is returned with the value 0.

I guess super_len() should return st_size if st_size if bigger than 0, otherwise None.

@Lukasa
Copy link
Member

Lukasa commented Sep 10, 2014

Thanks for reporting this issue!

This is an interesting case. Here are my thoughts. First, on the positive side, the effect of this change is that we'd switch to chunked encoding for the upload. This is good, that's clearly the correct behaviour when passed a named pipe.

On the downside, how should we react to a user passing us a file of zero length to upload? Sending content-length 0 has always been fairly clear. Sending a chunked upload of zero length is a bit weird, but not the end of the world, and it works fine.

In practice I'm +0.5 on the proposed change. I think we gain more than we lose.

@cecton
Copy link
Author

cecton commented Sep 10, 2014

Maybe we should check if the user has specified Transfer-Encoding in the header, because, according to the RFC, it conflicts with Content-Length and therefore there is no use to check the file size.

My code will look like this instead:

request.post(url, data=p.stdout,
             headers={
                 'Content-Type': "octet/stream",
                 'Transfer-Encoding': "chunked",
             })

...which makes a lot more sense.

Note: actually I just tested with curl and curl automatically removed the Content-Length because I specified Transfer-Encoding.

@sigmavirus24
Copy link
Contributor

Maybe we should check if the user has specified Transfer-Encoding in the header, because, according to the RFC, it conflicts with Content-Length and therefore there is no use to check the file size.

Except we have a policy of not doing that. We will check for the existence of headers specified by the user before trying to set those same headers (e.g., if you specify 'Content-Length': '10', we won't set that), but checking values of other headers is not currently acceptable.

That said, I'm +0 on the proposal of not returning anything that is non-positive from super_len. I think restricting it to: do not return non-positive values of fd.st_size is better though.

Finally,

Note: actually I just tested with curl and curl automatically removed the Content-Length because I specified Transfer-Encoding.

I hate reiterating this, but there are many things that cURL does. Most of those things we do not do. The list is pretty long, so this holds little weight in this discussion (in my opinion).

@cecton
Copy link
Author

cecton commented Sep 10, 2014

Ok @sigmavirus24 , I didn't know your policy about that but that sounds consistent.

That said, I'm +0 on the proposal of not returning anything that is non-positive from super_len.

I totally agree with you. This is definitely not acceptable, super_len has specific behavior depending the object it gets. So testing the non-positive value should be considered only when using st_size of the stat of the fileno.

Sending a chunked upload of zero length is a bit weird, but not the end of the world, and it works fine.

Maybe that's the best solution after all...

...or not guessing the size of a file descriptor no matter what (and let the user deal with it)? This super power may looks handy but at the end this is not an accurate behavior.

@Lukasa
Copy link
Member

Lukasa commented Sep 10, 2014

Letting the user deal with it is not the requests way. If you know how big it is, by all means set the size, but we're going to be as helpful as we can be. I guarantee to you that most requests users passing something with a file descriptor are passing a real file: the case of passing a named pipe is certainly not the norm. =)

I'm happy with the proposal of not allowing non-positive values of fd.st_size.

@cecton
Copy link
Author

cecton commented Sep 11, 2014

Ah, last update...

Actually now I have a problem with the server, I've never really tried the POST chunked and now that I'm testing it, it seems it's not implemented in werkzeug's server. I will have to parse the chunked content on my own...

So maybe I will fallback to the temporary file on the disk (^_^;) what a shame...

@Lukasa
Copy link
Member

Lukasa commented Sep 11, 2014

=P Chunked upload is frequently extremely poorly supported by servers, which is odd, because it's really not that complicated.

Note that you can force us to do a streamed upload by wrapping the file object in something with size. For example, you can try to use an intermediate file-like object (BytesIO is tempting). That only works if you can read all of stdin into memory, of course: if you can't then chunking is really the only way to go.

@cecton
Copy link
Author

cecton commented Sep 11, 2014

No there is no way it would enter in memory... Actually they are database dumps and it seems, some of them are huge like 13GB.

But you're right, it doesn't look complicated to implement. I will think of it if I want to implement it for the server. On werkzeug I can use request.input_stream to access the raw body, so it's definitely possible, but I prefer to keep the code clean from very specific handling like this.

But thank you very much for your help @Lukasa here and on urllib3. (^_^)/

@sigmavirus24
Copy link
Contributor

@cecton is there any way that you can determine the size in advance? If so, I may have something you can use without having to implement anything yourself.

@sigmavirus24
Copy link
Contributor

@cecton are you okay with me making the name of this bug a bit more clear/descriptive?

@cecton
Copy link
Author

cecton commented Sep 12, 2014

Sure!

On Fri, Sep 12, 2014 at 6:55 PM, Ian Cordasco notifications@github.com
wrote:

@cecton https://github.com/cecton are you okay with me making the name
of this bug a bit more clear/descriptive?


Reply to this email directly or view it on GitHub
https://github.com/kennethreitz/requests/issues/2215#issuecomment-55431222
.

@sigmavirus24 sigmavirus24 changed the title Using st_size of fd could be a bad idea. Using st_size on FIFO files or stdin will cause super_len to return 0 as the length Sep 12, 2014
@sigmavirus24
Copy link
Contributor

Thanks @cecton

@Lukasa
Copy link
Member

Lukasa commented Nov 5, 2015

Ok, I think we should fix this. Basically right now the behaviour is a bug, because you get this (using a named pipe that contains a few characters from a separate python file):

POST /post HTTP/1.1
Host: http2bin.org
Content-Length: 0
User-Agent: python-requests/2.7.0 CPython/2.7.10 Darwin/15.0.0
Connection: keep-alive
Accept: */*
Accept-Encoding: gzip, deflate

from test_pb2.py import *

That's an invalid request, so this is just a straight up bug.

@Lukasa
Copy link
Member

Lukasa commented Nov 5, 2015

Given that the fallback for zero-length files is safe (basically an empty chunked body), I think that we're probably ok having that fallback. I'm going to use that as the fix.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants