Using st_size on FIFO files or stdin will cause super_len to return 0 as the length #2215

cecton · 2014-09-10T12:10:28Z

Hello,

Use Case

I run a process using subprocess and give the stdout as data of a POST request. Since it is a file-descriptor, it has a fileno, and super_len() "stat" the file descriptor and use the st_size to determine the length.

The problem is: if the st_size returned is 0, it is possibly due to other reasons, in this case: the length is not determined (despite st_size is 0). But requests send a header specifying a Content-Length of 0 whilst it is actually not determined.

Work Around

Wrap the stream in a generator, so the method fileno() cannot be accessed anymore.

request.post(url, data=(x for x in p.stdout))

Another Use Case

[14:05:31] ~ > mkfifo /tmp/a
[14:05:35] ~ > stat /tmp/a
  File: ‘/tmp/a’
  Size: 0               Blocks: 0          IO Block: 4096   fifo
Device: 1eh/30d Inode: 514910      Links: 1
Access: (0644/prw-r--r--)  Uid: ( 1001/ openerp)   Gid: (  100/   users)
Access: 2014-09-10 14:05:35.415835558 +0200
Modify: 2014-09-10 14:05:35.415835558 +0200
Change: 2014-09-10 14:05:35.415835558 +0200
 Birth: -

The size of a named pipe is 0.

Fix

According to man 2 stat:

For most files under the /proc directory, stat() does not return the file size in the st_size field; instead the field is returned with the value 0.

I guess super_len() should return st_size if st_size if bigger than 0, otherwise None.

The text was updated successfully, but these errors were encountered:

Lukasa · 2014-09-10T12:29:09Z

Thanks for reporting this issue!

This is an interesting case. Here are my thoughts. First, on the positive side, the effect of this change is that we'd switch to chunked encoding for the upload. This is good, that's clearly the correct behaviour when passed a named pipe.

On the downside, how should we react to a user passing us a file of zero length to upload? Sending content-length 0 has always been fairly clear. Sending a chunked upload of zero length is a bit weird, but not the end of the world, and it works fine.

In practice I'm +0.5 on the proposed change. I think we gain more than we lose.

cecton · 2014-09-10T12:44:19Z

Maybe we should check if the user has specified Transfer-Encoding in the header, because, according to the RFC, it conflicts with Content-Length and therefore there is no use to check the file size.

My code will look like this instead:

request.post(url, data=p.stdout,
             headers={
                 'Content-Type': "octet/stream",
                 'Transfer-Encoding': "chunked",
             })

...which makes a lot more sense.

Note: actually I just tested with curl and curl automatically removed the Content-Length because I specified Transfer-Encoding.

sigmavirus24 · 2014-09-10T12:53:03Z

Maybe we should check if the user has specified Transfer-Encoding in the header, because, according to the RFC, it conflicts with Content-Length and therefore there is no use to check the file size.

Except we have a policy of not doing that. We will check for the existence of headers specified by the user before trying to set those same headers (e.g., if you specify 'Content-Length': '10', we won't set that), but checking values of other headers is not currently acceptable.

That said, I'm +0 on the proposal of not returning anything that is non-positive from super_len. I think restricting it to: do not return non-positive values of fd.st_size is better though.

Finally,

Note: actually I just tested with curl and curl automatically removed the Content-Length because I specified Transfer-Encoding.

I hate reiterating this, but there are many things that cURL does. Most of those things we do not do. The list is pretty long, so this holds little weight in this discussion (in my opinion).

cecton · 2014-09-10T13:07:14Z

Ok @sigmavirus24 , I didn't know your policy about that but that sounds consistent.

That said, I'm +0 on the proposal of not returning anything that is non-positive from super_len.

I totally agree with you. This is definitely not acceptable, super_len has specific behavior depending the object it gets. So testing the non-positive value should be considered only when using st_size of the stat of the fileno.

Sending a chunked upload of zero length is a bit weird, but not the end of the world, and it works fine.

Maybe that's the best solution after all...

...or not guessing the size of a file descriptor no matter what (and let the user deal with it)? This super power may looks handy but at the end this is not an accurate behavior.

Lukasa · 2014-09-10T13:31:38Z

Letting the user deal with it is not the requests way. If you know how big it is, by all means set the size, but we're going to be as helpful as we can be. I guarantee to you that most requests users passing something with a file descriptor are passing a real file: the case of passing a named pipe is certainly not the norm. =)

I'm happy with the proposal of not allowing non-positive values of fd.st_size.

cecton · 2014-09-11T08:08:14Z

Ah, last update...

Actually now I have a problem with the server, I've never really tried the POST chunked and now that I'm testing it, it seems it's not implemented in werkzeug's server. I will have to parse the chunked content on my own...

So maybe I will fallback to the temporary file on the disk (^_^;) what a shame...

Lukasa · 2014-09-11T09:01:05Z

=P Chunked upload is frequently extremely poorly supported by servers, which is odd, because it's really not that complicated.

Note that you can force us to do a streamed upload by wrapping the file object in something with size. For example, you can try to use an intermediate file-like object (BytesIO is tempting). That only works if you can read all of stdin into memory, of course: if you can't then chunking is really the only way to go.

cecton · 2014-09-11T10:00:52Z

No there is no way it would enter in memory... Actually they are database dumps and it seems, some of them are huge like 13GB.

But you're right, it doesn't look complicated to implement. I will think of it if I want to implement it for the server. On werkzeug I can use request.input_stream to access the raw body, so it's definitely possible, but I prefer to keep the code clean from very specific handling like this.

But thank you very much for your help @Lukasa here and on urllib3. (^_^)/

sigmavirus24 · 2014-09-11T11:26:04Z

@cecton is there any way that you can determine the size in advance? If so, I may have something you can use without having to implement anything yourself.

sigmavirus24 · 2014-09-12T16:55:20Z

@cecton are you okay with me making the name of this bug a bit more clear/descriptive?

cecton · 2014-09-12T16:56:28Z

Sure!

On Fri, Sep 12, 2014 at 6:55 PM, Ian Cordasco notifications@github.com
wrote:

@cecton https://github.com/cecton are you okay with me making the name
of this bug a bit more clear/descriptive?

—
Reply to this email directly or view it on GitHub
https://github.com/kennethreitz/requests/issues/2215#issuecomment-55431222
.

sigmavirus24 · 2014-09-12T18:38:49Z

Thanks @cecton

Lukasa · 2015-11-05T18:29:34Z

Ok, I think we should fix this. Basically right now the behaviour is a bug, because you get this (using a named pipe that contains a few characters from a separate python file):

POST /post HTTP/1.1
Host: http2bin.org
Content-Length: 0
User-Agent: python-requests/2.7.0 CPython/2.7.10 Darwin/15.0.0
Connection: keep-alive
Accept: */*
Accept-Encoding: gzip, deflate

from test_pb2.py import *

That's an invalid request, so this is just a straight up bug.

Lukasa · 2015-11-05T18:30:57Z

Given that the fallback for zero-length files is safe (basically an empty chunked body), I think that we're probably ok having that fallback. I'm going to use that as the fix.

sigmavirus24 added Contributor Friendly Feature Request labels Sep 11, 2014

sigmavirus24 changed the title ~~Using st_size of fd could be a bad idea.~~ Using st_size on FIFO files or stdin will cause super_len to return 0 as the length Sep 12, 2014

Lukasa mentioned this issue Nov 5, 2015

Fix breakage introduced by 8f591682 #2861

Merged

BraulioVM mentioned this issue Nov 23, 2015

Set 'Transfer-Encoding: chunked' if data is a file with length 0 #2896

Merged

sigmavirus24 closed this as completed in #2896 Dec 2, 2015

github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using st_size on FIFO files or stdin will cause super_len to return 0 as the length #2215

Using st_size on FIFO files or stdin will cause super_len to return 0 as the length #2215

cecton commented Sep 10, 2014

Lukasa commented Sep 10, 2014

cecton commented Sep 10, 2014

sigmavirus24 commented Sep 10, 2014

cecton commented Sep 10, 2014

Lukasa commented Sep 10, 2014

cecton commented Sep 11, 2014

Lukasa commented Sep 11, 2014

cecton commented Sep 11, 2014

sigmavirus24 commented Sep 11, 2014

sigmavirus24 commented Sep 12, 2014

cecton commented Sep 12, 2014

sigmavirus24 commented Sep 12, 2014

Lukasa commented Nov 5, 2015

Lukasa commented Nov 5, 2015

Using st_size on FIFO files or stdin will cause super_len to return 0 as the length #2215

Using st_size on FIFO files or stdin will cause super_len to return 0 as the length #2215

Comments

cecton commented Sep 10, 2014

Use Case

Work Around

Another Use Case

Fix

Lukasa commented Sep 10, 2014

cecton commented Sep 10, 2014

sigmavirus24 commented Sep 10, 2014

cecton commented Sep 10, 2014

Lukasa commented Sep 10, 2014

cecton commented Sep 11, 2014

Lukasa commented Sep 11, 2014

cecton commented Sep 11, 2014

sigmavirus24 commented Sep 11, 2014

sigmavirus24 commented Sep 12, 2014

cecton commented Sep 12, 2014

sigmavirus24 commented Sep 12, 2014

Lukasa commented Nov 5, 2015

Lukasa commented Nov 5, 2015