-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
Add a new socket.sendfile() method #61752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is based on progress made in bpo-13564 and aims to provide a new sendfile() method for the socket class taking advantage of high-performance "zero-copy" os.sendfile() available on most POSIX platforms. === THE API === Attached is a draft patch proposing an API which:
=== ALTERNATE API ===
=== WINDOWS SUPPORT === Further development may include Windows support by using TransmitFile (http://msdn.microsoft.com/en-us/library/windows/desktop/ms740565(v=vs.85).aspx). Once we agree on an API I might start looking into Windows code (which appears to be quite tricky btw). Any feedback is very welcome. |
A couple of comments:
|
|
The 'exception' member can be useful to know the reason why sendfile() failed and send() was used as fallback.
Do you mean that if the user sets a timeout=2 and the whole transfer takes longer than that you expect a timeout exception?
I see little value in supporting non-blocking sockets because someone willing to do that usually wants to use sendfile() directly (no wrappers involved) as they are forced to handle errors and take count of transmitted bytes in their own code anyway.
Will turn that into "regular files".
Right. Will fix that.
No idea. I made a quick tour on snakebite and apparently all of the provided platforms have poll().
I will make some actual benchmarks and fix that later if makes sense. |
New patch in attachment includes a new 'offset' parameter, new tests and also update file offset on return or in case of error so that file.tell() can be used to tell how many bytes were transmitted at any time. In summary, the API looks like this. Transfer ok: >>> file = open('somefile', 'rb')
>>> s = socket.socket()
>>> sock.sendfile(file)
(True, None)
>>> file.tell()
20000000
>>> ...and in case sendfile() could not be used internally because file was not a regular file: >>> file = io.BytesIO(b'x' * 1*1024*1024)
>>> sock.sendfile(file)
(False, UnsupportedOperation('fileno',))
>>> file.tell()
20000000
>>> I still haven't looked into TransmitFile on Windows as I have to figure out how to compile Python 3.4 on Windows. |
New patch in attachment. Changes:
I've tried to work on Windows TransmitFile support but I got stuck as I'm not sure how to convert a file object into a HANDLE in C. I suppose Windows support can also be added later as a separate ticket and in the meantime I'd like to push this forward. Open questions:
|
Attached is a simple benchmark script transmitting a 100MB file. send()
real 0.0613s
user 0.0100s
sys 0.0900s
total 0.1000s
sendfile()
real 0.0318s
user 0.0000s
sys 0.0500s
total 0.0500s |
For TransmitFile support, the Windows function to turn an integer file descriptor into a WinAPI file HANDLE should be _get_osfhandle: http://msdn.microsoft.com/en-us/library/ks2530z6.aspx |
Should socket.sendfile() always return number of bytes sent because file.tell() may be changed by something else that uses the same file descriptor? What happens if the file grows? Instead of returning If possible; always include number of bytes sent in any error that is raised. |
Good idea, thanks, that is much better indeed. Updated patch is in attachment.
I would say that is a use case we should explicitly not support as it probably implies you're doing something you're not supposed to.
That's similar to my recent (rejected) proposal for socket.sendall(): |
Yet another patch fixing some problems on Windows. |
You can have a look at |
My initial thought was that the user might want to know *why* a file cannot be sent by using the fastest method and hence wants to see the original exception. Anyway, I have not strong opinions about this so I guess we can also drop it.
Have you read my patch? This is already provided by the "offset" parameter.
Both sendfile() and TransmitFile provide a "blocksize" parameter for very good reasons therefore it seems natural that an API built on top of them exposes the same parameter as well. |
Note: my example about limiting the transfer speed does not really apply 'cause as this stands right now it cannot be used with non-blocking sockets. Other arguments do though and I hope it's clear that we need "blocksize". |
I initially also thought so. But I've suggested the parameter to replace
The reason sendfile exists is performance. Otherwise socket.makefile and shutil.copyfileobj could be used instead. use_fallback parameter provides a way to assert that an ineffective fallback is not used by accident. It may be ignored by most users. An alternative is a new separate public method that doesn't use the fallback. |
Considering the current indecision about certain design aspects I started a discussion on python-ideas: https://mail.python.org/pipermail/python-ideas/2014-April/027752.html |
Can you also think about how this would be wrapped in asyncio? |
I think asyncio would be better off using os.sendfile() / TransmitFile directly, in fact the current patch explicitly does not support non-blocking sockets (I couldn't see any sane approach to do that). |
Of course I read your patch ;-)
No, they expose a *count* parameter: You're mixing up blocksize, which is the maximum number of bytes to Here, you basically implement sendall() on top of sendfile() (in while <remaining data to send>:
socket.sendfile(data[offset:offset+chunksize) The way it's supposed to be used is simply: That's how everyone one uses sendfile(), that's how Java exposes it, To sum up, I think there's a fundamental confusion between blocksize |
use_fallback parameter is mostly a debugging tool. If it helps to avoid the It seems the patch assumes *offset == nbytes_sent* that is false in general :: _SEND_BLOCKSIZE = 262144 # ???
def sendfile(self, file, offset=None, nbytes=None,
*, nbytes_per_send=_SEND_BLOCKSIZE) -> nbytes_sent:
"""
Send *nbytes* bytes from regular *file* starting at *offset* position.
|
Ah OK, I see what you mean now. If seems we didn't understand each other. =) As for what to do, here's what I propose:
I'm -1 about adding "count" *and* "blocksize" parameters. "blocksize" alone is good enough IMO and considering what I've just described it is a better name than "count". |
count and blocksize are completely different. *count* specifies how many bytes at most socket.sendfile should sent overall. It may change the result i.e., it may not be necessary that the file is read until EOF. It has the same meaning as *nbytes* parameter for os.sendfile or *nbytes* in msg217121 *blocksize* doesn't change how many bytes is read in the end by socket.sendfile. At most it may affect time performance. It is *nbytes_per_send* in msg217121 |
I'm confused. Why is "blocksize" necessary at all?
Why not fstat(fd) ? |
My guess, it may be used to implement socket.send()-based fallback. Its meaning could be the same as *length* parameter in shutil.copyfileobj The fallback is useful if os.sendfile doesn't exists or it doesn't accept given parameters e.g., if *file* is not mmap-like enough for os.sendfile.
os.path.getsize(file.name) in msg217121 is a pseudo-code (as said In real code, if *nbytes is None*; I would just call os.sendfile It assumes socket.sendfile doesn't specify its behaviour if the file The pseudo-code in msg217121 is my opinion about the public interface for socket.sendfile -- It is different from the one in the current socket-sendfile5.patch |
Given the opinions expressed so far I:
I'm attaching socket-sendfile6.patch which includes docs and many new tests. |
Looking back at this I think a "send_blocksize" argument is necessary after all. shutil.copyfileobj() has it, so is ftplib.FTP.storbinary() and httplib (bpo-13559) which will both be using socket.sendfile() once it gets included. |
Those APIs are really poor, please don't cripple sendfile() to mirror them. Once again, a send_blocksize argument doesn't make sense, you won't |
I agree it is not necessary for sendfile() (you were right).
|
...speaking of which, now that I look back at those benchmarks it looks like 65536 bytes is the best compromise (in my latest patch I used 16348). |
Charles, Antoine, any final thought about this given the reasons I stated above? If you're still -1 about adding 'send_blocksize' argument I guess I can get rid of it and perhaps reintroduce it later if we see there's demand for it. |
Good we agree :-)
Honestly, we should deprecate the whole ftplib module :-) If you have time and it interest you, trying to improve this module
I agree, but both points are addressed by sendfile(): internally, the So to reply to your above question, I wouldn't feel too bad about So I'd really like it if you could push the version without the |
I'm talking about send(), not sendfile(). sendfile(self, file, offset=0, count=None, send_blocksize=16384):
ftplib module API may be a bit rusty but there's a reason why it was designed like that. |
Which makes even less sense if it's not needed for sendfile() :-) |
OK then, I'll trust your judgement. I'll use 8K as the default and will commit the patch soon. |
New changeset 001895c39fea by Giampaolo Rodola' in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: