-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalidate rather than update cache #34
Conversation
Sounds like a more sensible solution for multiple writes than mine. I can look at it later, if you need. |
Mostly I wanted to make sure that this was a sane approach and to cheaply check if the errors have an obvious cause to you. https://travis-ci.org/dask/s3fs/jobs/126181303 If not then I can continue diving in here unless you're particularly free from other work. |
I am in a meeting now for at least 90min. I don't think current() can cause any problem, don't mind keeping it here. |
We should remove the |
Under the many-small files case this helps by about a factor of two, but there is still more performance work to do. Here is a profile from a zarr computation: In [1]: import zarr
In [2]: from s3fs import S3Map, S3FileSystem
In [3]: d = S3Map('zarr-test/test-1')
In [4]: x = zarr.empty(shape=(10000, 10000), chunks=(1000, 1000), dtype='f4', store=d)
In [5]: %time x[:] = 1
CPU times: user 800 ms, sys: 35.5 ms, total: 835 ms
Wall time: 15.6 s
In [6]: %prun x[:] = 2
@martindurant do you see any obvious gains that can be made? Is there a way with boto3 to maintain live connections that might benefit us? |
This is a profile from master (before this PR) to show the difference.
|
One thing is that we're really doing just 100 puts, but we seem to incur 3000 messages. Are some of these extra messages avoidable? |
cc @alimanfoo in case he's interested |
Thanks, very interesting. |
The write interface works assuming big files, and uses multi-part-upload even for small files (so need to create the multi-part, write chunks, and finalize every time). Could give an option for small files/multi-part off. I'll look into the latter, see if it's simpler than I first imagine. |
Perhaps the Regardless, uploading many small-to-medium sized files is an application that it would be nice to support efficiently. Creating a benchmark similar to what is above and then optimizing it down might be a satisfying and valuable experience. |
@mrocklin , please benchmark now; appears to be substantially faster for small files, and I didnd't break anything. |
Away from an easy testing location at the moment. Will try this out later tonight. |
In [1]: from s3fs import S3Map, S3FileSystem
In [2]: d = S3Map('zarr-test/test-1')
In [3]: d.clear()
In [4]: import zarr
In [5]: x = zarr.empty(shape=(10000, 10000), chunks=(1000, 1000), dtype='f4', store=d)
In [6]: %prun x[:] = 1
In [7]: %prun -s cumtime x[:] = 2 Sorted by total time
Sorted by cumulative time
|
Writing non-trivial data to the array shows how these times relate to other times. In [10]: %time a = np.random.random(size=x.shape).astype(dtype=x.dtype)
CPU times: user 1.28 s, sys: 231 ms, total: 1.51 s
Wall time: 1.51 s
In [11]: %prun x[:] = a
In [12]: %time a = np.random.random(size=x.shape).astype(dtype=x.dtype)
CPU times: user 1.27 s, sys: 233 ms, total: 1.5 s
Wall time: 1.5 s
In [13]: %prun -s cumtime x[:] = a Sorted by total time
Sorted by cumulative time 621571 function calls (620671 primitive calls) in 29.433 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 29.434 29.434 {built-in method builtins.exec}
1 0.000 0.000 29.434 29.434 <string>:1(<module>)
1 0.003 0.003 29.434 29.434 core.py:193(__setitem__)
100 0.003 0.000 29.429 0.294 core.py:284(_chunk_setitem)
100 0.002 0.000 28.703 0.287 mapping.py:64(__setitem__)
100 0.000 0.000 28.598 0.286 core.py:869(__exit__)
200 0.002 0.000 28.598 0.143 core.py:818(close)
100 0.001 0.000 28.496 0.285 client.py:244(_api_call)
100 0.003 0.000 28.495 0.285 client.py:514(_make_api_call)
100 0.000 0.000 27.670 0.277 endpoint.py:114(make_request)
100 0.001 0.000 27.670 0.277 endpoint.py:140(_send_request)
100 0.002 0.000 27.562 0.276 endpoint.py:163(_get_response)
100 0.003 0.000 27.547 0.275 sessions.py:539(send)
100 0.002 0.000 27.522 0.275 adapters.py:323(send)
100 0.002 0.000 27.490 0.275 connectionpool.py:421(urlopen)
100 0.003 0.000 27.471 0.275 connectionpool.py:317(_make_request)
100 0.001 0.000 21.255 0.213 client.py:1130(getresponse)
100 0.002 0.000 21.250 0.213 client.py:275(begin)
2600 0.005 0.000 21.234 0.008 socket.py:561(readinto)
2600 0.005 0.000 21.223 0.008 ssl.py:913(recv_into)
2600 0.003 0.000 21.217 0.008 ssl.py:778(read)
100 0.002 0.000 21.214 0.212 client.py:242(_read_status)
2600 0.002 0.000 21.213 0.008 ssl.py:563(read)
801 0.003 0.000 21.212 0.026 {method 'readline' of '_io.BufferedReader' objects}
2600 21.211 0.008 21.211 0.008 {method 'read' of '_ssl._SSLSocket' objects}
100 0.000 0.000 6.195 0.062 client.py:1081(request)
100 0.001 0.000 6.195 0.062 awsrequest.py:121(_send_request)
100 0.002 0.000 6.194 0.062 client.py:1109(_send_request)
100 0.000 0.000 6.176 0.062 client.py:1066(endheaders)
100 0.002 0.000 6.176 0.062 awsrequest.py:146(_send_output)
100 3.918 0.039 3.918 0.039 {built-in method select.select}
100 0.002 0.000 2.247 0.022 awsrequest.py:195(_handle_expect_response)
200 0.001 0.000 2.218 0.011 awsrequest.py:236(send)
200 0.083 0.000 2.218 0.011 client.py:846(send)
100 0.000 0.000 2.212 0.022 awsrequest.py:232(_send_message_body)
49000 0.123 0.000 2.055 0.000 ssl.py:876(sendall)
49000 0.064 0.000 1.906 0.000 ssl.py:849(send)
49000 0.035 0.000 1.832 0.000 ssl.py:575(write)
49000 1.796 0.000 1.796 0.000 {method 'write' of '_ssl._SSLSocket' objects}
800/600 0.006 0.000 0.854 0.001 hooks.py:175(_emit)
200 0.000 0.000 0.792 0.004 hooks.py:228(emit_until_response)
100 0.001 0.000 0.788 0.008 handlers.py:151(conditionally_calculate_md5)
100 0.001 0.000 0.788 0.008 handlers.py:125(calculate_md5)
100 0.002 0.000 0.786 0.008 handlers.py:142(_calculate_md5_from_file)
800 0.735 0.001 0.735 0.001 {method 'update' of '_hashlib.HASH' objects}
100 0.000 0.000 0.642 0.006 compression.py:73(compress)
100 0.585 0.006 0.642 0.006 {zarr.blosc.compress}
49600 0.228 0.000 0.228 0.000 {method 'read' of '_io.BytesIO' objects}
100 0.001 0.000 0.100 0.001 endpoint.py:119(create_request)
100 0.001 0.000 0.100 0.001 core.py:739(write)
100 0.100 0.001 0.100 0.001 {method 'write' of '_io.BytesIO' objects}
100 0.000 0.000 0.079 0.001 numeric.py:535(ascontiguousarray)
100 0.078 0.001 0.078 0.001 {built-in method numpy.core.multiarray.array}
600/500 0.001 0.000 0.063 0.000 hooks.py:215(emit)
100 0.057 0.001 0.057 0.001 __init__.py:487(string_at)
100 0.000 0.000 0.050 0.000 signers.py:85(handler)
100 0.001 0.000 0.050 0.000 signers.py:92(sign)
500 0.001 0.000 0.048 0.000 handlers.py:145(<lambda>)
100 0.000 0.000 0.044 0.000 endpoint.py:136(prepare_request)
100 0.001 0.000 0.039 0.000 awsrequest.py:356(prepare)
100 0.001 0.000 0.036 0.000 auth.py:624(add_auth)
100 0.003 0.000 0.031 0.000 client.py:179(parse_headers)
100 0.001 0.000 0.031 0.000 auth.py:612(get_signature)
200 0.004 0.000 0.029 0.000 {method 'readline' of '_io._IOBase' objects}
100 0.002 0.000 0.028 0.000 client.py:546(_convert_to_request_dict) It's odd how much time we spend reading |
So all in all this is a huge improvement. I'm still generally curious what is taking up time here. Any ideas? |
I'll look into it, but at a first guess, would this include time waiting for the server to supply a response? Every call on S3 must be a HTTP PUT/POST, but the response is always checked to see if the operation was successful. |
There appear to be precisely 2600 calls to |
I looked into S3 connections, and I can’t find any way to keep them open. (Ideally we would not be writing tiny chunks, would this be used in a real scenario?) |
In this comment we dump 400MB of 1000x1000 chunks into S3 over 30s from a box on EC2. This is roughly 10MB/s which is quite a bit slower than what I would expect. Most of the time comes from I consider 4MB chunks to be a perhaps a bit small, but not that small. Assuming we have 100MB/s write bandwidth and this read overhead is around 300ms this means that we need to be writing chunks in the hundreds of megabytes before we stop noticing the overhead. This seems too large to me. |
We were not the first to think about this http://stackoverflow.com/questions/8650625/efficiently-move-many-small-files-to-amazon-s3 (refers to a previous python-fuse s3fs I didn't know about) |
It's odd that the overhead for a single put is 300ms. I wonder if there is hidden API somewhere in |
Experiment with threads: |
Any reason not to merge the work here so far (minus the experimental commit with threads)? Anything fancy to deal with latency can come separately. |
Sure. Should I take care of the rebase or would you like to? If you do it note that I have a commit in there titled |
If you want to restructure the commits at all, then please go ahead. |
Previously on touch or write events we called `_ls` to update the local cache. When we called touch repeatedly this caused a slowdown unless we used the `no_refresh` context manager. Now we just invalidate the cache on write events so that the next read event will correctly update the cache. This helps with many successive writes.
a better way to solve the many-writes problem; no need to check the filelist until the next time we need the listing/detail.
Previously on touch or write events we called
_ls
to update the localcache. When we called touch repeatedly this caused a slowdown unless
we used the
no_refresh
context manager.Now we just invalidate the cache on write events so that the next read
event will correctly update the cache. This helps with many successive
writes.
cc @martindurant
This still has a couple of failures. I'm probably doing something a bit wacky.