Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance comparison between s3fs and riofs #6

Closed
henningpeters opened this issue Mar 18, 2013 · 12 comments
Closed

Performance comparison between s3fs and riofs #6

henningpeters opened this issue Mar 18, 2013 · 12 comments

Comments

@henningpeters
Copy link
Contributor

It would be great to see the performance improvements and eventually see where we still have to catch up.

The benchmark could be scripted such that we can reproduce it regularly.

Most importantly would be list loading/caching.

@zenwanger
Copy link

Definitely interested in this too.

@kahing
Copy link

kahing commented Oct 17, 2015

I did not know about riofs and ended up writing goofys, partially motivated by poor performance of s3fs. I have some rudimentary benchmark against s3fs using https://github.com/kahing/goofys/blob/master/bench.sh which I think covers most common operations. I try hard to measure uncached performance because that's the most interesting use case for me.

btw glancing at the code it looks like flush() in riofs does nothing? That's cheating ;-)

@henningpeters
Copy link
Contributor Author

flush() has no semantics on AWS S3, hence we left it intentionally unimplemented (leaky abstraction). Can you explain why you think this should be implemented?

@kahing
Copy link

kahing commented Oct 19, 2015

apologizes about the lack of flush(), like I commented on HN I didn't go through the code enough and that was the first thing I saw. I hadn't heard of riofs previously so I just assumed it wasn't in a stage that's ready yet. Sorry!

@henningpeters
Copy link
Contributor Author

No worries, riofs is in production use internally at Skoobe for over 18 months now and I also heard of some others using and liking it, but we haven't spend any energy making it more popular.

I would be very interested in seeing performance numbers such that we learn more about which underlying strategies really work. Unfortunately, I am not with Skoobe anymore and don't have enough cycles to work on this right now...

Getting caching right took us most time, but I believe the non-caching part is also pretty neat.

@kahing
Copy link

kahing commented Oct 20, 2015

I am trying to run some benchmarks, but numbers seem too good to be true. On closer examinations looks like rfuse_release() does not wait for the S3 response before returning. It's probably okay for most use cases but that makes it hard to do apples and apples comparisons.

@wizzard
Copy link
Member

wizzard commented Oct 20, 2015

Just want to give you a hint how did I to test RioFS: start two instances of RioFS and mount the same bucket. Using some script create a file (or many files / folders) and place into the first mounted folder, then check how soon it appears in the second mounted folder. It should give you some ideas about the performance.

@kahing
Copy link

kahing commented Oct 21, 2015

Here's the result. I didn't put any effort in optimizing the number of concurrent connections, so maybe that explains why the read/write perf isn't up to par. riofs config here

operation s3fs riofs † speedup
Create 100 files 33.7+/-2.5 1.43+/-0.21 23.6+/-3.8x
Unlink 100 files 6.6+/-0.6 3.63+/-0.33 1.81+/-0.23x
Create 100 files (parallel) 29.4+/-1.7* 1.25+/-0.16 23.6+/-3.4x
Unlink 100 files (parallel) 9.7+/-1.7 4.2+/-0.4 2.3+/-0.5x
ls with 1000 files 9.6+/-1.4 0.21+/-0.09* 44.6+/-20.8x
Write 1GB 38.4+/-6.2* 117.1+/-3.7 0.33+/-0.05x
Read 1GB 22.0+/-6.7* 25.2+/-1.0 0.87+/-0.27x
Time to 1st byte 1.1+/-0.4 0.275+/-0.018* 4.1+/-1.6x

(*) indicates the number of outliers removed
(†) see the above caveat regarding release()

@henningpeters
Copy link
Contributor Author

Nice work and interesting figures indeed, your critique regarding rfuse_release sounds valid, but the configuration is certainly not ideal for a benchmark (capped at max. 2/2/4 concurrent connections), I would increase it substantially for a production setup. Most likely the bottleneck hasn't been RioFS. Can you try again with x10 and x100 for all numbers?

@kahing
Copy link

kahing commented Oct 21, 2015

I can but I don't want to, because then I would have to find the optimal configuration for s3fs as well. I challenge you guys to default to a reasonable value instead of having everyone independently figure that out.

@henningpeters
Copy link
Contributor Author

Understood, you are probably highlighting an issue as we haven't done a good job at thinking from a user's perspective here.

ThePythonicCow added a commit to ThePythonicCow/riofs that referenced this issue Jan 13, 2017
This is the easiest of the three old consistency
checking methods to remove - based on file size
checking.
ThePythonicCow added a commit to ThePythonicCow/riofs that referenced this issue Jan 14, 2017
This is the easiest of the three old consistency
checking methods to remove - based on file size
checking.
ThePythonicCow added a commit to ThePythonicCow/riofs that referenced this issue Jan 15, 2017
This is the easiest of the three old consistency
checking methods to remove - based on file size
checking.
ThePythonicCow added a commit to ThePythonicCow/riofs that referenced this issue Jan 15, 2017
This is the easiest of the three old consistency
checking methods to remove - based on file size
checking.
ThePythonicCow added a commit to ThePythonicCow/riofs that referenced this issue Jan 15, 2017
This is the easiest of the three old consistency
checking methods to remove - based on file size
checking.
@deepthi
Copy link

deepthi commented Jan 28, 2019

This should be re-opened because it has not been resolved. It was accidentally closed because PR#137 used #6 in the title

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants