Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark on memcached #4

Open
posutsai opened this issue Jul 8, 2020 · 2 comments
Open

benchmark on memcached #4

posutsai opened this issue Jul 8, 2020 · 2 comments

Comments

@posutsai
Copy link

posutsai commented Jul 8, 2020

I have tried to reproduce performance measurement on memcached. However, I can't more clear details about how you benchmark it. Could you provide more detail such as

  • What tool did you use?
  • What's the config of the benchmarking tool?

I already try utilize memtier_benchmark since it is recommended by official. Here is my benchmarking command.
memtier_benchmark -x 5 -c 50 -t 100 -d 32 --ratio=1:0 --pipeline=1 --key-pattern S:S -P memcache_binary --hide-histogram
Yet, the result after applying linker interposition on memcached is pretty unstable. I would like to have some comment from the author.

Apart from benchmark, I encounter several issues while reading the paper. Please correct me if I misunderstand the content in the paper. According to page 60 Table 4-11, on the memcached-new column, it shows the performance with hmcs lock is poorest. Since pthread has 103 performance gain compared to hmcs, those locks whose performance gain is greater than 103 should probably perform better than original pthread_mutex version such as mcs_stp with 582 performance gain. Nevertheless, the performance isn't like expected and it even has worse performance than pthread_mutex.

Furthermore, I would like to know the testing on pthread_mutex is based on original version with linker interposition in LITL or replace the symbol with libpthreadinterpose_original.sh.

Last but not least, thank you for providing such a great tool to try various lock so easily. It helps me a lot. Please let me know if you need further description to understand my issue. I will provide it ASAP.

@HugoGuiroux
Copy link
Contributor

HugoGuiroux commented Aug 30, 2020

Hi @posutsai.

Sections 4.1.3 and 4.1.4 contain more information about Memcached benchmarking

For the Memcached-* experiments where some nodes are dedicated to network injection, memory is interleaved only on the nodes dedicated to the server.

For Memcached, similarly to other setups used in the literature [72, 42], the workload runs on a single machine: we dedicate one socket of the machine where we run memaslap to inject network traffic to the Memcached instance, the two running on two distinct sets of cores.

Note that the memcached version do play a role in the results, cf footnote page 46.

Memcached 1.4.15 uses a global lock to synchronize all accesses to a shared hash table. This lock is known to be the main bottleneck. Newer versions use per-bucket locks, thus suffer less from contention.

Regarding you question about pthread_mutex, we use libpthreadinterpose_original.sh for pthread. Close to the end of Section 4.1.4:

Besides, in order to make fair comparisons among applications, the results presented for the Pthread locks are obtained using the same library interposition mechanism (see Chapter 3) as with the other locks.

@posutsai
Copy link
Author

posutsai commented Sep 7, 2020

Thank you for replying. I guess the main reason why I can't reproduce the experiment is because of hardware. Anyway I figure out what kind of lock suits my application most.

By the way I would like to know if you ever consider doing this kind of lock replacement through static source code analysis instead of linker interposition?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants