Generate mixed workload with Get, Put, Seek in db_bench #4788

zhichao-cao · 2018-12-17T04:15:09Z

Based on the specific workload models (key access distribution, value size distribution, and iterator scan length distribution, the QPS variation), the MixGraph benchmark generate the synthetic workload according to these distributions which can reflect the real-world workload characteristics.

After user enable the tracing function, they will get the trace file. By analyzing the trace file with the trace_analyzer tool, user can generate a set of statistic data files. The *_accessed_key_stats.txt, *-accessed_value_size_distribution.txt, *-iterator_length_distribution.txt, and *-qps_stats.txt are mainly used to fit the Matlab model fitting. After that, user can get the parameters of the workload distributions (the modeling details are described: here)

The key access distribution follows the the two-term power model. The probability density function is: f(x) = ax^{b}+c. The corresponding parameters are key_dist_a, key_dist_b, and key_dist_c in db_bench

For the value size distribution and iterator scan length distribution, they both follow the Generalized Pareto Distribution. The probability density function is f(x) = (1/sigma)(1+k*(x-theta)/sigma))^{-1-1/k). The parameters are: value_k, value_theta, value_sigma and iter_k, iter_theta, iter_sigma. For more information about the Generalized Pareto Distribution, users can find the wiki and Matalb page

As for the QPS, it follows the diurnal pattern. So Sine is a good model to fit it. F(x) = sine_a*sin(sine_b*x + sine_c) + sine_d. The trace_will tell you the average QPS in the print out resutls, which is sine_d. After user fit the "*-qps_stats.txt" to the Matlab model, user can get the sine_a, sine_b, and sine_c. By using the 4 parameters, user can control the QPS variation including the period, average, changes.

To use the bench mark, user can indicate the following parameters as examples:

-benchmarks="mixgraph" -key_dist_a=0.002312 -key_dist_b=0.3467 -value_k=0.9233 -value_sigma=226.4092 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.7 -mix_put_ratio=0.25 -mix_seek_ratio=0.05 -sine_mix_rate_interval_milliseconds=500 -sine_a=15000 -sine_b=1 -sine_d=20000

…ek based on ratio, value size distribution, scan length distribution and qps

facebook-github-bot

@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

sagar0

Thanks @zhichao-cao .
Could you provide more information about the distributions that you added in the summary section, so that RocksDB users can know what type of models/equations/distributions are used to generate the workload, without delving into the code?

tools/db_bench_tool.cc

sagar0 · 2019-01-04T00:49:22Z

I believe it is quite difficult for users to come up with values for these parameters. Is there an easy way a user can figure out what values to provide for these parameters based on some other data? (say from trace_analyzer?)

zhichao-cao · 2019-01-04T02:40:03Z

I believe it is quite difficult for users to come up with values for these parameters. Is there an easy way a user can figure out what values to provide for these parameters based on some other data? (say from trace_analyzer?)

The trace analyzer provide the statistic files for the users. User needs to use Matlab to fit the statistic data to the models. The fitting functions are complex and I think using the well-develop tool boxes in Matlab is a better way for users to figure out these parameters. I have wrote the instructions on the intro of "Tracing, analyzing, and Modeling" at RocksDB wiki, which includes the files generated by trace_analyzer and how these files can be used in Matlab for model fitting. The Matlab scripts are also listed there.

facebook-github-bot · 2019-01-04T05:02:32Z

@zhichao-cao has updated the pull request. Re-import the pull request

sagar0 · 2019-01-04T19:47:42Z

I believe it is quite difficult for users to come up with values for these parameters. Is there an easy way a user can figure out what values to provide for these parameters based on some other data? (say from trace_analyzer?)

The trace analyzer provide the statistic files for the users. User needs to use Matlab to fit the statistic data to the models. The fitting functions are complex and I think using the well-develop tool boxes in Matlab is a better way for users to figure out these parameters. I have wrote the instructions on the intro of "Tracing, analyzing, and Modeling" at RocksDB wiki, which includes the files generated by trace_analyzer and how these files can be used in Matlab for model fitting. The Matlab scripts are also listed there.

That's great. I haven't seen the "Model the Workloads" section on that wiki page before.

facebook-github-bot

@zhichao-cao has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@sagar0 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

sagar0

Thanks, lgtm.
Lets get this in and you can iterate on it.

facebook-github-bot

@sagar0 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

xkszltl · 2019-01-24T10:09:55Z

rocksdb/tools/db_bench_tool.cc

Line 4675 in e07aa86

char value_buffer[2 * value_max];

This seems to be a dynamic array, which is gnu extension.

sagar0 · 2019-01-24T19:22:26Z

I wonder why the appveyor build didn't catch this 😕 .

Summary: In the MixGraph benchmark of db_bench #4788 , the char array is initialized with an argument from user's input, which can cause build error on some platforms. Also, the msg char array size can be potentially smaller than the printed data, which should be extended from 100 to 256. Tested with make check. Pull Request resolved: #4918 Differential Revision: D13844298 Pulled By: sagar0 fbshipit-source-id: 33c4809c5c4438f0a9f7b289d3f42e20c545bbab

Summary: In the previous PR #4788, user can use db_bench mix_graph option to generate the workload that is from the social graph. The key is generated based on the key access hotness. In this PR, user can further model the key-range hotness and fit those to two-term-exponential distribution. First, user cuts the whole key space into small key ranges (e.g., key-ranges are the same size and the key-range number is the number of SST files). Then, user calculates the average access count per key of each key-range as the key-range hotness. Next, user fits the key-range hotness to two-term-exponential distribution (f(x) = f(x) = a*exp(b*x) + c*exp(d*x)) and generate the value of a, b, c, and d. They are the parameters in db_bench: prefix_dist_a, prefix_dist_b, prefix_dist_c, and prefix_dist_d. Finally, user can run db_bench by specify the parameters. For example: `./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=268435456 -key_dist_a=0.002312 -key_dist_b=0.3467 -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=350 -sine_b=0.0105 -sine_d=50000 --perf_level=2 -reads=1000000 -num=5000000 -key_size=48` Pull Request resolved: #5953 Test Plan: run db_bench with different parameters and checked the results. Differential Revision: D18053527 Pulled By: zhichao-cao fbshipit-source-id: 171f8b3142bd76462f1967c58345ad7e4f84bab7

Summary: In the previous PR facebook#4788, user can use db_bench mix_graph option to generate the workload that is from the social graph. The key is generated based on the key access hotness. In this PR, user can further model the key-range hotness and fit those to two-term-exponential distribution. First, user cuts the whole key space into small key ranges (e.g., key-ranges are the same size and the key-range number is the number of SST files). Then, user calculates the average access count per key of each key-range as the key-range hotness. Next, user fits the key-range hotness to two-term-exponential distribution (f(x) = f(x) = a*exp(b*x) + c*exp(d*x)) and generate the value of a, b, c, and d. They are the parameters in db_bench: prefix_dist_a, prefix_dist_b, prefix_dist_c, and prefix_dist_d. Finally, user can run db_bench by specify the parameters. For example: `./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=268435456 -key_dist_a=0.002312 -key_dist_b=0.3467 -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=350 -sine_b=0.0105 -sine_d=50000 --perf_level=2 -reads=1000000 -num=5000000 -key_size=48` Pull Request resolved: facebook#5953 Test Plan: run db_bench with different parameters and checked the results. Differential Revision: D18053527 Pulled By: zhichao-cao fbshipit-source-id: 171f8b3142bd76462f1967c58345ad7e4f84bab7

facebook-github-bot added the CLA Signed label Dec 17, 2018

zhichao-cao added 3 commits January 3, 2019 10:23

Implemented the basic model generater which can generate Get, Put, Se…

273b3ca

…ek based on ratio, value size distribution, scan length distribution and qps

Fixed the bugs, and it can generate the workload as model defined

a8bb99c

Adjusted the parameters of the mix workload model

90eaea9

zhichao-cao force-pushed the model_generator branch from 57478f4 to 90eaea9 Compare January 3, 2019 18:37

facebook-github-bot reviewed Jan 3, 2019

View reviewed changes

Fixed the build error

acf96ff

facebook-github-bot reviewed Jan 3, 2019

View reviewed changes

sagar0 reviewed Jan 4, 2019

View reviewed changes

tools/db_bench_tool.cc Show resolved Hide resolved

tools/db_bench_tool.cc Show resolved Hide resolved

Removed the unused parameters and changed the int to int64_t

c49945a

facebook-github-bot reviewed Jan 4, 2019

View reviewed changes

sagar0 changed the title ~~Generate the mix workload with Get, Put, Seek in db_bench~~ Generate mix workload with Get, Put, Seek in db_bench Jan 8, 2019

facebook-github-bot reviewed Jan 8, 2019

View reviewed changes

sagar0 changed the title ~~Generate mix workload with Get, Put, Seek in db_bench~~ Generate mixed workload with Get, Put, Seek in db_bench Jan 8, 2019

sagar0 approved these changes Jan 8, 2019

View reviewed changes

facebook-github-bot reviewed Jan 8, 2019

View reviewed changes

facebook-github-bot closed this in ce8e88d Jan 22, 2019

xkszltl mentioned this pull request Jan 24, 2019

[MSVC] Build failure with error C2131 error #4917

Closed

zhichao-cao mentioned this pull request Jan 24, 2019

Fix the build error caused by the dynamic array #4918

Closed

zhichao-cao mentioned this pull request Oct 22, 2019

Workload generator (Mixgraph) based on prefix hotness #5953

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate mixed workload with Get, Put, Seek in db_bench #4788

Generate mixed workload with Get, Put, Seek in db_bench #4788

zhichao-cao commented Dec 17, 2018 •

edited

Loading

facebook-github-bot left a comment

facebook-github-bot left a comment

sagar0 left a comment

sagar0 commented Jan 4, 2019

zhichao-cao commented Jan 4, 2019

facebook-github-bot commented Jan 4, 2019

sagar0 commented Jan 4, 2019

facebook-github-bot left a comment

facebook-github-bot left a comment

sagar0 left a comment

facebook-github-bot left a comment

xkszltl commented Jan 24, 2019

sagar0 commented Jan 24, 2019

Generate mixed workload with Get, Put, Seek in db_bench #4788

Generate mixed workload with Get, Put, Seek in db_bench #4788

Conversation

zhichao-cao commented Dec 17, 2018 • edited Loading

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

sagar0 left a comment

Choose a reason for hiding this comment

sagar0 commented Jan 4, 2019

zhichao-cao commented Jan 4, 2019

facebook-github-bot commented Jan 4, 2019

sagar0 commented Jan 4, 2019

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

sagar0 left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

xkszltl commented Jan 24, 2019

sagar0 commented Jan 24, 2019

zhichao-cao commented Dec 17, 2018 •

edited

Loading