Skip to content


Subversion checkout URL

You can clone with
Download ZIP
tree: 63e4824838
Fetching contributors…

Cannot retrieve contributors at this time

99 lines (88 sloc) 4.683 kb
600-Bucket is for testing scribe configuration which has a large number
of buffer stores. E.g. 600 buffer stores within a top level bucket store.
In this kind of setup, if down stream receivers are unreachable, buffer
stores will start backing incoming messages which could incur such large amount
of file i/o that it could bring down the scribe server during
buffer store's buffering, and SENDING_BUFFER phases. The test is used to
verify the changes and newly added options that are designed to deal
with this situation.
Network Topology
You need at least 3 servers: a sender, a mid tier, and a receiver. Scribe
servers will be running on all of them. On the receiver, we will run
two scribe servers to simulate bucketing to different hosts.
Configuration files for scribe instances will be generated via
script. We made some assumptions about the configurations. If you are
testing someting differently, or of different scale, please update's configuration accordingly. Run again to generate
updated scribe.confs.
Configuration generation
Before you run, you need to collect the following information:
* midtierhost: host name /ip of the mid tier server
* midtierport: port number of the mid tier server
* receiverhost: hostname/ip of the receiver server
* receiverporteven: port number of the receiver server
* numbuckets: how many buckets
* outputdir: there will be five configuration file generated. outputdir
specify under which directory will the output conf file
generated. I usually output them to one of the directory
under my home nfs mount so that I can use it anywhere.
* category: scribe category
* scribefilepath: root directory for scribe file store. File basename
will be the same as the category.
Now that you have all those info, run
perl -h
to view the latest syntax. Plug in the information listed above into
the command line argument. Run "perl" with arguments
supplied, and you have generated all necessary configurations.
The following files are generated:
* scribe.sender.conf
* scribe.midtier.conf
* scribe.sinker.conf
* scribe.sinker.file.even.conf
* scribe.sinker.file.odd.conf
You need first three conf files for load test. You need the first
two conf files and even/odd sinker conf files if you want to test
whether messages are bucketed correctly. scribe.sinker.file.even.conf
and scribe.sinker.file.odd.conf use file stores to write messages
onto the disk so that you can verify bucketing results.
Now, copy those conf file to sender, mid tier, and receiver box.
(sinker means receiver, if you don't already noticed) Or, copy
them to a mounted directory.
How to test:
1. Start mid tier: scribed -p midtierport scribe.midtier.conf.
2. Start sender: scribed -p senderport scribe.sender.conf.
3. Use your favorite scribe client to generate scribe messages.
Since we are testing bucket store messages, make sure that
you read the scribe.midtier.conf file's relevant bucket
store configuration regarding which delimiter to use
and how many buckets are there. By default, we use ^A, or
ascii 1, as delimiter, and there are 600 buckets. You can
run your client any where. However, a convenient place is
on the sender box. Tune your message load to the max load
specified in scribe.conf. Directions are given in
within each conf file template.
4. Use ganglia to monitor mid tier and sender servers. Observe
the following:
4.1. sender output bandwidth
4.2. mid tier input bandwidth
4.3. mid tier memory usage
4.4. mid tier disk usage. (Avail disk should keep going down)
4.5. mid tier output bandwidth (it should be close to zero
as we haven't brought up receivers yet)
5. Let it run for some extended period of time. Watch available
disk starts getting lower and lower overtime.
6. On receiver servers, starts two scribe sinker servers:
6.1. perl -p 9998 scribe.conf, and
6.2. perl -p 9999 scribe.conf.
7. Monitor ganglia for sender, mid tier, and receiver hosts.
Pay attention to network, memory utilitzation, and avail free
disk. With proper tuning, free available disk recovery should be
as fast as when it was getting filled in step 4.
1. is a shell command that show disk i/o. It mainly use iostat but
with a little bit of formatting to only show interesting i/o stats.
2. is a shell program that automates periodic shutting down
sinkers and then bring it up repeatly, every 10 min.
More Information:
Jump to Line
Something went wrong with that request. Please try again.