GitHub - evertlammerts/hadoop-acceptance-test: Some scripts to test disk performance

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README		README
clean.sh		clean.sh
config		config
create_inputfiles.sh		create_inputfiles.sh
read.sh		read.sh
result.sh		result.sh
start_test.sh		start_test.sh
stop_test.sh		stop_test.sh
write.sh		write.sh

Repository files navigation

DD TESTSUITE
============

This package contains a set of scripts which can be used to perform dd-based
I/O tests. These tests will conduct simultaneous read and write requests in a
ratio of 1:1 using 128MiB files by default.

First, in order to perform read tests an initial set of input files needs to be
set up. The create_inputfiles.sh script does this. The usage is as follows:

create_inputfiles.sh <number of files>

The script doing the reads will pick one of these files at random.
The idea is that initially a fairly large number of files are created that will
be used for the read test. During the test, read processes will pick one of these
files at random. We have selected a file size of 128MiB since this is a typical size
for a virtual block in Hadoop. The total size of all of these files combined should
be equal to or greater than 25 times the total amount of memory on the host where
the test is run which gives an indication on how many files should be created. For
example, using 128MiB files and having 64GiB of memory in the machine you would need
12800 files. This is to reduce the benefit of the filesystem buffer cache. The block
size used by dd is chosen to be 4KiB. This is equal to the default EXT3 block size.
Both the block size as the file size are configurable in the "config" file.

During the write test a faily large number of files are written. At random a number
between 1 and MAX_FILES is chosen and the file#number will be written. This is also
to reduce file system buffer cache effects. The number of file MAX_FILES is
configurable through the variable in the "config" file. The same holds for the file
size.

When the test starts, reads and writes will be started simultaneously in the ratio of 1:1.
This is done by running the "start_test.sh" script. The amount of reads and writes is
configurable with a commandline flag of the "start_test.sh" script. These read and write
processes are running indefinitely and should eventually be stopped using the
"stop_test.sh" script. However, the processes should be run over a sufficiently long
period of time to get a accurate measurement. This is especially needed for writes which
initially show a very high rate due to filesystem buffer cache effects. It is usually
sufficient to let the processes continue for, say, a few hundred iterations of writes and
reads. The "result.sh" script parses the output files and displays the throughput
performance. This script can also be run during the test. When the variations in throughput
become small the test can be stopped.