Permalink
Browse files

Got new version of goraci.sh working. Updated README, copying goraci …

…jar to hadoop lib dir should not be nesc.
  • Loading branch information...
1 parent f9f420a commit e6896b5a493320341f27846079776db8a65f663d @keith-turner committed Mar 3, 2012
Showing with 42 additions and 67 deletions.
  1. +19 −21 README
  2. +22 −45 goraci.sh
  3. +1 −1 src/main/resources/gora.properties
View
40 README
@@ -7,19 +7,19 @@
BACKGROUND
------------
-Apache Accumulo [0] has a simple test suite that verifies that data is not lost at scale.
-This test suite is called continuous ingest. This test runs many ingest
-clients that continually create linked lists containing 25 million nodes. At
-some point the clients are stopped and a map reduce job is run to ensure no
-linked list has a hole. A hole indicates data was lost.
+Apache Accumulo [0] has a simple test suite that verifies that data is not lost
+at scale. This test suite is called continuous ingest. This test runs many
+ingest clients that continually create linked lists containing 25 million
+nodes. At some point the clients are stopped and a map reduce job is run to
+ensure no linked list has a hole. A hole indicates data was lost.
The nodes in the linked list are random. This causes each linked list to
spread across the table. Therefore if one part of a table loses data, then it
will be detected by references in another part of the table.
-This project is a version of the test suite written using Apache Gora [1]. Theoretically
-it could run against other column stores, however currently it has only been
-tested at scale using Apache Accumulo.
+This project is a version of the test suite written using Apache Gora [1].
+Theoretically it could run against other column stores, however currently it
+has only been tested at scale using Apache Accumulo.
THE ANATOMY OF GORACI TESTS
----------------------------
@@ -39,7 +39,7 @@ never reference a missing node, even if the ingest client is killed at any
point in time.
When running this test suite w/ Accumulo there is a script running in parallel
-called the Adgitator that randomly and continuously kills server processes.
+called the Aggitator that randomly and continuously kills server processes.
The outcome was that many data loss bugs were found in Accumulo by doing this.
This test suite can also help find bugs that impact uptime and stability when
run for days or weeks.
@@ -52,22 +52,20 @@ This test suite consists the following
BUILDING GORACI
---------------
-To build the code, you may need to edit the maven script to point to the gora
+To build the code, you may need to edit the maven script to point to the gora
datastore that you want to use. This will require you to edit commented out
dependencies, which will reflect which datastore you wish to test against.
-Alternatively just use the maven script to build the java code, and copy whatever
-dependencies you need into lib.
+Alternatively just use the maven script to build the java code, and copy
+whatever dependencies you need into lib.
+
To compile, do
$mvn compile package
-The current maven build script depends on an unreleased version of Accumulo and an
-un released version of gora-accumulo. Both of these can be downloaded and
+The current maven build script depends on an unreleased version of Accumulo and
+an un released version of gora-accumulo. Both of these can be downloaded and
installed in your local maven repo using mvn install.
-Once the gora goraci-${version}-SNAPSHOT.jar has been built copy it to your $HADDOP_HOME/lib
-directory.
-
JAVA CLASS DESCRIPTION
-----------------
@@ -91,10 +89,10 @@ You can just run "./goraci.sh Generator", below is an example.
Usage : Generator <num mappers> <num nodes>
For Gora to work, it needs a gora.properties file on the classpath and a
-mapping file on the classpath, the contents of both are datastore specific, more
-details can be found here [2]. You can edit the ones in src/main/resources and
-build the goraci-${version}-SNAPSHOT.jar with those. Alternatively remove those
-and put them on the classpath through some other means.
+mapping file on the classpath, the contents of both are datastore specific,
+more details can be found here [2]. You can edit the ones in src/main/resources
+and build the goraci-${version}-SNAPSHOT.jar with those. Alternatively remove
+those and put them on the classpath through some other means.
CONCLUSIONS
------------
View
@@ -28,18 +28,18 @@ done
if [ $# = 0 ]; then
echo "Usage: run COMMAND [COMMAND options]"
echo "where COMMAND is one of:"
- echo " generator A map only job that generates data."
- echo " verify A map reduce job that looks for holes.
+ echo " Generator A map only job that generates data."
+ echo " Verify A map reduce job that looks for holes.
Look at the counts after running.
REFERENCED and UNREFERENCED are ok,
any UNDEFINED counts are bad. Do not
run at the same time as the Generator."
- echo " walker A standalong program that starts
+ echo " Walker A standalong program that starts
following a linked list and emits
timing info."
- echo " print A standalone program that prints nodes
+ echo " Print A standalone program that prints nodes
in the linked list."
- echo " delete A standalone program that deletes a
+ echo " Delete A standalone program that deletes a
single node."
echo " or"
echo " CLASSNAME run the class named CLASSNAME"
@@ -53,71 +53,48 @@ shift
# some directories
THIS_DIR=`dirname "$THIS"`
-GORACI_HOME=`cd "$THIS_DIR/.." ; pwd`
+GORACI_HOME=`cd "$THIS_DIR" ; pwd`
# cath when JAVA_HOME is not set
if [ "$JAVA_HOME" = "" ]; then
echo "Error: JAVA_HOME is not set."
exit 1
fi
-
-JAVA=$JAVA_HOME/bin/java
-JAVA_HEAP_MAX=-Xmx1024m
-
-# check envvars which might override default args
-if [ "$GORACI_HEAPSIZE" != "" ]; then
- #echo "run with heapsize $GORACI_HEAPSIZE"
- JAVA_HEAP_MAX="-Xmx""$GORACI_HEAPSIZE""m"
- #echo $JAVA_HEAP_MAX
-fi
-
-# initial CLASSPATH
-CLASSPATH=$JAVA_HOME/lib/tools.jar
-
# so that filenames w/ spaces are handled correctly in loops below
IFS=
# restore ordinary behaviour
unset IFS
-# default log directory & file
-if [ "$GORACI_LOG_DIR" = "" ]; then
- GORACI_LOG_DIR="$GORACI_HOME/logs"
-fi
-if [ "$GORACI_LOGFILE" = "" ]; then
- GORACI_LOGFILE='goraci.log'
-fi
-
-if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
- JAVA_OPTS="$JAVA_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH"
-fi
-
-#GORACI_OPTS="$GORACI_OPTS -Dhadoop.log.dir=$GORACI_LOG_DIR"
-#GORACI_OPTS="$GORACI_OPTS -Dhadoop.log.file=$GORACI_LOGFILE"
-
# figure out which class to run
-if [ "$COMMAND" = "generator" ] ; then
+if [ "$COMMAND" = "Generator" ] ; then
CLASS=goraci.Generator
-elif [ "$COMMAND" = "verify" ] ; then
+elif [ "$COMMAND" = "Verify" ] ; then
CLASS=goraci.Verify
-elif [ "$COMMAND" = "walker" ] ; then
+elif [ "$COMMAND" = "Walker" ] ; then
CLASS=goraci.Walker
-elif [ "$COMMAND" = "print" ] ; then
- CLASS=goraci.Print
-elif [ "$COMMAND" = "delete" ] ; then
+elif [ "$COMMAND" = "Print" ] ; then
CLASS=goraci.Print
+elif [ "$COMMAND" = "Delete" ] ; then
+ CLASS=goraci.Delete
else
- MODULE="$COMMAND"
CLASS=$1
shift
fi
+# initial CLASSPATH
+CLASSPATH=""
+
# add libs to CLASSPATH
-for f in $GORA_HOME/$MODULE/lib/*.jar; do
- CLASSPATH=${CLASSPATH}:$f;
+SEP=""
+for f in $GORACI_HOME/lib/*.jar; do
+ CLASSPATH=${CLASSPATH}$SEP$f;
+ SEP=":"
done
#run it
-hadoop jar "$GORACI_HOME/lib/goraci-0.0.1-SNAPSHOT.jar" -classpath "$CLASSPATH" $CLASS "$@"
+export HADOOP_CLASSPATH="$CLASSPATH"
+LIBJARS=`echo $HADOOP_CLASSPATH | tr : ,`
+hadoop jar "$GORACI_HOME/lib/goraci-0.0.1-SNAPSHOT.jar" $CLASS -libjars "$LIBJARS" "$@"
@@ -2,7 +2,7 @@
# Default Accumulo properties #
###############################
gora.datastore.default=org.apache.gora.accumulo.store.AccumuloStore
-gora.datastore.accumulo.instance=a14
+gora.datastore.accumulo.instance=test14
gora.datastore.accumulo.zookeepers=localhost
gora.datastore.accumulo.user=root
gora.datastore.accumulo.password=secret

0 comments on commit e6896b5

Please sign in to comment.