Permalink
Browse files

initial import

  • Loading branch information...
0 parents commit 1d95b95b38831ff2b58f03e8b5045985598b9240 @keith-turner committed Dec 20, 2011
59 README
@@ -0,0 +1,59 @@
+Accumulo has a simple test suite that verifies that data is not lost at scale.
+This test suite is called continuous ingest. This test runs many ingest
+clients that continually create linked lists containing 25 million nodes. At
+some point the clients are stopped and a map reduce job is run to ensure no
+linked list has a hole. A hole indicates data was lost.
+
+This project is a version of the test suite written using Gora. Theoretically
+it could run against other column stores. Currently I have only tested it at
+scale using Accumulo.
+
+Below is rough sketch of how data is written. For specific details look at the
+Generator code.
+
+ 1 Write out 1 million nodes
+ 2 Flush
+ 3 Write out 1 million that reference previous million
+ 4 If this is the 25th set of 1 million nodes, then update 1st set of million
+ to point to last
+ 5 goto 1
+
+The key is that nodes only reference flushed nodes. Therefore a node should
+never reference a missing node, even if the ingest client is killed at any
+point in time.
+
+When running this test suite w/ Accumulo we also run a script called the
+Agitator that randomly and continuously kills server processes. We found many
+data loss bugs in Accumulo by doing this. This test suite can also help find
+bugs that impact uptime and stability when run for day or weeks.
+
+This test suite consist of a few Java programs, a little helper script to run
+the java programs, and a maven script to build it. To build the code, you may
+need to edit the maven script to point to the gora data store that you want to
+use. Or just use the maven script to build this java code, and copy whatever
+dependencies you need into lib. To compile, do "mvn compile package". The
+current maven build script depends on an unreleased version of Accumulo and an
+un released version of gora-accumulo. Both of these can be downloaded and
+installed in your local maven repo using mvn install.
+
+Below is a description of the Java programs
+
+ * goraci.Generator - A map only job that generates data.
+ * goraci.Verify - A map reduce job that looks for holes. Look at the
+ counts after running. REFERENCED and UNREFERENCED are
+ ok, any UNDEFINED counts are bad. Do not run at the
+ same time as the Generator.
+ * goraci.Walker - A standalong program that start following a linked list
+ and emits timing info.
+ * goraci.Print - A standalone program that prints nodes in the linked list
+
+goraci.sh is a helper script that you can use to run the above programs. It
+assumes all needed jars are in the lib dir. It does not need the package name.
+You can just run "./goraci.sh Generator", below is an example.
+
+ $ ./goraci.sh Generator
+ Usage : Generator <num mappers> <num nodes>
+
+This test suite does not do everything that the Accumulo test suite does,
+mainly it does not collect statistics and generate reports.
+
@@ -0,0 +1,10 @@
+{
+ "type": "record",
+ "name": "CINode",
+ "namespace": "goraci.generated",
+ "fields" : [
+ {"name": "prev", "type": "long"},
+ {"name": "client", "type": "string"},
+ {"name": "count", "type": "long"}
+ ]
+}
@@ -0,0 +1,14 @@
+#!/bin/sh
+
+GORACI_HOME=`dirname "$0"`
+export HADOOP_CLASSPATH=$(JARS=("$GORACI_HOME/lib"/*.jar); IFS=:; echo "${JARS[*]}")
+LIBJARS=`echo $HADOOP_CLASSPATH | tr : ,`
+
+
+PACKAGE="goraci"
+
+CMD=$1
+shift
+
+hadoop jar "$GORACI_HOME/lib/goraci-0.0.1-SNAPSHOT.jar" "$PACKAGE.$CMD" -libjars $LIBJARS $@
+
111 pom.xml
@@ -0,0 +1,111 @@
+<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
+ <modelVersion>4.0.0</modelVersion>
+ <groupId>goraci</groupId>
+ <artifactId>goraci</artifactId>
+ <version>0.0.1-SNAPSHOT</version>
+
+
+ <dependencies>
+ <dependency>
+ <groupId>org.apache.gora</groupId>
+ <artifactId>gora-core</artifactId>
+ <version>0.2-SNAPSHOT</version>
+ </dependency>
+
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>avro</artifactId>
+ <version>1.3.3</version>
+ </dependency>
+
+ <!-- begin dependencies for accumulo.... all needed runtime deps are specified
+ because enabling transitive deps brings in too much junk. Comment out if not
+ using accumulo -->
+ <!-- see https://issues.apache.org/jira/browse/GORA-65 to obtain source
+ for gora-accumulo -->
+ <dependency>
+ <groupId>org.apache.gora</groupId>
+ <artifactId>gora-accumulo</artifactId>
+ <version>0.2-SNAPSHOT</version>
+ <scope>runtime</scope>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.accumulo</groupId>
+ <artifactId>accumulo-core</artifactId>
+ <version>1.4.0-incubating-SNAPSHOT</version>
+ <scope>runtime</scope>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.accumulo</groupId>
+ <artifactId>cloudtrace</artifactId>
+ <version>1.4.0-incubating-SNAPSHOT</version>
+ <scope>runtime</scope>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.thrift</groupId>
+ <artifactId>libthrift</artifactId>
+ <version>0.6.1</version>
+ <scope>runtime</scope>
+ </dependency>
+ <dependency>
+ <groupId>org.apache.hadoop</groupId>
+ <artifactId>zookeeper</artifactId>
+ <version>3.3.1</version>
+ <scope>runtime</scope>
+ </dependency>
+ <!-- end accumulo deps -->
+
+ </dependencies>
+
+ <build>
+ <plugins>
+ <plugin>
+ <artifactId>maven-dependency-plugin</artifactId>
+ <version>2.4</version>
+ <executions>
+ <execution>
+ <id>copy-dependencies</id>
+ <phase>package</phase>
+ <goals>
+ <goal>copy-dependencies</goal>
+ </goals>
+ <configuration>
+ <outputDirectory>lib</outputDirectory>
+ <overWriteReleases>false</overWriteReleases>
+ <overWriteSnapshots>true</overWriteSnapshots>
+ <overWriteIfNewer>true</overWriteIfNewer>
+ <excludeTransitive>true</excludeTransitive>
+ </configuration>
+ </execution>
+ </executions>
+ </plugin>
+
+
+ <plugin>
+ <artifactId>maven-jar-plugin</artifactId>
+ <version>2.3</version>
+ <configuration>
+ <outputDirectory>lib</outputDirectory>
+ </configuration>
+ </plugin>
+
+ <plugin>
+ <artifactId>maven-clean-plugin</artifactId>
+ <version>2.4.1</version>
+ <configuration>
+ <filesets>
+ <fileset>
+ <directory>lib</directory>
+ <includes>
+ <include>**/*.jar</include>
+ </includes>
+ <followSymlinks>false</followSymlinks>
+ </fileset>
+ </filesets>
+ </configuration>
+ </plugin>
+
+ </plugins>
+ </build>
+</project>
@@ -0,0 +1,36 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package goraci;
+
+import goraci.generated.CINode;
+
+import java.io.IOException;
+
+import org.apache.gora.store.DataStore;
+import org.apache.gora.store.DataStoreFactory;
+import org.apache.hadoop.conf.Configuration;
+
+/**
+ *
+ */
+public class Clear {
+ public static void main(String[] args) throws IOException {
+ DataStore<Long,CINode> store = DataStoreFactory.getDataStore(Long.class, CINode.class, new Configuration());
+ store.truncateSchema();
+ store.close();
+ }
+}
Oops, something went wrong.

0 comments on commit 1d95b95

Please sign in to comment.