Switch branches/tags
Nothing to show
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
..
Failed to load latest commit information.
cassandra
java
lib
multiple_outputs
pig
scala
README
presentation.pdf

README

PREREQUISITES:
- Cassandra 1.1
- Hadoop 1.0.1+
- JDK 1.6
- Scala 2.9 (if building the Scala job)

1. Create the Cassandra schema with the following command:

         cassandra-cli -h <cass_host> -f cassandra/cassandra_schema.txt

2. Use cassandra/seed_data.txt to populate some sample data (using the same command as above), or add your own data.

3. Increase the max heap size for the Hadoop client in hadoop-env.sh:

         export HADOOP_CLIENT_OPTS="-Xmx1g $HADOOP_CLIENT_OPTS"

4. Copy jars from lib directory into $HADOOP_HOME/lib or otherwise ensure they are on the Hadoop classpath

5. cd into either java or scala, and run the build script to generate the jar file.

6. Execute the run script as follows:

	./run <cass_host> <num_reducers> <keyspace> <input_column_family> <output_column_family>

   Example if running locally using the schema script above:

	./run localhost 1 HadoopTest TestInput TestOutput