Cassandra Tap for Cascading!
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
src
README.md
cascading.cassandra.xml

README.md

This is an effort to implement a Cassandra Tap for Cascading.

At the moment, only Source is supported, although the tap does automatic type detection from the CF metadata.

Here's the cascading Wordcount example running on Cassandra:

    Tap source = new CassandraTap("127.0.0.1", 9160, "wordcount", "input_words",
      new Fields("line", "bible"));

    Scheme sinkScheme = new TextLine(new Fields("word", "count"));
    Tap sink = new StdoutTap();

    // the 'head' of the pipe assembly
    Pipe assembly = new Pipe("wordcount");

    // For each input Tuple
    // parse out each word into a new Tuple with the field name "word"
    // regular expressions are optional in Cascading
    String regex = "(?<!\\pL)(?=\\pL)[^ ]*(?<=\\pL)(?!\\pL)";
    Function function = new RegexGenerator(new Fields("word"), regex);
    assembly = new Each(assembly, new Fields("line"), function);

    // group the Tuple stream by the "word" value
    assembly = new GroupBy(assembly, new Fields("word"));

    // For every Tuple group
    // count the number of occurrences of "word" and store result in
    // a field named "count"
    Aggregator count = new Count(new Fields("count"));
    assembly = new Every(assembly, count);

    // initialize app properties, tell Hadoop which jar file to use
    Properties properties = new Properties();
    FlowConnector.setApplicationJarClass(properties, Main.class);

    // plan a new Flow from the assembly using the source and sink Taps
    // with the above properties
    FlowConnector flowConnector = new FlowConnector(properties);
    Flow flow = flowConnector.connect("word-count", source, sink, assembly);

    // execute the flow, block until complete
    flow.complete();