Cassandra Tap for Cascading!
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

This is an effort to implement a Cassandra Tap for Cascading.

At the moment, only Source is supported, although the tap does automatic type detection from the CF metadata.

Here's the cascading Wordcount example running on Cassandra:

    Tap source = new CassandraTap("", 9160, "wordcount", "input_words",
      new Fields("line", "bible"));

    Scheme sinkScheme = new TextLine(new Fields("word", "count"));
    Tap sink = new StdoutTap();

    // the 'head' of the pipe assembly
    Pipe assembly = new Pipe("wordcount");

    // For each input Tuple
    // parse out each word into a new Tuple with the field name "word"
    // regular expressions are optional in Cascading
    String regex = "(?<!\\pL)(?=\\pL)[^ ]*(?<=\\pL)(?!\\pL)";
    Function function = new RegexGenerator(new Fields("word"), regex);
    assembly = new Each(assembly, new Fields("line"), function);

    // group the Tuple stream by the "word" value
    assembly = new GroupBy(assembly, new Fields("word"));

    // For every Tuple group
    // count the number of occurrences of "word" and store result in
    // a field named "count"
    Aggregator count = new Count(new Fields("count"));
    assembly = new Every(assembly, count);

    // initialize app properties, tell Hadoop which jar file to use
    Properties properties = new Properties();
    FlowConnector.setApplicationJarClass(properties, Main.class);

    // plan a new Flow from the assembly using the source and sink Taps
    // with the above properties
    FlowConnector flowConnector = new FlowConnector(properties);
    Flow flow = flowConnector.connect("word-count", source, sink, assembly);

    // execute the flow, block until complete