This is an effort to implement a Cassandra Tap for Cascading.
At the moment, only Source is supported, although the tap does automatic type detection from the CF metadata.
Here's the cascading Wordcount example running on Cassandra:
Tap source = new CassandraTap("127.0.0.1", 9160, "wordcount", "input_words", new Fields("line", "bible")); Scheme sinkScheme = new TextLine(new Fields("word", "count")); Tap sink = new StdoutTap(); // the 'head' of the pipe assembly Pipe assembly = new Pipe("wordcount"); // For each input Tuple // parse out each word into a new Tuple with the field name "word" // regular expressions are optional in Cascading String regex = "(?<!\\pL)(?=\\pL)[^ ]*(?<=\\pL)(?!\\pL)"; Function function = new RegexGenerator(new Fields("word"), regex); assembly = new Each(assembly, new Fields("line"), function); // group the Tuple stream by the "word" value assembly = new GroupBy(assembly, new Fields("word")); // For every Tuple group // count the number of occurrences of "word" and store result in // a field named "count" Aggregator count = new Count(new Fields("count")); assembly = new Every(assembly, count); // initialize app properties, tell Hadoop which jar file to use Properties properties = new Properties(); FlowConnector.setApplicationJarClass(properties, Main.class); // plan a new Flow from the assembly using the source and sink Taps // with the above properties FlowConnector flowConnector = new FlowConnector(properties); Flow flow = flowConnector.connect("word-count", source, sink, assembly); // execute the flow, block until complete flow.complete();