Demos of various testing strategies in Cascalog.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

This project contains examples of the right way to test Cascalog workflows using midje-cascalog. The tests and code mirror the discussion at this blog post. All tests can be found here.


I've been working on a Cascalog testing suite these past few weeks, an extension to Brian Marick's Midje, that eases much of the pain of testing MapReduce workflows. I think a lot of the dull work we see in the Hadoop community is a direct result of fear. Without proper tests, Hadoop developers can't help but be scared of making changes to production code. When creativity might bring down a workflow, it's easiest to get it working once and leave it alone.

The antidote to all of this fear is a functional testing suite. As I discussed in Getting Creative with MapReduce, Hadoop workflows are difficult to test at all; testing application logic in isolation of data storage is impossible.

Cascalog is free of this weakness. midje-cascalog allows you to test Cascalog queries as pure functions, both in isolation and as components of more complicated workflows. the resulting tests are truly beautiful.