Cascalog allows you to query Hadoop in Clojure with an expressive language inspired by Datalog. Follow the getting started steps, check out the tutorial, and you'll be running Cascalog queries on your local computer within 5 minutes.
Cascalog also features a wrapper around Cascading to define dataflows in cascalog.workflow . Custom operations defined in Cascalog can be used both for Cascalog queries and Cascalog dataflows.
- Make sure you have java 1.6
- export JAVA_OPTS=-Xmx768m
- install leiningen
- git clone git://github.com/nathanmarz/cascalog.git
- cd cascalog && lein deps && lein compile-java && lein compile
- optionally run "lein test" to make sure tests pass
- Introducing Cascalog
- New Cascalog features: outer joins, combiners, sorting, and more
- News Feed in 38 lines of code using Cascalog
Running Cascalog queries on a Hadoop cluster
- Cascalog includes hadoop as a dependency so that you can experiment with it easily. Don't include Hadoop jars within your jar that has Cascalog.
- Cascalog requires Cascading 1.1
- Any custom operations must be compiled into the jar you give to Hadoop for running jobs
Questions or Concerns?
Google group: cascalog-user
IRC: Come chat in the #cascading room on freenode
Cascalog is based off of a very early branch of cascading-clojure project. Special thanks to Bradford Cross and Mark McGranaghan for their work on that project. Much of that code appears within Cascalog in either its original form or a modified form.