Clone this wiki locally
Cascalog is hosted at the Clojars maven repo.
- Make sure you have java 1.6
Nathan's tech talk at LinkedIn goes through an in-depth example of using Cascalog to perform a complex query on real-world data. Watching this talk in full is highly recommended.
After you've gone through the tutorials, read through the documentation on this wiki.
Cascalog can be run from the REPL on your local machine. In this case Hadoop runs in "local mode" which just means it's completely in process. This is useful for experimentation and for doing local analysis with small datasets.
Cascalog comes with some "playground" datasets which are useful for learning how to use the tool. These datasets are used in the introductory tutorials and you can see them by looking at the
playground.clj file in the Cascalog source.
Running Cascalog queries on a Hadoop cluster
See this tutorial for information about developing and running a Cascalog query on a cluster.