Cascalog is hosted at the Clojars maven repo.
Nathan's tech talk at LinkedIn goes through an in-depth example of using Cascalog to perform a complex query on real-world data. Watching this talk in full is highly recommended.
After you've gone through the tutorials, read through the documentation on this wiki.
Cascalog can be run from the REPL on your local machine. In this case Hadoop runs in "local mode" which just means it's completely in process. This is useful for experimentation and for doing local analysis with small datasets.
Cascalog comes with some "playground" datasets which are useful for learning how to use the tool. These datasets are used in the introductory tutorials and you can see them by looking at the
playground.clj file in the Cascalog source.
See this tutorial for information about developing and running a Cascalog query on a cluster.