The forma-clj project is a FORMA implementation written in the open-source Clojure programming language.
At the heart of this implementation is Cascalog, a fully-featured data processing and querying library for Clojure. It lets us process FORMA data in a reliable, scalable, and distributed way using MapReduce badassery courtesy of Hadoop. And without the hassle!
Head to the Project Wiki for more details.
FORMA stands for Forest Monitoring for Action (FORMA) and it uses freely available satellite data to generate rapidly updated online maps of tropical forest clearing, providing useful information for local and national forest conservation programs, as well as international efforts to curb greenhouse gas emissions by paying to keep forests intact.
FORMA was originally a project of the Center for Global Development, an economics think tank in Washington, DC. It is now part of World Resources Institute's Global Forest Watch. WRI is an environmental think tank based in Washington, DC.
To get started, you'll need to install a few tools, but it's painless.
- forma-clj (this project)
- Leiningen (Build tool for clojure, located on github)
- GDAL (translator and processing library for working with geospatial data formats)
- Plugins
Fire up your command line and:
git clone https://github.com/sritchie/forma-clj.git
cd forma-clj
Next install Leiningen, the build tool for Clojure. These instructions are copied from the Leiningen README:
- Download this script which is named
lein
- Place it on your path so that you can execute it. (I like to use
~/bin
) - Set it to be executable. (
chmod 755 ~/bin/lein
)
OK, so forma-clj
requires GDAL 1.8.0's native java bindings. GDAL (pronounced "guhdal") is a translator and processing library for working with geospatial data formats. The native bindings can be a bit of a pain to acquire, but they must be built for the system you plan on using.
If you're using Linux though, we made it easy!
- Download the native bindings
- Decompress them into a directory like
/opt/linuxnative
export LD_LIBRARY_PATH=/opt/linuxnative
Finally, install the plugins using the lein
command. This part's easy!
lein plugin install swank-clojure "1.4.0-SNAPSHOT"
lein plugin install lein-marginalia "0.6.1"
lein plugin install lein-midje "1.0.7"
And then, just run lein deps
to download the dependencies, and run lein deps
a second time to install them.
And you are DONE. As a sanity check, try compiling via lein compile
.
See the forma-deploy project.
For project task management, use the Pivotal Tracker.
;; TODO: Run the hansen and vcf special dataset stuff, for diff ;; between big-set and little-set. ;; ;; TODO: Run the ecoid special dataset stuff. ;; ;; TODO: Re-run all timeseries -- might have to jack up the open file ;; limit. ;; ;; TODO: Re-run forma for more countries!
;; :BGD :LAO :IDN :IND :MMR :MYS :PHL :THA :VNM :BOL :CHN :CIV
;; hadoop jar /home/danhammer/forma.jar ;; forma.hadoop.jobs.preprocess.PreprocessAscii "border" ;; /user/hadoop/border.txt s3n://pailbucket/rawstore/ "[11 10]" "[12 ;; 11]" "[11 11]" "[30 7]" "[27 5]" "[28 6]" "[29 7]" "[27 6]" "[28 ;; 7]" "[26 6]" "[27 7]" "[24 5]" "[25 6]" "[26 7]" "[24 6]" "[25 7]" ;; "[24 7]" "[25 8]" "[17 8]" "[11 9]" "[12 10]"
Added integration for booting spot emr clusters, based on our usual configurations. I think these will work with gdal as well. This is nice, as it'll give us cluster compute support, and bump the number of machines we can use way up.
# This needs Homebrew: http://mxcl.github.com/homebrew/
brew install cloc
# Source Lines of Code:
cloc src/ --force-lang="lisp",clj
# Test Lines of Code
cloc test/ --force-lang="lisp",clj
FORMA's making it happen in 2011. Clojure, Cascalog, Hadoop... What the hell? Head to the develop branch or the Project Wiki for more details.
FORMA was originally a project of the Center for Global Development, an economics think tank in Washington, DC. It is now part of World Resources Institute's Global Forest Watch. WRI is an environmental think tank based in Washington, DC.