OCaml may seem to be a weird choice. It's not a very popular language. Nevertheless in the "static analysis world", it is one of the most popular langage. Researchers in Berkeley and Stanford are using OCaml for performing analysis of C code (e.g. Necula with CIL at Berkeley, Aiken with Saturn at Stanford), some type inferers for Ruby are written in OCaml (DRuby), the Windows Device verifiers (based on work from Microsoft Research by Thomas Ball on SLAM) is written in OCaml, people in the Linux kernel are using the Coccinelle tool to perform some complex refactoring and to find bugs, also written in OCaml. In fact one of the latest programming language of Microsoft, F#, is directly inspired (one could say copy) from OCaml.
OCaml is good for code analysis. Trust me.
You can use the code in lang_lisp/ (the simplest language) as a source of inspiration. Then do in order:
a simple lexer (in lang_xxx/parsing/lexer_xxx.mll)
a -tokens_xxx command line action
skeleton for -parse_xxx
an AST with just the tokens (in lang_xxx/parsing/ast_xxx.ml)
a token-based highlighter => instant gratification (in lang_xxx/analyze/highlight_xxx.ml and few modifications in visual/parsing2.ml)
mostly keywords highlighting a la emacs
highlight entities (function definitions, globals)
a simple grammar and parser (in lang_xxx/parsing/parser_xxx.mly)
an ast-based highlighter => can convey more information (again in lang_xxx/analyze/highlight_xxx.ml)
a simple global analysis to compute the most important functions (in lang_xxx/analyze/database_light_xxx.ml)
a semantic-based highlighter => can convey even more information (in visual/analyze/highlight_xxx.ml)
Last edited by pad,