Part of this project runs cg-llvm on the LLVM IR test code that comes with the library. In retrospect, cg-llvm was a complete waste of time, fiddling around with its inane source code. The LLVM library comes with a parser for IR, so why rewrite it? However, I learned how to use packrat parsers to parse grammars, so that's a plus.
The LLVM manual is in HTML. I used plump to extract grammar rules for LLVM IR sinces there was no pure specification anywhere. I think this method or parsing the documentation can be generalized to other projects. For example, parsing the documentation of FFMPEG in order to generate C Bindings. Normally one would use Clang or c2ffi, but what if it is not available? What happens when projects have a regularized documentation, and no other way to extract information, so you need to parse documentation in order to create wrappers?