codemap codegraph oud 2013 notes
Clone this wiki locally
-*- org -*-
pfff: PHP program analysis at FB
- deadcode reaper (integrated with code review process)
- use/def checker (undefined function/class/constant)
- variable checker (unused variables)
- sgrep lint (make it easy for people to write lint rules)
- tainting analysis via abstract interpreter (XSS, code injection)
#- type inference
- type checker daemon (CUFP talk julien)
- separation logic? (monoidics ocaml startup acquisition)
#- datalog interprocedural analysis?
tools to help understand large codebase
subtitle: codemap, codegraph, codequery
terrified when I joined, 5M PHP, IMHO badly organized visualization + program analysis 30’ screen
- huge challenge, understand large codebase, code written by other people in fact common solution for people: start from scratch, there has to be tools to help
first thing I did, try to visualize the whole codebase. If can not visualize the mess, cannot understand it
google maps for code treemap, color = aspect, e.g. test code (=> visual clue test coverage), core code help see huge subdirectory => I actually deleted 1M Loc at FB with that :) ##treemap code oriented, filter .o, etc also skip code as you zoom in, render the content of the file
precise identifier highligting, bad smells, tiling multi column (xmonad wm), use eyes to scroll emacs integration semantic feedback (bigger road = bigger functions, more important)
- pattern of purple => type defs not even have to open the files, see already an overview of its content use your eyes to scroll, not keys
layers (age, number of authors, coverage, cyclomatic complexity, etc) layer nb users?
the more I was using it, the more I realized I wanted to understand the “software architecture”. focus not on source code, but on code relationships! what are entry points? What are core code, What is all the code depending on that, etc
package mode, external mode, module mode, gephi, flibotomy, lots of tentatives. graph? I tried but does not scale, and need flexibility, visualize a different granularity, different focus => DSM.
left, top, same, number when x use y (call, import, etc), aggregated. hypertree
good structure = layer = empty upper right (enforced by ocaml linker actually) core code at the top, entry points at the bottom unfold reslice see patterns more easily, visualize the mess (can’t fix what can’t easily see), usually when backward deps => ugly hacks, things that need to be documented anyway
PLUG: NP problem reorganize minimal so that more layers
hard to see value, but when plan to change something, I look at deps quickly, help evaluate difficulty.
demo codemap + codegraph
more semantic feedback in codemap
layer bottomup (good macro level, good also micro level) layer nb users?
uses, users, file level (dead code) uses, users, fine grained level, e.g. fields (when something is immediate (not running git grep), ca change la donne, can scroll set of fields and immediately see if used or not, where,)
reslice => focus on current task
other tool to help understand, navigate
- codequery, prolog on codebase (e.g. call(X, ‘foo’), not (children(X, ‘bar’))) (some nice queries by engineer, e.g. yesterday abe)
- stags (many people needs that)
- sgrep/spatch (fuzzy level) also for sgreplint
syncweb? it also helps for large codebase understanding in some sense ocamltarzan?
PLUG: uncaught exceptions are a recurring pb, in cron especially, change something and boom, later have to capture it
ocaml side notes
side note during the talk about ocaml complaints or positives stuff:
- ocamltarzan, visitor, mapper, dumper (thx to ‘a. )
- ocaml.ml generic dumper, gazagnaire
need ocamltarzan, -dump_xxx ast useful for beginners. need syncweb? :)
fil rouge: Huge codebase, terrified => set of tools to help. idea: intuition maybe visual could help, huge screen, make use of it
google maps on code => treemap + code thumbnails
- when fully zoomed, column layout, maximimze space
better than emacs, identifier coloring syntactical use of refs := or <- in big and purple light db => semantic info, important stuff layers
codegraph, focus on deps, understand global orga, software architecture understand “layers”
QUESTION for audience: tool to help find better orga, NP complete probably, but heuristics? minimize elts in upper triangle, and property is hierarchical orga so operations are move parent, move children. monte-carlo?
last iteration: codemap + codegraph integration
- layer bottom up, (todo special key to see what is the structure)
- file level deps, refocus treemap!
- fine grained level
bottom up layer is super nice.
- prolog queries bill, scale of code => need tools to automate. all classes implement I of children => can put up in parent.
- sgrep, spatch
- precise tags
demo on ocaml source code, they are familiar with that.
4 years at FB
take what I presented at IRISA? Lessons learned :) Engler work … wait first need to detect easy bugs.
fil rouge: Huge codebase, terrified => set of tools to help. (reaper, t, coverage, lint, … codemap … codegraph)
stat #lines removed stat #bugfixes diffs stat #lint rules, sgrepLint pfff_logger stats?
tags (stats? hsh on all server and look if www has a symlinks to TAGS?) prolog sgrep, spatch (git log and search for codemod/spatch ?) codemap, codegraph software architecture!!
failures: codemap because X11? => pfff-web. weird, but small barrier and boom, they dont use. also no action. FBIDE focused on a few core things, that was useful, especially for beginner (search entity + string search, completion, goto defs, no config for good color) failures? codegraph need X11? not enough marketing? => pfff-web better? we have no soft architect af FB.
success: prolog, spatch, tags
lessons: need push for your idea a lot. Cf hack.
see google’s paper, similarity: cmf -n
skip_code => big improvment on my process. gradual fixing made tractable and reviewable.