Analyze how a Git repo grows over time
Python Shell
Latest commit 56ef43e Feb 17, 2017 @ferologics ferologics committed with Add --upgrade for convenience (#31)

README.md

travis badge

Some scripts to analyze Git repos. Produces cool looking graphs like this (running it on git itself):

git

How to run

  1. Run git clone https://github.com/erikbern/git-of-theseus and cd git-of-theseus
  2. Run virtualenv . and then . bin/activate (optional, only if you don't want to install the dependencies as root or in your local pip installation folder)
  3. Run pip install -r requirements.txt to install dependencies
  4. Run python analyze.py <path to repo> (see python analyze.py --help for a bunch of config)
  5. Run python stack_plot.py cohorts.json which will write to stack_plot.png
  6. Run python survival_plot.py survival.json which will write to survival_plot.png (see python survival_plot.py --help for some options)

If you want to plot multiple repositories, have to run python analyze.py separately for each project and store the data in separate directories using the --outdir flag. Then you can run python survival_plot.py <foo/survival.json> <bar/survival.json> (optionally with the --exp-fit flag to fit an exponential decay)

Help

AttributeError: Unknown property labels – upgrade matplotlib if you are seeing this. pip install matplotlib --upgrade

Some pics

Survival of a line of code in a set of interesting repos:

git

This curve is produced by the survival_plot.py script and shows the percentage of lines in a commit that are still present after x years. It aggregates it over all commits, no matter what point in time they were made. So for x=0 it includes all commits, whereas for x>0 not all commits are counted (because we would have to look into the future for some of them). That means the total percentage can go up occasionally.

You can also add an exponential fit:

git

Linux – stack plot:

git

This curve is produced by the stack_plot.py script and shows the total number of lines in a repo broken down into cohorts by the year the code was added.

Node – stack plot:

git

Rails – stack plot:

git

Other stuff

Markovtsev Vadim implemented a very similar analysis that claims to be 20%-6x faster than Git of Theseus. It's named Hercules and there's a great blog post about all the complexity going into the analysis of Git history.