Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


Ohm: history miner.

Ohm is a repository miner that was developed for use in the our work:

  • Modeling the Ownership of Source Code Topics
    C.S. Corley, E.A. Kammer, and N.A. Kraft
    IEEE International Conference on Program Comprehension (ICPC'12)
    [Tech report]

If you use Ohm for your next work, we would greatly appreciate if you cited our original paper. Please send an email if you do! We would love to read the works using our software.

Note: Ohm is a work in progress!

Software Requirements

Ohm currently requires the following:

  • Python 2.6
    • pysvn
    • psycopg2
    • ANTLR 3.1.2
  • PostgreSQL 8.4
  • Apache Ant

Configuration how-to

Ohm currently reads repository information from a configuration file located at src/python/ohm/ Projects are given as a namedtuple, with parameters: name, url, type, lexers, and parsers.

For example, our Argouml configuration resemble:

    , url='svn://localhost/argouml/trunk'
    , type='svn'
    , lexers={'.java' : [
                        (13020, Java5Lexer)
                      , (8295, Java4Lexer)
                      , (0, JavaLexer)
    , parsers={'.java' : [
                         (0, JavaParser)

Parameters of Project namedtuples


string parameter

This parameter decides what you will pass when you begin mining using the -n parameter. In our example, we used 'argouml', thus you would use python2 -n argouml.


string parameter

This parameter is where the repository is located. We highly recommend making a local copy of the repository you are using, as mining from public servers can be very slow and may get your IP blacklisted.


string parameter

This parameter identifies what kind of repository is located at url. At the moment, Ohm can only access subversion repositories.


dict parameter

This parameter expects a dict with keys being extension types in strings of .ext format, and values being a list of tuples. The tuples are pairs of commit identifiers and lexer classes. In our example, for .java files, we begin with JavaLexer at revision 0 (the default lexer), then at revision 8295, we switch to Java4Lexer. These classes must appear as imports in the

Over time, a repository will begin using new langauge features which may require a new lexer or parser. You will have to identify the commit that begins this switch manually.


dict parameter

This parameter expects a dict with keys and values the same as dicussed above in Lexers.


Basic usage includes three steps:

  1. Begin the database building process:
    • python2 -n argouml -b
  2. Step away, and make yourself a sandwich. (Step 1 mines the entire repository for changes.)
  3. Once building is complete and you have had lunch, generate the ownership profiles:
    • python2 -n argouml -g

By default, your generated profiles will appear in /tmp/ohm/argouml-r#### where #### is the last commit mined.

Note: python2 -h contains full usage details. Some of these parameters are or were experimental and may be broken or of no practical use.


Please see the LICENSE file for further information.

Copyright (c) 2012 The Board of Trustees of The University of Alabama. All rights reserved.


No description, website, or topics provided.




No packages published


You can’t perform that action at this time.