A repository for work on analyzing open source mailing lists.
Nothing much formalized, but here's what we've got:
This directory contains some mbox dumps from Mozilla mailing lists.
Mozilla has a lot of Forums. Each is mirrored in Mailman, a Newsgroup, and a Google Groups.
The dumps in this directory were retrieved using the nntp-pull
tool that's part of the [sinntp](http://manpages.ubuntu.com/manpages/lucid/man1/sinntp.1.html)
package.
This directory has a couple basic scripts I've been playing with or trying ot port over.
-
news-sources.perl
A script from Mozilla's Gervase Markham that gets the subscription information from the list. -
Threader.java
Jamie Zawinski's code for message threading, ported from C in 1997. See his description of the algorithm. -
thread.py
An (incomplete!) port of the Threader algorithm to Python. -
timeseries.py
Plots a histogram of the timing of posts to an email message.
Contains a couple saved visualizations outputed by analysis scripts. Currently, time series histograms of mailing lists.
Some attempts to use iPython notebooks for this wor.
Some scripts, analysis, and charts from a related study using version control data.