This is the code for my undergraduate thesis at Wesleyan University, "A Model of Outbound Client Behavior on the Tor Anonymity Network." The abstract for the paper is given below:
Tor is a popular low latency anonymity system. It works by routing web traffic through a series of relays operated by volunteers across the globe. Academic research on Tor frequently involves proposing new attacks on the network, creating defenses against these attacks, and designing more efficient methods for routing traffic. For ethical and practical reasons, it is often necessary to perform this research on a simulated version of the live Tor network. ExperimenTor and Shadow are two simulation environments that realistically model many aspects of the extant network. Both of these environments, however, only offer crude models of the outgoing web traffic generated by actual Tor users. In this thesis, we seek to improve on these models by collecting traffic data from the live network, clustering it into groups with similar behavior, then training a Hidden Markov Model on each cluster.
The code here was used to process, visualize, and model time series data collected from a live Tor router. Though the workflow is currently specific to my thesis, this code could definitely be adapted into a general purpose tool for exploring and modeling time series datasets.
Available on Pip:
- scikit-learn 0.13.1+
- scipy 0.10+
- ghmm (http://ghmm.org)