Where's the Party?
Where's the Party is a scalable, censorship-resistant mirror network for the web. It aims to make mirroring more accessible for both clients and hosts, allowing several orders of magnitude greater participation. WTP is different from previous mirroring efforts because it works entirely with software users already have - no additional packages need to be installed for either clients or hosts.
Where's the Party enables the distribution of static HTML content to users on censored networks. It aims to be resistant against several forms of censorship: host/IP blocking, content-aware filtering (DPI) and legal takedowns (DMCA).
The extremely low rates of uptake for browser plugins and other installable software is well known. At the height of the Egypt crisis, usage of TOR skyrocketed - to just over 2000 users, out of a population of 83 million. Peer-to-peer solutions such as BitTorrent often fail the "grandmother test", restricting access to the technologically savvy. For clients, WTP leverages the familiar web browser that users already have installed. It will support all browsers back through IE6 - a particular benefit in China, where IE6 usage rates remain as high as 35%.
Working on the streisand.me mirror project, we observed similar problems for volunteers who wish to host a mirror. Apache configuration requires a fair degree of technological ability, and is simply not available for many popular hosting platforms (S3, Dropbox). The sheer size of content, often including videos, is an obstacle to hosting. For example, the HBGary leaks site was 9 GB in size. While not enormous, a tarball this large will take several hours to download and may exceed the capacity of many hosting plans. WTP makes mirroring easier by serving content from any directory on a dumb web host. An individual host may mirror only a portion of the total content.
Some features of an ideal mirroring system need to be sacrificed for greater accessibility. There is no support for dynamic content such as a blog or CMS (though this may be mitigated by static snapshots). WTP does not provide anonymity or privacy, leaving clients who surf mirrors vulnerable to detection. Such concerns are best left to special-purpose tools like TOR & VPNs.
WTP does not specify how to securely find an initial entry node. Rather, users find the network the same way as any other website - from a link, a URI written on a sidewalk, or a message from $DEITY (navigation between nodes is cryptographically verified). I call this approach a "rumor net" - as a user, you find the party via out-of-band channels. Once there, you can safely stay at the party and tell others where it is.
Finally WTP may result in a slightly slower browsing experience, as resources may need to be loaded multiple times. This is faster than completely unavailable, and therefore acceptable.
WTP is resistant to a variety of different threats. As an entirely open system, some attacks cannot be completely prevented. In these cases, WTP aims to raise the technical and economic costs of disabling the mirror network.
Censors may cause content to be removed from a host using the legal system, such as DMCA takedown notices. WTP makes such takedowns impossible or costly, by utilizing a large number of hosts in multiple legal jurisdictions (some of which may be immune to takedowns). Attempts at domain seizure face similar challenges, and are further complicated by the co-hosting of targeted and innocuous content.
Hostname & IP blocking
Network operators may block traffic to particular IP addresses or hostnames. WTP resist such coarse filtering with the use of a large number of hosts. The ease with which new mirrors may be added should help party operators stay ahead of censors. Standby mirrors may be pre-loaded with content but not published, ready to be brought online quickly if the need arises.
Network operators can block traffic using particular protocols (most commonly SSL, but also TOR & VPN). This form of coarse filtering has been observed in China and Iran. WTP only utilizes plain HTTP, making such efforts ineffective.
Deep packet inspection
Network operators have blocked traffic based on semantic content using deep packet inspection. Traffic can be blocked or monitored on the basis of particular keywords, such as "revolution". WTP obfuscates content through the use of encryption and filename mutation. All filenames are replaced with hashes. Each instance of a mirrored resource is encrypted with a different key, so that filtering on the basis of ciphertext is no longer possible. In the unlikely (expensive) event that HTTP traffic is monitored for anything that looks like ciphertext, steganography could be used to further hide encrypted content.
Network operators could attempt to hijack a browser by injecting false payloads into the data stream (changing content) or by redirecting a session to adversary-controlled hosts. All resources used by WTP are cryptographically signed, making this impossible.
Since WTP is an open source protocol and software, attackers can easily learn how the system operates. This knowledge can be used to block traffic based on WTP's unique signature. However, such efforts would require replicating the full WTP protocol. Doing so would be extremely expensive, both technically and economically, as the protocol is both stateful and uses multiple levels of encryption. It is extremely unlikely that an adversary would apply such analysis to all HTTP traffic on their network.
Honeypots and Monitoring
Adversaries may run mirrors as honeypots to gather information on visitors. Similarly, mirrors which are detected can be monitored rather than blocked. WTP does not address this problem directly. Party creators can choose to only host content with trusted volunteers. Users will also be notified about the possibility of monitoring.
How it Works
The content of a website is copied across the mirror network. An individual node may host only a fraction of the total pages; some resources, such as CSS or JS may be present on all mirrors. Each mirror has a list of the root URIs for some (not all) of the other nodes, and the public half of a keypair (the "verification keys"). A cryptographic signature is stored next to each resource (index.html.sig).
Several errors are possible:
- the resource may not exist on the current server (404)
- the current server timeouts. This can occur if the user leaves a browser window open and the node is taken down or blocked.
- the verification signature is invalid
For (3), the user is alerted via popup, and given the option to load the resource from the current host or from a different node. XXX user choice here is lame
Images and Binary Resources
Images and other binary resources, including PDFs, videos, etc. pose a challenge. Signature checking code cannot be executed by such resources. To compensate, binary data may be embedded directly in HTML using data: URIs or MHTML for older versions of Internet Explorer. Further investigation is needed to determine if these methods can be used for all binary formats, such as video and audio.
Alternately, PDFs could be converted to HTML using pdftohtml. Large files such as video pose a particular challenge, as the entire content must be loaded into memory to perform signature checks.
An inner layer of encryption uses an unique keypair (the "instance keys") for each instance of a document on a mirror; no two copies of a resource have the same instance key. This guarantees that the ciphertext sent over the wire by a particular mirror for a given resource are different than those sent by any other mirror. The private instance key is prefixed to the ciphertext.
An outer layer of encryption uses a unique keypair (the "resource keys") for each document. The private key is appended to the anchor (hash) of URIs referring to the resource. It is transmitted in documents that link to the resource, but not with the resource itself. As anchors are not transmitted by browsers in HTTP requests, this outer encryption further complicates filtering. Censors can no longer examine HTTP requests in isolation to detect WTP traffic, as would be the case if only the inner encryption is used. Rather, they must run a complete, stateful implementation of WTP.
Note these techniques provide only obfuscation, not security (as publicly-accessible mirrors have the private keys). It may be possible to detect the presence of ciphertext sent over HTTP (by looking for a high degree of randomness); steganography could be employed in this case.
Proof of Authorship
A collapsible sidebar or dropdown widget can be optionally added to each page, with an explanation of the WTP technology, information about the party creator and keypairs in use, how to volunteer to host a mirror, etc..
To protect content authors, WTP can optionally purge identifying metadata from content (EXIF, PDF author, etc.).
As the website is highly likely to be blocked, its use is entirely optional. However, as content creators need access to BitTorrent, not the site itself, this problem is somewhat mitigated.
By signing the content tarball using author keys (described in Proof of Authorship), the party creator gains the ability to update content in the future. To update a party, the author creates an update tarball with new/changed files and a manifest of deletions. This file is signed using the author private key, and the tarball and signature are served through BitTorrent as described above. mirrorparty.org can download this new torrent, verify the signature and update the mirrors as necessary. Note that the public author key can be included in the torrent and need not be uploaded to an external keyserver.
Several difficulties arise from a fully-automated mirroring system. There may be more content than hosting space available. Some content may expose mirror owners to local legal or political liability. The existence of free storage is an attractive target for spammers and trolls.
These problems can be mitigated with the use of collaborative decision making systems (a la Reddit). A small subset of content from a potential party will be unpacked and served to browsers (either by direct hosting or on nodes willing to host unreviewed content). Users can help provide a brief description and other metadata (political relevance, legal risks), as well as flag potential parties as spam or inappropriate. They will be able to vote on whether that content should be mirrored on mirrorparty.org. Additional weight will be given to the votes of users who:
- provide more mirror space (logarithmic, so that small mirrors are not overwhelmed)
- have a longer history of mirroring (again logarithmic, so that new users are not automatically outvoted)
- mirror content on under served countries, languages and topics
- mirror under-replicated content (see below)
The actual content mirrored on a particular node is left up to that node's owner. Volunteers may allocate space to parties selected by the community, subject to constraints they specify (i.e., "exclude content that is legally risky in my jurisdiction"). Alternately, they may prefer individual parties, authors, topics or countries. Extra voting weight will be given to volunteers who mirror scarce (i.e., under-replicated) content.
System administrators may set reasonable limits on the number of mirrors for popular parties. For example, the world probably doesn't need any more WikiLeaks mirrors at present.
Other Content & Services
mirrorparty.org will provide a list of known parties, instructions on how to use the software and links to information about communications safety. It could run a spider as described in Health Checks and use the reports to improve the redundancy of the networks it manages. Note that mirrorparty.org will not host parties itself, as this would significantly increase its exposure to legal and technological threats.
For mirrorparty.org, the main site could be written in Django or another of the many Python web frameworks. Screen scrapers for The Pirate Bay would be written with standard library modules, lxml or scrapy. The original BitTorrent client could be used for downloads. For uploading to mirrors, there is ftplib for FTP, paramiko for ssh/sftp, pysync for rsync. (several alternatives available for all of these, including wrappers around commandline utilities). scipy and NLTK can be used for automated language and topic identification, and spam filtering. Google Translate links will be present on sample pages.
- Is there a better domain than wherestheparty.net? All the good ones are taken.
- Are there other ways of getting content into mirrorparty.org? Searching for tags/links/named files on Google, file hosting services or links on pastebins perhaps?
- Elliptic curve DSA would be preferable to RSA, but SJCL doesn't currently support it.
- mirrorparty.org could generate tarballs on demand for users who do not want to supply login credentials. This makes updating their mirror lists more difficult, but maybe a small mirror-list-update script could be provided.
- Should mirrorparty.org have a keypair so that tarballs can be transmitted to it securely? Motivation is to prevent content filtering on upload to a file hosting site.
- Things may simplified by using a single entry point URI on each mirror and referencing individual documents using anchors (a la Gmail or Twitter). Mirror hopping (switching between entry points on several mirrors) is desirable here, as the browser's back button can be used to find a working mirror if the current one goes down. May also help with memory consumption/leak issues.