Fight Internet censorship with user-generated content
Switch branches/tags
Nothing to show
Clone or download
Pull request Compare This branch is even with sburnett:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Collage: Fight Internet Censorship with User-Generated Content

Collage is a new platform for constructing censorship-resistant applications. This document gives an overview of how to build new applications using Collage.

For general information on Collage, please see

What Collage Provides

Collage provides a framework for constructing censorship-resistant message channels, by storing data inside user-generated content. Some examples of applications you could build using Collage:

  • Publish daily news articles hidden inside photos on Flickr.
  • Send secret emails hidden inside your tweets on Twitter.
  • Distribute Tor bridge node addresses using blog posts.

What Collage Does Not Provide

Collage only provides tools that you can use to build these message channels. It doesn't by itself allow for communication. Each application using Collage must provide:

  • A set of user-generated content (e.g., photos on Flickr, videos on YouTube, etc.)
  • A mechanism for storing data inside this user-generated content. We provide some example techniques, e.g., for storing data inside JPEGs.
  • A set of tasks that upload and download content from user-generated content hosts

This may look like a lot of work. Luckily, we include examples of these components with Collage that you can modify for your application. Collage uses these components to construct a custom message channel for your application.

What's Included in This Source Tree

This is a brief summary of the code tree.

  • collage/: The core of Collage.
  • collage_apps/: A collection of sample Collage applications.
    • proxy/: An application for fetching news articles stored hidden in user-generated content.
      • taskmodules/: Pluggable modules for fetching vectors on content hosts.
    • vectors/: For embedding messages inside user-generated content vectors (e.g., JPEGs, tweets, etc.).
    • providers/: For obtaining sets of vectors, e.g., from a directory, from generous users, etc.
    • tasks/: For uploading and downloading vectors to and from user-generated content hosts.
  • collage_donation/: A system for accepting donations of user-generated content from end-users.
    • server/: The server backend for the donation system, which stores vectors while they are encoded with censored messages.
    • client/: The donation clients, which accept donations (e.g., photos) from generous users.
      • flickr_web_client/: A Web application that accepts photo donations from Flickr users.
      • tk_client/: A standalone GUI application that accepts photos from Flickr users.

Building Collage Message Channels

This section describes how to build a Collage message channel. You must specify three components:

  • A Vector, which represents user-generated content (e.g., photos, videos, etc).
  • A VectorProvider, which obtains Vectors from an external source.
  • A set of Tasks that upload and download Vectors to and from user-generated content hosts (e.g., Flickr, YouTube, etc.)

These components fit together as follows:

  • One user (the "sender") composes a message and stores it inside a set of Vectors he obtains from a VectorProvider.
  • The sender executes the send method of some Tasks to upload his Vectors to a user-generated content host.
  • Later, another user (the "receiver") executes the receive method of the same Tasks to download the sender's Vectors from the content host.
  • The receiver recovers the sender's message from the downloaded Vectors.

Vector, VectorProvider and Task are Python classes which I now describe in more detail. I encourage you to also look at the source code for these classes.


A Vector represents user-generated content, e.g., a single JPEG image. Aside from storing vector's data (e.g., an in-memory representation of the JPEG), the Vector must provide an encode method that hides a message inside the Vector (or throws an exception if that message will not fit). It must also provide a corresponding decode method to extract the hidden message.

Typically these encode and decode methods are implemented using an information hiding algorithm like steganography or watermarking. For example, we use OutGuess to store messages inside JPEG data.

Important: The encode method must store the message inside the vector without altering its visual appearence. Otherwise, it would be easy to detect uses of Collage and make it much easier to block.

  • Examples: collage_apps/vectors
  • What you must do: Provide a new Vector subclass, with encode, decode and is_encoded methods.
  • For more information: collage/


A VectorProvider obtains Vectors from somewhere. This can be from a directory on disk, a photo sharing program such as iPhoto, from an external source such as donations from Flickr users, or pretty much anywhere else.

Typically, when developing a Collage application you use DirectoryVectorProvider to fetch Vectors from a local repository of vectors. However, when you deploy your application in the real world, you will need to get your Vectors from somewhere else.

  • Examples: collage_apps/providers, specifically
  • What you must do: Provide a new VectorProvider subclass, with a _find_vector method.
  • For more information: collage/


A Task uploads and downloads Vectors from a user-generated content host, e.g., Flickr. Each Task performs a specific action on a content host that yield some Vectors, for example:

  • Search for "blue flowers" on Flickr and download the top 20 resulting photos.
  • Download the videos from user "JohnDoe"'s on YouTube.
  • Look at trending tweets on Twitter.

Each Task has two key methods: send and receive. send uploads a Vector to a user-generated content host, while receive downloads a set of Vectors from that content host, some of which might contain Collage data.

Important: For each Task, the Vectors uploaded by send must be accessible by receive. For example, if send uploads pictures of "pink dinosaurs" to Flickr, then receive should, e.g., perform a keyword search for "pink dinosaurs".

To construct these tasks, we recommend you use Selenium, a Web automation framework bundled with Collage. The example in collage_apps/proxy/taskmodules/ demonstrates how to do this.

  • Examples: collage_apps/tasks, collage_apps/proxy/taskmodules
  • What you must do: Provide new Task subclasses, with send, receive and can_embed methods.
  • For more information: collage/ (the Task class, near the bottom of the file.)