Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Reimplementing TOSBack using git as a database layer!
C Shell Python Perl C++ JavaScript Other
Branch: master

README

This is TOSBack version 2, a clean redesign & reimplementation of EFF's
TOSBack project.

It uses Git as an inherently and efficiently versioned backend storage
database.

After cloning the git repository, you need to execute this command:

git submodule update --init --recursive

That will fetch a recent version of the GitPython code, which we depend upon.

*BUGS IN WGET*

If you want to actually run the crawler yourself (not really necessary unless
you're testing something), be aware that TOSBack2 also exposes a number of
bugs in common versions of wget.  As of December 2011, there are two bugs you
might need to patch yourself!

(FOR YOUR CONVENIENCE, a patched version of the wget source can be found in
lib/wget-1.13.4/ .  There is also a binary .deb that Debian and Ubuntu users
can try in lib/.  More hints on building from source below) 

1. Versions of wget built against
   gnutls may suffer from fatal memory leaks 
   https://lists.gnu.org/archive/html/bug-wget/2011-10/msg00050.html
   (so apply that patch, or build against openssl using ./configure --with-ssl=openssl).

2. You should also apply the following patch 
   https://savannah.gnu.org/support/download.php?file_id=24473
   to fix this bug: https://savannah.gnu.org/bugs/?21714

HINTS FOR BUILDING WGET FROM SOURCE ON DEBIAN OR UBUNTU

sudo apt-get build-dep wget
cd lib/wget-1.13.4/
fakeroot debian/rules binary
an installable .deb file *should* be written to the lib/ directory
Something went wrong with that request. Please try again.