Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Reimplementing TOSBack using git as a database layer!

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 GitPython @ 6e86f8a
Octocat-spinner-32 code
Octocat-spinner-32 lib
Octocat-spinner-32 rules
Octocat-spinner-32 web-frontend
Octocat-spinner-32 .gitmodules
Octocat-spinner-32 LICENSE
Octocat-spinner-32 README
README
This is TOSBack version 2, a clean redesign & reimplementation of EFF's
TOSBack project.

It uses Git as an inherently and efficiently versioned backend storage
database.

After cloning the git repository, you need to execute this command:

git submodule update --init --recursive

That will fetch a recent version of the GitPython code, which we depend upon.

*BUGS IN WGET*

If you want to actually run the crawler yourself (not really necessary unless
you're testing something), be aware that TOSBack2 also exposes a number of
bugs in common versions of wget.  As of December 2011, there are two bugs you
might need to patch yourself!

(FOR YOUR CONVENIENCE, a patched version of the wget source can be found in
lib/wget-1.13.4/ .  There is also a binary .deb that Debian and Ubuntu users
can try in lib/.  More hints on building from source below) 

1. Versions of wget built against
   gnutls may suffer from fatal memory leaks 
   https://lists.gnu.org/archive/html/bug-wget/2011-10/msg00050.html
   (so apply that patch, or build against openssl using ./configure --with-ssl=openssl).

2. You should also apply the following patch 
   https://savannah.gnu.org/support/download.php?file_id=24473
   to fix this bug: https://savannah.gnu.org/bugs/?21714

HINTS FOR BUILDING WGET FROM SOURCE ON DEBIAN OR UBUNTU

sudo apt-get build-dep wget
cd lib/wget-1.13.4/
fakeroot debian/rules binary
an installable .deb file *should* be written to the lib/ directory
Something went wrong with that request. Please try again.