Replicating HTTP Proxy Server
Python Shell
Latest commit 46167f7 Dec 28, 2013 Gertjan van Zwieten merge release notes with changelog
Failed to load latest commit information.
.gitignore add .swp ignore Dec 28, 2013
Cache.py changed umask to 022 Jul 16, 2013
LICENSE Initial commit Dec 28, 2013
Params.py changed umask to 022 Jul 16, 2013
Protocol.py
README.md switch from list to code block Dec 28, 2013
Request.py
Response.py changed umask to 022 Jul 16, 2013
changelog.md merge release notes with changelog Dec 28, 2013
design.md rename txt to md, add design.md Dec 28, 2013
fiber.md rename txt to md, add design.md Dec 28, 2013
fiber.py changed umask to 022 Jul 16, 2013
http-replicator changed HttpRequest to iterator; other objects will follow Feb 9, 2008
unit-test reintroduced flat mode Jan 1, 2008

README.md

HTTP Replicator

HTTP Replicator is a general purpose caching proxy server written in python. It reduces bandwidth by merging concurrent downloads and building a local 'replicated' file hierarchy, similar to wget -r. The cache will also be accessible through a web interface; currently unsupported.

The following example session demonstrates basic usage.

~$ mkdir /tmp/cache
~$ http-replicator -r /tmp/cache -p 8888 --daemon /tmp/replicator.log
[process id]
~$ http_proxy=localhost:8888 wget http://www.python.org/index.html
100%[====================================>] 15,978
~$ find /tmp/cache
/tmp/cache
/tmp/cache/www.python.org:80
/tmp/cache/www.python.org:80/index.html

Replicator has reasonable defaults for all its settings, which means it can be run without command line arguments. In that case it will listen at port 8080, will not detach from the terminal, and takes the current directory as root. Files are cached in top directory host:port, where port defaults to 80 for http and 21 for ftp, and a trailing path corresponding to the url. The following arguments can be used to change this default behaviour:

-h --help
 Show this help message and exit.

-p --port PORT
 Listen on this port for incoming connections, default 8080.

-r --root DIR
 Set cache root directory, default current.

-v --verbose
 Show http headers and other info

-t --timeout SEC
 Break connection after so many seconds of inactivity, default 15

-6 --ipv6
 Try ipv6 addresses if available

--flat
 Flat mode; cache all files in root directory (dangerous!)

--static
 Static mode; assume files never change

--offline
 Offline mode; never connect to server

--limit RATE
 Limit download rate at a fixed K/s

--daemon LOG
 Route output to log and detach

--debug
 Switch from gather to debug output module