Skip to content
Python library for reading and writing warc files
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs
test_data
warc Added gzip2.py to warc instead of keeping it a separate package. May 10, 2012
.gitignore Ignore coverage data Apr 25, 2012
.travis.yml Integrate with Travis CI. Feb 29, 2012
LICENSE
MANIFEST.in
Readme.rst
requirements.txt Releasing 0.2 May 10, 2012
setup.py

Readme.rst

warc: Python library to work with WARC files

build status

WARC (Web ARChive) is a file format for storing web crawls.

http://bibnum.bnf.fr/WARC/

This warc library makes it very easy to work with WARC files.:

import warc
f = warc.open("test.warc")
for record in f:
    print record['WARC-Target-URI'], record['Content-Length']

Documentation

The documentation of the warc library is available at http://warc.readthedocs.org/.

License

This software is licensed under GPL v2. See LICENSE file for details.

You can’t perform that action at this time.