Skip to content

A Python library for dealing with Web ARChive (WARC) files.

License

Notifications You must be signed in to change notification settings

odie5533/pylibwarc

Repository files navigation

pylibwarc

pylibwarc is a Python library for dealing with Web ARChive (WARC) files. It has a WARC reader, a CDX reader, and a warc to cdx converter. pylibwarc requires the Twisted Python networking library as well as the Python dateutils library.

WARC to CDX

To create a CDX index file from a WARC file, use:

python warctocdx.py [-c <output.cdx.gz>] <warc file>

About

A Python library for dealing with Web ARChive (WARC) files.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages