WebWar Archiver

WebWar is a proof of concept web archival tool which saves all data that flows through an HTTP proxy.

Right now I'm just cheap and using mitmproxy for the heavy lifting. B3

Usage

Archival

Edit mitm_archive_http.py and change the DB to where you want to save stuff
Run mitmproxy with --script set as the path to mitm_archive_http.py
Set up your web browser to use the proxy (probably 127.0.0.1 port 8080)
Browse around to save your shit :3

Browsing

I wrote some really shitty browser that can read back the saved files. Edit the path in there to point to your DB and then python ./netwar_browser.py :)

You can then visit sites like http://localhost:8000/https://furaffinity.net/user/knot126.

Note that you need to get the page name exactly right (ex www.furaffinity.net != furaffinity.net and example.com/ != example.com - will have some way to correct this later).

Format

The archive "database" is a simple content addressed storage system, sorted per domain, with a map.json file mapping URIs and time of archival to content and headers. Content files are named after the hex of their SHA-256 hash and stored in the domain folder - that is, alongside the map.json.

`map.json`

map.json is a simple array of objects with the following properties:

url: URL for this capture
time: UNIX timestamp of the capture
content: Hash of the saved contents
headers: Hash of the response headers (should be optional but currently required for browser)

Example

<archive root>
	/www.furaffinity.net
		/f0e4c2f76c58916ec258f246851bea091d14d4247a2fc3e18694461b1816e13b
		/13954213a197701957f334ace6845c1ebcd0a329053c790a8b31c47bc18c83de
		/b0eb9b2e16cd79eb4471af9f7d34de90b69d79b5de4177604e0109f82a83bc54
		/ ...
		/map.json
	/example.com
		/a379a6f6eeafb9a55e378c118034e2751e682fab9f2d30ab13d2125586ce1947
		/0efb0ab6e3a4e54c1a3ed2633c8a542125a9945498ae491dfb5d15d9648342d1
		/map.json

Notes

For portability, archives can be compressed into a ZIP file. Domain folders should be stored directly at the root of the archive, and the resulting ZIP file should retain a .zip file extension.
One major pillar of this design is that most of the formats should be easy to understand and based on widely known standards, so that even if this spec document were lost, it would be easy to get content out of the archive files. After all, an archive is useless if it can't be understood!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE		LICENSE
README.md		README.md
mitm_archive_http.py		mitm_archive_http.py
netwar_browser.py		netwar_browser.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebWar Archiver

Usage

Archival

Browsing

Format

`map.json`

Example

Notes

About

Releases

Packages

Languages

License

knot126/WebWar

Folders and files

Latest commit

History

Repository files navigation

WebWar Archiver

Usage

Archival

Browsing

Format

map.json

Example

Notes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`map.json`

Packages