Skip to content
/ WebWar Public

Really hacky proof of concept http archival using mitmproxy

License

Notifications You must be signed in to change notification settings

knot126/WebWar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

WebWar Archiver

WebWar is a proof of concept web archival tool which saves all data that flows through an HTTP proxy.

Right now I'm just cheap and using mitmproxy for the heavy lifting. B3

Usage

Archival

  1. Edit mitm_archive_http.py and change the DB to where you want to save stuff
  2. Run mitmproxy with --script set as the path to mitm_archive_http.py
  3. Set up your web browser to use the proxy (probably 127.0.0.1 port 8080)
  4. Browse around to save your shit :3

Browsing

I wrote some really shitty browser that can read back the saved files. Edit the path in there to point to your DB and then python ./netwar_browser.py :)

You can then visit sites like http://localhost:8000/https://furaffinity.net/user/knot126.

Note that you need to get the page name exactly right (ex www.furaffinity.net != furaffinity.net and example.com/ != example.com - will have some way to correct this later).

Format

The archive "database" is a simple content addressed storage system, sorted per domain, with a map.json file mapping URIs and time of archival to content and headers. Content files are named after the hex of their SHA-256 hash and stored in the domain folder - that is, alongside the map.json.

map.json

map.json is a simple array of objects with the following properties:

  • url: URL for this capture
  • time: UNIX timestamp of the capture
  • content: Hash of the saved contents
  • headers: Hash of the response headers (should be optional but currently required for browser)

Example

<archive root>
	/www.furaffinity.net
		/f0e4c2f76c58916ec258f246851bea091d14d4247a2fc3e18694461b1816e13b
		/13954213a197701957f334ace6845c1ebcd0a329053c790a8b31c47bc18c83de
		/b0eb9b2e16cd79eb4471af9f7d34de90b69d79b5de4177604e0109f82a83bc54
		/ ...
		/map.json
	/example.com
		/a379a6f6eeafb9a55e378c118034e2751e682fab9f2d30ab13d2125586ce1947
		/0efb0ab6e3a4e54c1a3ed2633c8a542125a9945498ae491dfb5d15d9648342d1
		/map.json

Notes

  • For portability, archives can be compressed into a ZIP file. Domain folders should be stored directly at the root of the archive, and the resulting ZIP file should retain a .zip file extension.
  • One major pillar of this design is that most of the formats should be easy to understand and based on widely known standards, so that even if this spec document were lost, it would be easy to get content out of the archive files. After all, an archive is useless if it can't be understood!

About

Really hacky proof of concept http archival using mitmproxy

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages