Web Archiving Community

Nick Sweeting edited this page Feb 13, 2019 · 32 revisions

▶️ Want to learn more about why Web Archiving is important? Check out this article.

Start with the master list: the Awesome Web Archiving List.

Community Forums

Related Projects

From the Archive.org & Archive-It teams

  • Archive.org The O.G. wayback machine provided publicly by the Internet Archive (Archive.org)
  • Archive.it commercial Wayback-Machine solution
  • Brozzler chrome headless crawler + WARC archiver maintained by Archive.org
  • WarcProx warc proxy recording and playback utility
  • WarcTools utilities for dealing with WARCs

From the Webrecorder team

  • Webrecorder.io An open-source personal archiving server that uses pywb under the hood
  • pywb The python wayback machine, the codebase forked off archive.org that powers webrecorder
  • ipwb A distributed web archiving solution using pywb with ipfs for storage
  • warcit Create a warc file out of a folder full of assets

Bookmarking + Archiving

  • Pocket Premium Bookmarking tool that provides an archiving service in their paid version, run by Mozilla
  • Pinboard Bookmarking tool that provides archiving in a paid version, run by a single independent developer
  • Wallabag / Wallabag.it Self-hostable web archiving server that can import via RSS
  • Shaarli Self-hostable bookmark tagging, archiving, and sharing service

Other Public Archiving Services

ArchiveBox alternatives

  • Polarized a desktop application for bookmarking, annotating, and archiving articles offline
  • Hypothes.is a web/pdf/ebook annotation tool that also archives content
  • Reminiscence extremely similar to ArchiveBox, uses a Django backend + UI and provides auto tagging and summary features with NLTK
  • Shaarchiver very similar project that archives Firefox, Shaarli, or Delicious bookmarks and all linked media, generating a markdown/HTML index
  • ReadableWebProxy A proxying archiver that downloads content from sites and can snapshot multiple versions of sites over time
  • Memex by Worldbrain.io a browser extension that saves all your history and does full-text search
  • Perkeep "Perkeep lets you permanently keep your stuff, for life."
  • Fetching.io A personal search engine/archiver that lets you search through all archived websites that you've bookmarked
  • Fossilo A commercial archiving solution that appears to be very similar to ArchiveBox
  • Archivematica web GUI for institutional long-term archiving of web and other content
  • Headless Chrome Crawler distributed web crawler built on puppeteer with screenshots
  • ZeroNet a replacement p2p internet powered by namecoin and a cryptocurrency for hosting
  • WWWofle old proxying recorder software similar to ArchiveBox
  • Erised Super simple CLI utility to bookmark and archive webpages
  • Zotero collect, organize, cite, and share research (mainly for technical/scientific papers & citations)

Smaller Utilities

Reading List

If any of these links are dead, you can find an archived version on https://archive.sweeting.me.

ArchiveBox in the News

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.