Skip to content
A social media open post web archiving tool
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
archive
dbbackup
jobs
munin_archiver
munin_indexer
munin_playback
munin_web
scripts
.gitignore
LICENSE
README.md
docker-compose.yml
example_env_file
migrate.sh

README.md

Munin - a Facebook and Instagram indexer and archiver

This tool will monitor open Facebook and Instagram account seeds for new posts and archive those posts available on the open web. Posts are archived in the WARC file format using the excellent Squidwarc package. A playback tool and a simple dashboard is available to monitor collections.

Munin dashboard screenshot

System overview

Munin builds on great software by other people. Indexing of post items is done in snscrape. Archiving of individual pages is done with Squidwarc. Playback of WARC files is enabled by pywb.

System overview - a Django application manages seeds and post URL:s in a PostgreSQL database. A queue for indexing finds more post URLs for the seeds. A queue for archiving makes sure post URLs are archived.

Install

Create and empty data directory for postgres called data.

$ mkdir data

Copy example_env_file to env_file and update it with your settings.

Start everything;

$ docker-compose up -d

Set up a superuser:

$ docker-compose exec web python manage.py createsuperuser

Login to the admin dashboard with the newly created superuser at http://0.0.0.0:4444/admin

Start by adding your first Collection item in the admin interface. Then add one or more seed URLs to the collection (e.g. https://www.instagram.com/visit_berlin/). You can bulk add multiple seeds (one per line) fron the dashboard.

After a couple of minutes, archived pages are available for playback from http://0.0.0.0:4445/munin/

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.