Scheduled Archiving

Nick Sweeting edited this page Jan 23, 2019 · 5 revisions

Schedule daily importing of new links into your archive

To schedule regular archiving you can use any task scheduler like cron, at, sytsemd, etc.

ArchiveBox ignores links that are imported multiple times (keeping the earliest version that it's seen). This means you can add cron jobs that regularly poll the same file or URL for new links, adding only new ones as necessary.

For some example configs, see the etc/cron.d and etc/supervisord folders.

Example: Import Firefox browser history every 24 hours

This example exports your browser history and archives it once a day:

Create /opt/ArchiveBox/bin/firefox_custom.sh:

#!/bin/bash

cd /opt/ArchiveBox
./bin/archivebox-export-browser-history --firefox ./output/sources/firefox_history.json
./bin/archivebox ./output/sources/firefox_history.json  >> /var/log/ArchiveBox.log

Then create a new file /etc/cron.d/ArchiveBox-Firefox to tell cron to run your script every 24 hours:

0 24 * * * www-data /opt/ArchiveBox/bin/firefox_custom.sh

Example: Import an RSS feed from Pocket every 12 hours

This example imports your Pocket bookmark feed and archives any new links once a day:

First, set your Pocket RSS feed to "public" under https://getpocket.com/privacy_controls.

Create /opt/ArchiveBox/bin/pocket_custom.sh:

#!/bin/bash

cd /opt/ArchiveBox
./bin/archivebox https://getpocket.com/users/yourusernamegoeshere/feed/all  >> /var/log/ArchiveBox.log

Then create a new file /etc/cron.d/ArchiveBox-Pocket to tell cron to run your script every 12 hours:

0 12 * * * www-data /opt/ArchiveBox/bin/pocket_custom.sh
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.