Skip to content
archive reddit data as offline web pages
Branch: master
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
r/static initial Oct 28, 2018
screenshots initial Oct 28, 2018
templates initial Oct 28, 2018
.gitignore initial Oct 28, 2018 fix subreddit name casing bugs, resolves #3 Apr 9, 2019 fix subreddit name casing bugs, resolves #3 Apr 9, 2019 fix subreddit name casing bugs, resolves #3 Apr 9, 2019

reddit html archiver

pulls reddit data from the pushshift api and renders offline compatible html pages


requires python 3 on linux, OSX, or Windows

sudo apt-get install pip
pip install psaw
git clone
cd snudown
sudo python install
cd ..
git clone [this repo]
cd reddit-html-archiver
chmod u+x *.py

Windows users may need to run

chcp 65001

before running or to resolve encoding errors such as 'codec can't encode character'.

fetch reddit data

data is fetched by subreddit and date range and is stored as csv files in data.

./ politics 2017-1-1 2017-2-1
# or add some link/post request filters
./ --self_only --score "> 2000" politics 2015-1-1 2016-1-1
./ -h

you may need decrease your date range or adjust pushshift_rate_limit_per_minute in if you are getting connection errors.

write web pages

write html files for all subreddits to r.

# or add some output filtering
./ --min-score 100 --min-comments 100 --hide-deleted-comments
./ -h

your html archive has been written to r. once you are satisfied with your archive feel free to copy/move the contents of r to elsewhere and to delete the git repos you have created. everything in r is fully self contained.

to update an html archive, delete everything in r aside from r/static and re-run to regenerate everything.

hosting the archived pages

copy the contents of the r directory to a web root or appropriately served git repo.

potential improvements

  • fetch_links
    • num_comments filtering
    • thumbnails or thumbnail urls
    • media posts
    • score update
    • scores from reddit with praw
  • real templating
  • choose Bootswatch theme
  • specify subreddits to output
  • show link domain/post type
  • user pages
    • add pagination, posts sorted by score, comments, date, sub
    • too many files in one directory
  • view on
  • js powered search page, show no links by default
  • js inline media embeds/expandos
  • links

see also


You can’t perform that action at this time.