Skip to content
Switch branches/tags


Failed to load latest commit information.
Latest commit message
Commit time
Oct 28, 2018
Oct 28, 2018
Oct 28, 2018
Oct 28, 2018
Jul 19, 2020

reddit html archiver

pulls reddit data from the pushshift api and renders offline compatible html pages. uses the reddit markdown renderer.


requires python 3 on linux, OSX, or Windows.

warning: if $ python --version outputs a python 2 version on your system, then you need to replace all occurances of python with python3 in the commands below.

$ sudo apt-get install pip
$ pip install psaw -U
$ git clone
$ cd snudown
$ sudo python install
$ cd ..
$ git clone [this repo]
$ cd reddit-html-archiver
$ chmod u+x *.py

Windows users may need to run

> chcp 65001

before running or to resolve encoding errors such as 'codec can't encode character'.

fetch reddit data

fetch data by subreddit and date range, writing to csv files in data:

$ python ./ politics 2017-1-1 2017-2-1

or you can filter links/posts to download less data:

$ python ./ --self_only --score "> 2000" politics 2015-1-1 2016-1-1

to show all available options and filters run:

$ python ./ -h

decrease your date range or adjust pushshift_rate_limit_per_minute in if you are getting connection errors.

write web pages

write html files for all subreddits to r:

$ python ./

you can add some output filtering to have less empty postssmaller archive size

$ python ./ --min-score 100 --min-comments 100 --hide-deleted-comments

to show all available filters run:

$ python ./ -h

your html archive has been written to r. once you are satisfied with your archive feel free to copy/move the contents of r to elsewhere and to delete the git repos you have created. everything in r is fully self contained.

to update an html archive, delete everything in r aside from r/static and re-run to regenerate everything.

hosting the archived pages

copy the contents of the r directory to a web root or appropriately served git repo.

potential improvements

  • fetch_links
    • num_comments filtering
    • thumbnails or thumbnail urls
    • media posts
    • score update
    • scores from reddit with praw
  • real templating
  • choose Bootswatch theme
  • specify subreddits to output
  • show link domain/post type
  • user pages
    • add pagination, posts sorted by score, comments, date, sub
    • too many files in one directory
  • view on
  • js powered search page, show no links by default
  • js inline media embeds/expandos
  • links

see also



archive reddit data as offline friendly web pages




No releases published


No packages published