(More info on this blog post)
Adapted rank-es, proof of concept.
Or: can anyone easily create a news aggregator without any user around to score the news articles? And: can we add comments to this?
And: can we do it in a very, very easy way?
Only two scripts are needed: one to populate the database and another one to generate static HTML files.
retriever.py
: downloads the new articles from the sources and stores them into the database. It also moves expired links and scores the ones still alive.html_generator.py
:
That's it.
You will need the following Python modules: feedparser, jinja2, sqlite3, urllib2, json. You can install them locally on your user directory with the --user
option if you use pip
: pip install --user module
. I think this is much more convenient than creating a full environment. This should also work on shared hostings (I am on DreamHost and works like a charm.)
mkdir db
sqlite3 db/reranker.db < schema.sqlite3
, or any other DB file you want.- Copy
config-sample.py
toconfig.py
and edit the relevant variables so that they reflect the actual paths in your system. Use full paths whenever possible. If you want to use the commenting system, register a new site on Disqus and setDISQUS
to the appropriate value. - Add a cron entry to your system to run first the retriever and then the generator. I have mine set at every 30 minutes.
And that's basically it.
It should not be very difficult. Let's try with The Guardian.
import feedparser
test = feedparser.parse("http://www.theguardian.com/international/rss")
test['entries'][0]
And then simply inspect the output. This feed, for instance, uses both:
[...]
'guidislink': False,
'id': u'http://www.theguardian.com/politics/2015/jun/28/theresa-may-tunisia-gunman-did-not-target-britons-andrew-marr',
'link': u'http://www.theguardian.com/politics/2015/jun/28/theresa-may-tunisia-gunman-did-not-target-britons-andrew-marr',
'links': [{'href': u'http://www.theguardian.com [...]