Feedparser-based feed aggregation django app
Python CSS CoffeeScript
Pull request Compare This branch is 80 commits behind mk-fg:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
feedjack
.gitignore
AUTHORS
CHANGES
LICENSE
README.md
finddata.py
setup.py

README.md

Intro (original feedjack).

Feedjack is a feed aggregator writen in Python using the Django web development framework.

Like the Planet feed aggregator:

  • It downloads feeds and aggregate their contents in a single site
  • The new aggregated site has a feed of its own (atom and rss)
  • It uses Mark Pilgrim’s excelent FeedParser
  • The subscribers list can be exported as OPML and FOAF

Original FeedJack also has some advantages:

  • Handles historical data, you can read old posts
  • Parses a lot more info, including post categories
  • Generates pages with posts of a certain category
  • Generates pages with posts from a certain subscriber
  • Generates pages with posts of a certain category from a certain subcriber
  • A cloud tag/folksonomy (hype 2.0 compliant) for every page and every subscriber
  • Uses Django templates
  • The administration is done via web (using Django's kickass autogenerated and magical admin site), and can handle multiple planets
  • Extensive use of Django’s cache system. Most of the time you will have no database hits when serving pages.

Original feedjack project looks abandoned though - no real updates since 2008 (with quite a lively history before that).

Fork

  • (fixed) Bugs:

    • hashlib warning
    • field lenghts
    • non-unique date sort criteria
    • Always-incorrect date_modified setting (by treating UTC as localtime)
    • Misc unicode handling fixes.
  • Features:

    • Proper transactional updates, so single feed failure is guaranteed not to produce inconsistency or crash the parser.

    • Simple individual Post filters - python callables (accepting Post object and optional parameters, returning True/False), attached (to individual Feeds) and configured (additional parameters to pass) via database (or admin interface).

    • As complex as needed cross-referencing filters for tasks like site-wide elimination of duplicate entries from a different feeds (by arbitrary comparison functions as well), and automatic mechanism for invalidation of their results.

    • Sane, configurable logging in feedjack_update, without re-inventing the wheel via encode, prints and a tons of if's. feedjack_update command in django-admin suite.

    • Ability to use adaptive feed-check interval, based on average feed activity, so feeds that get updated once a year won't be polled every hour.

    • "immutable" flag for feeds, so their posts won't be re-fetched if their content or date changes (for feeds that have "commets: N" thing).

    • Dropped a chunk of obsolete code (ripped from old Django) - ObjectPaginator in favor of native Paginator.

    • "bootstrap", "fern" and "plain" (merged from another fork) themes, image feed oriented "fern_grid" theme.

    • Keeping track of already-read entries with client-side localStorage (only on "bootstrap" and "fern*" themes") or server-side (on any compatible server, not the feedjack instance, django-unhosted instance, for example) remoteStorage ("bootstrap" theme).

    • Quite a few code optimizations.

    • ...and there's usually more stuff in the CHANGES file.

Installation

This feedjack fork is a regular package for Python 2.7 (not 3.X), but not in pypi, so can be installed from a checkout with something like that:

python setup.py install

That will install feedjack to a python site-path, so it can be used as a Django app.

Better way would be to use pip to install all the necessary dependencies as well:

% pip install 'git+https://github.com/mk-fg/feedjack.git#egg=feedjack'

Note that to install stuff in system-wide PATH and site-packages, elevated privileges are often required. Use "install --user", ~/.pydistutils.cfg or virtualenv to do unprivileged installs into custom paths.

After that you must set up your Feedjack static directory inside your Django STATIC_URL directory. It must be set in a way that Feedjack’s static directory can be reached at "STATIC_URL/feedjack/".

For instance, if your STATIC_URL resolves to "/var/www/htdocs", and Feedjack was installed in /usr/lib/python2.7/site-packages/feedjack, just type this:

% ln -s /usr/lib/python2.7/site-packages/feedjack/static/feedjack /var/www/htdocs/feedjack

Alternatively, standard django.contrib.staticfiles app can be used to copy/link static files with "./manage.py collectstatic" command.

You must also add 'feedjack' in your settings.py under INSTALLED_APPS and then run "./manage.py syncdb" from the command line.

Make sure to add/uncomment "django.contrib.admin" app (Django admin interface) before runnng syncdb as well, since it's the most convenient and supported way to configure and control feedjack. Otherwise the next best way would be to manipulate models from the python code directly, which might be desirable for some kind of migration or other automatic configuration.

If South app is available (highly recommended), make sure to add it to INSTALLED_APPS as well, so it'd be able to apply future database schema updates effortlessly. Don't forget to run "./manage.py migrate feedjack" in addition to syncdb in that case.

Then you must add an entry for feedjack.urls in your Django "urls.py" file, so it'd look something like this (with admin interface also enabled on "/admin/"):

urlpatterns = patterns( '',
    (r'^admin/', include('django.contrib.admin.urls')),
    (r'', include('feedjack.urls')) )

After that you might want to check out /admin section (if django.contrib.admin app was enabled) to create a feedjack site, otherwise sample default site will be created for you on the first request.

Requirements

Updating from older versions

The only non-backwards-compatible changes should be in the database schema, thus requiring migration, but it's much easier (automatic, even) than it sounds.

Feedjack uses South for database migration, so it has to be installed if database schema migrations are necessary. Don't forget to add "south" to INSTALLED_APPS afterwards.

After that, use something like this to see current database schema version and which migrations are necessary:

% ./manage.py migrate --list

feedjack
  ...
  (*) 0013_auto__add_field_filterbase_crossref_rebuild__add_field_filterbase_cros
  ( ) 0014_auto__add_field_post_hidden
  ( ) 0015_auto__add_field_feed_skip_errors
  ( ) 0016_auto__chg_field_post_title__chg_field_post_link
  ( ) 0017_auto__chg_field_tag_name

Here you can see at which version the current schema is and how far it's behind what code (models.py) expects it to be.

If South was just installed, you may have to specify initial schema version manually by using command like ./manage.py migrate feedjack 0013 --fake. Best way to manually find which model version was used before South is probably to inspect git history for models.py to find the first not-yet applied change to the model classes.

All the necessary migrations can be applied with a single ./manage.py migrate feedjack command:

% ./manage.py migrate feedjack

Running migrations for feedjack:
 - Migrating forwards to 0017_auto__chg_field_tag_name.
 > feedjack:0014_auto__add_field_post_hidden
 > feedjack:0015_auto__add_field_feed_skip_errors
 > feedjack:0016_auto__chg_field_post_title__chg_field_post_link
 > feedjack:0017_auto__chg_field_tag_name
 - Loading initial data for feedjack.
Installed 4 object(s) from 1 fixture(s)

In case of any issues and for more advanced usage information, please refer to South project documentation.

Configuration

The first thing you want to do is add a Site.

To do this, open Django admin interface and create your first planet. You must use a valid address in the URL field, since it will be used to identify the current planet when there are multiple planets in the same instance and to generate all the links.

Then you should add subscribers to your first planet. A subscriber is a relation between a Feed and a Site, so when you add your first subscriber, you must also add your first Feed by clicking in the “+” button at the right of the Feed combobox.

Feedjack is designed to use Django cache system to store database-intensive data like pages of posts and tagclouds, so it is highly recomended to configure CACHES in django settings (memcached, db or file).

Now that you have everything set up, run ./manage.py feedjack_update (or something like DJANGO_SETTINGS_MODULE=myproject.settings feedjack_update) to retrieve the actual data from the feeds. This script should be setup to be run periodically (to retreive new posts from the feeds), which is usually a task of unix cron daemon.

In case of some missing or inaccessible functionality, feedjack may issue (once per runtime) python warnings, which can (and most likely should) be captured by logging system, so they can be handled by django (e.g. notification mail sent to ADMINS). To do that, add following code to settings.py:

import logging
logging.captureWarnings(True)

Bugs, development, support

All the issues with this fork should probably be reported to respective github project/fork, since code here can be quite different from the original project.

Until 2012, fork was kept in fossil repo here.

Original version is available at feedjack site.

Links