Permalink
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
176 lines (118 sloc) 6.26 KB

Getting started

Installation

Before you begin, you should have a Django project created and configured.

In­stall our library from PyPI, like so:

$ pip install django-memento-framework

Edit your settings.py and add this app to your INSTALLED_APPS list.

IN­STALLED_APPS = (
    # ...
    # other apps would be above this of course
    # ...
    'memento',
)

Example project

For the purposes of this introduction, imagine a Django project that archives a list of URLs each hour. A simple set of models could look something like this.

from django.db import models


class Page(models.Model):
    """
    A web page that is archived each hour.
    """
    url = models.URLField()


class ArchivedHTML(model.Model):
    """
    A HTML snapshot of a single web page.
    """
    page = models.ForeignKey(Page)
    timestamp = models.DateTimeField()
    html = models.TextField()

    def get_absolute_url(self):
        return '/%s/' % self.pk

We won't go into all the code you'd need for this archive to actually work, but I bet you can imagine the sort of thing you'd need. Imagine it's there and it's been archiving URLs every hour for a few years.

TimeMaps

A good first step is create a TimeMap that will index all of the archives in your database. This is much like the sitemaps created for traditional search engines, but expanded to include datetime metadata.

This library includes a generic view that will publish a queryset in Memento's link format, that can be optionally paginated. It is designed to emulate Django's built-in feed framework and operates in much the same way.

You could start by create a pattern in urls.py that will accept a web page's URL with the goal of returning all the snapshots for that resource in our example archive.

url(
    r'^timemap/(?P<url>.*)$',
    views.ExampleTimemapLinkList(),
    name="timemap-archivedhtml"
),

That URL now needs to connect to a view named ExampleTimemapLinkList so let's go make it in your views.py file. It will inherit from our custom class and include configuration to pull the Page object from the database that matches the submitted URL and convert a list of all its archived HTML into a Timemap.

from memento.timemap import TimemapLinkList


class ExampleTimemapLinkList(TimemapLinkList):
    paginate_by = 1000

    def get_object(self, request, url):
        """
        Take the URL from the request and check if we have it in our Page model.
        """
        return get_object_or_404(Page, url=url)

    def get_original_url(self, obj):
        """
        Return the original URL being archived from the Page model after its been pulled.
        """
        return obj.url

    def memento_list(self, obj):
        """
        Pull a queryset set of all the ArchivedHTML records of the Page object.
        """
        return ArchivedHTML.objects.filter(page=obj)

    def memento_datetime(self, item):
        """
        Return the datetime when the archived was saved from each ArchivedURL object.
        """
        return item.timestamp

Now if you fired up your test server and visited http://localhost:8000/timemap/link/http://example.com/ You'd get a paginated index of all the archives saved from the page example.com.

TimeGate

The natural next step is to create the other pillar of the Memento system, a TimeGate. It is a view that when given a request that includes a URL and a timestamp will redirect to the detail page for the nearest archive.

A good first step is to create an URL in urls.py.

url(
    r'^timegate/(?P<url>.*)$',
    views.ExampleTimeGateView.as_view(),
    name="timegate"
),

Next you need to create the view included there.

class ExampleTimeGateView(TimeGateView):
    model = ArchivedHTML
    url_field = 'page__url' # You can walk across ForeignKeys like normal
    datetime_field = 'timestamp'
    timemap_pattern_name = "timemap-archived" # The name of the timemap URL we created above

This will now return a 302 redirect when given a URL in the request with a header of Accept-Datetime. A good way to check out if this is working is to use cURL on the command line.

$ url -X HEAD -i http://localhost:8000/timegate/http://archivedsite.com/ --header "Accept-Datetime: Fri, 1 May 2015 00:01:00 GMT"
HTTP/1.1 302 Moved Temporarily
Server: Apache/2.2.22 (Ubuntu)
Link: <http://archivedsite.com/>; rel="original", <http://localhost:800/timemap/link/http://archivedsite.com/>; rel="timemap"; type="application/link-format"
Location: http://www.example.com/screenshot/100/
Vary: accept-datetime

Memento detail view

The detail page for each snapshot in the archive should also be enriched to include Memento metadata in its response. This is done by using our extended version of Django's generic DetailView.

from memento.timegate import MementoDetailView


class ExampleMementoDetailView(MementoDetailView):
    model = ArchivedHTML
    datetime_field = 'timestamp'
    timemap_pattern_name = "timemap-archivedhtml"
    timegate_pattern_name = "timegate"

    def get_original_url(self, obj):
        return obj.page.url

That can be linked to a pattern in the urls.py file like any other Django detail view.

url(
    r'^archived-html/(?P<pk>\d+)/$',
    views.ExampleMementoDetailView.as_view(),
    name='archivedhtml-detail'
)

And that's it! Get all that going and your archive is ready to be indexed by http://mementoweb.org/ and integrated into Memento-based services.