Skip to content

phpwaves/wikifetch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This tutorial will cover creating a simple wiki scrape code for highlighting Python Django REST Framework Web API.

Along the way it will make you understand how to create new django REST based applicaiton to fetch wikipedia Revision history details dynamically related to any wikipedia article and display article information upon revision selection.


Note Demo application is available here.


Requirements

  • Python (2.6.5+, 2.7, 3.2, 3.3)
  • Django-1.5.4
  • djangorestframework ( pip install djangorestframework )
  • BeautifulSoup ( pip install beautifulsoup4)

Getting started

Okay, we're ready to get coding. To get started, let's create a new project to work with.

cd ~
django-admin.py startproject hitachi
cd hitachi

Once that's done we can create an app that we'll use to create wikifetch Web API.

python manage.py startapp wikifetch

The simplest way to get up and running will probably be to define STATIC_ROOT and TEMPLATE_DIRS inside settings.py configaration file


For STATIC_ROOT :
import os
ROOT_PATH = os.path.dirname(__file__)
PROJECT_PATH = os.path.dirname(os.path.abspath(__file__))

STATIC_ROOT = os.path.join(ROOT_PATH, 'static') STATIC_URL = '/static/' STATICFILES_DIRS = (( os.path.join('keep your static files dir path', 'static')),)

For TEMPLATE_DIRS : TEMPLATE_DIRS = ( # Put strings here, like "/home/html/django_templates" or "C:/www/django/templates". )

We'll also need to add our new wikifetch app and the rest_framework app to INSTALLED_APPS.

INSTALLED_APPS = (
    ...
    'rest_framework',
    'wikifetch',
)

Okay, we're ready to roll.

Creating a Serializer class

The first thing we need to get started on our Web API is provide a way of serializing and deserializing the wikifetch instances into representations such as json. We can do this by declaring serializers that work very similar to Django's forms. Create a file in the wikifetch directory named serializers.py and add the following.


# Create your serializers here.
from django.forms import widgets
from rest_framework import serializers

class Wikifetch(object): def init(self, title, versions, url): self.title = title self.versions = versions self.url = url

class WikiSerializer(serializers.Serializer): title = serializers.CharField(required=False) versions = serializers.ChoiceField(required=False) url = serializers.CharField(required=False)

def restore_object(self, attrs, instance=None):
    """
    Restore object for json response
    """
    if instance:
        # Update existing instance
        instance.title = attrs.get('title', instance.title)
        instance.versions = attrs.get('versions', instance.code)
        instance.url = attrs.get('url', instance.linenos)
        return instance

    # Create new instance
    return Wikifetch(**attrs)

As we are not using any binding relation to models we create Wikifetch class for non relation objects .

The first part of WikiSerializer class defines the fields that get serialized/deserialized. The restore_object method defines how fully fledged instances get created when deserializing data.

Writing Django views using our Serializer

We'll start off by creating a subclass of HttpResponse that we can use to render any data we return into json.

Edit the wikifetch/views.py file, and add the following.


# Create your views here.
from rest_framework.renderers import JSONRenderer
from rest_framework.parsers import JSONParser
from wikifetch.serializers import WikiSerializer, Wikifetch
from django.template import Context, RequestContext
from django.shortcuts import render_to_response, get_object_or_404, render
from django import http
from django.http import HttpResponseRedirect, HttpResponse
from django.views.decorators.csrf import csrf_exempt
import urllib2, urllib
from bs4 import BeautifulSoup
from django.utils import simplejson

class JSONResponse(HttpResponse): """ An HttpResponse that renders its content into JSON. """ def init(self, data, **kwargs): content = JSONRenderer().render(data) kwargs['content_type'] = 'application/json' super(JSONResponse, self).init(content, **kwargs)

def home(request): return render_to_response('index.html', {'data': "hello"}, context_instance=RequestContext(request))

We'll also need couple of views which populates version history details and related content for any submitted url


@csrf_exempt						  
def fetchWiki(request):
"""
An HttpResponse that renders its version history details into JSON.
We use addheaders just to recover forbidden errors from wikipedia
"""
content = []
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/24.0')]
url_id = request.GET['wiki']
url = 'http://en.wikipedia.org/w/index.php?title='+url_id+'&action=history'
infile = opener.open(url)
page = infile.read()
soup = BeautifulSoup(page)
data=soup.find_all('a',{'class':'mw-changeslist-date'})
for i in data:
	info = {}
	info['title'] = i.string
	info['versions'] = '1.1'
	info['url'] = i['href']
	info[i.string] = Wikifetch(title=info['title'], versions=info['versions'], url=info['url'])
	content.append(info)

serializer = WikiSerializer(content)
return JSONResponse(serializer.data)

@csrf_exempt
def fetchArticle(request): """ An HttpResponse that renders its article content into JSON. """ opener = urllib2.build_opener() opener.addheaders = [('User-agent', 'Mozilla/24.0')] url_id = request.GET['url'] oldid = request.GET['oldid'] url = url_id+'&oldid='+oldid infile = opener.open(url) page = infile.read() soup = BeautifulSoup(page) data = soup.find_all('div' ,attrs={'id':'content'}) response_dict = {} for content in data: response_dict.update({'article': content.text})

json_data = simplejson.dumps(response_dict)
return HttpResponse(json_data, mimetype='application/json')

def home(request): """ Simple landing page view """ return render_to_response('index.html', context_instance=RequestContext(request))

Finally we need to wire these views up. update the urls.py file:

from django.conf.urls import patterns, url

urlpatterns = patterns('', url(r'^$', 'wikifetch.views.home', name='home'), url(r'^fetchWiki/$', 'wikifetch.views.fetchWiki', name='fetchWiki'), url(r'^fetchArticle/$', 'wikifetch.views.fetchArticle', name='fetchArticle'), )

fetchWiki and fetchArticle requestd URLs use jquery ajax implementation and code will be in wikifetch.js inside static files js directory

Testing our first attempt at a Web API

Now we can start up a sample server that serves our wikifetch.

...and start up Django's development server.

python manage.py runserver

Validating models...

0 errors found Django version 1.4.3, using settings 'tutorial.settings' Development server is running at http://127.0.0.1:8000/ Quit the server with CONTROL-C.

Finally we can access Revision history details from our landing page from browser

http://127.0.0.1:8000

About

Django rest frame work tutorial

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors