# What does the public want?

## Identifying the main themes in UK Parliament e-petitions

The ability to petition Parliament gives the people **power to join together in support of causes and effect change**. In the UK, British citizens and UK residents are able to create petitions on the UK Parliament petitions website; at 10,000 signatures petitions will get a response from the government, at 100,000 signatures petitions will be considered for a debate in Parliament (though it's of course not always successful, take [Revoke Article 50 and remain in the EU](https://petition.parliament.uk/archived/petitions/241584) for example).

**But what, you ask, are the themes that keep appearing?** In an age of information overload, it can be hard to keep track. In this notebook I use topic modelling to pull out the key themes in petitions submitted by members of the public to the UK Parliament e-petitions website.

Data is sourced from the [UK Parliament petitions website](https://petition.parliament.uk/), available under the Open Government Licence v3.0 ([OGL v3.0](https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/)). More info on the history of the UK Parliament petitions website can be found in the [wikipedia entry](https://en.wikipedia.org/wiki/UK_Parliament_petitions_website).

### Import libraries

In [108]:
import glob
import pickle
import pandas as pd

import urllib3
from bs4 import BeautifulSoup
import json
import time

http = urllib3.PoolManager()

#### Functions to read and write to pickle

In [111]:
def write_pkl(fname, obj):
    with open(fname,'wb') as pklf:
        pickle.dump((obj), pklf)
        
def read_pkl(fname):
    with open(fname,'rb') as pklf:
        obj = pickle.load(pklf)
    return obj

### Get the data

First, I download lists (CSV format) of current and archived petitions from the [UK Parliament petitions website](https://petition.parliament.uk/). At time of accessing the data, the archive covers 2010-2019, and current through to mid- 2021.

#### Read the lists of petitions

Stored locally in `data/`.

In [112]:
df = pd.DataFrame()

for f in glob.glob('data/*.csv'):
    df = df.append(pd.read_csv(f, header=0), ignore_index=True)
    
# Drop rejected petitions
rejects = df[df['State'] == 'rejected']
df = df[~df['URL'].isin(rejects['URL'])].reset_index(drop=True)

print('%d petitions (%d rejected petitions discarded)' % (len(df), len(rejects)))
df.head()

57727 petitions (23456 rejected petitions discarded)


Unnamed: 0,Petition,URL,State,Signatures Count
0,Introduce a rate increase cap on pay TV pricin...,https://petition.parliament.uk/archived/petiti...,closed,22
1,Impose a heavy extra tax on foreign buyers of ...,https://petition.parliament.uk/archived/petiti...,closed,383
2,Hold a referendum on electoral reform with the...,https://petition.parliament.uk/archived/petiti...,closed,4767
3,Make the 'Steam' refund policy the law for all...,https://petition.parliament.uk/archived/petiti...,closed,94
4,Ban unpaid internships,https://petition.parliament.uk/archived/petiti...,closed,438


#### Get the petition details
The details of each petition (json entry) are pulled from the parliament site using the URL in the lists. I save all fields except signatures. *Note that web scraping should be done considerately and after referring to both T&Cs and robots.txt*

In [None]:
errlog, details = [],[]

# iterate over dataframe rows (NB this method is slow, but that's okay as we want slow in this case)
for index, row in df.iterrows():

    url = row['URL']+'.json'
    try :
        # get url json
        response = http.request('GET', url)
        soup = BeautifulSoup(response.data)

        # extract data attributes dict and add to list
        petition_dict = json.loads(soup.text)['data']['attributes']
        details.append((index, {key: value for key, value in petition_dict.items() if 'signature' not in key}))
        
    except :
        errlog.append((index, row['URL']))
        details.append((index, ''))
        
    # scrape considerately(!) - add a delay of one second between requests
    time.sleep(1)
    
    # save the list approximately every half hour
    if index % 1800 == 0 :
        write_pkl('data/details.pkl', details)
        write_pkl('data/errlog.pkl', errlog)
        

Add details dicts to dataframe

In [102]:
indices, dicts = list(zip(*details)) # unzip the (index, dict) tuples
df['Details Dict'] = list(dicts)

In [103]:
df

Unnamed: 0,Petition,URL,State,Signatures Count,Details Dict
0,Introduce a rate increase cap on pay TV pricin...,https://petition.parliament.uk/archived/petiti...,closed,22,{'action': 'Introduce a rate increase cap on p...
1,Impose a heavy extra tax on foreign buyers of ...,https://petition.parliament.uk/archived/petiti...,closed,383,{'action': 'Impose a heavy extra tax on foreig...
