# legislation.gov.uk API


Published legislation is available from [legislation.gov.uk](https://www.legislation.gov.uk/) via an API. API guidance can be found [here](https://legislation.github.io/data-documentation/api/overview.html) but we demo its basic usage in this notebook.

There are several options for [file formats](https://legislation.github.io/data-documentation/formats/overview.html) - the recommended format is [XML](https://legislation.github.io/data-documentation/api/xml-intro.html) and documentation on the XML dialect can be found [here](https://legislation.github.io/clml-schema/userguide.html).

A bulk download service is also available, but note that the data provided is a couple of years out of date: http://leggovuk-ldn.s3-website.eu-west-2.amazonaws.com/

## Get data via API

This section shows how to obtain the [Theft Act 1968](https://www.legislation.gov.uk/ukpga/1968/60/contents) via the API.

In [None]:
import requests

url = "https://www.legislation.gov.uk/ukpga/1968/60/data.xml"

In [None]:
response = requests.get(url)

if response.ok:
    print("Ok!")
    print(response.url)

## Parse XML data using BeautifulSoup

There are various Python libraries available for parsing XML data. Here's an example using BeautifulSoup.

In [None]:
from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text, 'xml')

In [None]:
# Get the Act title
soup.title.string

In [None]:
# The url to the document on the legislation.gov website
soup.identifier.string

In [None]:
# Act introduction
soup.description.string

In [None]:
# Preview text
soup.get_text(" ", strip=True)[:1000]

## Extract text from specific Sections

In [None]:
# Get a list of all Sections in the Act
section_ids = [section.P1.get("id") for section in soup.find_all("P1group")]
section_ids = [section for section in section_ids if section is not None]
section_ids

In [None]:
# Get text from a specific Section
soup.find(id=section_ids[0]).get_text(" ")