<a href="https://colab.research.google.com/github/samtaylor54321/digital-drafters/blob/main/problem_1_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Setting up

In [None]:
import json
import requests
from anthropic import Anthropic
from google.colab import userdata
from bs4 import BeautifulSoup

client = Anthropic(
    api_key = userdata.get('anthropic_api') # this is my colab setup: update to relevant key location
)

## User input

Provide legislation URL and furthest section to parse. Also a space to customise the prompt.

In [None]:
url = "https://www.legislation.gov.uk/ukpga/1968/60/data.xml"
sections = 5
prompt = "Please summarise the following text so it's understandable for a policy expert who is not a legal expert. In your output, only include the summary. Do not include an acknowledgement of this prompt:\n"

## Processing

Get contents

In [None]:
response = requests.get(url)

soup = BeautifulSoup(response.text, 'xml')

if response.ok:
    print("Successfully retrieved the following:")
    print(soup.title.string)

Successfully retrieved the following:
Theft Act 1968


Split into sections and get up to the highest section

In [None]:
section_ids = [section.P1.get("id") for section in soup.find_all("P1group")]
section_ids = [section for section in section_ids[0:sections] if section is not None]

Call API for several sections and store output in nested dictionary with a field for original text and for explanatory text

In [None]:
nested_output = {}

for i in range(sections):
  original_text = soup.find(id=section_ids[i]).get_text(" ")
  prompt_text = f"{prompt}{original_text}"

  message = client.messages.create(
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": prompt_text,
        }
    ],
    model="claude-3-haiku-20240307",
  )

  out_text = message.to_dict()['content'][0].get('text', '')

  nested_output[i] = {
      'original': original_text,
      'explanation': out_text
  }



In [None]:
# store output
with open('output_problem1.txt', 'w') as file:
     file.write(json.dumps(nested_output))