# Exploring the Legislation Corpus with Neo4j

In [5]:
# Load dotenv
from dotenv import load_dotenv
import os

load_dotenv()

NEO4J_URI = os.getenv("NEO4J_URI")
NEO4J_USER = os.getenv("NEO4J_USER")
NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD")
NEO4J_DATABASE = os.getenv("NEO4J_DATABASE", "neo4j")

In [6]:
from neo4j_analysis import Neo4jAnalysis

# Initialize the analysis helper
analysis = Neo4jAnalysis(NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD, NEO4J_DATABASE)

In [7]:
colors = {
    "Legislation": "#1f77b4",  # Blue for Legislation
    "Part": "#ff7f0e",  # Orange for Parts
    "Chapter": "#2ca02c",  # Green for Chapters
    "Section": "#d62728",  # Red for Sections
    "Paragraph": "#9467bd",  # Purple for Paragraphs
    "Schedule": "#8c564b",  # Brown for Schedules
    "ScheduleParagraph": "#e377c2",  # Pink for Schedule Paragraphs
    "ScheduleSubparagraph": "#7f7f7f",  # Gray for Schedule Subparagraphs
    "Commentary": "#bcbd22",  # Olive for Commentaries
    "Citation": "#17becf",  # Cyan for Citations
    "CitationSubRef": "#aec7e8",  # Light Blue for Citation Sub References
    "ExplanatoryNotes": "#ffbb78",  # Light Orange for Explanatory Notes
    "ExplanatoryNotesParagraph": "#98df8a",  # Light Green for Explanatory Notes Paragraphs
}

## The complete graph schema

We will start by visualising the complete graph schema of the Legislation Corpus. This will give us an overview of the different types of nodes and relationships that exist in the dataset.

In [8]:
# Show the graph schema
from neo4j_viz.neo4j import from_neo4j, ColorSpace

query = """
CALL db.schema.visualization()
"""
results = analysis.run_query_viz(query)

VG = from_neo4j(results)
VG.color_nodes(
    field="caption",  # Using the internal labels property
    color_space=ColorSpace.DISCRETE,
    colors=colors,
)

generated_html = VG.render(layout="forcedirected")
await analysis.capture_graph_to_png(
    generated_html, "renderings/schema_graph.png", width=1080, height=1080
)

![Graph Schema](renderings/schema_graph.png)

## The corpus

The following query retrieves all the core legislation documents in the corpus, along with their titles and enactment dates.

In [9]:
query = """
MATCH p=(l:Legislation)
RETURN l.category AS Category, l.status AS Status, l.title AS Title, l.uri AS URI, l.enactment_date AS Enactment
ORDER BY Enactment
"""

corpus_df = analysis.run_query_df(query)
# filter by Status="final" and sort by enactment date
corpus_df[corpus_df["Status"] == "final"].sort_values(
    "Enactment", ascending=False
).head(10)

Unnamed: 0,Category,Status,Title,URI,Enactment
777,primary,final,General Cemetery Act 2025,http://www.legislation.gov.uk/ukla/2025/2/enacted,2025-10-27
729,primary,final,Housing (Amendment) Act (Northern Ireland) 2020,http://www.legislation.gov.uk/nia/2020/5/enacted,2020-08-28
700,primary,final,City of London Corporation (Open Spaces) Act 2018,http://www.legislation.gov.uk/ukla/2018/1/enacted,2018-03-15
633,primary,final,Humber Bridge Act 2013,http://www.legislation.gov.uk/ukla/2013/6/enacted,2013-12-18
267,primary,final,Social Security (Contributions) Act 1982,http://www.legislation.gov.uk/ukpga/1982/2/ena...,1988-02-02
253,primary,final,Social Fund (Maternity and Funeral Expenses) A...,http://www.legislation.gov.uk/ukpga/1987/7/ena...,1987-03-17
237,primary,final,Statute Law (Repeals) Act 1986,http://www.legislation.gov.uk/ukpga/1986/12/en...,1986-05-02
180,primary,final,Social Security (Contributions) Act 1981,http://www.legislation.gov.uk/ukpga/1981/1/ena...,1981-01-29
172,primary,final,Social Security (No. 2) Act 1980,http://www.legislation.gov.uk/ukpga/1980/39/en...,1980-07-17
153,primary,final,Housing (Homeless Persons) Act 1977,http://www.legislation.gov.uk/ukpga/1977/48/en...,1977-07-29


## A piece of legislation down to the section level

With the graph in place, we can start exploring the legislation corpus. Let's start with a piece of legislation, the **Corporation Tax Act 2010**, and explore its structure down to the section level.

In [10]:
query = """
MATCH p=(l:Legislation)-[:HAS_PART]->(:Part)-[:HAS_CHAPTER]->(:Chapter)-[:HAS_SECTION]->(:Section)
WHERE l.uri CONTAINS "ukpga/2010/4"
RETURN p
"""

results = analysis.run_query_viz(query)

VG = from_neo4j(results)
VG.color_nodes(
    field="caption",
    color_space=ColorSpace.DISCRETE,
    colors=colors,
)

generated_html = VG.render(layout="forcedirected")
await analysis.capture_graph_to_png(
    generated_html, "renderings/legislation_example.png", width=1080, height=1080
)

![Legislation Example](renderings/legislation_example.png)

## Focusing on a single part down to paragraphs and citations

Let us focus on a single part of the **Corporation Tax Act 2010**, and explore the network of paragraphs and commentaries in that specific part. The `order` property of the `Paragraph` nodes allows us to reconstruct the original order of the paragraphs in the legislation.

In [11]:
query = """
MATCH p=(l:Legislation)-[:HAS_PART]->(part:Part)-[:HAS_CHAPTER]->(:Chapter)-[:HAS_SECTION]->(section:Section)-[:HAS_PARAGRAPH]->(para:Paragraph)-[:HAS_COMMENTARY]->(comm:Commentary)
WHERE l.uri CONTAINS "ukpga/2010/4" AND part.order=2
RETURN p
"""

results = analysis.run_query_viz(query)

VG = from_neo4j(results)
VG.color_nodes(
    field="caption",
    color_space=ColorSpace.DISCRETE,
    colors=colors,
)

generated_html = VG.render(layout="forcedirected", initial_zoom=1.0)
await analysis.capture_graph_to_png(
    generated_html, "renderings/legislation_example_detail.png", width=1080, height=1080
)



![Legislation Example](renderings/legislation_example_detail.png)

## Commentaries

Let us now run a query that retrieves the network of commentaries which cite a specific piece of legislation (in this case, the **Data Protection Act 2018**). In this case we filter by the `uri` property of the `Legislation` node, which is a unique identifier for each piece of legislation in the corpus. You could also perform an exact or fuzzy match on the title of the legislation.

In [12]:
query = """
MATCH p=(:Commentary)-[:HAS_CITATION]->(:Citation)-[:CITES_ACT]->(l:Legislation)
WHERE l.uri CONTAINS "ukpga/2018/12"
RETURN p
"""

results = analysis.run_query_viz(query)

VG = from_neo4j(results)
VG.color_nodes(
    field="caption",
    color_space=ColorSpace.DISCRETE,
    colors=colors,
)

generated_html = VG.render(layout="forcedirected", initial_zoom=1.0)
await analysis.capture_graph_to_png(
    generated_html, "renderings/commentary_network.png", width=1080, height=1080
)

![Commentary Network](renderings/commentary_network.png)

## Schedules

Schedules appear in legislation as a means to include additional information that is relevant to the legislation but does not fit into the main body of the text. They are often used to provide details, examples, or exceptions related to specific sections of the legislation. In our graph, we represent schedules as nodes that are connected to the relevant sections or parts of the legislation.

In [13]:
query = """
MATCH p=(l:Legislation)-[:HAS_SCHEDULE]->(sc:Schedule)-[:HAS_PARAGRAPH]->(scp:ScheduleParagraph)-[:HAS_SUBPARAGRAPH]->(scsp:ScheduleSubparagraph)
WHERE l.uri CONTAINS "ukpga/2010/4"
OPTIONAL MATCH (scp)-[:HAS_COMMENTARY]-(:Commentary)-[:HAS_CITATION]-(:Citation)-[:HAS_SUBREF]->(:CitationSubRef)
RETURN p
"""

results = analysis.run_query_viz(query)

VG = from_neo4j(results)
VG.color_nodes(
    field="caption",
    color_space=ColorSpace.DISCRETE,
    colors=colors,
)

generated_html = VG.render(layout="forcedirected", initial_zoom=1.0)
await analysis.capture_graph_to_png(
    generated_html, "renderings/schedules.png", width=1080, height=1080
)

![Schedule Example](renderings/schedules.png)

# Creating synthetic relationships

Let us now create synthetic relationships between legislation nodes that cite each other, and store the count of citations as a 'weight' property.

In [None]:
query = """
// Assign the structural traversal to a path variable 'p'
MATCH p = (source:Legislation)-[:HAS_PART|HAS_CHAPTER|HAS_SECTION|HAS_PARAGRAPH|HAS_SCHEDULE|HAS_SUBPARAGRAPH|HAS_COMMENTARY|HAS_CITATION|HAS_SUBREF*1..10]->(citation_link)
// Match the final jump to the target legislation
MATCH (citation_link)-[:CITES_ACT|REFERENCES]->(target:Legislation)
// Prevent self-citations
WHERE source.uri <> target.uri
// Ensure no intermediate node is another Legislation node.
// nodes(p)[1..] skips the starting node (source) and checks everything else.
  AND ALL(n IN nodes(p)[1..] WHERE NOT n:Legislation)
// Aggregate and create the synthetic relationship
WITH source, target, count(citation_link) AS citation_count
MERGE (source)-[rel:CITES_LEGISLATION]->(target)
SET rel.weight = citation_count
"""

results = analysis.run_query(query)

We can then visualise the resulting graph, which shows the network of citations between Acts in our corpus.

In [15]:
query = """
MATCH p = (l1:Legislation)-[r:CITES_LEGISLATION]->(l2:Legislation)
RETURN p
LIMIT 1000
"""

results = analysis.run_query_viz(query)

VG = from_neo4j(results)
VG.color_nodes(
    field="caption",
    color_space=ColorSpace.DISCRETE,
    colors=colors,
)

generated_html = VG.render(layout="forcedirected", initial_zoom=1.0)
await analysis.capture_graph_to_png(
    generated_html, "renderings/citation_network.png", width=1080, height=1080
)

![Citation Network](renderings/citation_network.png)

Or we can run a similar query to create a table of citations.

In [16]:
query = """
MATCH p = (l1:Legislation)-[r:CITES_LEGISLATION]->(l2:Legislation)
RETURN l1.title AS Source, l2.title AS Target, r.weight AS CitationCount
ORDER BY CitationCount DESC
"""

citations_df = analysis.run_query_df(query)
citations_df.head(10)

Unnamed: 0,Source,Target,CitationCount
0,The Insolvency (Northern Ireland) Order 1989,,22287
1,Value Added Tax Act 1994,Taxation (Cross-border Trade) Act 2018,16139
2,The Insolvency (Northern Ireland) Order 1989,The Payment and Electronic Money Institution I...,11221
3,Value Added Tax Act 1994,The Value Added Tax (Miscellaneous and Transit...,10764
4,Insolvency Act 1986,,8586
5,Value Added Tax Act 1994,Taxation (Post-transition Period) Act 2020,5607
6,Value Added Tax Act 1994,"The Value Added Tax (Miscellaneous Amendments,...",5463
7,Finance Act 2008,Finance Act 2004,3499
8,Gambling Act 2005,The Gambling Act 2005 (Commencement No. 6 and ...,3080
9,National Health Service Act 1977,National Health Service (Consequential Provisi...,2761


## Superseeded legislation

Over time legislation can be superseeded by newer legislation. The graph holds information about which pieces of legislation have been superseeded, and by which newer pieces of legislation. We can run a query to retrieve this information and visualise its network.

In [17]:
query = """
MATCH p=(:Legislation)-[:SUPERSEDED_BY|SUPERSEDES]-(:Legislation)
RETURN p
"""

results = analysis.run_query_viz(query)
VG = from_neo4j(results)
VG.color_nodes(
    field="caption",
    color_space=ColorSpace.DISCRETE,
    colors=colors,
)

generated_html = VG.render(layout="forcedirected", initial_zoom=1.0)
await analysis.capture_graph_to_png(
    generated_html, "renderings/superseded_network.png", width=1080, height=1080
)

![Superseeded Legislation Network](renderings/superseded_network.png)


## Rebuilding the full text of a piece of legislation

Because we hold the full text of legislation in the graph, we can run a query to retrieve all the text associated with a piece of legislation, and rebuild part or the whole of the text.

In [18]:
query = """
MATCH (l:Legislation)-[:HAS_PART|HAS_CHAPTER|HAS_SECTION|HAS_PARAGRAPH|HAS_SCHEDULE|HAS_SUBPARAGRAPH|HAS_EXPLANATORY_NOTES*0..6]->(node)
WHERE l.uri CONTAINS 'ukpga/2010/4'
  // Only keep nodes that actually hold readable content (ignoring empty structural containers)
  AND (node.text IS NOT NULL OR node.title IS NOT NULL OR node.description IS NOT NULL)
RETURN labels(node)[0] AS node_type,
       node.id AS node_id,
       node.number AS number,
       coalesce(node.text, node.title, node.description) AS content
"""

full_text_df = analysis.run_query_df(query)
full_text = full_text_df["content"].str.cat(sep="\n\n")
print(full_text[:2000])  # Print the first 2000 characters of the full

Corporation Tax Act 2010

Introduction

Overview of Act

Part 2 is about calculation of the corporation tax chargeable on a company's profits, in particular— the rates at which corporation tax on profits is charged (see Chapter 2), ascertaining the amount of profits to which the rates of tax are applied (see Chapter 3), and the currency in which profits are to be calculated and expressed (see Chapter 4).

Parts 3A to 7 make provision for the following reliefs— . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . relief for companies with small profits (other than ring-fence profits) (see Part 3A), relief for trade losses (see Chapters 2 and 3 of Part 4), relief for losses from property businesses (see Chapter 4 of Part 4), relief for losses on a disposal of shares (see Chapter 5 of Part 4), relief for losses from miscellaneous transactions (see Chapter 6 of Part 4), group relief (see Part 5), group relief for carried-forward losses (see Part 5A), relief for qualifying charit