# Ultimate Beneficial Owner (UBO) Analysis

UBO analysis is a critical component of KYC (Know Your Customer) processes, enabling organizations to identify the ultimate beneficial owners of companies. This notebook exemplifies how to perform UBO analysis using Neo4j, focusing on majority indirect control and voting rights indirect control, using Companies House data.

Company and Person of Significant Control (PSC) data from Companies House is used to identify the relationships between individuals and companies, allowing us to determine the ultimate beneficial owners - this data is highly connected and ideal for graph analysis. The notebook demonstrates how to query this data, visualize the relationships, and interpret the results to identify UBOs effectively.

In [67]:
import dotenv
import os

dotenv.load_dotenv()

NEO4J_URI = os.getenv("NEO4J_URI")
NEO4J_USER = os.getenv("NEO4J_USER")
NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD")
NEO4J_DATABASE = os.getenv("NEO4J_DATABASE")

In [68]:
from neo4j_analysis import Neo4jAnalysis


# Initialize the analysis helper
analysis = Neo4jAnalysis(NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD, NEO4J_DATABASE)

In [69]:
colors = {
    "Country": "#1f77b4",  # Blue for Countries
    "Address": "#ff7f0e",  # Orange for Addresses
    "Person": "#2ca02c",  # Green for Persons
    "PreviousName": "#d62728",  # Red for Previous Names
    "SICCode": "#9467bd",  # Purple for SIC Codes
    "Company": "#8c564b",  # Brown for Companies
    "CompanyCategory": "#e377c2",  # Pink for Company Categories
    "CompanyStatus": "#7f7f7f",  # Gray for Company Statuses
    "SupervisoryAuthority": "#bcbd22",  # Olive for Supervisory Authorities
    "AuthorisedCorporateServiceProvider": "#17becf",  # Cyan for Authorised Corporate Service Providers,
    "Organization": "#aec7e8",  # Light Blue for Organizations
}

## Basic UBO Analysis: Majority Indirect Control

One of the simplest forms of UBO analysis is to identify individuals or organizations who have direct control over a company.

> Note the `r.ceased_on IS NULL` condition in the query, which ensures that we are only considering active control relationships. Without this condition, we might include relationships that have ended, which could lead to inaccurate conclusions about current UBOs.

In [70]:
from neo4j_viz.neo4j import from_neo4j, ColorSpace

voting_rights_query = """
MATCH p=(ctrl:Organization|Person)-[r:CONTROLS]->(c:Company)
MATCH p1=(ctrl)-[:BASED_IN|RESIDES_IN]->(co:Country)
WHERE c.number = "SL035117" AND r.ceased_on IS NULL
RETURN p, p1
"""

results = analysis.run_query_viz(voting_rights_query)

VG = from_neo4j(results)
VG.color_nodes(
    field="caption",  # Using the internal labels property
    color_space=ColorSpace.DISCRETE,
    colors=colors,
)

label_to_property = {
    "Organization": "name",
    "Person": "name",
    "Company": "name",
    "Country": "name",
}

analysis.set_caption_by_label(VG, label_to_property)

generated_html = VG.render(layout="forcedirected", initial_zoom=1.9)
await analysis.capture_graph_to_png(
    generated_html, "renderings/voting_rights_graph.png"
)

![Voting Rights Graph](renderings/voting_rights_graph.png)

## Visually distinguishing voting and share rights

We can visually distinguish between voting rights and share rights by using different colors for the relationships in the graph. Let us run a query that calculates the thickness of the relationships based on the level of control (e.g., low, medium, high), so we can visually differentiate the strength of control in the graph.

In [71]:
thickness_query = """
CALL() {
    MATCH (:Person|Organization)-[r:CONTROLS]->(:Company)
    SET r.thickness = CASE
        WHEN coalesce(r.voting_rights_min, 0) <= 25 AND coalesce(r.share_rights_min, 0) <= 25 THEN 1
        WHEN r.voting_rights_min <= 50 OR r.share_rights_min <= 50 THEN 2
        WHEN r.voting_rights_min <= 75 OR r.share_rights_min <= 75 THEN 3
        ELSE 4
    END
} IN TRANSACTIONS OF 50000 ROWS
"""

results = analysis.run_query(thickness_query)

## Extended control

Let's now extend our analysis from the prior example, to show which other interests controlling members of the company `SL035117` have, and how they are connected to other companies. Note that now degrees of control are visually distinct by the thickness and color of the relationships.

In [72]:
voting_rights_expanded_query = """
MATCH p=(ctrl:Organization|Person)-[r1:CONTROLS]->(c:Company)
WHERE c.number = "SL035117" AND r1.ceased_on IS NULL
MATCH p1=(ctrl)-[:BASED_IN|RESIDES_IN]->(co:Country)
OPTIONAL MATCH p2=(ctrl)-[r2:CONTROLS]->(other_c:Company)
WHERE other_c <> c AND r2.ceased_on IS NULL
RETURN p, p1, p2
"""

results = analysis.run_query_viz(voting_rights_expanded_query)

VG = from_neo4j(results)
VG.color_nodes(
    field="caption",  # Using the internal labels property
    color_space=ColorSpace.DISCRETE,
    colors=colors,
)
VG.resize_relationships(
    property="thickness",  # Using the thickness property set by the query
)
VG.color_relationships(
    property="thickness",  # Using the thickness property to color relationships
    color_space=ColorSpace.DISCRETE,
    colors={
        1: "blue",  # Blue for low control (<=25% voting and share rights)
        2: "orange",  # Orange for medium control (26-50% voting and share rights)
        3: "red",  # Red for high control (51-75% voting and share rights)
        4: "purple",  # Purple for anything else (>75% voting and share rights)
    },
)

label_to_property = {
    "Organization": "name",
    "Person": "name",
    "Company": "name",
    "Country": "name",
}

analysis.set_caption_by_label(VG, label_to_property)

generated_html = VG.render(layout="forcedirected", initial_zoom=1.3)
await analysis.capture_graph_to_png(
    generated_html, "renderings/voting_rights_expanded_graph.png"
)

![Expanded Voting Rights Graph](renderings/voting_rights_expanded_graph.png)

## Indirect control

Indirect control isn't always explicitly represented in the data. For example, companies data can represent legal control through share ownership, but not necessarily voting control. In such cases, we can infer indirect control by analyzing the share ownership structure and applying certain rules (e.g., if a person owns more than 50% of the shares, they have effective control). This can be visualized by adding inferred relationships to the graph, which can be done through additional queries or data processing steps.

What follows is such an example (for company number `02188631`), where we infer indirect control based on share ownership and a "bridge" holding company. The resulting graph highlights the indirect control relationships, allowing us to better understand the influence and control dynamics within the corporate structure.

In [73]:
indirect_control_query = """
MATCH (p:Person)-[r1:CONTROLS]->(inter_co:Company)
WHERE inter_co.number = "02188631" AND r1.ceased_on IS NULL
MATCH (inter_org:Organization)-[r2:CONTROLS]->(target_c:Company)
WHERE inter_org.registration_number = inter_co.number AND r2.ceased_on IS NULL
  AND (r1.voting_rights_min > 50 OR r1.ownership_of_shares_min > 50)
  AND (r2.voting_rights_min >= 25 OR r2.ownership_of_shares_min >= 25)

// Group everything by the "Bridge" (Company + Org)
WITH inter_co, inter_org, collect({
    person: p, 
    control_r1: r1, 
    control_r2: r2, 
    target: target_c
}) AS ownership_paths

// Create the virtual relationship ONCE per bridge
WITH inter_co, inter_org, ownership_paths, 
     apoc.create.vRelationship(inter_co, 'SAME_AS', {type: 'identity_link'}, inter_org) AS vRel

// Unwind the paths back out to show the full graph
UNWIND ownership_paths AS path
RETURN path.person, 
       path.control_r1, 
       inter_co, 
       vRel,
       inter_org, 
       path.control_r2, 
       path.target
"""

results = analysis.run_query_viz(indirect_control_query)

VG = from_neo4j(results)
VG.color_nodes(
    field="caption",
    color_space=ColorSpace.DISCRETE,
    colors=colors,
)
VG.resize_relationships(
    property="thickness",
)
VG.color_relationships(
    property="thickness",  # Using the thickness property to color relationships
    color_space=ColorSpace.DISCRETE,
    colors={
        1: "blue",  # Blue for low control
        2: "orange",  # Orange for medium control
        3: "red",  # Red for high control
        4: "purple",  # Purple for anything else
    },
)

analysis.set_caption_by_label(VG, label_to_property)

generated_html = VG.render(layout="forcedirected", initial_zoom=2.1)
await analysis.capture_graph_to_png(
    generated_html, "renderings/voting_rights_indirect_control_graph.png"
)

![Indirect Control Graph](renderings/voting_rights_indirect_control_graph.png)

We can also look for indirect control structures, but starting on a Person instead of an Organization.

In [74]:
person_indirect_control_query = """
MATCH (p:Person)-[r1:CONTROLS]->(inter_co:Company)
WHERE p.uid="025f044720e0cd9c0f42fb06fbbb9f77" AND r1.ceased_on IS NULL
MATCH (inter_org:Organization)-[r2:CONTROLS]->(target_c:Company)
WHERE inter_org.registration_number = inter_co.number AND r2.ceased_on IS NULL
  AND (r1.voting_rights_min > 50 OR r1.ownership_of_shares_min > 50)
  AND (r2.voting_rights_min >= 25 OR r2.ownership_of_shares_min >= 25)

WITH inter_co, inter_org, collect({
    person: p, 
    control_r1: r1, 
    control_r2: r2, 
    target: target_c
}) AS ownership_paths

WITH inter_co, inter_org, ownership_paths, 
     apoc.create.vRelationship(inter_co, 'SAME_AS', {type: 'identity_link'}, inter_org) AS vRel

UNWIND ownership_paths AS path
RETURN path.person, 
       path.control_r1, 
       inter_co, 
       vRel,            // This single object is now shared by all rows
       inter_org, 
       path.control_r2, 
       path.target
"""

results = analysis.run_query_viz(person_indirect_control_query)

VG = from_neo4j(results)
VG.color_nodes(
    field="caption",
    color_space=ColorSpace.DISCRETE,
    colors=colors,
)
VG.resize_relationships(
    property="thickness",
)
VG.color_relationships(
    property="thickness",  # Using the thickness property to color relationships
    color_space=ColorSpace.DISCRETE,
    colors={
        1: "blue",  # Blue for low control
        2: "orange",  # Orange for medium control
        3: "red",  # Red for high control
        4: "purple",  # Purple for anything else
    },
)

analysis.set_caption_by_label(VG, label_to_property)

generated_html = VG.render(layout="forcedirected", initial_zoom=0.8)
await analysis.capture_graph_to_png(
    generated_html, "renderings/person_voting_rights_indirect_control_graph.png"
)

![Person's Indirect Control Graph](renderings/person_voting_rights_indirect_control_graph.png)

With the above, we can also trim the graph so we show only **direct or indirect control** of the original person of interest above a certain threshold, by trimming the graph to only show relationships with a share ownership percentage above it (e.g., 50%). This can help focus the analysis on more significant control relationships, while still capturing both direct and indirect influence.

This allows to uncover [**"russian doll"**](https://www.mondaq.com/unitedstates/securities/1424234/the-unravelling-of-the-matryoshka-doll-impact-of-the-cta-on-entities-having-nexus-to-the-us) structures, where a person controls a company through multiple layers of ownership, which can be crucial for UBO analysis and understanding the true control behind a company.

In [75]:
person_majority_indirect_control_query = """
MATCH (p:Person)-[r1:CONTROLS]->(inter_co:Company)
WHERE p.uid="025f044720e0cd9c0f42fb06fbbb9f77" AND r1.ceased_on IS NULL
MATCH (inter_org:Organization)-[r2:CONTROLS]->(target_c:Company)
WHERE inter_org.registration_number = inter_co.number AND r2.ceased_on IS NULL

// STRICT CONTROL: Apply >50% (Majority) Rule to BOTH steps
  // Person must control the Intermediary (>50%)
  AND (r1.voting_rights_min > 50 OR r1.ownership_of_shares_min > 50)
  // Intermediary must control the Target (>50%)
  AND (r2.voting_rights_min > 50 OR r2.ownership_of_shares_min > 50)

// Deduplication Logic (for clean visualization)
WITH inter_co, inter_org, collect({
    person: p, 
    control_r1: r1, 
    control_r2: r2, 
    target: target_c
}) AS ownership_paths

// Create the virtual relationship ONCE per bridge
WITH inter_co, inter_org, ownership_paths, 
     apoc.create.vRelationship(inter_co, 'SAME_AS', {type: 'identity_link'}, inter_org) AS vRel

// Unwind back to individual paths
UNWIND ownership_paths AS path
RETURN path.person, 
       path.control_r1, 
       inter_co, 
       vRel,
       inter_org, 
       path.control_r2, 
       path.target
"""

results = analysis.run_query_viz(person_majority_indirect_control_query)

VG = from_neo4j(results)
VG.color_nodes(
    field="caption",
    color_space=ColorSpace.DISCRETE,
    colors=colors,
)
VG.resize_relationships(
    property="thickness",
)
VG.color_relationships(
    property="thickness",  # Using the thickness property to color relationships
    color_space=ColorSpace.DISCRETE,
    colors={
        1: "blue",  # Blue for low control
        2: "orange",  # Orange for medium control
        3: "red",  # Red for high control
        4: "purple",  # Purple for anything else
    },
)

analysis.set_caption_by_label(VG, label_to_property)

generated_html = VG.render(layout="forcedirected", initial_zoom=0.8)
await analysis.capture_graph_to_png(
    generated_html, "renderings/person_majority_indirect_control_graph.png"
)

![Majority Indirect Control Graph](renderings/person_majority_indirect_control_graph.png)

## Deep "Russian Dolls"

Because of the nature of graphs, we can easily extend the analysis to uncover deeper "russian doll" structures, where a person controls a company through multiple layers of ownership. This can be done by recursively applying the indirect control logic, allowing us to identify UBOs even in complex ownership structures.

Let us run a query which uncovers such deeper structures.

> Note that there is no assumption of malicious intent in such structures - they can be perfectly legitimate. However, identifying them is crucial for transparency and understanding the true control behind a company, which is the essence of UBO analysis.

First we need to create the necessary persistent links between companies and organizations, so we can then recursively traverse them to uncover deeper structures. This is done in the following query.

In [76]:
create_bridge_relationships_query = """
// Create persistent links between Companies and Orgs
MATCH (c:Company), (o:Organization)
WHERE c.number = o.registration_number
MERGE (c)-[:SAME_AS]-(o);
"""

results = analysis.run_query(create_bridge_relationships_query)

And now we can run the recursive query to uncover deeper "russian doll" structures, which can be visualized in the graph to understand the complex ownership and control dynamics. This way we can identify UBOs even in cases where there are many layers of ownership. In the following case, we are visualising "Russian Doll" structures aproximately five layers deep.

In [77]:
deep_russian_dolls_query = """
MATCH path = (p:Person)-[r:CONTROLS|SAME_AS*9..]->(end_c:Company)

// Filter: Ensure the path is valid (Active Majority Control)
WHERE ALL(r IN relationships(path) WHERE 
    type(r) = 'SAME_AS' OR 
    (r.voting_rights_min > 50 AND r.ceased_on IS NULL)
)

// Filter: Ensure we aren't just traversing internal links
// We want the last node to be a target company, not an intermediary org
AND NOT (end_c)-[:SAME_AS]->() 

RETURN path
LIMIT 50;
"""

results = analysis.run_query_viz(deep_russian_dolls_query)

VG = from_neo4j(results)
VG.color_nodes(
    field="caption",
    color_space=ColorSpace.DISCRETE,
    colors=colors,
)
VG.resize_relationships(
    property="thickness",
)
VG.color_relationships(
    property="thickness",  # Using the thickness property to color relationships
    color_space=ColorSpace.DISCRETE,
    colors={
        1: "blue",  # Blue for low control
        2: "orange",  # Orange for medium control
        3: "red",  # Red for high control
        4: "purple",  # Purple for anything else
    },
)

analysis.set_caption_by_label(VG, label_to_property)

generated_html = VG.render(layout="forcedirected", initial_zoom=0.4)
await analysis.capture_graph_to_png(
    generated_html, "renderings/deep_russian_dolls_graph.png"
)

![Deep Russian Dolls Graph](renderings/deep_russian_dolls_graph.png)

## Geographical "Russian Dolls"

Because we have geolocation data for companies and controlling organizations and persons, we can also visualize the "Russian Doll" structures in a geographical context. This can help identify patterns of control across different jurisdictions, which is particularly relevant for UBO analysis in the context of offshore companies and tax havens.

As an example, let us visualize one of the "Russian Doll" structures we uncovered in the previous section, but now in a geographical context.

To interpret this visualization, think of it as tracking the "flight path" of corporate control across the globe:

- Follow the Color (Red to Blue): The path begins in Red, representing the Ultimate Beneficial Owner (UBO) or the source of control. As the ownership chain passes through intermediaries, the color transitions to yellow and finally "cools down" to Blue, which marks the final Target Company or asset.

- Watch the Altitude: The height of the arcs corresponds to the depth of the chain. The initial control relationships (the UBO) tower high above the map, representing overarching influence, while the final links "land" at the specific location of the asset.

- Spotting the Loop: A normal business structure might show a direct line (e.g., London â†’ Manchester). A "Russian Doll" structure is revealed by the geographic detour: a path that starts in the UK, "hops" to a tax haven (like the BVI, Jersey or the Isle of Man), and then loops back to own a property just a few miles from where it started.

In [78]:
geo_russian_dolls_query = """
MATCH path = (p:Person)-[:CONTROLS|SAME_AS*..10]->(target:Company)
MATCH (target)-[:HAS_STATUS]->(cs:CompanyStatus)
WHERE 
  // Anchor to a specific high-risk profile or random sample
  p.nationality = 'British' AND cs.name="Active"
  AND length(path) >= 6 // Ensure it's deep (Person -> Co -> Org -> Co -> Target)

// Filter for valid coordinates at the START and END of the chain
// (Intermediate nodes might lack coords if they are purely foreign entities in the text data, 
//  but we will try to get as many as possible).
WITH path, nodes(path) as n
RETURN 
    [node in n | labels(node)[0]] as node_types,
    [node in n | coalesce(node.name, 'Unknown')] as names,
    // Extract Lat/Lons for every node that has an address
    [node in n | 
        CASE 
            WHEN node:Person THEN head([(node)-[:LIVES_AT]->(a) | a.latitude])
            WHEN node:Company THEN head([(node)-[:REGISTERED_AT]->(a) | a.latitude])
            WHEN node:Organization THEN head([(node)-[:REGISTERED_AT]->(a) | a.latitude])
            ELSE null 
        END
    ] as lats,
    [node in n | 
        CASE 
            WHEN node:Person THEN head([(node)-[:LIVES_AT]->(a) | a.longitude])
            WHEN node:Company THEN head([(node)-[:REGISTERED_AT]->(a) | a.longitude])
            WHEN node:Organization THEN head([(node)-[:REGISTERED_AT]->(a) | a.longitude])
            ELSE null 
        END
    ] as lons
LIMIT 1;
"""

df = analysis.run_query_df(geo_russian_dolls_query)

In [79]:
import pandas as pd
import numpy as np

# Container for the flat segments
segments = []

for index, row in df.iterrows():
    lats = row["lats"]
    lons = row["lons"]
    names = row["names"]
    types = row["node_types"]

    # We need to find the sequence of valid coordinates
    # Filter out nodes that have no lat/lon
    valid_nodes = []
    for i in range(len(lats)):
        if lats[i] is not None and lons[i] is not None:
            valid_nodes.append(
                {
                    "lat": lats[i],
                    "lon": lons[i],
                    "name": names[i],
                    "type": types[i],
                    "original_index": i,
                }
            )

    # Create segments between consecutive valid nodes
    # This automatically "bridges" gaps if an intermediate node has no address
    for i in range(len(valid_nodes) - 1):
        source = valid_nodes[i]
        target = valid_nodes[i + 1]

        segments.append(
            {
                "source": [source["lon"], source["lat"]],
                "target": [target["lon"], target["lat"]],
                "source_name": source["name"],
                "target_name": target["name"],
                "step": i,  # How deep in the chain are we? (0 = Start)
                "total_steps": len(valid_nodes) - 1,
                "path_id": index,  # To group arcs if needed
            }
        )

# Convert to DataFrame for easier inspection if needed
segments_df = pd.DataFrame(segments)
print(f"Generated {len(segments_df)} geospatial hops from {len(df)} control chains.")

Generated 7 geospatial hops from 1 control chains.


In [None]:
import pydeck as pdk

# Define the view state (Centering on Europe/Atlantic generally works well for UK-Offshore links)
view_state = pdk.ViewState(
    latitude=54.5,
    longitude=-3.0,
    zoom=6,
    pitch=45,
    bearing=0,
)

geo_data_dict = segments_df.to_dict(orient="records")

layer = pdk.Layer(
    "ArcLayer",
    data=geo_data_dict,
    get_source_position="source",
    get_target_position="target",
    # Variable Width: Closer to the asset, the line gets slightly thicker
    get_width="2 + step",
    # Dynamic Coloring (RGBA)
    # We color-code based on the 'step' relative to 'total_steps'
    # Start (Step 0) -> Red [255, 0, 0]
    # End (Last Step) -> Blue [0, 100, 255]
    get_source_color="[255 * (1 - (step/total_steps)), 50, 255 * (step/total_steps), 160]",
    get_target_color="[255 * (1 - ((step+1)/total_steps)), 50, 255 * ((step+1)/total_steps), 160]",
    # Dynamic Height (The "Landing" Effect)
    # Early steps (Controllers) fly high (e.g., 500km)
    # Late steps (Assets) fly low (e.g., 50km)
    get_height="5 * (1 - (step / (total_steps + 1)))",
    get_tilt=15,
    pickable=True,
)

r = pdk.Deck(
    layers=[layer],
    initial_view_state=view_state,
    map_style=pdk.map_styles.CARTO_DARK,
    tooltip={
        "html": "<b>Hop:</b> {step}<br/>"
        "<b>From:</b> {source_name}<br/>"
        "<b>To:</b> {target_name}"
    },
)

html_path = "renderings/geospatial_russian_dolls.html"
r.to_html(html_path, notebook_display=False)

# Capture Image (Optional, matching your previous notebook style)
await analysis.capture_graph_to_png(
    html_content=None,
    output_path="renderings/geospatial_russian_dolls.png",
    scale=1,
    width=1500,
    height=1920,
    html_file=html_path,
)

![Geospatial Russian Dolls](renderings/geospatial_russian_dolls.png)