# Unraveling the UK Corporate Landscape: Geospatial Intelligence with Neo4j

This notebook focuses on **Geospatial Intelligence**, leveraging the enriched address data within the Neo4j graph to visualize the physical footprint of the UK corporate landscape. By combining graph traversals with high-fidelity mapping libraries like [PyDeck](https://pydeck.gl/), we move beyond simple record retrieval to uncover spatial patterns, industry clusters, and geographic risk concentrations.

The analysis answers critical questions about the physical distribution of economic activity. We visualize the density of registered businesses to identify commercial hubs, isolate "graveyards" of liquidated companies to detect potential fraud hotspots, and map the flow of control from UK assets to foreign jurisdictions. This spatial perspective is essential for understanding regional economic health, identifying shell company farms, and tracing cross-border beneficial ownership networks that may indicate capital flight or tax avoidance.

In [1]:
import dotenv
import os

dotenv.load_dotenv()

NEO4J_URI = os.getenv("NEO4J_URI")
NEO4J_USER = os.getenv("NEO4J_USER")
NEO4J_PASSWORD = os.getenv("NEO4J_PASSWORD")
NEO4J_DATABASE = os.getenv("NEO4J_DATABASE")

## Analysis and Visualization Wrapper

To streamline interactions with the database and rendering engines, we define a helper class, `Neo4jAnalysis`. This class abstracts the complexity of the Neo4j Python Driver, providing simplified methods for executing Cypher queries and returning results directly as Pandas DataFrames.

Crucially, this class also includes a rendering utility, `capture_graph_to_png`. This method integrates with [Playwright](https://playwright.dev/python/) to launch a headless browser, render complex interactive PyDeck visualizations, and capture high-resolution screenshots. This capability allows us to automate the generation of static assets for reports while retaining the interactive capabilities of the underlying HTML visualizations.

In [2]:
from neo4j_analysis import Neo4jAnalysis

# Initialize the analysis helper
analysis = Neo4jAnalysis(NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD, NEO4J_DATABASE)

## Extracting Company Coordinates

We execute a Cypher query to retrieve the geospatial coordinates of all companies in the graph. By matching `(:Company)` nodes with their connected `(:Address)` nodes, we filter for records where latitude and longitude properties are present. This dataset forms the baseline for our density analysis, representing the "ground truth" of where companies are physically registered across the UK.

In [3]:
company_query = """
MATCH (c:Company)-[:REGISTERED_AT]->(a:Address)
WHERE a.latitude IS NOT NULL AND a.longitude IS NOT NULL
RETURN c.name AS name, a.latitude AS latitude, a.longitude AS longitude
"""

df = analysis.run_query_df(company_query)

## Visualizing Company Density

Using the extracted coordinate data, we generate a 3D Column Map to visualize company density. We group the data by unique coordinate pairs to calculate the volume of companies at each location. Using `pydeck`, we render these counts as extruded columns, where both the height and color of the column correspond to the number of companies. This visualization immediately highlights major economic centers, such as London, Manchester, and Birmingham, appearing as towering peaks in the topography of UK business.

In [4]:
import pydeck as pdk
import matplotlib
import matplotlib.colors as mcolors
import json

grouped_df = (
    df.groupby(["latitude", "longitude"]).size().reset_index(name="company_count")
)


custom_colors = [
    "#4c8bf5",
    "#9b59b6",
    "#e74c3c",
]  # (Google Blue, Amethyst, Alizarin Red)
cmap = mcolors.LinearSegmentedColormap.from_list("BluePurpleRed", custom_colors)

norm = mcolors.LogNorm(
    vmin=grouped_df["company_count"].min(), vmax=grouped_df["company_count"].max()
)


def get_color(count):
    rgba = cmap(norm(count))
    return [
        int(rgba[0] * 255),
        int(rgba[1] * 255),
        int(rgba[2] * 255),
        220,
    ]  # 220 = Transparency


grouped_df["fill_color"] = grouped_df["company_count"].apply(get_color)

# Sanitize Data
geo_data_dict = json.loads(grouped_df.to_json(orient="records"))

# Define the Camera (UK Wide)
view_state = pdk.ViewState(
    latitude=54.5,
    longitude=-3.0,
    zoom=6.5,
    pitch=45,
    bearing=0,
)

layer = pdk.Layer(
    "ColumnLayer",
    data=geo_data_dict,
    get_position=["longitude", "latitude"],
    # Height mapped to Count
    get_elevation="company_count",
    elevation_scale=1,
    radius=400,
    get_fill_color="fill_color",
    pickable=True,
    auto_highlight=True,
    extruded=True,
    material={
        "ambient": 0.6,
        "diffuse": 0.9,
    },
)

r = pdk.Deck(
    layers=[layer], initial_view_state=view_state, map_style=pdk.map_styles.CARTO_DARK
)

html_path = "renderings/company_density_columns_3d.html"
r.to_html(html_path, notebook_display=False)

await analysis.capture_graph_to_png(
    html_content=None,
    output_path="renderings/company_density_columns_3d.png",
    scale=1,
    width=1500,
    height=1920,
    html_file=html_path,
)

![Company Density Map](renderings/company_density_columns_3d.png)


## Extracting Liquidation Data

Next we target a specific risk segment: companies that have entered liquidation. We modify our graph query to filter for companies linked to a `(:CompanyStatus)` node with the name 'Liquidation'. Extracting the coordinates for these specific entities allows us to move from general density to specific risk mapping, setting the stage for identifying geographic clusters of business failure.

In [5]:
company_query = """
MATCH (c:Company)-[:REGISTERED_AT]->(a:Address)
WHERE a.latitude IS NOT NULL AND a.longitude IS NOT NULL
MATCH (c)-[:HAS_STATUS]->(s:CompanyStatus)
WHERE s.name = 'Liquidation'
RETURN c.name AS name, a.latitude AS latitude, a.longitude AS longitude
"""

df = analysis.run_query_df(company_query)

## Heatmap of Liquidation Clusters

We visualize the liquidation data using a Gaussian Heatmap. Unlike the column map which shows exact counts, the heatmap aggregates the intensity of points to create a continuous surface of "risk." We define a custom color gradient, ranging from cool blues to intense reds, to represent the density of failed companies. This visualization helps analysts spot potential "phoenixing" hubs or areas with abnormally high insolvency rates that may warrant further investigation.

In [6]:
import json

grouped_df = (
    df.groupby(["latitude", "longitude"]).size().reset_index(name="company_count")
)

geo_data_dict = json.loads(grouped_df.to_json(orient="records"))

# 1. Define the Camera (UK Wide)
view_state = pdk.ViewState(
    latitude=54.5,
    longitude=-3.0,
    zoom=6.5,
    pitch=45,  # Top-down view is best for 2D heatmaps
    bearing=0,
)

# 2. Define the Heatmap Layer
layer = pdk.Layer(
    "HeatmapLayer",
    data=geo_data_dict,
    get_position=["longitude", "latitude"],
    get_weight="company_count",
    radiusPixels=30,  # Radius of the "glow" around each point (in pixels, not meters)
    intensity=1.0,  # Multiplier for the heat value
    threshold=0.05,  # Cutoff: Hide areas with very low density (reduces visual noise)
    pickable=True,
    color_range=[
        [65, 182, 196],  # Light Blue (Low Density)
        [44, 127, 184],  # Medium Blue
        [37, 52, 148],  # Deep Blue
        [128, 0, 128],  # Purple
        [220, 20, 60],  # Crimson
        [255, 0, 0],  # Bright Red (High Density)
    ],
)

# 3. Render
r = pdk.Deck(
    layers=[layer],
    initial_view_state=view_state,
    map_style=pdk.map_styles.CARTO_DARK,  # Dark maps make heatmaps glow properly
)

html_path = "renderings/liquidation_density_heatmap_3d.html"
r.to_html(html_path, notebook_display=False)

await analysis.capture_graph_to_png(
    html_content=None,
    output_path="renderings/liquidation_density_heatmap_3d.png",
    scale=1,
    width=1500,
    height=1920,
    html_file=html_path,
)

![Liquidation Density Heatmap](renderings/liquidation_density_heatmap_3d.png)

## Industry Sector Dominance

This complex query analyzes the industrial character of different locations. For every coordinate cluster, it retrieves the Standard Industrial Classification (SIC) codes of the registered companies. It then aggregates this data to determine the *dominant* industry at that specific location, essentially asking, "What is the primary business of this street or building?" The query extracts the sector division (the first two digits of the SIC code) to categorize the location into broad industries like Technology, Manufacturing, or Finance.

In [7]:
sic_codes_query = """
MATCH (c:Company)-[:REGISTERED_AT]->(a:Address)
WHERE a.latitude IS NOT NULL AND a.longitude IS NOT NULL

// 1. Get the single effective SIC code for the company (same as before)
MATCH (c)-[:HAS_SIC]->(s:SICCode)
WITH c, a, s
ORDER BY s.code
WITH c, a, head(collect(s.code)) as company_sic

// 2. Count the frequency of each SIC code at each specific coordinate
//    We group by Lat, Lon, and SIC here.
WITH a.latitude as lat, a.longitude as lon, company_sic, count(*) as frequency
ORDER BY frequency DESC

// 3. Group by Coordinate only, collecting the (SIC, frequency) pairs.
//    Since we ordered by frequency DESC above, the first item in the list is the dominant one.
WITH lat, lon, collect({code: company_sic, count: frequency}) as sic_stats
WITH lat, lon, sic_stats[0] as dominant_stat

RETURN 
    lat as latitude, 
    lon as longitude, 
    dominant_stat.code as sic_code,
    // Extract the 'Division' from the dominant SIC
    substring(dominant_stat.code, 0, 2) as sector_id,
    // Optional: Return the count so you can visualize 'strength' (e.g., opacity)
    dominant_stat.count as dominance_strength
LIMIT 20000
"""

df = analysis.run_query_df(sic_codes_query)

## Visualizing Industrial Hubs

We map the dominant industries using a Scatterplot Layer. We define a custom color palette that assigns distinct hues to specific economic sectors, for example, mapping "Information & Communication" to Cyan and "Finance" to Emerald Green. The radius of each point is scaled by the "dominance strength" (the number of companies in that cluster). This creates a vivid map of economic specialization, clearly distinguishing the Tech City in Shoreditch from the financial strongholds of Canary Wharf.

In [8]:
# Example: '62' = Computer programming, '70' = Head office, etc.
def get_sector_color(sector_id):
    s = str(sector_id).strip()

    # Agriculture, Forestry, Fishing (01-03) -> Green/Brown
    if s in ["01", "02", "03"]:
        return [34, 139, 34]
    # Mining & Quarrying (05-09) -> Dark Grey
    if s in ["05", "06", "07", "08", "09"]:
        return [105, 105, 105]

    # Manufacturing (10-33) -> Purple/Slate
    if "10" <= s <= "33":
        return [147, 112, 219]
    # Energy, Water, Waste (35-39) -> Electric Blue
    if s in ["35", "36", "37", "38", "39"]:
        return [0, 191, 255]
    # Construction (41-43) -> Yellow/Orange (High Vis)
    if s in ["41", "42", "43"]:
        return [255, 215, 0]

    # Wholesale & Retail Trade (45-47) -> Red
    if s in ["45", "46", "47"]:
        return [220, 20, 60]
    # Transport & Storage (49-53) -> Navy Blue
    if s in ["49", "50", "51", "52", "53"]:
        return [0, 0, 128]
    # Accommodation & Food Service (55-56) -> Pink/Coral
    if s in ["55", "56"]:
        return [255, 127, 80]

    # Information & Communication (58-63) -> Cyan/Teal
    # Note: 62 (Programming) is dominant, so we keep it distinct if needed
    if s in ["58", "59", "60", "61", "62", "63"]:
        return [0, 255, 255]

    # Financial & Insurance (64-66) -> Emerald Green
    if s in ["64", "65", "66"]:
        return [46, 139, 87]
    # Real Estate (68) -> Magenta
    if s == "68":
        return [255, 0, 255]

    # Professional, Scientific & Technical (69-75) -> Orange
    if s in ["69", "70", "71", "72", "73", "74", "75"]:
        return [255, 165, 0]
    # Administrative & Support Service (77-82) -> Light Grey/Blue
    if s in ["77", "78", "79", "80", "81", "82"]:
        return [176, 196, 222]

    # Education (85) -> Blue (Academic)
    if s == "85":
        return [65, 105, 225]
    # Health & Social Work (86-88) -> Red Cross Red (or softer Pink)
    if s in ["86", "87", "88"]:
        return [255, 105, 180]
    # Arts, Entertainment & Recreation (90-93) -> Violet
    if s in ["90", "91", "92", "93"]:
        return [138, 43, 226]

    # Default (Unknown/Other) -> Dark Grey
    return [80, 80, 80]


df["color"] = df["sector_id"].apply(lambda x: get_sector_color(str(x)) + [200])

# Scale the radius so big clusters are visible but don't cover the map
#    We use a simple square root scale so 100x companies = 10x width
df["radius"] = df["dominance_strength"].apply(lambda x: (x**0.5) * 20)

view_state = pdk.ViewState(
    latitude=54.5,
    longitude=-3.0,
    zoom=6.5,
    pitch=45,
    bearing=0,
)

geo_data_dict = json.loads(df.to_json(orient="records"))

layer = pdk.Layer(
    "ScatterplotLayer",
    data=geo_data_dict,
    get_position=["longitude", "latitude"],
    get_fill_color="color",
    get_radius="radius",
    pickable=True,
    opacity=0.8,
    stroked=True,
    filled=True,
    radius_min_pixels=3,
    radius_max_pixels=50,
)

r = pdk.Deck(
    layers=[layer],
    initial_view_state=view_state,
    map_style=pdk.map_styles.CARTO_DARK,
    tooltip={
        "html": "<b>Sector:</b> {sector_id}<br/>"
        "<b>Dominant Type:</b> {sic_code}<br/>"
        "<b>Cluster Size:</b> {dominance_strength} companies"
    },
)

html_path = "renderings/sic_code_distribution.html"
r.to_html(html_path, notebook_display=False)

await analysis.capture_graph_to_png(
    html_content=None,
    output_path="renderings/sic_code_distribution.png",
    scale=1,
    width=1500,
    height=1920,
    html_file=html_path,
)

![SIC Code Distribution](renderings/sic_code_distribution.png)

## Financial District Analysis

We narrow our focus to the Financial sector within the Greater London area. The query filters for companies with SIC codes containing "finance" or "financial" and utilizes Neo4j's spatial functions (`point.distance`) to retain only those within a 50km radius of central London. We then aggregate these counts by postcode district (e.g., "EC2M"), preparing the data for a choropleth map that compares the density of financial institutions across different administrative boundaries.

In [9]:
finance_district_query = """
MATCH (c:Company)-[:HAS_SIC]-(sc:SICCode)
WHERE toLower(sc.code) CONTAINS 'finance' OR toLower(sc.code) CONTAINS 'financial'
MATCH (c)-[:REGISTERED_AT]->(a:Address)
WHERE a.latitude IS NOT NULL AND a.longitude IS NOT NULL
  AND point.distance(
      point({latitude: a.latitude, longitude: a.longitude}),
      point({latitude: 51.5074, longitude: -0.1278})
  ) < 50000  // 50km Radius
// Extract the District (e.g., 'EC2M' from 'EC2M 7PP')
WITH split(a.postcode, ' ')[0] AS district, count(c) AS company_count
RETURN district, company_count
ORDER BY company_count DESC
"""

df = analysis.run_query_df(finance_district_query)

## Geospatial Boundary Integration

To render a choropleth map, we require the actual geometric shapes of the postcode districts. This cell dynamically fetches GeoJSON boundary files from an external repository for the districts identified in our query. It merges these geometric shapes with our financial company counts into a unified GeoDataFrame. We also compute the color values for each district in Python, applying a logarithmic scale to ensure that the intense density of the City of London doesn't visually drown out smaller financial hubs in the periphery.

In [None]:
import geopandas as gpd
import pandas as pd
import requests
import io
import re


def get_postcode_area(district):
    # Extracts the leading letters (e.g., "SW" from "SW1A", "B" from "B1")
    match = re.match(r"([A-Z]+)", district, re.I)
    return match.group(1) if match else None


df["area"] = df["district"].apply(get_postcode_area)

unique_areas = df["area"].dropna().unique()
base_url = "https://raw.githubusercontent.com/missinglink/uk-postcode-polygons/refs/heads/master/geojson"

gdf_list = []

for area in unique_areas:
    url = f"{base_url}/{area.upper()}.geojson"
    try:
        # Fetch the file from GitHub
        response = requests.get(url)
        if response.status_code == 200:
            # Read GeoJSON from memory
            area_gdf = gpd.read_file(io.BytesIO(response.content))
            gdf_list.append(area_gdf)
        else:
            print(f"Warning: Could not find {url}")
    except Exception as e:
        print(f"Error fetching {area}: {e}")

if not gdf_list:
    raise ValueError("No GeoJSON data could be retrieved.")

# Combine all downloaded areas into one main GeoDataFrame
full_gdf = pd.concat(gdf_list, ignore_index=True)
if full_gdf.crs and full_gdf.crs.to_string() != "EPSG:4326":
    full_gdf = full_gdf.to_crs(epsg=4326)

# The GeoJSONs usually have a 'name' property matching the district (e.g., "AB10")
merged_gdf = full_gdf.merge(df, left_on="name", right_on="district", how="left")
merged_gdf["company_count"] = merged_gdf["company_count"].fillna(0)

# We calculate colors in Python to keep PyDeck fast
cmap = matplotlib.colormaps["YlOrRd"]
norm = mcolors.LogNorm(vmin=1, vmax=merged_gdf["company_count"].max())


def get_fill_color(count):
    if count == 0:
        return [30, 30, 155, 150]  # Transparent blueish grey for empty districts
    rgba = cmap(norm(count))
    return [int(rgba[0] * 255), int(rgba[1] * 255), int(rgba[2] * 255), 200]


merged_gdf["fill_color"] = merged_gdf["company_count"].apply(get_fill_color)

## Finance Sector Choropleth

We render the financial density data using a `GeoJsonLayer`. This visualization paints each postcode district according to its calculated heat value and extrudes the shape based on the company count. The result is a 3D landscape of the financial sector, providing a clear, district-level view of how the industry radiates outward from the City of London and Canary Wharf into the surrounding boroughs.

In [12]:
geo_data_dict = json.loads(merged_gdf.to_json())

view_state = pdk.ViewState(
    latitude=51.5074,  # Centered on London
    longitude=-0.1278,
    zoom=8.5,  # Zoomed out to see Oxford, Cambridge, and Brighton
    pitch=45,  # Tilted to show the 3D height of the bars
    bearing=-35,  # North is roughly up
)

layer = pdk.Layer(
    "GeoJsonLayer",
    data=geo_data_dict,
    opacity=0.8,
    stroked=True,
    filled=True,
    extruded=True,
    wireframe=False,
    get_fill_color="properties.fill_color",
    get_line_color=[255, 255, 255, 50],
    get_elevation="properties.company_count * 3",  # Scale elevation for better visibility
    get_line_width=50,
    pickable=True,
    auto_highlight=True,
)

r = pdk.Deck(
    layers=[layer],
    initial_view_state=view_state,
    map_style=pdk.map_styles.CARTO_DARK,
    tooltip={"html": "<b>{name}</b><br/>Finance Companies: {company_count}"},
)

html_path = "renderings/finance_choropleth_dynamic.html"
r.to_html(html_path, notebook_display=False)

await analysis.capture_graph_to_png(
    html_content=None,
    output_path="renderings/finance_choropleth_dynamic.png",
    scale=1,
    width=1500,
    height=1920,
    html_file=html_path,
)

![Financial Industry Choropleth](renderings/finance_choropleth_dynamic.png)

## Cross-Border Outflow Extraction

This cell initiates the analysis of international control networks. We execute two parallel queries to sample "outflows": UK companies controlled by Persons residing abroad, and UK companies controlled by Organizations based abroad. By filtering out the UK as the country of residence, we isolate the connections that bridge international borders. We retrieve the coordinates of the UK registered office (Source) and the foreign country (Target) to map the vectors of external influence.

In [13]:
outflows_persons_query = """
MATCH (p:Person)-[:LIVES_AT]->(pa:Address)
MATCH (p)-[:RESIDES_IN]->(co:Country)
MATCH (p)-[:CONTROLS]->(c:Company)-[:REGISTERED_AT]->(ca:Address)
WHERE co.name <> 'United Kingdom'
  AND pa.latitude IS NOT NULL AND pa.longitude IS NOT NULL
  AND ca.latitude IS NOT NULL AND ca.longitude IS NOT NULL
WITH p, pa, c, ca, co
ORDER BY rand()
LIMIT 2000
RETURN
    p.name as controller,
    c.name as company,
    ca.latitude as src_lat, 
    ca.longitude as src_lon,
    co.name as country,
    'Person' as type
"""

outflows_org_query = """
MATCH (o:Organization)-[:REGISTERED_AT]->(pa:Address)
MATCH (o)-[:BASED_IN]->(co:Country)
MATCH (o)-[:CONTROLS]->(c:Company)-[:REGISTERED_AT]->(ca:Address)
WHERE co.name <> 'United Kingdom'
  AND pa.latitude IS NOT NULL AND pa.longitude IS NOT NULL
  AND ca.latitude IS NOT NULL AND ca.longitude IS NOT NULL
WITH o, pa, c, ca, co
ORDER BY rand()
LIMIT 2000
RETURN
    o.name as controller,
    c.name as company,
    ca.latitude as src_lat, 
    ca.longitude as src_lon,
    co.name as country,
    'Organization' as type
"""

df_persons = analysis.run_query_df(outflows_persons_query)

df_orgs = analysis.run_query_df(outflows_org_query)

df = pd.concat([df_persons, df_orgs], ignore_index=True)

## Mapping Global Control Networks

We visualize the international ownership connections using an Arc Layer. We define a dictionary of centroids for key countries to serve as the destination points for our arcs. The visualization draws lines from the UK to the controller's country of residence, with the color of the arc transitioning from white (source) to a color representing the volume of connections (target). This "spider web" map effectively highlights the jurisdictions that exert the most significant control over UK assets (arcs ending in **red** as to opposed to **blue**, which exert the least influence), making it easy to spot heavy inflows from tax havens or specific foreign powers.

> Note the heavy concentration of arcs pointing towards the Jersey Channel Islands, a notorious tax haven.

In [14]:
COUNTRY_CENTROIDS = {
    # Europe
    "Austria": [14.5501, 47.5162],
    "Belgium": [4.4699, 50.5039],
    "Bulgaria": [25.4858, 42.7339],
    "Croatia": [15.2000, 45.1000],
    "Cyprus": [33.4299, 35.1264],
    "Czech Republic": [15.4730, 49.8175],
    "Denmark": [9.5018, 56.2639],
    "Estonia": [25.0136, 58.5953],
    "Finland": [25.7482, 61.9241],
    "France": [2.2137, 46.2276],
    "Germany": [10.4515, 51.1657],
    "Greece": [21.8243, 39.0742],
    "Hungary": [19.5033, 47.1625],
    "Ireland": [-8.2439, 53.4129],
    "Italy": [12.5674, 41.8719],
    "Latvia": [24.6032, 56.8796],
    "Lithuania": [23.8813, 55.1694],
    "Luxembourg": [6.1296, 49.8153],
    "Malta": [14.3754, 35.9375],
    "Netherlands": [5.2913, 52.1326],
    "Poland": [19.1451, 51.9194],
    "Portugal": [-8.2245, 39.3999],
    "Romania": [24.9668, 45.9432],
    "Slovakia": [19.6990, 48.6690],
    "Slovenia": [14.9955, 46.1512],
    "Spain": [-3.7492, 40.4637],
    "Sweden": [18.6435, 60.1282],
    "Switzerland": [8.2275, 46.8182],
    # Tax Havens / Crown Dependencies
    "Jersey": [-2.1312, 49.2144],
    "Guernsey": [-2.5853, 49.4482],
    "Isle of Man": [-4.5481, 54.2361],
    "British Virgin Islands": [-64.6399, 18.4207],
    "Cayman Islands": [-81.2546, 19.3133],
    "Bermuda": [-64.7574, 32.3078],
    "Bahamas": [-77.3963, 25.0343],
    "Panama": [-80.7821, 8.5380],
    "Seychelles": [55.4920, -4.6796],
    # Asia & Middle East
    "China": [104.1954, 35.8617],
    "Hong Kong": [114.1694, 22.3193],
    "Japan": [138.2529, 36.2048],
    "Singapore": [103.8198, 1.3521],
    "Australia": [133.7751, -25.2744],
    "India": [78.9629, 20.5937],
    "Russia": [105.3188, 61.5240],
    "Turkey": [35.2433, 38.9637],
    "United Arab Emirates": [53.8478, 23.4241],
    # Northern Africa
    "Morocco": [-7.0926, 31.7917],
    "Algeria": [1.6596, 28.0339],
    "Tunisia": [9.5375, 33.8869],
}


def get_centroid(country_name):
    return COUNTRY_CENTROIDS.get(str(country_name).strip(), [None, None])


# Apply Centroids
coords = df["country"].apply(get_centroid)
df["tgt_lon"] = coords.apply(lambda x: x[0])
df["tgt_lat"] = coords.apply(lambda x: x[1])

df_clean = df.dropna(subset=["tgt_lat", "tgt_lon"]).copy()

country_counts = df_clean["country"].value_counts()
count_map = country_counts.to_dict()
min_val = country_counts.min()
max_val = country_counts.max()


# Define a function to generate a Blue -> Red gradient based on count
def get_heat_color(country):
    count = count_map.get(country, 0)

    # Normalize the count to a 0.0 - 1.0 scale
    # Prevent division by zero if only one country exists or min == max
    if max_val == min_val:
        ratio = 0.5
    else:
        ratio = (count - min_val) / (max_val - min_val)

    # Linear Interpolation:
    # Low (0.0) = Blue [0, 0, 255]
    # High (1.0) = Red [255, 0, 0]

    r = int(255 * ratio)
    g = 0  # Keeping green at 0 for a sharp Blue-Purple-Red transition
    b = int(255 * (1 - ratio))

    return [r, g, b, 160]  # Add Alpha channel (transparency)


# Apply the color and store the count for the tooltip
df_clean["color"] = df_clean["country"].apply(get_heat_color)
df_clean["count"] = df_clean["country"].map(count_map)

geo_data_dict = df_clean.to_dict(orient="records")

view_state = pdk.ViewState(
    latitude=50.5, longitude=12.5, zoom=4.4, pitch=55, bearing=30
)

layer = pdk.Layer(
    "ArcLayer",
    data=geo_data_dict,
    # SOURCE: UK Company
    get_source_position=["src_lon", "src_lat"],
    # TARGET: Foreign Controller
    get_target_position=["tgt_lon", "tgt_lat"],
    get_source_color=[255, 255, 255, 60],  # Faint white source
    get_target_color="color",  # Colored target
    get_width=2,
    get_tilt=15,
    pickable=True,
)

r = pdk.Deck(
    layers=[layer],
    initial_view_state=view_state,
    map_style=pdk.map_styles.CARTO_DARK,
    tooltip={
        "html": "<b>Controller:</b> {controller} ({type})<br/><b>Location:</b> {country}<br/><b>Company:</b> {company}"
    },
)

html_path = "renderings/unified_outflows.html"
r.to_html(html_path, notebook_display=False)

await analysis.capture_graph_to_png(
    html_content=None,
    output_path="renderings/unified_outflows.png",
    scale=1,
    width=1500,
    height=1920,
    html_file=html_path,
)

![Outflows](renderings/unified_outflows.png)