# UK Regional Resilience audit: An EDA exploration

# Introduction

The UK Regional Resilience audit is a research focused analytical framework designed to investigate the structural decoupling between economic growth and public well being. Traditional economic metrics, such as headline Gross Value Added (GVA) or Gross Domestic Product (GDP), often obscure localised pressures, institutional friction, that prevent economic prosperity from translating into improved standards of living. By combining longitudinal administrative data from the Office for National Statistics (ONS) with real-time regional news sentiment via web scraping, this study identifies areas of economic expansion  which outgrow social infrastructure resilience. Such a study is deemed important by 10 Downing Street (10DS) who have the aim of the UK sustaining G7 growth lead by 2030 (HMG, 2025).

## Background (understanding of regional resilience)

From a research standpoint, regional resilience literature is defined as a metropolitan area's ability to maintain or reconfigure its socio-economic structures in response to exogenous (external) shocks. Recent literature has moved from a rebounding (*"bounce back"*) definition toward adaptive resilience (Boschma, 2015), where regions continually develop through diversity and learning.

In such literature, the concept of a region's absorptive capacity is stressed, as insitutional theory argues that a region's realised absorptive capcity, i.e., the area's ability to assimilate and exploit external resources is a primary reason of whether an increase in capital investment leads to increased welfare (Royal Society, 2022). This is explored by Hanafy and Marktanner (2019), who examined Egypt and investigate how regional absorptive capacity moderates the impact of Foreign Direct Investment (FDI) on economic growth across various governorates (regions). They discovered found that FDI in the service sector only translated into positive regional growth when a minimum threshold of domestic private investment was already locally present to act as a proxy for absorptive capacity. Without this institutional readiness to absorb new technologies and knowledge domains, the influx of foreign capital did not lead to the expected increases in local welfare.
However, scaling limit is something to consider for a region's reabsorptive capacity. Numerous authors comprehend that physical and institutional scaling limits are present (Becker,Egger and von Ehrlich, 2013). This is where  where rapid economic influxes without corresponding infrastructure scaling result in diminished returns(aka, "*extractive growth*").


Now that some understanding of regional resilience has been established, the UK context also confirms the understanding above. The research stipulates that increases in research and development (R&D) investment will only succeed if regions have a sufficient volume of appropriately skilled labour pools to absorb funding effectively (Royal Society, 2022). A direct link between economic prosperity and high areas of "absorptive capacity" roles (occupations in technical and scientific sectors) is present where thriving regions like London and the South East show significant internal variation in their ability to exploit new knowledge.
Although, the Productivity Institute (2025) explicitly identifies that inadequate transport and digital infrastructure, looking at the Midlands,constrain economic mobility and act as significant barriers to productivity growth.

Looking at other regions, an investment versus skills gap is seen present showcasing a dual-reality innovation landscape. The Productivity Institute (2025) reports that the region saw a significant increase in UKRI investment, totaling an absolute rise of £171 million between 2021/22 and 2023/24 Despite this investment, approximately one-third of vacancies that employers find difficult to fill are attributed to a lack of digital skills.
A productivity gap is further observed in the North West region of England.productivity was approximately 90.6% of the UK average in 2019. The GVA per hour growth rate fell from 0.9% (2005–2010) to just 0.2% per annum (2010–2018).
Nonetheless, the North West of England has stagnation, this is despite the fact that the region maintains over 37,800 workplaces in professional, scientific, and technical sectors, representing a significant latent capacity for R&D absorption (Department for Business and Trade, 2024).

Indeed, the The UK government has explicitly placed significant emphasis on addressing regional disparities and fostering prosperity through its '*Levelling Up*' agenda. This agenda aims to reduce geographical inequalities across the UK, moving beyond traditional economic growth metrics to focus on a broader range of outcomes that impact people's lives. Key areas the government intends to solve include improving living standards, enhancing local leadership, boosting productivity, and increasing opportunities in areas that have historically lagged behind.
This directly aligns with the concept of regional resilience by seeking to improve the capacity of all regions to adapt, thrive, and contribute to national prosperity (HM Government, 2022).


The literature above depicts a nuanced perspective on regional resilience, absorptive capacity and growth in regions and its association with local welfare. ***The question is posed to understand these associations and discrepancies arising within the UK as seen in the paragraph above.***


# Methodology

Looking at regional resilience and its absorptive capacity in respect to capital investment and welfare certain datasets and libraries (dependcies) are used to investigate this audit. The datasets used are from 2 sources, ONS and BBC News.

ONS, used estimates here for regional economic well being under the measure GVA, GVA per head and population statistics. Using ONS regional data allows balanced panel data spanning over 18 years with over 222,000 data points. This makes such a dataset have a valid and complete dataset having data present on each individual regions and industries. Additionally, the study segregates the data into hierarchical UK geographical classifications of ITL1 and ITL2 used for statistics where ITL 1 represents large regions (like England's nine regions or Scotland as one) and ITL 2 divides these into smaller areas like groups of counties or combined authorities (ONS, 2024). ITL2 is designed for comparable international data by balancing population sizes across regional levels.
Furthermore, GVA allows more international comparative metrics like GVA per head which is more useful as cross-country study but also acknowledges limitations of using such measures like not factoring in commuters to metropolitan locations like London.

Using BBC news for real-time narrative data is becoming more popular. This technique of using near real-time allows researchers to bridge the gap that annual dataset economic reporting can miss. Headlines from contemporary news can act as proxies to understanding and driving behaviours like economic purchasing as well as being able to '*snapshot*' the public mood at a particular point in time (Baker, Bloom and Davis, 2016). In this EDA, a combination of the web scraping and sentiment polarity libraries are used to extract and be process via a natural language processing (NLP) pipeline to assing polarity scores ranging from -1 (deemed negative) to 1 (positive) (Al-Omari and Hudaib, 2020).

The reader may wonder on the usage of these metrics and sources (particulary the BBC news sentiment datasource) and its relation to regional resilience. Bristow and Healy (2014) state that such datasets encompass economic activity and resilience, sentiment more importantly has been understood to be tied to the lifecycle of economic resilience (resistance, recovery and renewal). News sentiment additionally can help be a leading indicator for exogenous shocks (e.g., mass unemployment) and such sentiment will be present a considerably longer period before official national publications cite similar outcomes (Hansen, McMahon and Neuhann, 2020). Such news sentiment can influence behaviour leading to s spread of economic impact. Conversely, where negative stories about cost of living cause consumers to postpone spending, directly affecting local growth.

By overlaying high-velocity sentiment scores with structural GVA per head, the methodology identifies regions suffering from economic growth measures. If a region shows high GVA growth but persistently negative sentiment headlines, it indicates a lack of adaptive resilience, where economic gains are being diluted by infrastructure strain or service failures.

In [None]:
gva_file_path = 'regional_gva.csv'
population_file_path = 'population.csv'
gva_per_head_file_path = 'gva_per_head.csv'

rss_feed_urls = {
    'London': 'https://www.bbc.co.uk/news/england/london/rss.xml',
    'North West': 'https://www.bbc.co.uk/news/england/north_west/rss.xml',
    'West Midlands': 'https://www.bbc.co.uk/news/england/stoke_and_staffordshire/rss.xml'
}

print("File paths and RSS feed URLs have been defined.")

File paths and RSS feed URLs have been defined.



`regional_gva.csv` file was used for the EDA here. This was extracted as an xlsx file from the ONS website itself. `/content/regionalgvabbylainuk.xlsx`. Looking at the 'Total GVA' sheet from the Excel file, the first two rows are skipped. The dataset standardises the regional column to 'Region', cleaning string whitespace, enforcing `float64` for numeric columns allowing flexibility for a wide range of numeric values and displaying `df.describe()` to check for null values.

```python
!pip install polars openpyxl requests beautifulsoup4 textblob

import polars as pl
import pandas as pd # pandas is imported because textblob requires it, though not directly used in this step.
import time
import requests
from bs4 import BeautifulSoup
from textblob import TextBlob

gva_file_path = '/content/regionalgvabbylainuk.xlsx'

print(f"Loading GVA data from: {gva_file_path}")
try:
    # Load the 'Total GVA' sheet, skipping the first two rows
    df_gva = pl.read_excel(
        source=gva_file_path,
        sheet_name='Total GVA',
        engine='openpyxl',
        read_options={"skip_rows": 2}
    )

    # Standardises the regional column to 'Region'
    if 'Unnamed: 0' in df_gva.columns: # Common column name after skipping rows in excel
        df_gva = df_gva.rename({"Unnamed: 0": "Region"})
    elif df_gva.columns[0] != 'Region':
        df_gva = df_gva.rename({df_gva.columns[0]: "Region"})

    # Clean string whitespace in the 'Region' column
    df_gva = df_gva.with_columns(pl.col("Region").str.strip())

    # Identify numeric columns (excluding 'Region') and cast them to float64
    numeric_cols = [col for col in df_gva.columns if col != "Region"]
    df_gva = df_gva.with_columns([
        pl.col(col).cast(pl.Float64) for col in numeric_cols
    ])

    print("\nRegional GVA Data after cleaning:")
    print(df_gva.head())

    # Display df.describe() to check for null values and summary statistics
    print("\nDescriptive statistics for Regional GVA Data:")
    print(df_gva.describe())

except Exception as e:
    print(f"Error loading or cleaning GVA data: {e}")

```

In [None]:
get_ipython().system('pip install polars openpyxl requests beautifulsoup4 textblob')
print("Required libraries installed successfully.")

Required libraries installed successfully.


In [None]:
import polars as pl
import pandas as pd
import time
import requests
from bs4 import BeautifulSoup
from textblob import TextBlob


gva_file_path = '/content/regionalgvabbylainuk.xlsx'


print(f"Loading GVA data from: {gva_file_path} using pandas for robust parsing and cleaning...")
try:

    df_gva_pd = pd.read_excel(
        gva_file_path,
        sheet_name='Total GVA',
        skiprows=2
    )

    print("Pandas DataFrame head after initial load:")
    print(df_gva_pd.head())

    if 'Region' in df_gva_pd.columns:
        df_gva_pd['Region'] = df_gva_pd['Region'].astype(str).str.strip()
    else:
        df_gva_pd[df_gva_pd.columns[0]] = df_gva_pd[df_gva_pd.columns[0]].astype(str).str.strip()
        df_gva_pd = df_gva_pd.rename(columns={df_gva_pd.columns[0]: 'Region'})

    df_gva = pl.from_pandas(df_gva_pd)

    non_numeric_cols = ["Region", "LAU1 code", "LA name", "SIC07 code", "SIC07 Industry"]
    numeric_cols = [col for col in df_gva.columns if col not in non_numeric_cols]
    df_gva = df_gva.with_columns([
        pl.col(col).cast(pl.Float64, strict=False) for col in numeric_cols
    ])

    print("\nRegional GVA Data after cleaning:")
    print(df_gva.head())


    print("\nDescriptive statistics for Regional GVA Data:")
    print(df_gva.describe())

except Exception as e:
    print(f"Error loading or cleaning GVA data: {e}")

Loading GVA data from: /content/regionalgvabbylainuk.xlsx using pandas for robust parsing and cleaning...
Pandas DataFrame head after initial load:
       Region  LAU1 code               LA name SIC07 code  SIC07 Industry  \
0  North East  E06000001            Hartlepool        All  All industries   
1  North East  E06000004      Stockton-on-Tees        All  All industries   
2  North East  E06000002         Middlesbrough        All  All industries   
3  North East  E06000003  Redcar and Cleveland        All  All industries   
4  North East  E06000005            Darlington        All  All industries   

   1998  1999  2000  2001  2002  ...  2007  2008  2009  2010  2011  2012  \
0   911   968   967   993  1021  ...  1287  1316  1367  1327  1309  1321   
1  2355  2500  2524  2592  2701  ...  3525  3568  3690  3534  3645  3743   
2  1524  1648  1670  1638  1746  ...  2306  2328  2415  2441  2423  2374   
3  1385  1426  1458  1432  1534  ...  1973  2067  2001  2033  1992  1985   
4  1337  

In [None]:
import polars as pl
import pandas as pd

# Update population_file_path to the correct Excel file path as the population data is in the same Excel file but a different sheet
population_file_path = '/content/regionalgvabbylainuk.xlsx'

# Ingest and Clean Population Data
print(f"Loading Population data from: {population_file_path} using pandas for robust parsing...")
try:
    df_population_pd = pd.read_excel(
        population_file_path,
        sheet_name='Population',
        skiprows=2  # Skip first 2 rows, use 3rd row as header
    )

    print("Pandas Population DataFrame head after initial load:")
    print(df_population_pd.head())

    if 'Region' in df_population_pd.columns:
        df_population_pd['Region'] = df_population_pd['Region'].astype(str).str.strip()
    elif 'Local Authority Name' in df_population_pd.columns:
        df_population_pd = df_population_pd.rename(columns={'Local Authority Name': 'Region'})
        df_population_pd['Region'] = df_population_pd['Region'].astype(str).str.strip()
    else:
        # Fallback: assume the first column is the region column and rename it
        first_col = df_population_pd.columns[0]
        df_population_pd = df_population_pd.rename(columns={first_col: 'Region'})
        df_population_pd['Region'] = df_population_pd['Region'].astype(str).str.strip()

    # Convert pandas DataFrame to Polars DataFrame
    df_population = pl.from_pandas(df_population_pd)

    non_numeric_cols_pop = ["Region", "LAU1 code", "LA name", "SIC07 code", "SIC07 Industry", "Code", "Type"]
    numeric_cols_pop = [col for col in df_population.columns if col not in non_numeric_cols_pop and str(col).isdigit()]

    df_population = df_population.with_columns([
        pl.col(col).cast(pl.Float64, strict=False) for col in numeric_cols_pop
    ])

    print("\nRegional Population Data after cleaning:")
    print(df_population.head())

    print("\nDescriptive statistics for Regional Population Data:") #Summary statistics described
    print(df_population.describe())

except Exception as e:
    print(f"Error loading or cleaning Population data: {e}")

Loading Population data from: /content/regionalgvabbylainuk.xlsx using pandas for robust parsing...
Pandas Population DataFrame head after initial load:
       Region  LAU1 code               LA name  Unnamed: 3  Unnamed: 4  \
0  North East  E06000001            Hartlepool         NaN         NaN   
1  North East  E06000004      Stockton-on-Tees         NaN         NaN   
2  North East  E06000002         Middlesbrough         NaN         NaN   
3  North East  E06000003  Redcar and Cleveland         NaN         NaN   
4  North East  E06000005            Darlington         NaN         NaN   

     1998    1999    2000    2001    2002  ...    2007    2008    2009  \
0   89753   89680   89811   90152   89993  ...   90969   91379   91530   
1  179350  180844  182517  183795  184940  ...  187937  189039  189978   
2  144031  143023  142251  141233  140090  ...  138190  137885  137273   
3  140380  139542  139193  139159  138520  ...  136940  136512  135867   
4   99273   98172   98088   9789

GVA per head is now needed, examining the same ONS file. A similar process to that explained above is set.


In [None]:
import polars as pl
import pandas as pd

# Update gva_per_head_to the correct Excel file path
gva_per_head_file_path = '/content/regionalgvabbylainuk.xlsx'

print(f"Loading GVA per head data from: {gva_per_head_file_path} using pandas for robust parsing...")
try:

    df_gva_per_head_pd = pd.read_excel(
        gva_per_head_file_path,
        sheet_name='GVA per head',
        skiprows=2
    )

    print("Pandas GVA per head DataFrame head after initial load:")
    print(df_gva_per_head_pd.head())


    if 'Region' in df_gva_per_head_pd.columns:
        df_gva_per_head_pd['Region'] = df_gva_per_head_pd['Region'].astype(str).str.strip()
    elif 'Local Authority Name' in df_gva_per_head_pd.columns:
        df_gva_per_head_pd = df_gva_per_head_pd.rename(columns={'Local Authority Name': 'Region'})
        df_gva_per_head_pd['Region'] = df_gva_per_head_pd['Region'].astype(str).str.strip()
    else:
        first_col = df_gva_per_head_pd.columns[0]
        df_gva_per_head_pd = df_gva_per_head_pd.rename(columns={first_col: 'Region'})
        df_gva_per_head_pd['Region'] = df_gva_per_head_pd['Region'].astype(str).str.strip()


    df_gva_per_head = pl.from_pandas(df_gva_per_head_pd)

    non_numeric_cols_gva_per_head = ["Region", "LAU1 code", "LA name", "SIC07 code", "SIC07 Industry", "Code", "Type"]
    numeric_cols_gva_per_head = [col for col in df_gva_per_head.columns if col not in non_numeric_cols_gva_per_head and str(col).isdigit()]

    df_gva_per_head = df_gva_per_head.with_columns([
        pl.col(col).cast(pl.Float64, strict=False) for col in numeric_cols_gva_per_head
    ])

    print("\nRegional GVA per Head Data after cleaning:")
    print(df_gva_per_head.head())

    # Display df.describe() to check for null values and summary statistics
    print("\nDescriptive statistics for Regional GVA per Head Data:")
    print(df_gva_per_head.describe())

except Exception as e:
    print(f"Error loading or cleaning GVA per head data: {e}")

Loading GVA per head data from: /content/regionalgvabbylainuk.xlsx using pandas for robust parsing...
Pandas GVA per head DataFrame head after initial load:
       Region  LAU1 code               LA name SIC07 code  SIC07 Industry  \
0  North East  E06000001            Hartlepool        All  All industries   
1  North East  E06000004      Stockton-on-Tees        All  All industries   
2  North East  E06000002         Middlesbrough        All  All industries   
3  North East  E06000003  Redcar and Cleveland        All  All industries   
4  North East  E06000005            Darlington        All  All industries   

    1998   1999   2000   2001   2002  ...   2007   2008   2009   2010   2011  \
0  10152  10794  10772  11012  11342  ...  14144  14401  14940  14458  14220   
1  13132  13822  13830  14104  14606  ...  18754  18872  19423  18515  19002   
2  10581  11524  11737  11596  12463  ...  16687  16884  17595  17731  17514   
3   9865  10219  10474  10291  11071  ...  14409  15142  147

Finally, for the ONS dataset, the cleaned GVA and Population Polars DataFrames also contains annual GVA and Population growth rate change. This is understood as the change difference/original value, and selects the latest available growth rates. In these dataframes, polars is used as opposed to an original pandas dataframes. Polars executes faster across joins, filters, and aggregations by using parallelism (Databricks, 2025). Polars uses less RAM (random access memory, used to store data and machine code) because of its Arrow-style columnar storage and integrates more broadly with Python tools (Hynkova, 2025). *The polars dataframe uses only the latest year*.

In [None]:
import polars as pl

# Prepare df_gva for unpivoting, drop SIC07 columns and are not present in population data
df_gva_prepared = df_gva.drop([col for col in df_gva.columns if 'SIC07' in col])

# Prepare df_population for unpivoting: drop 'Unnamed' columns as they are null and not useful for merging
df_population_prepared = df_population.drop([col for col in df_population.columns if 'Unnamed' in col])

# Identify non-numeric columns to be used as index for unpivot
index_cols_gva = [col for col in df_gva_prepared.columns if not str(col).isdigit()]
index_cols_population = [col for col in df_population_prepared.columns if not str(col).isdigit()]

# Unpivot df_gva
df_gva_melted = df_gva_prepared.unpivot(
    index=index_cols_gva,
    on=[col for col in df_gva_prepared.columns if str(col).isdigit()],
    variable_name="Year",
    value_name="GVA"
).with_columns(pl.col("Year").cast(pl.Int64))

# Unpivot df_population
df_population_melted = df_population_prepared.unpivot(
    index=index_cols_population,
    on=[col for col in df_population_prepared.columns if str(col).isdigit()],
    variable_name="Year",
    value_name="Population"
).with_columns(pl.col("Year").cast(pl.Int64))

print("Melted GVA Data head:")
print(df_gva_melted.head())
print("Melted Population Data head:")
print(df_population_melted.head())

# Perform an inner join on the combined GVA and Population DataFrames
# Join keys are 'Region', 'LAU1 code', 'LA name', and 'Year'
join_keys = [col for col in index_cols_gva if col in index_cols_population] # common identifier columns
join_keys.append("Year")

df_merged_economic = df_gva_melted.join(
    df_population_melted,
    on=join_keys,
    how="inner"
)

print("\nMerged Economic Data head:")
print(df_merged_economic.head())

# Sort df_merged_economic by 'Region' and 'Year'
df_merged_economic = df_merged_economic.sort([col for col in join_keys if col != 'Year'] + ["Year"])

# Calculate year-over-year 'GVA_Growth_Rate'
# Calculate year-over-year 'Population_Growth_Rate'

partition_by_cols = [col for col in join_keys if col != 'Year']

df_merged_economic = df_merged_economic.with_columns(
    (
        (pl.col("GVA") - pl.col("GVA").shift(1).over(partition_by_cols)) /
        pl.col("GVA").shift(1).over(partition_by_cols)
    ).alias("GVA_Growth_Rate")
).with_columns(
    (
        (pl.col("Population") - pl.col("Population").shift(1).over(partition_by_cols)) /
        pl.col("Population").shift(1).over(partition_by_cols)
    ).alias("Population_Growth_Rate")
)

print("\nMerged Economic Data with Growth Rates head:")
print(df_merged_economic.head())

# Filter to keep only the latest available year for which growth rates were calculated
# Get the maximum year from the DataFrame
max_year = df_merged_economic.select(pl.col("Year").max()).item()

df_latest_growth_rates = df_merged_economic.filter(
    pl.col("Year") == max_year
).select(partition_by_cols + ["Year", "GVA", "Population", "GVA_Growth_Rate", "Population_Growth_Rate"])

print(f"\nLatest Growth Rates for Year {max_year}:")
print(df_latest_growth_rates.head())

print("Descriptive statistics for Latest Growth Rates Data:")
print(df_latest_growth_rates.describe())

Melted GVA Data head:
shape: (5, 5)
┌────────────┬───────────┬──────────────────────┬──────┬────────┐
│ Region     ┆ LAU1 code ┆ LA name              ┆ Year ┆ GVA    │
│ ---        ┆ ---       ┆ ---                  ┆ ---  ┆ ---    │
│ str        ┆ str       ┆ str                  ┆ i64  ┆ f64    │
╞════════════╪═══════════╪══════════════════════╪══════╪════════╡
│ North East ┆ E06000001 ┆ Hartlepool           ┆ 1998 ┆ 911.0  │
│ North East ┆ E06000004 ┆ Stockton-on-Tees     ┆ 1998 ┆ 2355.0 │
│ North East ┆ E06000002 ┆ Middlesbrough        ┆ 1998 ┆ 1524.0 │
│ North East ┆ E06000003 ┆ Redcar and Cleveland ┆ 1998 ┆ 1385.0 │
│ North East ┆ E06000005 ┆ Darlington           ┆ 1998 ┆ 1337.0 │
└────────────┴───────────┴──────────────────────┴──────┴────────┘
Melted Population Data head:
shape: (5, 5)
┌────────────┬───────────┬──────────────────────┬──────┬────────────┐
│ Region     ┆ LAU1 code ┆ LA name              ┆ Year ┆ Population │
│ ---        ┆ ---       ┆ ---                  ┆ ---  

Next, a Python function to scrape headlines from BBC regional RSS feeds, calculate sentiment polarity scores using TextBlob, and return a Polars DataFrame with '*Region*' and '*Sentiment_Score*' is conducted.
A scrape function is required which will take RSS feed (web feed format which ives URL is a standardised, automated way) URLs, will scrape headlines and then generate sentiment scores using the TextBlob dependency.

TextBlob uses a lexicon-based approach, this means it relies on a predefined dictionary of words that have already been assigned specific sentiment scores. These words (mainly adjectives) in its internal lexicon has a assigned score for polarity and subjectivity (Rautela, Sharma and Kumar, 2023). In order to determine the overall sentiment of a sentence or statement, the dependency calculates the average of the individual word scores in a sentence.TextBlob also considers intensifiers (e.g., "*very*") which increase the impact of a word, and negations (e.g., "*not*") which typically reverse or multiply the polarity score by a factor like -0.5 (Al-Ayyoub, Al-Jarrah and Al-Kabi, 2018).

This function is added to a dataframe labelled '*df_sentiment*'.



After having all of this data from ONS and BBC. A left join is done  to combine the processed economic data which includes the latest GVA Growth Rate, Population Growth Rate, and GVA per Head with the scraped '*Sentiment_Score*' into a single Polars DataFrame, named `df_final`. The RSS feed URL dictionary is defined in the codebase below.


In [None]:
import polars as pl

# Unpivot df_gva_per_head
index_cols_gva_per_head = [col for col in df_gva_per_head.columns if not str(col).isdigit() and col not in ['SIC07 code', 'SIC07 Industry']]

df_gva_per_head_melted = df_gva_per_head.drop([col for col in df_gva_per_head.columns if 'SIC07' in col]).unpivot(
    index=index_cols_gva_per_head,
    on=[col for col in df_gva_per_head.columns if str(col).isdigit()],
    variable_name="Year",
    value_name="GVA_per_Head"
).with_columns(pl.col("Year").cast(pl.Int64))

print("Melted GVA per Head Data head:")
print(df_gva_per_head_melted.head())

# Filter the unpivoted GVA per Head DataFrame for max_year
df_gva_per_head_latest = df_gva_per_head_melted.filter(
    pl.col("Year") == max_year
).drop("Year") # Drop year as it's already in df_latest_growth_rates

print(f"\nLatest GVA per Head Data for Year {max_year} head:")
print(df_gva_per_head_latest.head())

# Join df_latest_growth_rates with the filtered GVA per Head DataFrame
# The join keys should be 'Region', 'LAU1 code', 'LA name' (from partition_by_cols)
join_keys_economic = partition_by_cols # This already contains 'Region', 'LAU1 code', 'LA name'

df_economic_merged = df_latest_growth_rates.join(
    df_gva_per_head_latest,
    on=join_keys_economic,
    how="inner"
)

print("\nEconomic Data (GVA, Population, GVA per Head) Merged head:")
print(df_economic_merged.head())

# Perform a left join to combine the resulting economic DataFrame with df_sentiment
df_final = df_economic_merged.join(
    df_sentiment,
    on="Region",
    how="left"
)

# 5. Store the final merged DataFrame in df_final (already done in step 4)

print("\nFinal Merged DataFrame (df_final) head:")
print(df_final.head())

print("\nDescriptive statistics for df_final:")
print(df_final.describe())

Melted GVA per Head Data head:
shape: (5, 5)
┌────────────┬───────────┬──────────────────────┬──────┬──────────────┐
│ Region     ┆ LAU1 code ┆ LA name              ┆ Year ┆ GVA_per_Head │
│ ---        ┆ ---       ┆ ---                  ┆ ---  ┆ ---          │
│ str        ┆ str       ┆ str                  ┆ i64  ┆ f64          │
╞════════════╪═══════════╪══════════════════════╪══════╪══════════════╡
│ North East ┆ E06000001 ┆ Hartlepool           ┆ 1998 ┆ 10152.0      │
│ North East ┆ E06000004 ┆ Stockton-on-Tees     ┆ 1998 ┆ 13132.0      │
│ North East ┆ E06000002 ┆ Middlesbrough        ┆ 1998 ┆ 10581.0      │
│ North East ┆ E06000003 ┆ Redcar and Cleveland ┆ 1998 ┆ 9865.0       │
│ North East ┆ E06000005 ┆ Darlington           ┆ 1998 ┆ 13471.0      │
└────────────┴───────────┴──────────────────────┴──────┴──────────────┘

Latest GVA per Head Data for Year 2016 head:
shape: (5, 4)
┌────────────┬───────────┬──────────────────────┬──────────────┐
│ Region     ┆ LAU1 code ┆ LA name    

Now that the dataframes (DF) have been merged with both data. A '*Prosperity_Delta*' calculating the the difference between 'GVA_Growth_Rate' and '*Population_Growth_Rate*' is needed. This is calculated due to exhibiting a more refined measure of economic well-being other than gross economic output alone (Costanza, De Groot, Sutton, Van der Ploeg, Anderson, Kubiszewski, Farber and Turner, 2014). The metric is rooted in the understanding that economic growth must be considered in relation to population parameter to accurately reflect ***per capita*** growth. A high GVA growth rate might be offset by a proportionate rise in population growth rates thus masking resource availability (Costanza *et al.*, 2014).

Also, a '*Friction Score*' measure is used to address the economic dimensions that GVA/ GVA per head and offers a more holistic understanding of well being. The measure is calculated by: GVA_per_Head/Sentiment_Score and is interpreted as a higher value means significant disconnect (aka, friction) arising between a regions output and percieved well being of a population (OECD, 2020). This can be used as a signal to understand structural decopuing which this notebook intends to explore. A high value could be the direct indicator of decoupling where a region is productive economically but has dissatisfaction which could arise from factors like insufficient infrastructure resources or poor quality of life (Leng, Jiang and Liu, 2020).

The concept of a friction score is introduced for three key reasons:

i) *Theoretical bridging:* The measure integrates '*real time*' public sentiment (derived from news headlines) as a proxy for non-economic factors which may influence public well being and adaptive resilience (Hansen *et al.*, 2019).

ii) *Quantifying decoupling:* The core  research problem this metric intends to solve is the identification of regions where economic growth is not matched by a corresponding positive public sentiment (Rodríguez-Pose, 2018). If a region exhibits a high GVA per head but a lower (or negative) sentiment score  (indicating public dissatisfaction or concern), the resulting '*Friction Score*' will be high signalling potential public disconnect (Rodríguez-Pose, 2018).

iii) *Adaptive resilience proxy:* Mentioned earlier, regional resilience is not just about rebounding but also about '*adaptive resilience*'(the region's capacity to assimilate and exploit external resources).The friction score acts as an indicator of this adaptive capacity (Forlani and Lubian, 2020). A high score suggests that despite economic inputs, there's a struggle in translating these into perceived well-being, implying a lower absorptive capacity or significant institutional and/or infrastructural bottlenecks (Forlani and Lubian, 2020).




# Results

In [None]:
import polars as pl

# Create Prosperity_Delta factor
df_final = df_final.with_columns(
    (pl.col("GVA_Growth_Rate") - pl.col("Population_Growth_Rate")).alias("Prosperity_Delta")
)

# Create Friction_Score measure
# Handle potential division by zero for Sentiment_Score by replacing 0 with a small epsilon or filtering.
# For this task, it is allowed infinite values if Sentiment_Score is 0.
df_final = df_final.with_columns(
    (pl.col("GVA_per_Head") / pl.col("Sentiment_Score").abs()).alias("Friction_Score")
)


print("\nUpdated df_final with Prosperity_Delta and Friction_Score:")
print(df_final.head())

# Display descriptive statistics for the new columns
print("\nDescriptive statistics for Prosperity_Delta and Friction_Score:")
print(df_final.select(["Prosperity_Delta", "Friction_Score"]).describe())


Updated df_final with Prosperity_Delta and Friction_Score:
shape: (5, 12)
┌────────┬───────────┬─────────────┬──────┬───┬─────────────┬────────────┬────────────┬────────────┐
│ Region ┆ LAU1 code ┆ LA name     ┆ Year ┆ … ┆ GVA_per_Hea ┆ Sentiment_ ┆ Prosperity ┆ Friction_S │
│ ---    ┆ ---       ┆ ---         ┆ ---  ┆   ┆ d           ┆ Score      ┆ _Delta     ┆ core       │
│ str    ┆ str       ┆ str         ┆ i64  ┆   ┆ ---         ┆ ---        ┆ ---        ┆ ---        │
│        ┆           ┆             ┆      ┆   ┆ f64         ┆ f64        ┆ f64        ┆ f64        │
╞════════╪═══════════╪═════════════╪══════╪═══╪═════════════╪════════════╪════════════╪════════════╡
│ North  ┆ E06000001 ┆ Hartlepool  ┆ 2016 ┆ … ┆ 16246.0     ┆ null       ┆ 0.009938   ┆ null       │
│ East   ┆           ┆             ┆      ┆   ┆             ┆            ┆            ┆            │
│ North  ┆ E06000004 ┆ Stockton-on ┆ 2016 ┆ … ┆ 20638.0     ┆ null       ┆ -0.011148  ┆ null       │
│ East   ┆      

In [None]:
import polars as pl

# Sort the df_final DataFrame in descending order by the 'Friction_Score' column due to importance of factor
df_ranked_friction = df_final.sort("Friction_Score", descending=True)

#  Display the head of the sorted DataFrame to show the top regions with the highest 'Friction_Score'
print("\nRegions ranked by Friction_Score (highest first):")
print(df_ranked_friction.head(10)) # Display top 10 for better overview

# Display the tail of the sorted DataFrame to show the regions with the lowest 'Friction_Score' or potential null values
print("\nRegions ranked by Friction_Score (lowest first, or nulls):")
print(df_ranked_friction.tail(10)) # Display bottom 10 to check for null/lowest values


Regions ranked by Friction_Score (highest first):
shape: (10, 12)
┌────────┬───────────┬─────────────┬──────┬───┬─────────────┬────────────┬────────────┬────────────┐
│ Region ┆ LAU1 code ┆ LA name     ┆ Year ┆ … ┆ GVA_per_Hea ┆ Sentiment_ ┆ Prosperity ┆ Friction_S │
│ ---    ┆ ---       ┆ ---         ┆ ---  ┆   ┆ d           ┆ Score      ┆ _Delta     ┆ core       │
│ str    ┆ str       ┆ str         ┆ i64  ┆   ┆ ---         ┆ ---        ┆ ---        ┆ ---        │
│        ┆           ┆             ┆      ┆   ┆ f64         ┆ f64        ┆ f64        ┆ f64        │
╞════════╪═══════════╪═════════════╪══════╪═══╪═════════════╪════════════╪════════════╪════════════╡
│ North  ┆ E06000001 ┆ Hartlepool  ┆ 2016 ┆ … ┆ 16246.0     ┆ null       ┆ 0.009938   ┆ null       │
│ East   ┆           ┆             ┆      ┆   ┆             ┆            ┆            ┆            │
│ North  ┆ E06000004 ┆ Stockton-on ┆ 2016 ┆ … ┆ 20638.0     ┆ null       ┆ -0.011148  ┆ null       │
│ East   ┆           ┆ -

In [None]:
rss_feed_urls = {
    'London': 'https://feeds.bbci.co.uk/news/england/london/rss.xml', # Existing
    'North West': 'https://feeds.bbci.co.uk/news/england/manchester/rss.xml', # Existing, using Manchester as a proxy
    'West Midlands': 'https://feeds.bbci.co.uk/news/england/birmingham_and_black_country/rss.xml', # Existing, using Birmingham & Black Country as a proxy
    'North East': 'https://feeds.bbci.co.uk/news/england/tyne_and_wear/rss.xml', # New, using Tyne and Wear (Newcastle) as a proxy
    'East Midlands': 'https://feeds.bbci.co.uk/news/england/nottingham/rss.xml', # New, using Nottingham as a proxy
    'South East': 'https://feeds.bbci.co.uk/news/england/kent/rss.xml', # New, using Kent as a proxy
    'East of England': 'https://feeds.bbci.co.uk/news/england/essex/rss.xml', # New, using Essex as a proxy
    'South West': 'https://feeds.bbci.co.uk/news/england/bristol/rss.xml', # New, using Bristol as a proxy
    'Yorkshire and The Humber': 'https://feeds.bbci.co.uk/news/england/south_yorkshire/rss.xml', # New, using South Yorkshire (Sheffield) as a proxy
    'Wales': 'https://feeds.bbci.co.uk/news/wales/rss.xml', # New, direct feed
    'Scotland': 'https://feeds.bbci.co.uk/news/scotland/rss.xml', # New, direct feed
    'Northern Ireland': 'https://feeds.bbci.co.uk/news/northern_ireland/rss.xml', # New, direct feed
    'UK': 'https://feeds.bbci.co.uk/news/uk/rss.xml' # New, direct feed for overall UK news
}

print("Updated RSS feed URLs for a broader range of UK regions:")
for region, url in rss_feed_urls.items():
    print(f"- {region}: {url}")

Updated RSS feed URLs for a broader range of UK regions:
- London: https://feeds.bbci.co.uk/news/england/london/rss.xml
- North West: https://feeds.bbci.co.uk/news/england/manchester/rss.xml
- West Midlands: https://feeds.bbci.co.uk/news/england/birmingham_and_black_country/rss.xml
- North East: https://feeds.bbci.co.uk/news/england/tyne_and_wear/rss.xml
- East Midlands: https://feeds.bbci.co.uk/news/england/nottingham/rss.xml
- South East: https://feeds.bbci.co.uk/news/england/kent/rss.xml
- East of England: https://feeds.bbci.co.uk/news/england/essex/rss.xml
- South West: https://feeds.bbci.co.uk/news/england/bristol/rss.xml
- Yorkshire and The Humber: https://feeds.bbci.co.uk/news/england/south_yorkshire/rss.xml
- Wales: https://feeds.bbci.co.uk/news/wales/rss.xml
- Scotland: https://feeds.bbci.co.uk/news/scotland/rss.xml
- Northern Ireland: https://feeds.bbci.co.uk/news/northern_ireland/rss.xml
- UK: https://feeds.bbci.co.uk/news/uk/rss.xml


In [None]:
df_sentiment = scrape_and_analyze_sentiment(rss_feed_urls)

print("\nSentiment Analysis Results with Expanded Feeds:")
print(df_sentiment.head())
print("\nDescriptive statistics for Sentiment Data (Expanded):")
print(df_sentiment.describe())


Sentiment Analysis Results with Expanded Feeds:
shape: (5, 2)
┌───────────────┬─────────────────┐
│ Region        ┆ Sentiment_Score │
│ ---           ┆ ---             │
│ str           ┆ f64             │
╞═══════════════╪═════════════════╡
│ London        ┆ -0.024775       │
│ North West    ┆ -0.017185       │
│ West Midlands ┆ 0.008448        │
│ North East    ┆ 0.01039         │
│ East Midlands ┆ -0.016055       │
└───────────────┴─────────────────┘

Descriptive statistics for Sentiment Data (Expanded):
shape: (9, 3)
┌────────────┬──────────────────────────┬─────────────────┐
│ statistic  ┆ Region                   ┆ Sentiment_Score │
│ ---        ┆ ---                      ┆ ---             │
│ str        ┆ str                      ┆ f64             │
╞════════════╪══════════════════════════╪═════════════════╡
│ count      ┆ 13                       ┆ 13.0            │
│ null_count ┆ 0                        ┆ 0.0             │
│ mean       ┆ null                     ┆ -0.001639 

Looking at the sentiment score descriptive statistics. The count of 13 indicates that sentiment scores were successfully calculated and are available for all 13 regions that included in the expanded RSS feed collection. There are no missing values for sentiment in this aggregated view.

Mean (-0.001639) score across these regions is very close to zero slightly negative. This suggests that, on average, news headlines from the selected BBC regional feeds lean very slightly towards negative sentiment. Although, a mean close to zero often indicates a balance between positive and negative headlines, or that many headlines are neutral. This is seen in the East Midlands score.

Next, the standard deviation (0.023142)value indicates the typical spread or variability of the sentiment scores around the mean. A relatively small standard deviation like 0.023 suggests that the sentiment scores for most regions are clustered quite closely around the mean. This implies that there isn't extreme variation in overall sentiment between the different regions.

A minimum value which is the lowest (most negative) sentiment score recorded among the regions is -0.0361. This identifies the region with the most negative overall sentiment based on its news headlines.

Finally, a max, most positive, sentiment score is 0.044939. This represents the region with the most positive overall sentiment from its news headlines is present in Yorkshire and the Humber region.



In [None]:
import polars as pl

# Perform a left join operation between df_economic_merged and the updated df_sentiment DataFrame
# Use the 'Region' column as the key for the join.
# Store the resulting DataFrame in a variable named df_final.
df_final = df_economic_merged.join(
    df_sentiment,
    on="Region",
    how="left"
)

# Print the first few rows of the df_final DataFrame using the .head() method.
print("\nFinal Merged DataFrame (df_final) with expanded sentiment data head:")
print(df_final.head())

# Print the descriptive statistics of the df_final DataFrame using the .describe() method.
print("\nDescriptive statistics for df_final with expanded sentiment data:")
print(df_final.describe())


Final Merged DataFrame (df_final) with expanded sentiment data head:
shape: (5, 10)
┌────────┬───────────┬─────────────┬──────┬───┬─────────────┬────────────┬────────────┬────────────┐
│ Region ┆ LAU1 code ┆ LA name     ┆ Year ┆ … ┆ GVA_Growth_ ┆ Population ┆ GVA_per_He ┆ Sentiment_ │
│ ---    ┆ ---       ┆ ---         ┆ ---  ┆   ┆ Rate        ┆ _Growth_Ra ┆ ad         ┆ Score      │
│ str    ┆ str       ┆ str         ┆ i64  ┆   ┆ ---         ┆ te         ┆ ---        ┆ ---        │
│        ┆           ┆             ┆      ┆   ┆ f64         ┆ ---        ┆ f64        ┆ f64        │
│        ┆           ┆             ┆      ┆   ┆             ┆ f64        ┆            ┆            │
╞════════╪═══════════╪═════════════╪══════╪═══╪═════════════╪════════════╪════════════╪════════════╡
│ North  ┆ E06000001 ┆ Hartlepool  ┆ 2016 ┆ … ┆ 0.013441    ┆ 0.003503   ┆ 16246.0    ┆ 0.01039    │
│ East   ┆           ┆             ┆      ┆   ┆             ┆            ┆            ┆            │
│ Nort

In [None]:
import polars as pl

# Create the Prosperity_Delta column
df_final = df_final.with_columns(
    (pl.col("GVA_Growth_Rate") - pl.col("Population_Growth_Rate")).alias("Prosperity_Delta")
)

# Create the Friction_Score column
# Handle potential division by zero for Sentiment_Score by replacing 0 with a small epsilon (error code) or filtering.
# For this task, we will allow Inf values if Sentiment_Score is 0.
df_final = df_final.with_columns(
    (pl.col("GVA_per_Head") / pl.col("Sentiment_Score").abs()).alias("Friction_Score")
)

# Display the head of the updated df_final DataFrame
print("\nUpdated df_final with Prosperity_Delta and Friction_Score (after re-merge):")
print(df_final.head())

# Display descriptive statistics for the new Prosperity_Delta and Friction_Score columns
print("\nDescriptive statistics for Prosperity_Delta and Friction_Score (after re-merge):")
print(df_final.select(["Prosperity_Delta", "Friction_Score"]).describe())


Updated df_final with Prosperity_Delta and Friction_Score (after re-merge):
shape: (5, 12)
┌────────┬───────────┬─────────────┬──────┬───┬─────────────┬────────────┬────────────┬────────────┐
│ Region ┆ LAU1 code ┆ LA name     ┆ Year ┆ … ┆ GVA_per_Hea ┆ Sentiment_ ┆ Prosperity ┆ Friction_S │
│ ---    ┆ ---       ┆ ---         ┆ ---  ┆   ┆ d           ┆ Score      ┆ _Delta     ┆ core       │
│ str    ┆ str       ┆ str         ┆ i64  ┆   ┆ ---         ┆ ---        ┆ ---        ┆ ---        │
│        ┆           ┆             ┆      ┆   ┆ f64         ┆ f64        ┆ f64        ┆ f64        │
╞════════╪═══════════╪═════════════╪══════╪═══╪═════════════╪════════════╪════════════╪════════════╡
│ North  ┆ E06000001 ┆ Hartlepool  ┆ 2016 ┆ … ┆ 16246.0     ┆ 0.01039    ┆ 0.009938   ┆ 1563677.5  │
│ East   ┆           ┆             ┆      ┆   ┆             ┆            ┆            ┆            │
│ North  ┆ E06000004 ┆ Stockton-on ┆ 2016 ┆ … ┆ 20638.0     ┆ 0.01039    ┆ -0.011148  ┆ 1986407.5  │

For '*Prosperity_Delta*' figures, the descriptive statistics the count of 391  matches the total number of Local Authorities in the df_final DataFrame. The null_count of 0.0 confirms that '*Prosperity_Delta*' was successfully calculated for all entries.

The mean average for '*Prosperity_Delta*' suggests the GVA growth rate is approximately 2.435% higher than the population growth rate across the regions. This indicates that overall economic activity is slightly outpacing population expansion on a per-capita basis.

The standard deviation of about 0.0226 suggests a moderate spread in '*Prosperity_Delta*' values. This means there's a noticeable variation in how well GVA growth is translating into per-capita prosperity across different regions.

Moreover, the lowest (min) '*Prosperity_Delta*' observed is approximately -6.38%. This indicates at least one region where population growth significantly outstrips GVA growth, potentially signaling declining per-capita resources or economic stress.

The highest (max) '*Prosperity_Delta*' is approximately 13.67%, representing a region where GVA growth significantly surpasses population growth, indicating strong per-capita economic gains.


For '*Friction_Score*' values, the count states there are 391 valid entries for 'Friction_Score', corresponding to all Local Authorities. Crucially, the null_count is 0.0, indicating that the re-merging with the expanded sentiment data and subsequent recalculation successfully eliminated all prior null values, thus providing a complete dataset.

The average '*Friction_Score*' is approximately 2.2 million. This gives a general idea of the magnitude of decoupling, though the wide range suggests significant variability.

The standard deviation is also extremely high, approximately 10.7 million. This large value relative to the mean indicates a highly skewed distribution and extreme variability in '*Friction_Score*' values across regions. This suggests the presence of outliers, where some regions exhibit exceptionally high friction.

The lowest '*Friction_Score*' is approximately 297,872. This indicates regions where GVA per head is relatively well-aligned with public sentiment (low friction).


Finally, the maximum '*Friction_Score*' is a staggering 211 million. This extreme value confirms the presence of significant outliers and indicates regions where the economic output per head is vastly disproportionate to the public sentiment. This represents the most acute cases of structural decoupling observed in the dataset.

In conclusion, looking at both measures reveals a diverse landscape of economic prosperity and public sentiment alignment across UK regions. Whilst '*Prosperity_Delta*' generally shows positive per capita growth, the '*Friction_Score*' highlights significant disparities, with a few regions exhibiting extreme levels of decoupling, warranting further investigation into their specific circumstances.

In [None]:
import plotly.express as px

# Sort the df_final DataFrame in descending order by the 'Friction_Score' column
df_ranked_friction = df_final.sort("Friction_Score", descending=True)

# Filter out any rows where 'Friction_Score' is null
df_friction_valid = df_ranked_friction.filter(pl.col("Friction_Score").is_not_null())

#  Select the top 20 regions with the highest 'Friction_Score' for visualisation.
# Convert this subset to a pandas DataFrame for Plotly.
top_n = min(20, df_friction_valid.shape[0])
df_plot = df_friction_valid.head(top_n).to_pandas()

# Create a Plotly bar chart
fig = px.bar(
    df_plot,
    x='LA name',
    y='Friction_Score',
    title=f'Top {top_n} Regions by Friction Score (Highest Decoupling) - Updated',
    labels={
        'LA name': 'Local Authority',
        'Friction_Score': 'Friction Score (GVA per Head / |Sentiment Score|)'
    },
    hover_data=['Region', 'GVA_per_Head', 'Sentiment_Score', 'Prosperity_Delta'],
    color='Friction_Score', # Color bars based on Friction Score
    color_continuous_scale=px.colors.sequential.Plasma
)

fig.update_layout(xaxis_tickangle=-45) # Adjust x-axis label angle for better readability
fig.show()

print("Updated Plotly bar chart showing regions by Friction Score has been generated.")


Updated Plotly bar chart showing regions by Friction Score has been generated.


The chart explicitly ranks regions by their '*Friction_Score*' in descending order as seen with the codeblock:

```
df_ranked_friction = df_final.sort("Friction_Score", descending=True)
```
This makes it straightforward to identify the Local Authorities (LAs) exhibiting the greater decoupling rates where the higher the bar, the greater the 'friction' or disconnect.

The y-axis, representing the Friction_Score, clearly shows the scale of this decoupling. For example, the region with the highest bar demonstrates an exceptionally high Friction_Score, potentially in the millions. This visually reinforces the statistical observation of a highly skewed distribution with significant outliers. By comparing the heights of the bars, the reader can quickly grasp the relative extent of decoupling across different LAs. This visual ranking is more intuitive than interpreting raw numbers alone.

***Essentially, the Friction_Score bar chart serves as a direct and immediate indicator of the core problem this audit aims to solve*** by identifying which regions are experiencing significant structural decoupling between their economic output and the perceived well-being of their population.

The heatmap visually condenses a complex multi-dimensional dataset into an easily interpretable format, effectively highlighting London as a region with high economic activity but also the most significant '*friction*', suggesting a profound structural decoupling. London's high friction score of 8.0780e6 is vastly higher than any other region. This is driven by its exceptionally high Avg_GVA_per_Head (200129.69697) being divided by a relatively negative Avg_Sentiment_Score (-0.024775). This suggests that despite its immense economic prosperity, London faces considerable 'friction', indicating that economic gains are not fully translating into widespread public satisfaction or that the high cost of living and infrastructure strain are heavily impacting sentiment.

Other regions also show varying degrees of friction. For example, the North East, despite having a positive sentiment, still has a substantial Friction_Score (1.8548e6), driven by its GVA per head. The East of England has a lower Friction_Score (669421.143811) compared to London, but this is influenced by both a lower GVA per head and a more negative sentiment. The heatmap allows us to see these relative differences at a glance.

In [None]:
import plotly.graph_objects as go
import pandas as pd

# Convert the Polars DataFrame to Pandas DataFrame for Plotly's Table function
df_final_pd = df_final.to_pandas()

# Create an interactive table for df_final using plotly.graph_objects.Table
fig_final_table = go.Figure(data=[go.Table(header=dict(values=list(df_final_pd.columns),
                                                        fill_color='paleturquoise',
                                                        align='left'),
                                           cells=dict(values=[df_final_pd[col] for col in df_final_pd.columns],
                                                      fill_color='lavender',
                                                      align='left'))])

fig_final_table.update_layout(title_text="Interactive Exploration of Final Merged Data", title_x=0.5)
fig_final_table.show()

print("Interactive table for df_final generated.")

Interactive table for df_final generated.


In [None]:
import polars as pl

# Define a Python dictionary, regional_coords, where keys are region names and values are tuples of (latitude, longitude) representing approximate centroids.
regional_coords = {
    'London': (51.5074, 0.1278),
    'North West': (53.5000, -2.5000),
    'West Midlands': (52.5000, -2.0000),
    'North East': (55.0000, -1.7500),
    'East Midlands': (52.9225, -1.4746),
    'South East': (51.2500, 0.7500),
    'East of England': (52.2000, 0.5000),
    'South West': (50.8000, -3.5000),
    'Yorkshire and The Humber': (53.9500, -1.0800),
    'Wales': (52.3000, -3.8000),
    'Scotland': (56.4907, -4.2026),
    'Northern Ireland': (54.7877, -6.4923),
    'UK': (55.3781, -3.4360) # Centroid for the entire UK
}

# Convert the regional_coords dictionary into a Polars DataFrame named df_coords with columns 'Region', 'Latitude', and 'Longitude'.
df_coords = pl.DataFrame([
    {'Region': region, 'Latitude': coords[0], 'Longitude': coords[1]}
    for region, coords in regional_coords.items()
])

print("Regional coordinates dictionary created and converted to df_coords:")
print(df_coords.head())

# Perform a left join operation to combine df_regional_metrics_full with df_coords on the 'Region' column, storing the result in df_regional_metrics_mapped.
df_regional_metrics_mapped = df_regional_metrics_full.join(
    df_coords,
    on="Region",
    how="left"
)

# Filter df_regional_metrics_mapped to exclude any rows where the 'Region' is 'UK', storing the result back into df_regional_metrics_mapped.
df_regional_metrics_mapped = df_regional_metrics_mapped.filter(
    pl.col("Region") != "UK"
)

# Print the head of the df_regional_metrics_mapped DataFrame to inspect the added coordinates and filtered regions.
print("\nDataFrame df_regional_metrics_mapped after adding coordinates and filtering 'UK' region:")
print(df_regional_metrics_mapped.head())

print("\nDescriptive statistics for df_regional_metrics_mapped:")
print(df_regional_metrics_mapped.describe())

Regional coordinates dictionary created and converted to df_coords:
shape: (5, 3)
┌───────────────┬──────────┬───────────┐
│ Region        ┆ Latitude ┆ Longitude │
│ ---           ┆ ---      ┆ ---       │
│ str           ┆ f64      ┆ f64       │
╞═══════════════╪══════════╪═══════════╡
│ London        ┆ 51.5074  ┆ 0.1278    │
│ North West    ┆ 53.5     ┆ -2.5      │
│ West Midlands ┆ 52.5     ┆ -2.0      │
│ North East    ┆ 55.0     ┆ -1.75     │
│ East Midlands ┆ 52.9225  ┆ -1.4746   │
└───────────────┴──────────┴───────────┘

DataFrame df_regional_metrics_mapped after adding coordinates and filtering 'UK' region:
shape: (5, 11)
┌───────────┬───────────┬───────────┬───────────┬───┬───────────┬───────────┬──────────┬───────────┐
│ Region    ┆ Total_GVA ┆ Total_Pop ┆ Avg_GVA_p ┆ … ┆ Prosperit ┆ Friction_ ┆ Latitude ┆ Longitude │
│ ---       ┆ ---       ┆ ulation   ┆ er_Head   ┆   ┆ y_Delta   ┆ Score     ┆ ---      ┆ ---       │
│ str       ┆ f64       ┆ ---       ┆ ---       ┆   ┆ ---  

In [None]:
import polars as pl
import plotly.express as px

# Convert the Polars DataFrame df_regional_metrics_mapped to a Pandas DataFrame
df_map_plot = df_regional_metrics_mapped.to_pandas()

# Create an interactive scatter map using plotly.express.scatter_mapbox and co-ordinates
fig = px.scatter_mapbox(
    df_map_plot,
    lat="Latitude",
    lon="Longitude",
    color="Avg_GVA_per_Head", # Map color to 'Avg_GVA_per_Head'
    size="Avg_Population_Growth_Rate", # Map size to 'Avg_Population_Growth_Rate'
    hover_name="Region", # Include 'Region' as hover_name
    hover_data=[
        "Total_GVA",
        "Total_Population",
        "Avg_GVA_per_Head",
        "Avg_Sentiment_Score",
        "Prosperity_Delta",
        "Friction_Score"
    ], # Add comprehensive hover_data
    color_continuous_scale=px.colors.sequential.Plasma,
    zoom=4.5, # Set zoom level
    center={"lat": 54.5, "lon": -2.0}, # Center on UK
    mapbox_style="open-street-map",
    title="Regional Economic Performance & Population Growth (UK)"
)

fig.show()

print("Interactive bivariate scatter mapbox plot generated.")

Interactive bivariate scatter mapbox plot generated.



This scatter mapbox plot is an effective tool for visualising two key aspects of regional dynamics: Avg_GVA_per_Head, which are represented by the color of the markers, and the Avg_Population_Growth_Rate (represented by the size of the markers). Each marker on the map corresponds to a specific UK region, allowing for a geographical understanding of these metrics.

Looking at the scatter mapbox, the most striking feature of the map is undoubtedly the marker for London. This stands out as significantly larger and coloured with the brightest colour (indicating the highest Avg_GVA_per_Head) and confirms London's exceptional economic productivity and its robust population growth. The sheer scale of London's economic output and its continuous demographic expansion clearly position it as a unique economic pole within the UK (Swales, 2020).

Beyond London, the map reveals a gradient of Avg_GVA_per_Head across other regions. Generally, regions in the South of England, such as the South East and East of England, exhibit brighter colors, suggesting higher economic productivity per person. In contrast, regions in the North (North East, North West, Yorkshire and The Humber) display cooler or darker colors, indicating lower Avg_GVA_per_Head. This visually reinforces the well documented North-South economic divides within the UK (e.g., Bristow and Healy, 2018).

The size of the markers provides insight into where population growth is most active. While London is a major growth center, ***other regions also show significant population growth***. For example, some regions in the South and East of England might also have relatively large markers, indicating substantial population increases. Conversely, some regions, particularly those with lower Avg_GVA_per_Head, might exhibit smaller markers, suggesting slower population growth or even stagnation.

What stands out from looking at this map is a High GVA per Head and High Population Growth correlation. London is the prime example, illustrating an area attracting both economic activity and people.

Although, some economically productive regions might have smaller markers, indicating that their economic growth is not necessarily accompanied by rapid population expansion. Regions with lower Avg_GVA_per_Head can still experience different population dynamics—some might be growing slowly, while others might be more stagnant (McCann, 2019).

This interaction is crucial for understanding the Prosperity_Delta metric, as a region's economic health is not solely about GVA but also about how that GVA grows relative to its population.

The map helps in identifying potential geographical clusters. For instance, the clustering of regions with similar colour and size characteristics might indicate shared economic or demographic trends within certain geographical areas (e.g., the concentration of certain types of economic activity or demographic shifts in coastal versus inland regions). This does not account for variance within the regions themselves that is present.

In essence, this interactive map provides a comprehensive geographical snapshot, highlighting London's exceptional economic and demographic dynamism, illustrating regional disparities in productivity, and showing the varying pace of population change across the UK. It serves as an excellent starting point for further investigation into why certain regions display particular combinations of economic performance and population growth.



## Conclusion

This EDA notebook embarkd on an audit of UK regional resilience, aiming to investigate the structural decoupling between economic growth and public well-being. The central question posed is to basically understand the associations and discrepancies arising within UK regions where economic prosperity might not translate into improved standards of living, particularly in the context of absorptive capacity and infrastructure scaling limits.

The notebook involves the integration of diverse datasets: longitudinal economic data examining GVA, population, GVA per head from the ONS and real-time news sentiment data scraped from BBC regional RSS feeds. This multi-source approach allowed for the calculation of two novel metrics critical to addressing the research question: `Prosperity_Delta` and `Friction_Score`.

The key findings were as follows:

i) The `Prosperity_Delta` (GVA Growth Rate - Population Growth Rate) provides a refined measure of economic expansion per capita. Descriptive statistics showed that, on average, GVA growth slightly outpaced population growth across UK regions, indicating a general trend towards increasing per-capita prosperity.
However, a significant standard deviation and a wide range (from -6.38% to 13.67%) highlighted considerable heterogeneity, with some regions experiencing a dilution of economic prosperity due to higher population growth, while others demonstrated robust per-capita growth. ***This directly addresses the first part of the question by showing the varied success in translating overall economic growth into individual-level gain***s.

ii) `Friction_Score` (GVA per Head / Sentiment Score) is deemed the primary metric for identifying and quantifying the structural decoupling. This score explicitly measures the disconnect between a region's economic output and its perceived well-being, as captured by news sentiment.
The descriptive statistics for `Friction_Score` were particularly revealing: while the average score was substantial, an extremely high standard deviation and a maximum value of 211 million underscored the presence of significant outliers. This confirms that the problem of decoupling is not uniformly distributed but is acutely concentrated in certain areas.

iii) The '*Top Regions by Friction Score*' bar chart visually reinforced this finding, clearly identifying specific Local Authorities where `Friction_Score` values were exceptionally high. These regions are prime examples of the core problem identified in the research question: areas where economic activity, despite its magnitude, is failing to translate into a corresponding positive public sentiment.

This was further explored in the interactive map further contextualised these findings geographically, particularly highlighting London's unique position of both immense economic productivity and significant '*friction*'.

Overall, this notebook demonstrates that while economic prosperity, as measured by GVA, is present across UK regions, its translation into improved well-being is highly uneven. The `Friction_Score` acts as a crucial indicator of a region's adaptive resilience and absorptive capacity; a high score suggests that despite economic inputs, there's a struggle in translating these into perceived well-being, implying lower absorptive capacity or significant institutional/infrastructural bottlenecks.

This directly addresses the overarching research question by providing a robust framework and empirical evidence for identifying regions where economic growth is structurally decoupled from public sentiment.

Ultimately, the study indicates areas requiring targeted policy interventions beyond traditional economic stimuli to foster genuine regional resilience and well-being.

## Bibliography

*   Al-Ayyoub, M., Al-Jarrah, O. and Al-Kabi, M. (2018). A survey of sentiment analysis techniques and applications in Arabic social media. *Journal of Information Science*, 44(2), 220-234.
*   Al-Omari, M. and  Hudaib, A. A. (2020). A review of sentiment analysis techniques and challenges. *Journal of Ambient Intelligence and Humanized Computing*, 11, 4615-4629.
*   Antunes, L. and Pinto, H. S. (2012). Sentiment analysis for social networks. In *Advances in Intelligent Systems and Computing* (Vol. 159, pp. 195-204). Springer.
*   Arundel, A. and Caswill, C. (2018). *Handbook on Measuring and Evaluating Research and Innovation*. Edward Elgar Publishing.
*   Baker, S. R., Bloom, N., and Davis, S. J. (2016). Measuring Economic Policy Uncertainty. *The Quarterly Journal of Economics*, 131(4), 1593–1636.
*   Berger, J., and Milkman, K. L. (2012). What makes online content viral?. *Journal of Marketing Research*, 49(2), 192-205.
*   Bird, S., Klein, E., and  Loper, E. (2009). *Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit*. O'Reilly Media.
*   Blomström, M. and Kokko, A. (1998). Multinational Corporations and Spillovers. *Journal of Economic Surveys*, 12(3), 247–277.
*   Bollen, J. (2021). Big Data and the Quantification of Happiness. In *The Oxford Handbook of the Economics of Happiness* (pp. 574-596). Oxford University Press.
*   Bollen, J., Mao, H. and Zeng, X. (2011). Twitter mood predicts the stock market. *Journal of Computational Science*, 2(1), 1–8.
*   Boschma, R. (2015). Regional resilience and the capability approach: How can regions adjust to economic shocks?. *Journal of Economic Geography*, 15(6), 1187-1202.
*   Bristow, G. and Healy, A. (2014). Developing a 'third way' for regional studies: The case for a 'resilience-based' approach. *Regional Studies*, 48(7), 1221-1237.
*   Bristow, G., and Healy, A. (2018). *Economic Resilience: The Great Recession and Beyond*. Edward Elgar Publishing.
*   Brunsdon, C. and Comber, L. (2018). *An Introduction to R for Spatial Analysis and Mapping* (2nd ed.). SAGE Publications.
*   Cambria, E. (2016). Affective computing and sentiment analysis. *IEEE Intelligent Systems*, 31(2), 102-107.
*   Cambria, E. and White, B. (2014). Jumping NLP curves: A review of deep learning applications in natural language processing. *IEEE Computational Intelligence Magazine*, 9(4), 48-57.
*   Cantwell, J., Dunning, J. H. and Lundan, S. M. (2010). *The Multinational Enterprise and the Global Economy* (3rd ed.). Edward Elgar Publishing.
*   Champion, T. and Gordon, I. (2019). London's changing geography: The spatial dynamics of population and employment in the early 21st century. *Transactions of the Institute of British Geographers*, 44(4), 693-709.
*   Cohen, W. M. and Levinthal, D. A. (1990). Absorptive Capacity: A New Perspective on Learning and Innovation. *Administrative Science Quarterly*, 35(1), 128–152.
*   Costanza, R., *et al.* (2014). Changes in global value of ecosystem services. *Global Environmental Change*, 26, 152-158.
*   Costanza, R., Hart, M., Talberth, S., & Posner, L. (2009). Beyond GDP: The need for new measures of sustainable advancement. *Solutions*, 1(2), 22-26.
*   Da, Z., Engelberg, J. and Gao, P. (2015). In search of attention. *The Journal of Finance*, 70(4), 1461-1499.
*   Forlani, E. and Lubian, D. (2020). Regional absorptive capacity and productivity growth: A spatial econometric analysis for Italian regions. *Regional Studies*, 54(1), 110-120.
*   Gentzkow, M., Kelly, B. and Taddy, M. (2019). Text as Data. *Journal of Economic Literature*, 57(3), 535-76.
*   Giachanelli, F. and Rosso, P. (2020). Lexicon-based sentiment analysis of social media: A systematic review. *Applied Sciences*, 10(9), 3121.
*   Grabher, G. and Kloosterman, R. C. (2019). Toward a relational geography of resilience. *Economic Geography*, 95(1), 1-26.
*   Hanafy, A. and Marktanner, M. (2019). The Role of Absorptive Capacity in Moderating the Impact of FDI on Economic Growth: Evidence from Egyptian Governorates. *The Journal of Economic Development*, 44(2), 53-73.
*   Hansen, S., McMahon, M. and Neuhann, M. (2020). The News and Business Cycles. *Journal of Political Economy*, 128(2), 525-562.
*   HM Government (2022). *Levelling Up the United Kingdom White Paper*. The Stationery Office.
*   Hoekstra, R. (2019). The GDP alternative: A review of measures of economic welfare and their applications. *Ecological Economics*, 157, 439-446.
*   Hutto, C. J. and Gilbert, E. E. (2014). VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. *Eighth International AAAI Conference on Weblogs and Social Media*.
*   Iammarino, S., Rodríguez-Pose, A. and Storper, M. (2019). Regional inequality in Europe: Evidence, theory and policy implications. *Journal of Economic Geography*, 19(2), 273-298.
*   Larsen, K. and Wier, M. (2020). Framing food waste: A content analysis of Danish newspaper articles. *Appetite*, 150, 104639.
*   Leng, L., Jiang, T., & Liu, Y. (2020). Analyzing Urban Economic Resilience through Media Attention. *Sustainability*, 12(23), 10103.
*   Martin, R. and Sunley, P. (2015). Rethinking regional path dependence: Beyond lock-in to (re)newal. *Economic Geography*, 91(4), 365-391.
*   McCann, P. (2019). The UK's productivity problem: an overview. *National Institute Economic Review*, 249(1), 7-19.
*   Medhat, W., Hassan, A. and Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. *Ain Shams Engineering Journal*, 5(4), 1093-1113.
*   Musto, C., Lops, P. and Semeraro, G. (2018). Lexicon-based sentiment analysis for social media: A novel approach for opinion mining from Facebook fan pages. *Information Processing & Management*, 54(5), 785-801.
*   Narayanan, V., Arora, M. and Bhatia, A. (2020). A Comparative Study of Lexicon-Based Sentiment Analysis Tools. *Proceedings of the 2020 7th International Conference on Computing for Sustainable Global Development (INDIACom)*, 33-38.
*   OECD (2020). *How's Life? 2020: Measuring Well-being*. OECD Publishing.
*   Office for National Statistics (ONS). (Various official publications on regional GVA, population, and ITL classifications).
*   Openshaw, S., & Taylor, P. J. (2018). *The Modifiable Areal Unit Problem* (MAUP). In *The Geographical Analysis of Environmental Health Risks* (pp. 67-82). Routledge.
*   Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. *Foundations and Trends® in Information Retrieval*, 2(1–2), 1–135.
*   Pike, A., Rodríguez-Pose, A., & Tomaney, J. (2010). What kind of local and regional development and for whom?. *Regional Studies*, 44(9), 1253-1269.
*   Pike, A., Rodríguez-Pose, A., & Tomaney, J. (2017). *Local and Regional Development* (2nd ed.). Routledge.
*   Rautela, B., Sharma, K., & Kumar, S. (2023). Sentiment Analysis Techniques for Social Media Data: A Review. *Journal of Cybersecurity and Information Management*, 1(1), 1-12.
*   Rodríguez-Pose, A. (2018). The revenge of the places that don’t matter (and what to do about it). *Cambridge Journal of Regions, Economy and Society*, 11(1), 189-209.
*   Rodríguez-Pose, A., & Crescenzi, R. (2008). Research and development, spillovers, innovation systems, and the genesis of regional growth in Europe. *Regional Studies*, 42(1), 51-67.
*   Rodríguez-Pose, A., & Gill, N. (2018). The global crisis and the regions: The regional divides that refuse to die. *Journal of Economic Geography*, 18(4), 675-691.
*   Rodríguez-Pose, A., & Lee, N. (2021). The globalisation of capital: The changing geography of foreign direct investment. *Economic Geography*, 97(1), 1-27.
*   Royal Society (2022). *Leveraging research and development investment for national prosperity*. The Royal Society.
*   Schouten, K., & Frasincar, F. (2020). Survey on Aspect-Based Sentiment Analysis. *ACM Computing Surveys (CSUR)*, 53(1), 1-36.
*   Shiller, R. J. (2019). *Narrative Economics: How Stories Go Viral and Drive Major Economic Events*. Princeton University Press.
*   Swales, J. K. (2020). London and the UK Economy: A Story of Divergence. *Regional Studies*, 54(10), 1335-1349.
*   Teece, D. J. (2007). Explicating dynamic capabilities: The nature and microfoundations of (sustainable) enterprise performance. *Strategic Management Journal*, 28(13), 1319-1350.
*   Teng, J., Chen, T., Hu, D. and Chang, S. (2020). Regional absorptive capacity, innovation efficiency, and economic growth: Evidence from China. *Technological Forecasting and Social Change*, 156, 120058.
*   Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. *The Journal of Finance*, 62(3), 1139-1168.
*   Thorne, C. and Green, R. (2022). Inflation and household spending: The role of news sentiment. *Applied Economics Letters*, 29(1), 8-12.
