### Housing cost impact on poverty
In this notebook we examine the impact of housing cost to poverty.  We use HBAI data from https://stat-xplore.dwp.gov.uk/webapi/jsf/login.xhtml?invalidSession=true&reason=Session+not+established. 


Stat-xplore has data on HBAI before and after housing costs.  By observing the change in the popultation after housing costs to the ones before housing costs, we can see how housing costs impacts low-income group on a region basis and across a long time frame.  
We could use the observation in the study to find the best approach to help low-income groups.

In [2]:
import pandas as pd

import plotly.express as px
import sys



def read_csv(data_file):

    # Read the CSV file, skipping the first row (header row) and unnecessary footer rows
    df = pd.read_csv(data_file, skiprows=1, skipfooter=0, engine='python')

    # Drop empty columns and rows
    df = df.dropna(how='all', axis=1).dropna(how='all', axis=0)
    #print(df)
    # Extract the year column and clean it
    df['Financial Year'] = df['Financial Year'].str.extract(r'(\d{4}-\d{2})')
    df = df.dropna(subset=['Financial Year'])
    print(df.head())
    # Set the year as the index
    df.set_index('Financial Year', inplace=True)

    # extract data columns.
    total_col_index = df.columns.str.contains('Total').argmax()
    not_low_income = df.iloc[:, :total_col_index].add_suffix('-not low income')

    # clean up the duplcated 1. 2. column names
    df.columns = df.columns.str.replace(r'\.\d+$', '', regex=True)
    
    low_income = df.iloc[:, total_col_index + 1: total_col_index *2+1 ].add_suffix('-low income')

    total_columns = df.iloc[:, total_col_index *2 +2:].add_suffix('-total')

    # Combine the two groups of columns
    result = pd.concat([not_low_income, low_income, total_columns], axis=1)

    # Save or display the resulting DataFrame
    #print(result.head())  # Display the first few rows
    result.to_csv(data_file+"_processed.csv")  # Save to a new CSV file
    #print(df.head())
    #print("----")    

    # convert to numbers
    result = result.apply(pd.to_numeric, errors='coerce')
    print(result.head(10))
    return result, low_income.columns






In [3]:
AHC, regions=read_csv("./HBAI-AHC-Region.csv")
BHC, regions=read_csv("./HBAI-BHC-Region.csv")

  Financial Year Northern Ireland (N92000002) Scotland (S92000003)  \
0        1994-95                           ..              3872082   
1        1995-96                           ..              3857882   
2        1996-97                           ..              3768287   
3        1997-98                           ..              3872566   
4        1998-99                           ..              3848457   

  Wales (W92000004) South West (E12000009) South East (E12000008)  \
0           2092898                3555797                6080289   
1           2054994                3632870                6106879   
2           2084972                3576151                6143522   
3           2071710                3590371                6280984   
4           2133126                3657530                6263626   

  London (E12000007) East (E12000006) North East (E12000001)  \
0            4870625          3939325                1820345   
1            4808260          410560

In [4]:
region_list=list(regions.str.replace('-low income',''))

region_list

['Northern Ireland (N92000002)',
 'Scotland (S92000003)',
 'Wales (W92000004)',
 'North East (E12000001)',
 'North West (E12000002)',
 'Yorkshire and The Humber (E12000003)',
 'East Midlands (E12000004)',
 'West Midlands (E12000005)',
 'East (E12000006)',
 'London (E12000007)',
 'South East (E12000008)',
 'South West (E12000009)']

In [5]:
def plot_hbai_trends(ahc_df, bhc_df, region_list):
    merged_data = pd.merge(ahc_df, bhc_df, on='Financial Year', suffixes=('_AHC', '_BHC'))


    # Calculate the impact (AHC - BHC) for each region
    for region in region_list:
        print(region)
        merged_data[f'{region} Impact'] = ((merged_data[f'{region}-low income_AHC'] - merged_data[f'{region}-low income_BHC'])/merged_data[f'{region}-total_BHC'])*100


    merged_data.to_csv("merged.csv")
    region_impact_cols = [f'{region} Impact' for region in region_list]
    merged_data=merged_data.reset_index()
    # Melt the data to long format for easier plotting (if needed)
    impact_data = merged_data.melt(
        id_vars=['Financial Year'],
        value_vars=region_impact_cols,
        var_name='Region',
        value_name='Housing Cost Impact'
    )

    # Create the line chart
    fig = px.line(impact_data, 
                x='Financial Year', 
                y='Housing Cost Impact', 
                color='Region',
                title='Impact of Housing Costs on Low-Income Populations (AHC - BHC) by Region, 1994-2023',
                labels={'Housing Cost Impact': 'Difference in Low-Income Population (AHC - BHC)', 'Financial Year': 'Year'},
                hover_data=['Region'])

    # Customize the layout (optional)
    fig.update_layout(
        xaxis_title="Year",
        yaxis_title="Change in Low income population due to Housing costs",
        legend_title="Region",
        hovermode="x unified"  # Shows data for all regions at a given year
    )
    fig.add_annotation(
        text="Note: percentage is calculated by (low income BHC in region-low income AHC in region)/total BHC in region.",
        xref="paper", yref="paper",  # Relative to the entire plot
        x=0.5, y=-0.2,  # Position below the plot
        showarrow=False,
        font=dict(size=12, color="gray")
    )

    # Show the plot
    fig.show()
    fig.write_html("pct_impacted_by_housing_cost.html")





In [6]:

plot_hbai_trends(AHC,BHC, region_list)

Northern Ireland (N92000002)
Scotland (S92000003)
Wales (W92000004)
North East (E12000001)
North West (E12000002)
Yorkshire and The Humber (E12000003)
East Midlands (E12000004)
West Midlands (E12000005)
East (E12000006)
London (E12000007)
South East (E12000008)
South West (E12000009)


### Observations
London shows the largest impact of housing costs on low-income populations, peaking at 12-13% in the late 2010s, reflecting its high housing costs. 

Focus on actions to help with housing costs would help low income population.  Partner with local housing charities (e.g., Shelter) to provide rental assistance, emergency housing funds, or advocacy for affordable housing policies.



Northern Ireland and North East show the smallest impact, generally below 3%, indicating lower housing cost pressures in these regions.

We could focus resources on help low-income population on other aspects e.g. digitial awareness.
