# ST449 Assignment 3 – Group Project (Autumn Term 2024)


## Table of Contents

Introduction
Justification for Predicting Housing Prices in London
Challenges and Opportunities

Related Work and Motivation
Literature Review
Motivation for Using Gradient Boosting Models

Problem Breakdown
Overview of the Housing Market in London
Specific Challenges in Data and Prediction

Articulation of Contributions
Dataset Selection and Curation
Novel Feature Engineering Techniques
Model Benchmarking and Optimization

Self-Containedness
Description of Dataset Variables
Mathematical Framework for the Prediction Model
Principles of Gradient Boosting

Problem Formulation
Framing the Task as a Supervised Regression Problem
Mathematical Representation of Gradient Boosting

Solution
Dataset Description and Preprocessing
Feature Engineering
Model Training and Optimization Workflow
Evaluation Metrics

Numerical Experiments
Model Benchmarking and Results
Visual Analysis of Results (Charts and Tables)
Discussion on Model Performance

Limitations and Future Work
Identified Limitations of the Current Approach
Proposed Directions for Future Research

References
Academic and Industry Sources
Relevant Datasets and Tools
Appendices 
Additional Plots or Data Descriptions
Hyperparameter Optimization Details


# Paper which uses LSTM for house price prediction:

https://www.sciencedirect.com/science/article/pii/S1029313223000623


# STEP 1 - DATA CLEANING  - 6th JANUARY MONDAY

# 1) House Price Index (2020 to 2024)

In [12]:
import pandas as pd

# Load the dataset
HPI_Index_Data = pd.read_csv('/Users/firishtah/Desktop/LSE Autumn 24/ST 449 Artificial Intelligence/Data/UK House price index.csv')

# Stripping whitespace from column names and data
HPI_Index_Data.columns = HPI_Index_Data.columns.str.strip()

# Renaming the 'Unnamed: 0' column to 'Date'
HPI_Index_Data.rename(columns={'Unnamed: 0': 'Date'}, inplace=True)

# Parsing the 'Date' column using the correct format for 'MMM-YY'
HPI_Index_Data['Date'] = pd.to_datetime(HPI_Index_Data['Date'], format='%b-%y', errors='coerce')

# Checking for any NaT  values after conversion
print(HPI_Index_Data['Date'].isnull().sum())  # To check how many invalid dates exist

# Dropping rows with invalid dates (NaT values)
HPI_Index_Data = HPI_Index_Data.dropna(subset=['Date'])

# Dropping Empty Column 
HPI_Index_Data = HPI_Index_Data.drop(columns=['Unnamed: 47'])

# Extracting Year, Month, and Quarter from the 'Date' column
HPI_Index_Data['Year'] = HPI_Index_Data['Date'].dt.year
HPI_Index_Data['Month'] = HPI_Index_Data['Date'].dt.month
HPI_Index_Data['Quarter'] = HPI_Index_Data['Date'].dt.to_period('Q').dt.strftime('Q%q')  

# Dropping the 'Date' column
HPI_Index_Data = HPI_Index_Data.drop(columns=['Date'])

# Filtering for the third month of each quarter
third_month_data = HPI_Index_Data[HPI_Index_Data['Month'].isin([3, 6, 9, 12])]

# Dropping the 'Month' column after filtering
third_month_data = third_month_data.drop(columns=['Month'])

# Filtering for years 2020 to 2024
filtered_data = third_month_data[third_month_data['Year'].isin([2020, 2021, 2022, 2023, 2024])]

# Display 
print(filtered_data)


1
    City of London Barking & Dagenham   Barnet   Bexley    Brent  Bromley  \
303        847,240            300,166  522,194  339,110  467,597  431,300   
306        871,336            298,994  521,636  341,318  486,162  425,411   
309        800,735            300,833  532,669  345,487  522,852  437,115   
312        797,973            308,256  528,283  352,551  498,195  452,778   
315        703,795            304,797  544,008  362,419  492,811  455,847   
318        753,197            310,184  549,050  366,458  494,369  469,054   
321        910,270            308,365  568,815  374,948  486,748  472,067   
324      1,006,720            313,770  580,077  382,834  489,429  477,399   
327        889,061            327,669  582,333  388,419  493,298  486,891   
330        885,382            332,345  597,335  398,990  516,404  502,723   
333        910,396            338,673  602,206  410,345  555,888  514,974   
336        897,720            345,399  607,032  417,739  532,528  520,825 

# 2) Economic Growth (in %)¶

In [6]:
import pandas as pd

# Loading the dataset
file_path = '/Users/firishtah/Desktop/LSE Autumn 24/ST 449 Artificial Intelligence/Data/EconomicGrowth_UK.csv'
Economic_Growth_Data = pd.read_csv(file_path)

# Renaming columns
Economic_Growth_Data.rename(columns={
    'Gross domestic product (GDP) growth rate in the United Kingdom 2029': 'Year',
    'Unnamed: 1': 'GDP Growth',
    'Unnamed: 2': 'Percentage'
}, inplace=True)

# Stripping leading/trailing spaces and 
Economic_Growth_Data['Year'] = Economic_Growth_Data['Year'].astype(str).str.strip()

# Removing non-numeric characters 
Economic_Growth_Data['Year'] = Economic_Growth_Data['Year'].replace(r'\D', '', regex=True)

# Converting 'Year' to numeric and coerce errors
Economic_Growth_Data['Year'] = pd.to_numeric(Economic_Growth_Data['Year'], errors='coerce')

# Removing rows where 'Year' is NaN
Economic_Growth_Data.dropna(subset=['Year'], inplace=True)

# Removing any rows where 'GDP Growth' or 'Year' is NaN
Economic_Growth_Data.dropna(subset=['GDP Growth'], inplace=True)

# Converting 'GDP Growth' to numeric 
Economic_Growth_Data['GDP Growth'] = pd.to_numeric(Economic_Growth_Data['GDP Growth'], errors='coerce')

# Filtering the data to include only years from 2020 to 2024
Filtered_Economic_Growth_Data = Economic_Growth_Data[(Economic_Growth_Data['Year'] >= 2020) & (Economic_Growth_Data['Year'] <= 2024)].copy()

# Droping the 'Percentage' column
Filtered_Economic_Growth_Data.drop(columns=['Percentage'], inplace=True)

# Converting the 'Year' column to integer to remove the decimal part
Filtered_Economic_Growth_Data['Year'] = Filtered_Economic_Growth_Data['Year'].astype(int)

# Display
print(Filtered_Economic_Growth_Data)


   Year  GDP Growth
3  2020      -10.30
4  2021        8.58
5  2022        4.84
6  2023        0.34
7  2024        1.08


# 3) Council Tax per Borough

In [14]:
import pandas as pd

# Load the dataset
file_path = '/Users/firishtah/Desktop/LSE Autumn 24/ST 449 Artificial Intelligence/Data/Council Tax per dwelling.csv'
Council_Tax_Data = pd.read_csv(file_path)

# Renaming columns to remove commas and extra spaces from the year column names
Council_Tax_Data.columns = ['Authority', '2020', '2021', '2022', '2023', '2024', 'Unnamed: 6']

# Dropping the unnecessary 'Unnamed: 6' column
Council_Tax_Data.drop(columns=['Unnamed: 6'], inplace=True)

# Cleaning the data by converting to numeric and handling potential non-numeric characters
for year in ['2020', '2021', '2022', '2023', '2024']:
    Council_Tax_Data[year] = Council_Tax_Data[year].replace({',': ''}, regex=True)  
    Council_Tax_Data[year] = pd.to_numeric(Council_Tax_Data[year], errors='coerce') 

# Filling missing values using forward fill method 
Council_Tax_Data.ffill(inplace=True)

# Removing unnecessary decimal places by rounding or converting to integers
for year in ['2020', '2021', '2022', '2023', '2024']:
    Council_Tax_Data[year] = Council_Tax_Data[year].round().astype(int)  

# Display 
print("Cleaned Council Tax Data:\n", Council_Tax_Data)


Cleaned Council Tax Data:
                Authority  2020  2021  2022  2023  2024
0     Barking & Dagenham  1108  1164  1233  1313  1405
1                 Barnet  1587  1669  1731  1831  1955
2                 Bexley  1471  1546  1605  1716  1821
3                  Brent  1334  1402  1442  1521  1630
4                Bromley  1521  1606  1661  1771  1885
5                 Camden  1419  1419  1511  1649  1717
6         City of London  1140  1171  1214  1347  1398
7                Croydon  1509  1547  1668  1897  1983
8                 Ealing  1359  1421  1483  1582  1664
9                Enfield  1340  1342  1427  1513  1691
10             Greenwich  1107  1149  1196  1290  1370
11               Hackney  1002  1020  1076  1188  1267
12  Hammersmith & Fulham  1041  1098  1137  1218  1313
13              Haringey  1236  1269  1361  1435  1541
14                Harrow  1771  1850  1942  2045  2154
15              Havering  1533  1597  1677  1784  1891
16            Hillingdon  1402  1467  

# 4) Average Mortgage Interest Rate (in %)

In [17]:
import pandas as pd

# Load the dataset
file_path = '/Users/firishtah/Desktop/LSE Autumn 24/ST 449 Artificial Intelligence/Data/Average-mortgage-interest-rate-in-the-uk-2010-2024-per-quarter.csv'
mortgage_data = pd.read_csv(file_path)

# Dropping rows where all values are missing or irrelevant 
mortgage_data.dropna(how='all', inplace=True)

# Reseting the column names based on the first row
mortgage_data.columns = ['Quarter', 'Average Mortgage Interest Rate', 'Year']

# Dropping any rows where 'Quarter' or 'Average Mortgage Interest Rate' are missing
mortgage_data.dropna(subset=['Quarter', 'Average Mortgage Interest Rate'], inplace=True)

# Extracting the year from the 'Quarter' column 
mortgage_data['Year'] = mortgage_data['Quarter'].str.extract(r'(\d{4})')

# Cleaning the 'Quarter' column 
mortgage_data['Quarter'] = mortgage_data['Quarter'].str.extract(r'(Q\d)')

# Remove the 'in %' string from the 'Average Mortgage Interest Rate' and converting it to numeric
mortgage_data['Average Mortgage Interest Rate'] = mortgage_data['Average Mortgage Interest Rate'].replace('in %', '', regex=True)
mortgage_data['Average Mortgage Interest Rate'] = pd.to_numeric(mortgage_data['Average Mortgage Interest Rate'], errors='coerce')

# Reordering the columns to match the desired format
mortgage_data = mortgage_data[['Quarter', 'Year', 'Average Mortgage Interest Rate']]

# Removing the first 40 rows 
mortgage_data_cleaned = mortgage_data.iloc[40:].reset_index(drop=True)

# Reseting the index for cleanliness
mortgage_data_cleaned.reset_index(drop=True, inplace=True)

# Display 
print(mortgage_data_cleaned)



   Quarter  Year  Average Mortgage Interest Rate
0       Q1  2020                            1.84
1       Q2  2020                            1.77
2       Q3  2020                            1.74
3       Q4  2020                            1.85
4       Q1  2021                            1.91
5       Q2  2021                            1.92
6       Q3  2021                            1.82
7       Q4  2021                            1.57
8       Q1  2022                            1.64
9       Q2  2022                            1.98
10      Q3  2022                            2.59
11      Q4  2022                            3.38
12      Q1  2023                            4.20
13      Q2  2023                            4.56
14      Q3  2023                            4.85
15      Q4  2023                            5.31
16      Q1  2024                            4.96
17      Q2  2024                            4.80


# 5) Consumer Price Index

In [20]:
import pandas as pd

# Load the dataset
file_path = '/Users/firishtah/Desktop/LSE Autumn 24/ST 449 Artificial Intelligence/Data/CPI-2000-2024.csv'
cpi_data = pd.read_csv(file_path)

# Renaming the columns for clarity
cpi_data.columns = ['Date', 'CPI']

# Dropping rows where 'Date' is NaN 
cpi_data = cpi_data.dropna(subset=['Date'])

# Reseting index after dropping rows
cpi_data.reset_index(drop=True, inplace=True)

# Extracting year and quarter information
cpi_data['Year'] = cpi_data['Date'].str.extract(r'(\d{4})')
cpi_data['Quarter'] = cpi_data['Date'].str.extract(r'(Q\d)')

# Creating a new column 'Datetime' to represent the start of each quarter
cpi_data['Datetime'] = pd.to_datetime(cpi_data['Year'] + '-' + cpi_data['Quarter'].str.extract(r'(\d)')[0] + '-01', format='%Y-%m-%d')

# Converting 'CPI' column to numeric values, coercing errors to NaN
cpi_data['CPI'] = pd.to_numeric(cpi_data['CPI'], errors='coerce')

# Dropping any rows with NaN in the 'CPI' column
cpi_data = cpi_data.dropna(subset=['CPI'])

# Reseting index after cleaning
cpi_data.reset_index(drop=True, inplace=True)

# Dropping the 'Date' and 'Datetime' columns
cpi_data = cpi_data.drop(columns=['Date', 'Datetime'])

# Rearranging columns to have 'Quarter', 'Year', and 'CPI'
cpi_data = cpi_data[['Year','Quarter', 'CPI']]

# Converting 'Year' column to integer type
cpi_data['Year'] = cpi_data['Year'].astype(int)

# Filtering data for the years 2020 to 2024
cpi_data = cpi_data[(cpi_data['Year'] >= 2020) & (cpi_data['Year'] <= 2024)]

# Display the first few rows after filtering
print(cpi_data)


    Year Quarter    CPI
80  2020      Q1  108.5
81  2020      Q2  108.5
82  2020      Q3  108.9
83  2020      Q4  109.0
84  2021      Q1  109.2
85  2021      Q2  110.7
86  2021      Q3  111.9
87  2021      Q4  114.4
88  2022      Q1  115.9
89  2022      Q2  120.9
90  2022      Q3  123.2
91  2022      Q4  126.7
92  2023      Q1  127.7
93  2023      Q2  131.1
94  2023      Q3  131.4
95  2023      Q4  132.0
96  2024      Q1  132.3
97  2024      Q2  133.8
98  2024      Q3  134.1


# 6) FTSE 100 Index

In [23]:
import pandas as pd

# Load the dataset
file_path = '/Users/firishtah/Desktop/LSE Autumn 24/ST 449 Artificial Intelligence/Data/FTSE 100 Index-1995-2023.csv'
ftse_data = pd.read_csv(file_path, header=None)

# Dropping the first row and any completely empty rows
ftse_data = ftse_data.drop(index=[0, 1])  # Drop the first two rows (metadata)
ftse_data = ftse_data.dropna(how='all')  # Remove rows where all columns are NaN

# Renaming columns for clarity
ftse_data.columns = ['Year', 'FTSE_100_Index']

# Reseting the index
ftse_data.reset_index(drop=True, inplace=True)

# Converting the 'Year' column to integer and 'FTSE_100_Index' to float
ftse_data['Year'] = ftse_data['Year'].astype(int)
ftse_data['FTSE_100_Index'] = ftse_data['FTSE_100_Index'].str.replace(',', '').astype(float)

# Display 
ftse_data.head()

Unnamed: 0,Year,FTSE_100_Index
0,1995,3689.3
1,1996,4118.5
2,1997,5135.5
3,1998,5882.6
4,1999,6743.0


# 7) Households Debt to Income ratiopercentage: NSA

In [26]:
import pandas as pd

# Load the dataset
file_path = '/Users/firishtah/Desktop/LSE Autumn 24/ST 449 Artificial Intelligence/Data/Households Debt to Income .csv'
household_debt_data = pd.read_csv(file_path)

# Renaming columns to make them clearer
household_debt_data.columns = ['Title', 'Debt_to_Income_Ratio']

# Removing the first few metadata rows
household_debt_data_cleaned = household_debt_data.iloc[5:].reset_index(drop=True)

# Dropping any rows where 'Debt_to_Income_Ratio' is NaN
household_debt_data_cleaned['Debt_to_Income_Ratio'] = pd.to_numeric(household_debt_data_cleaned['Debt_to_Income_Ratio'], errors='coerce')

# Dropping rows where 'Debt_to_Income_Ratio' is NaN 
household_debt_data_cleaned.dropna(subset=['Debt_to_Income_Ratio'], inplace=True)

# Extracting the 'Quarter' information from the 'Title' column
household_debt_data_cleaned['Quarter'] = household_debt_data_cleaned['Title']

# Dropping the 'Title' column as it's no longer needed
household_debt_data_cleaned.drop(columns=['Title'], inplace=True)

# Extracting Year from the 'Quarter' column and create a new 'Year' column
household_debt_data_cleaned['Year'] = household_debt_data_cleaned['Quarter'].str.split(' ').str[0]

# Converting the 'Year' column to an integer for numerical processing
household_debt_data_cleaned['Year'] = household_debt_data_cleaned['Year'].astype(int)

# Keeping only the quarter information in the 'Quarter' column
household_debt_data_cleaned['Quarter'] = household_debt_data_cleaned['Quarter'].str.split(' ').str[1]

# Rearranging columns to the desired order
household_debt_data_cleaned = household_debt_data_cleaned[['Year', 'Quarter', 'Debt_to_Income_Ratio']]

# Filtering for rows where the Year is between 2020 and 2024
filtered_data = household_debt_data_cleaned[(household_debt_data_cleaned['Year'] >= 2020) & (household_debt_data_cleaned['Year'] <= 2024)]

# Display 
filtered_data

Unnamed: 0,Year,Quarter,Debt_to_Income_Ratio
134,2020,Q1,134.7
135,2020,Q2,135.4
136,2020,Q3,136.7
137,2020,Q4,137.8
138,2021,Q1,137.8
139,2021,Q2,137.7
140,2021,Q3,137.3
141,2021,Q4,136.2
142,2022,Q1,136.4
143,2022,Q2,136.0


# 8) Crime Data 

In [28]:
import pandas as pd

# File path for the Crime Data by Boroughs CSV file
file_path_crime = '/Users/firishtah/Desktop/LSE Autumn 24/ST 449 Artificial Intelligence/Data/Crime Data by boroughs.csv'
crime_data = pd.read_csv(file_path_crime)

# Renaming column if needed 
if 'Month_Year' not in crime_data.columns:
    print("Column 'Month_Year' not found. Available columns are:", crime_data.columns)
else:
    # Converting 'Month_Year' to datetime format
    crime_data['Month_Year'] = pd.to_datetime(crime_data['Month_Year'])

    # Splitting Month_Year into Year and Quarter
    crime_data['Year'] = crime_data['Month_Year'].dt.year
    crime_data['Quarter'] = crime_data['Month_Year'].dt.to_period('Q').dt.strftime('Q%q')

    # Filtering for rows where the month is the last month of each quarter 
    last_month_of_quarter = crime_data[crime_data['Month_Year'].dt.month.isin([3, 6, 9, 12])]

    # Ensuring the dataset has only one entry for each quarter (Q1, Q2, Q3, Q4) per year
    filtered_data = last_month_of_quarter.groupby(['Year', 'Quarter', 'Borough_SNT'], as_index=False).first()

    # Dropping the 'Month_Year' column
    filtered_data = filtered_data.drop(columns=['Month_Year'])

    # Reordering columns
    filtered_data = filtered_data[['Year', 'Quarter', 'Borough_SNT', 'Count']]

    # Display the filtered dataset
    #print(filtered_data.head(50))

# Sorting the data by Year, Quarter, and Borough
filtered_data = filtered_data.sort_values(by=['Year', 'Quarter', 'Borough_SNT'])

# Pivoting the data for a better arrangement 
pivoted_data = filtered_data.pivot_table(index=['Year', 'Quarter'], columns='Borough_SNT', values='Count', aggfunc='sum')

# 3. Resetting index for better readability
pivoted_data.reset_index(inplace=True)

# 4. Display 
print(pivoted_data.head(150))



Borough_SNT  Year Quarter  Barking and Dagenham  Barnet  Bexley   Brent  \
0            2020      Q4                1641.0  2368.0  1325.0  2521.0   
1            2021      Q1                1777.0  2732.0  1375.0  2485.0   
2            2021      Q2                1917.0  2676.0  1405.0  2641.0   
3            2021      Q3                1773.0  2430.0  1409.0  2562.0   
4            2021      Q4                1712.0  2399.0  1424.0  2374.0   
5            2022      Q1                1955.0  2646.0  1593.0  2957.0   
6            2022      Q2                1942.0  2439.0  1508.0  2815.0   
7            2022      Q3                1803.0  2464.0  1509.0  2721.0   
8            2022      Q4                1899.0  2264.0  1257.0  2422.0   
9            2023      Q1                1960.0  2595.0  1503.0  2787.0   
10           2023      Q2                2072.0  2559.0  1472.0  3138.0   
11           2023      Q3                2130.0  2393.0  1563.0  2933.0   
12           2023      Q4

# 9) School Quality Data

In [34]:
import pandas as pd

file_path = '/Users/firishtah/Desktop/LSE Autumn 24/ST 449 Artificial Intelligence/Data/cleaned_school_data.csv'

# Loading the dataset
school_data = pd.read_csv(file_path)

# Removing the specified columns
columns_to_remove = ['URN', 'LOCALITY', 'TOWN', 'ISPRIMARY', 'ISSECONDARY', 'ISPOST16', 'GENDER']
school_data_cleaned = school_data.drop(columns=columns_to_remove)

# Renaming columns
school_data_cleaned = school_data_cleaned.rename(columns={'LANAME': 'Borough Name', 'SCHNAME': 'School Name'})

# Display 
school_data_cleaned


Unnamed: 0,Borough Name,School Name,OFSTEDRATING,STREET
0,City of London,The Aldgate School,Outstanding,St James's Passage
1,City of London,City of London School for Girls,,St Giles' Terrace
2,City of London,St Paul's Cathedral School,,2 New Change
3,City of London,City of London School,,107 Queen Victoria Street
4,Camden,Argyle Primary School,Good,Tonbridge Street
...,...,...,...,...
1766,Barnet,Osidge Primary School,,Chase Side
1767,Brent,St Mary's RC Primary School,,Canterbury Road
1768,Lewisham,The Anchor SENDfriendly Centre,,303 Hither Green Lane
1769,Westminster,Millbank Gardens Primary Academy,,Erasmus Street


# 10) Environment Data 

In [63]:
# Load the dataset
green_belt_file_path = '/Users/firishtah/Desktop/LSE Autumn 24/ST 449 Artificial Intelligence/Data/cleaned-green-belt-land-borough.csv'
green_belt_data = pd.read_csv(green_belt_file_path)

# Dropping rows where all elements are NaN
green_belt_data = green_belt_data.dropna(how='all')

# Dropping columns that are unwanted 
green_belt_data = green_belt_data.loc[:, ~green_belt_data.columns.str.contains('^Unnamed')]

# Renaming columns to meaningful names
green_belt_data = green_belt_data.rename(columns={
    'Area': 'Borough Names',
    '2020/21': '2021',
    '2021/22': '2022',
    '2022/23': '2023',
    '2023/24': '2024',
    'Net Change 2007 to 2023/24': 'Net Change 2007 to 2024'
})

# Display 
green_belt_data.head()


Unnamed: 0,Code,Borough Names,2021,2022,2023,2024,Net Change 2007 to 2024
1,00AB,Barking and Dagenham,530,530,530,530,90
2,00AC,Barnet,2380,2380,2380,2380,-90
3,00AD,Bexley,1110,1110,1110,1120,0
4,00AF,Bromley,7660,7660,7660,7660,-70
5,00AH,Croydon,2190,2190,2190,2190,-120


# 11) Transport Accessibility Data

In [66]:
import pandas as pd

# URL for the dataset
url = "https://raw.githubusercontent.com/ukonward/network_effects/refs/heads/main/job_access_by_journey_time.csv"

# Loading the data with low_memory=False to suppress dtype warnings
data = pd.read_csv(url, low_memory=False)

# List of columns to drop that are less relevant 
columns_to_drop = [
    'COUNTRY', 'RGN11CD', 'RGN11NM', 'AREAEHECT', 'D15_jobspersqkm', 
    'D30_jobspersqkm', 'D60_jobspersqkm', 'D90_jobspersqkm'
]

# Dropping the specified columns
filtered_data = data.drop(columns=columns_to_drop)

# Display 
print(filtered_data.head())


   LSOA_code   LAD_code              LAD_name  MSOA_code  POP2019  BRESEMP19  \
0  E01000001  E09000001        City of London  E02000001     1636      15000   
1  E01000002  E09000001        City of London  E02000001     1558      42000   
2  E01000003  E09000001        City of London  E02000001     1786       1750   
3  E01000005  E09000001        City of London  E02000001     1888      20000   
4  E01000006  E09000002  Barking and Dagenham  E02000017     2094        150   

   DRIVE15  DRIVE30  DRIVE60  DRIVE90  ...  PT60_jobspersqkm  \
0  1031100  2310650  3565415  5526105  ...       5500.395979   
1  1030425  2278645  3557210  5535875  ...       5607.047528   
2  1088775  2257855  3595790  5638845  ...       5755.598966   
3   828220  1916185  3516830  5490115  ...       5424.903531   
4    25790   152060  2262760  6446535  ...       7114.077619   

   PT90_jobspersqkm    3mile    5mile   10mile   35mile  ratio_5m_pt60  \
0       2449.081472  2224830  2961515  4279825  8105545     