# Methods at Manchester Session 2

This file provides code for the empirical part of the workshop on Topological Data Analysis Ball Mapper (TDABM) held at the University of Manchester on the 18th November 2025. The code in this file provides replication in R.

In [1]:
library(BallMapper) 
library(tidyverse)
library(sf)
library(tmap)

"package 'ggplot2' was built under R version 4.3.3"
-- [1mAttaching core tidyverse packages[22m ------------------------ tidyverse 2.0.0 --
[32mv[39m [34mdplyr    [39m 1.1.4     [32mv[39m [34mreadr    [39m 2.1.5
[32mv[39m [34mforcats  [39m 1.0.0     [32mv[39m [34mstringr  [39m 1.5.1
[32mv[39m [34mggplot2  [39m 3.5.2     [32mv[39m [34mtibble   [39m 3.2.1
[32mv[39m [34mlubridate[39m 1.9.3     [32mv[39m [34mtidyr    [39m 1.3.0
[32mv[39m [34mpurrr    [39m 1.0.2     
-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31mx[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mi[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors
"package 'sf' was built under R version 4.3.3"
Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE

"

## Loading Data

We can load the data for the session using the read.csv() function

In [3]:
df1<-read.csv("fulldata.csv")
nd1<-read.csv("normalised.csv")

Create a reduced dataframe with only the relevant measures

In [4]:
df2<-as.data.frame(cbind.data.frame(df1$IMD,df1$INC,df1$EMP,df1$EST,df1$HDD,df1$CRM,df1$BHS,df1$LIV,df1$CYP,df1$ADS))
names(df2)<-c("IMD","INC","EMP","EST","HDD","CRM","BHS","LIV","CYP","ADS")

Then create a blank summary statistics dataframe

In [5]:
sdep<-as.data.frame(matrix(0,nrow=ncol(df2),ncol=8))
names(sdep)<-c("Var","Mean","SD","Min","q25","q50","q75","Max")

Then populate the summary statistics using a loop

In [7]:
for(i in 1:ncol(df2)){
    sdep$Var[i]<-names(df2)[i]
    sdep$Mean[i]<-round(mean(df2[,i]),3)
    sdep$SD[i]<-round(sd(df2[,i]),3)
    sdep$Min[i]<-round(min(df2[,i]),3)
    sdep$q25[i]<-round(quantile(df2[,i],0.25),3)
    sdep$q50[i]<-round(quantile(df2[,i],0.5),3)
    sdep$q75[i]<-round(quantile(df2[,i],0.75),3)
    sdep$Max[i]<-round(max(df2[,i]),3)
}

sdep

Var,Mean,SD,Min,q25,q50,q75,Max
<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
IMD,28.45,18.886,0.604,11.861,24.956,43.077,82.633
INC,0.283,0.187,0.011,0.111,0.254,0.435,0.812
EMP,0.17,0.098,0.007,0.085,0.154,0.239,0.555
EST,24.576,18.988,0.093,7.611,20.949,38.149,84.579
HDD,0.561,0.802,-1.97,-0.032,0.617,1.149,2.962
CRM,0.49,0.781,-1.95,-0.033,0.563,1.051,2.575
BHS,19.583,9.732,3.044,12.352,17.954,24.481,60.945
LIV,25.288,14.942,0.46,13.504,22.754,34.32,80.329
CYP,-0.012,0.925,-2.612,-0.646,0.083,0.651,2.752
ADS,0.264,0.114,0.022,0.177,0.253,0.344,0.601


We also construct the correlation matrix. For this purpose R's inbuilt correlation function suffices

In [8]:
Corr_Matrix<-round(cor(df2),3)
Corr_Matrix

Unnamed: 0,IMD,INC,EMP,EST,HDD,CRM,BHS,LIV,CYP,ADS
IMD,1.0,0.975,0.962,0.927,0.888,0.841,0.564,0.21,0.771,0.895
INC,0.975,1.0,0.951,0.912,0.83,0.776,0.537,0.164,0.733,0.916
EMP,0.962,0.951,1.0,0.896,0.865,0.791,0.433,0.055,0.768,0.859
EST,0.927,0.912,0.896,1.0,0.81,0.745,0.446,0.125,0.856,0.918
HDD,0.888,0.83,0.865,0.81,1.0,0.832,0.441,0.131,0.812,0.751
CRM,0.841,0.776,0.791,0.745,0.832,1.0,0.404,0.25,0.75,0.682
BHS,0.564,0.537,0.433,0.446,0.441,0.404,1.0,0.116,0.301,0.525
LIV,0.21,0.164,0.055,0.125,0.131,0.25,0.116,1.0,0.066,0.196
CYP,0.771,0.733,0.768,0.856,0.812,0.75,0.301,0.066,1.0,0.694
ADS,0.895,0.916,0.859,0.918,0.751,0.682,0.525,0.196,0.694,1.0


The correlations are strong, but we do see some correlations involving the Barriers to Housing Services (BHS) and Living Environment (LIV) which are below 0.5

For the analysis we create an additional 3 dummies to capture the most deprived LSOAs

In [11]:
df1$IMD25<-as.numeric(df1$IMD>quantile(df1$IMD,0.75))
df1$IDACI25<-as.numeric(df1$IDACI>quantile(df1$IDACI,0.75))
df1$IDAOPI25<-as.numeric(df1$IDAOPI>quantile(df1$IDAOPI,0.75))

To see the structure of the data we may use scatterplots

# Mapping

In this section we will plot the data to obtain a spatial impression of the information in the .csv files. The first step ensures that the names are of the same type. We then merge the map data with the data file. There are a wealth of libraries available for mapping and if undertaking research with geographic data it is a good idea to spend more time trying to produce effective maps. Here we will be using the maps for illustration only

# Topological Data Analysis Ball Mapper

The elements needed for a TDABM plot are the axis variables, the outcome variable and a choice of radius. In this case we are working with data on different scales so we will make use of the normalised.csv file for this part. The LSOA code will help us later. Before getting started let us ensure that the normalised data is sorted according to the LSOA code. The index is called point ready for merging with the TDABM output.

In [12]:
nd1$point<-seq(1:nrow(nd1))

To construct the TDABM plots, the data needs to be stored as dataframes. We create four colouration dataframes and a dataframe for the axes.

In [17]:
adf<-nd1[,-1] 
adf<-adf[,-10]

cdf1<-as.data.frame(df1$IMD)
cdf2<-as.data.frame(df1$IMD25)
cdf3<-as.data.frame(df1$IDACI25)
cdf4<-as.data.frame(df1$IDAOPI25)

The BallMapper graph is constructed by using the BallMapper command to generate a BallMapper object

In [18]:
bm1<-BallMapper(adf,cdf1,5)

The plotting code is simply the ColorIgraphPlot() function. There is an input called seed which allows you to play with the way in which the abstract plot appears on the page. Because all edges and balls remain the same, the output is topologically faithful whatever seed is used