<a href="https://colab.research.google.com/github/srudkin12/RegionalBallMapper/blob/main/Session_3_TDABM_Regional_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This Google Colab document has been set up to run in R and contains the code for the Parts A and B of the Topological Data Analysis Ball Mapper for Regional Analysis workshop held at the University of Manchester on Thursday 14th July 2022

For those unfamiliar with the operation of Jupyter Notebook based systems, to run a cell you need to either click the play button (the little triangle) or press SHIFT and ENTER at the same time.

Although this document runs all of the code you need, it is encouraged to also make reference to the commentary documents available on the workshop GitHub page.

In [None]:
install.packages("dplyr")
install.packages("BallMapper")
library(dplyr)
library(BallMapper)

The following line differs from the code in the commentary and is especially for reading files directly from GitHub into the GoogleColab 

In [None]:
url = 'https://raw.githubusercontent.com/srudkin12/RegionalBallMapper/main/region1.csv'
dtx<-read.table(url,sep=",",header=TRUE)


A first step is always to view the data to make sure that it appears as you would expect

In [None]:
head(dtx)

Let us view the correlations within the dataset

In [None]:
cor(dtx[,3:ncol(dtx)])

Given the high correlations between many pairs of variables, we will create a small subset of variables and call this dty. The final line creates a dummy variable for a local authority district having a percentage of households with at least one resident with a university degree being greater than 33%

In [None]:
dty<-cbind(dtx[,1:2],dtx$QualLevel4,dtx$Deprivation0,dtx$Accommodation,dtx$Married,dtx$HealthVeryGood,dtx$OwnedMortgage)
names(dty)<-c("geog","geogcode","QualLevel4","Deprivation0","Accommodation","Married","HealthVeryGood","OwnedMortgage")
dty$QL4<-as.numeric(dty$QualLevel4>33) 

Here we use a user defined function called sstatsmat to create a very simple summary statistics table. The remainder of the block then converts to a dataframe and displays. In this code we do not save to a .csv file.

In [None]:

sstatsmat<-function(characteristics,decp){
 if(missing(decp)) decp <- 2
 a001<-ncol(characteristics)
 sstats<-matrix(0,nrow=a001,ncol=5)
 for(i in 1:a001){
  j<-i
  sstats[i,1]<-names(characteristics)[j]
  sstats[i,2]<-round(mean(characteristics[,j]),decp)
  sstats[i,3]<-round(sd(characteristics[,j]),decp)
  sstats[i,4]<-round(min(characteristics[,j]),decp)
  sstats[i,5]<-round(max(characteristics[,j]),decp)
 }
 return(sstats)
}

s001<-sstatsmat(dty[,3:8]) # Creates an object with the summary statistics
s001<-as.data.frame(s001) # Convert to data frame
names(s001)<-c("Variable","Mean","s.d.","Min","Max")
s001

Create the correlation matrix for the reduced data set

In [None]:
c001<-cor(dty[,3:8])
c001

**Plotting**

In this section we will be producing graphs. To save these please right click and then select the "Save image as..." option.

We start with a basic scatter, setting the axis limits to 0,60 to reflect the ranges in the summary statistics table.

In [None]:
plot(dty$Deprivation0,dty$Accommodation,pch=16,xlim=c(0,60),ylim=c(0,60),xlab="Deprivation 0",ylab="Accommodation")

In order to visualise the data easier we will create two subsets...

In [None]:
dty0<-subset(dty,dty$QL4==0)
dty1<-subset(dty,dty$QL4==1)

Now we produce a graph on the 0 to 60 axis range with colouration according to the QL4 dummy created earlier

In [None]:
plot(dty0$Deprivation0,dty0$Accommodation,pch=16,xlim=c(0,60),ylim=c(0,60),xlab="Deprivation 0",ylab="Accommodation") # Note the limits are set based on summary statistics
points(dty1$Deprivation0,dty1$Accommodation,pch=16,col="blue")
leg.text=c("Below 33%","Above 33%")
legend("bottomleft",leg.text,pch=16,col=c("black","blue"))

In the second version of the plot we allow R to set the range based on the full set of data for the two axis varaibles. We then colour the points using the subsets. Remember with all the plots you can right click and select "Save image as..."

In [None]:
plot(dty$Deprivation0,dty$Accommodation,pch=16,xlab="Deprivation 0",ylab="Accommodation") # Note the limits are set based on the full data dty and not either dty1 or dty2
points(dty1$Deprivation0,dty1$Accommodation,pch=16,col="blue")
points(dty0$Deprivation0,dty0$Accommodation,pch=16,col="red")
leg.text=c("Below 33%","Above 33%")
legend("bottomleft",leg.text,pch=16,col=c("red","blue"))

# **Part B: Ball Mapper**

From this point forward the example follows Part B of the commentary. Here we introduce BallMapper (Dlotko, 2019) and provide some very basic usage on the dty dataset that was created by following the part A code above.

The first step is to make sure that our outcomes and axis variables are in data.frame objects ready for the BallMapper function

In [None]:
y1<-as.data.frame(dty$QualLevel4)
y2<-as.data.frame(dty$QL4)
x1<-as.data.frame(dty[,4:8])

We know already that Accommodation does not have the same range as some of the other variables. Therefore we use the normalisation function within the BallMapper package to put the variables in x1 onto the scale [0,1]

In [None]:
x2<-normalize_to_min_0_max_1(x1)

We may now create our first BallMapper plot. For this we need to specify the axis variables (x2), the outcome that we want to use for the colouration (y1) and the radius for the balls. Here we choose 0.3.

Again if you wish to save the image you can right click and select "Save image as..."

In [None]:
bm1<-BallMapper(x2,y1,0.3)
ColorIgraphPlot(bm1,seed_for_plotting=123)

We will now create BallMapper plots with different radii. Note that you will not see any output from this block of code and it may take a few moments to run

In [None]:
bm125<-BallMapper(x2,y1,0.25)
bm130<-BallMapper(x2,y1,0.30)
bm135<-BallMapper(x2,y1,0.35)
bm140<-BallMapper(x2,y1,0.40)
bm145<-BallMapper(x2,y1,0.45)
bm150<-BallMapper(x2,y1,0.50)


To view the BM graphs that have been created you can simply edit the bm125 in the block below to one of the other bm numbers (e.g. bm150). 

In [None]:

ColorIgraphPlot(bm125,seed_for_plotting=123)

We can repeat the BallMapper graphs but this time use the QL4 dummy. The colour now represents the proportion of local authority districts within the ball that have a value of 1 on the QL4 dummy.

In [None]:
bm225<-BallMapper(x2,y2,0.25)
bm230<-BallMapper(x2,y2,0.30)
bm235<-BallMapper(x2,y2,0.35)
bm240<-BallMapper(x2,y2,0.40)
bm245<-BallMapper(x2,y2,0.45)
bm250<-BallMapper(x2,y2,0.50)

As with the first set of BM plots here we will only write one plotting line and then leave it for editing to see the other radii. In the commentary bm230 is used.

In [None]:
ColorIgraphPlot(bm225,seed_for_plotting=123)

This is all of the material covered in Session 3 of the Topological Data Analysis Ball Mapper for Regional Analysis workshop. You can find a Colab to go with the Session 4 on the GitHub page.