Code to support "Map line-ups: effects of spatial structure on graphical inference".
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Supplementary details for Map LineUps: effects of spatial structure on graphical inference

Roger Beecham (contact
Jason Dykes
Wouter Meulemans
Aidan Slingsby
Cagatay Turkay
Jo Wood

This document contains supplementary information for our paper, Map LineUps: effects of spatial structure on graphical inference. It outlines the procedure for the experiment (on which the paper is based), code that can be used to run the experiment locally and code used in the data analysis. Note that this draws heavily on work published in Harrison et al. and Kay & Heer.


Below is some code and discussion of how we generate the stimuli used in the experiment. The survey software can be found in experiment/. Instructions for trying this locally are at the end of this section.

All stimuli used in the experiment were created using R.

Configure R

The following libraries are required for generating the maps used as stimuli.

# For spatial data handling
# For charting
# For data munging

Load shapefiles


Generating real study regions

Motivating our study is the need to evaluate Wickham et al's line-up protocol when applied to choropleth maps. We want to come up with recommendations for constructing line-up tests in real data analysis scenarios. So we want ecological validity in the stimuli we create. Because of this, we believe there's a strong argument for using real geographic regions as well as testing against more contrived situations (e.g. regular grids).

Below are two approaches to generating these real regions using English Output Areas (OAs):

  1. Load a SpatialDataFrame containing all OAs, sample an OA, then find its 50 nearest neighbours.
  2. Load a spatial data frame containing all Middle Super Output Areas (MSOAs). MSOAs contain on average 25 OAs. Sample an MSOA and find its nearest neighbour -- thus we end up with c. 50 geographic units, but their grouping is more real since MSOAs are a genuine administrative geography.

Next, we need to decide on different geometries of these regions to use in the testing. We want to see if ability at performing line-up tests varies with regions of increasingly irregularity. But we again want these definitions of irregular geometry to be plausible. One approach may be to find this distribution of plausible regions empirically: generate c.1000 regions using approach 2, calculate summary statistics on these and sample from different parts of this distribution.

The two summary statistics we explored are Coefficient of Variation in area and Nearest Neighbour Index (NNI).

# Function for exploring study regions of varying geography
find.geography <- function(england_msoas, msoas, england_OAs, coef_var_min, coef_var_max, nni_min, nni_max, calculate_coef_var)
    sample_msoa <- england_msoas[sample(england_msoas@data$row, 1),]
    sample_msoas <- england_msoas@data[get.knnx(coordinates(england_msoas),coordinates(sample_msoa), k=2)$nn.index,]
    sample_OAs <- msoas[which( msoas$msoa %in% sample_msoas$msoa ),]
      sample_geoms <- england_OAs[england_OAs@data$CODE %in%  sample_OAs$oa,]
      centroids <- gCentroid(sample_geoms, byid=TRUE)
      nni<- mean(nndist(centroids@coords))/(0.5*sqrt(gArea(sample_geoms, byid=FALSE)/length(sample_geoms@data$row)))
      coef_var <- sd(gArea(sample_geoms, byid=TRUE)/1000/1000)/mean(gArea(sample_geoms, byid=TRUE)/1000/1000)
          if(coef_var > coef_var_min && coef_var < coef_var_max && length(sample_geoms) > 44 && length(sample_geoms) < 56 )
        if(nni > nni_min && nni < nni_max && length(sample_geoms) > 44 && length(sample_geoms) < 56 )
regular <- find.geography(england_msoas, msoas, england_OAs, 0, 0.5, NULL, NULL, TRUE)
less_regular <- find.geography(england_msoas, msoas, england_OAs, 0.9, 1.0, NULL, NULL, TRUE)
irregular <- find.geography(england_msoas, msoas, england_OAs, 1.3, 1.4, NULL, NULL, TRUE)
more_irregular <- find.geography(england_msoas, msoas, england_OAs, 1.8, 1.9, NULL, NULL, TRUE)

plot of chunk generating_geographies

Generating autocorrelated maps

We first define a function for generating synthetic attribute data: rectangular distributions in this case. <- function(data)
  data@data$value <- NULL
  nrows <- nrow(data@data)
  temp <- data.frame(matrix(ncol = 2, nrow = nrows))
  temp[,1] <- 1:nrows
  dist <- runif(nrows, min=1, max=10)
  temp[,2] <- dist
  colnames(temp) <- c("index","value")
  temp[,1] <- data@data$CODE
  colnames(temp) <- c("CODE","value")
  data@data <- merge(data@data, temp, by="CODE")

Next we develop a function for creating maps with a stated Moran's I. The simplest means is the permutation based approach used in Wickham et al.. The problem is that this becomes very slow where we wish to generate even moderate Moran's I. An alternative option (more of an edit): randomly pick pairs of OAs, swap the attribute values and if difference in I to the target Moran's I decreases, keep the values swapped. <- function(data, min, max)
  data_nb <- poly2nb(data)
  data_dsts <- nbdists(data_nb, coordinates(data))
  idw <- lapply(data_dsts, function(x) 1/x)
  data_lw <- nb2listw(data_nb, glist = idw)
    # Start with a new permutation
    permutation <- data@data[sample(nrow(data@data)),]
    i_now <- moran.test(permutation$value, data_lw)$estimate[1]
    i_old <- i_now
    d_old <-NULL
    d_old <- sqrt((i_now-max)^2)
    # If after 5000 attempts, still not reached desired Moran's I, then try a new permutation.
    for(j in 1:5000)
      # Break if reached desired I.
      if(i_now >= min && i_now < max)
      # Sample two distinct positions  
      swap_index <- sample(1:nrow(permutation),2, replace=FALSE)
      # Get values corresponding to these positions
      swap_values <- sapply(swap_index,function(row_index){return(permutation$value[row_index])})
      # Swap the values
      # Calculate new Moran's I  
      i_now <- moran.test(permutation$value, data_lw)$estimate[1]
      # Revert back if it's not reduced distance to targets
      if(d_now < d_old)
       i_old <- i_now
       d_old <- d_now
      else # Revert if no nearer target
    #break if reached desired I.
    if(i_now > min && i_now <=max)
  data@data <- cbind(data@data,permutation$value);
  colnames(data@data)[4] <- "permutation"
  # Draw maps
  map <- tm_shape(data) +
    tm_fill(c("permutation"),style="cont", palette="YlOrBr")+
    tm_borders(col="gray80", lwd=1)+
regular <-
map1 <-, 0.9,0.91)
map2 <-, 0.7,0.71)
map3 <-, 0.5,0.51)
map4 <-, 0.3,0.31)
map5 <-, 0.1,0.11)
print(map1, vp=viewport(layout.pos.col=1))
print(map2, vp=viewport(layout.pos.col=2))
print(map3, vp=viewport(layout.pos.col=3))
print(map4, vp=viewport(layout.pos.col=4))
print(map5, vp=viewport(layout.pos.col=5))

plot of chunk generating_geographies

R scripts for generating maps in the directory structure used by our survey software can be found in src/: generate_tests.R.

Trying the survey

You will need to install MySQL and MAMP -- the latter of which enables PHP and MySQL to be run locally. After installing MAMP, select preferences, web server and navigate to ../experiment/. After start servers, open a browser and enter localhost:8888/admin_initialize.php. As well as creating the MySQL dbase (called maplineups) and required tables, this reads the pushed sample maps stored as .png files in tests/ and creates a ../stimuli/ folder, which is used by the dbase and survey software. To reduce the size of this repository regular grids only have been uploaded. Open a MySQL dbase connection via root and use the recently created maplineups dbase. This should contain five tables: lineup, lineupanswer, map, participantgroup, user. To try the survey enter into a browser: localhost:8888.

Do get in touch if you encounter any problems following these (rather terse) instructions:


# Required packages
# Data
data <- read.csv("data/data.csv", header = TRUE)

Data cleaning

We are only able to publish aggregate JND scores rather than the raw interactions on which the JNDs are based. These can be compared with those published at Harrison et al.'s github.

Our data exhibit substantial compression of JNDs due to approach: where the approach is from above and baseline Moran's I is high, scores are overly squashed; and the same happens when the base is low and the approach is from below.

# Plot raw data and fit linear regression as described in Figure 2 of Kay & Heer -- highlights the problem of compression due to approach.
colours <- brewer.pal(5,"Set1")
names(colours) <- levels(data$approach)
col_scale <- scale_colour_manual(name = "aproach",values = colours)
data  %>%
  ggplot(aes(x=base, y=jnd, color=approach)) +
  geom_point(size=3, alpha=.25) +  
  stat_smooth(method=lm, se=FALSE, size=1, linetype="dashed") +
  stat_smooth(method=lm, se=FALSE, aes(group=NA), size=1, color="black", linetype="dashed") +
  scale_color_manual(values=c(colours[2], colours[1]))+
  scale_x_continuous(limits=c(0.1,1), breaks=seq(0,1, by=0.2))+

plot of chunk raw_jnds_compressed

Such a large compression of scores is not observed in Harrison et al.'s derived JNDs. We also see evidence of substantial between-participant variability (also observed in Harrison et al.). Notice that variability in JNDs increases as the geography becomes more irregular.

Harrison et al. discuss the problem of outliers and use a threshold of 3 absolute deviations from the median for each base x approach x visualization-type combination. This does not solve the substantial outliers we observe for the irregular real geography and a base of 0.7/0.8. Harrison et al. also identify a chance boundary for JND – the JND in the staircase procedure that would result from participants randomly guessing through the staircase (JND = 0.45). We also calculate a chance boundary for JND by simulating the staircase procedure, but pay attention to how this boundary varies by each test-case (approach x base pair).

# Calculate chance by running the simulate_chance.R script
data <- merge(data, chance_by_condition)
colnames(chance_by_condition)<- c("base","approach","chance_jnd")
data <- merge(data, chance_by_condition)
data <- data %>% mutate(is_chance = jnd>chance_jnd)

Clearly, chance in the staircase will vary for different approach x base pairs and will tend towards the ceilings where the base is high and the approach is from above and the floors where the approach is from below and the base is low. The censoring method described in Kay & Heer may be one approach to treating outliers where scores are not artificially compressed – for example where the base is 0.8, the approach is from below and the estimated JND is 0.7: an obvious outlier. This score would be censored to 0.4, min(base−0.05,0.4). Given the precision with which we estimate JND, simply censoring to these thresholds would not, as we understand it, remove the observed compression effect. As an example, if the approach is from above and the baseline Moran’s I is 0.7, then Kay & Heer’s censoring would limit JNDs to min(0.95 − base, 0.4) → 0.25 – too small given the JNDs we estimate for 0.7 using the below approach.

The method described in the paper for removing the artificial compression effect due to approach is to remove approach x base pairs where there is not enough data difference to play with - below with a base of 0.3 and 0.2, and above with a base of 0.7, 0.8 and 0.9:

# Remove ceiling and floors due to approach
data <- data %>%
  mutate(exclude = ifelse( (base < 0.4  & approach == "below")
                           | (base> 0.6 & approach == "above"),
                         TRUE, FALSE))
# And resample midbases to prevent giving greater weight to these test cases in the analysis.

Finally, we decide on how to clean outliers. Given that our test is comparatively more challenging than the non-spatial equivalent – it is conceivable that, for the irregular geography, participants could not distinguish between a Moran’s I of 0.4 and 0.8 – we decide against pinning outliers to the chance threshold used in Kay & Heer (of ~0.4). Instead we remove all estimated JNDs where the accuracy rate on which the score is based begins to approach chance (< 0.55).

Model specification

We first compare differences in mean JND observed for each geography and find that as the geometry of our study regions becomes more irregular, JND increases.

cohens_data <- data_model %>% filter(accuracy>0.55)
cohen.d(cohens_data[cohens_data$geography=="1_grid",]$jnd, cohens_data[cohens_data$geography=="3_irreg" ,]$jnd)
## d estimate: -0.7060453 (medium)
## 95 percent confidence interval:
##        inf        sup
## -0.9107018 -0.5013888

cohen.d(cohens_data[cohens_data$geography=="1_grid",]$jnd, cohens_data[cohens_data$geography=="2_reg" ,]$jnd)
## d estimate: -0.3071126 (small)
## 95 percent confidence interval:
##        inf        sup
## -0.5037327 -0.1104925

cohen.d(cohens_data[cohens_data$geography=="2_reg",]$jnd, cohens_data[cohens_data$geography=="3_irreg" ,]$jnd)
## d estimate: -0.4094879 (small)
## 95 percent confidence interval:
##       inf        sup
## -0.5964104 -0.2225653

Following Kay & Heer, we compare linear models with and without log transformation of the outcome (JND). We borrow the plotting function provided at Kay & Heer's github. Log transformation improves problems of skew and kurtosis in residuals for the regular real geography.

m_linear_reg <- gamlss(jnd ~ base,data=data_model%>% filter(geography=="2_reg", accuracy>0.55)
m_loglinear_reg <- gamlss(jnd ~ base, data=data_model %>% filter(geography=="2_reg", accuracy>0.55), family=LOGNO)
# This uses the plotting function provided by Kay & Heer.

plot of chunk residuals_linear_vs_log

Kay & Heer also identify the problem of participant effects. Each participant contributes up to four data points and we might expect two randomly sampled observations from the same participant to be more similar than from a different participant. We therefore add a varying-intercept random effect on participant. We define one model for each geography.

m_grid <- lmer(log(jnd) ~ base + (1|user_id), data=data_model %>% filter(geography=="1_grid", accuracy>0.55))
m_reg <- lmer(log(jnd) ~ base + (1|user_id), data=data_model %>% filter(geography=="2_reg", accuracy>0.55))
m_irreg <- lmer(log(jnd) ~ base + (1|user_id), data=data_model %>% filter(geography=="3_irreg", accuracy>0.55))
library(MuMIn) # for pseudo-rsquared

We evaluate the models using estimated regression coefficients, model fits and prediction intervals. Note that direct comparisons between regular grid and the real geographies should be treated cautiously since we do not have data at bases of 0.2 and 0.9 for the regular grid case.

complexity exp(intercept) exp(slope) pseudo R2 fixed pseudo R2 fixed+random
regular grid 0.311 -0.344 0.15 0.16
regular real 0.312 -0.438 0.12 0.42
irregular real 0.424 -0.370 0.13 0.51

plot of chunk plot_prediction_intervals

Relevant literature

A reference list can be found in the full paper. Whilst the work described here relates most directly to Harrison et al. (2014) and Kay & Heer (2016), listed below are two important publications that have come to our attention that explicitly consider visual power in graphical inference tests of non-spatial data. Power estimates are arrived at empirically -- also through large scale crowd-sourced (AMT) experiments.

  • Hofmann, H., Follet, L., Majumder, M. & Cook, D. (2012) Graphical Tests for Power Comparison of Competing Designs, IEEE Transactions on Visualization & Computer Graphics, 18(12): 2441-2448, doi: 10.1109/TVCG.2012.230.

  • Majumder, M., Hofmann, H., Cook, D. (2013) Validation of Visual Statistical Inference, Applied to Linear Models, Journal of the American Statistical Association, 108(503): 942-956, doi: 10.1080/01621459.2013.808157