Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement global moran bivariate #93

Merged
merged 3 commits into from
Sep 7, 2022
Merged

implement global moran bivariate #93

merged 3 commits into from
Sep 7, 2022

Conversation

JosiahParry
Copy link
Contributor

@JosiahParry JosiahParry commented Sep 2, 2022

This PR accompanies #92 for the global counterpart.

This PR implements the global bivariate moran.

  • It adds the same R/utils-cond-permute.R helper functions.
  • Adds another function find_xj() to R/utils.R which creates a list of neighboring values of a vector x based on a neighbor list object
  • Implements the bivariate global moran using the folded pseudo p-value approach taken in pysal
── R CMD check results ──────────────────────────────────────── spdep 1.2-6 ────
Duration: 2m 1.4s

❯ checking package dependencies ... NOTE
  Packages suggested but not available for checking: 'spatialreg', 'RSpectra'

❯ checking installed package size ... NOTE
    installed size is  9.3Mb
    sub-directories of 1Mb or more:
      doc   6.7Mb
      etc   1.4Mb

0 errors ✔ | 0 warnings ✔ | 2 notes ✖

R CMD check succeeded

@rsbivand
Copy link
Member

rsbivand commented Sep 2, 2022

Hi @JosiahParry Could you please test this against lee.test() https://r-spatial.github.io/spdep/reference/lee.test.html and lee.mc() https://r-spatial.github.io/spdep/reference/lee.mc.html ? If they do the same, this is just about lee being less easy to find than moran_bv, isn't it?

@JosiahParry
Copy link
Contributor Author

Definitely! Will update you.

@JosiahParry
Copy link
Contributor Author

They seem to be very similar but not exactly. Take these two separate cases. With the Boston data we see a closer number than we do with Guerry. I haven't given Lee (2001) an exceptionally thorough read so I can't compare the two too well.

devtools::load_all()
#> ℹ Loading spdep
#> Loading required package: sp
#> Loading required package: spData
#> To access larger datasets in this package, install the spDataLarge
#> package with: `install.packages('spDataLarge',
#> repos='https://nowosad.github.io/drat/', type='source')`
#> Loading required package: sf
#> Linking to GEOS 3.9.1, GDAL 3.2.3, PROJ 7.2.1; sf_use_s2() is TRUE
data(boston, package = "spData")
x <- boston.c$CRIM
y <- boston.c$NOX
nb <- boston.soi
listw <- nb2listw(nb)

lee.test(x, y, listw)
#> 
#>  Lee's L statistic randomisation
#> 
#> data:  x ,  y 
#> weights: listw  
#> 
#> Lee's L statistic standard deviate = 16.025, p-value < 2.2e-16
#> alternative hypothesis: greater
#> sample estimates:
#> Lee's L statistic       Expectation          Variance 
#>      0.3993476438      0.1135035797      0.0003181587
moran_bv(x, y, listw)
#> $Ib
#> [1] 0.4082073
#> 
#> $p_sim
#> [1] 0.01

g <- Guerry::gfrance85
nb <- poly2nb(g)
listw <- nb2listw(nb)


x <- g$Crime_pers
y <- g$Literacy


lee.test(x, y, listw)
#> 
#>  Lee's L statistic randomisation
#> 
#> data:  x ,  y 
#> weights: listw  
#> 
#> Lee's L statistic standard deviate = 1.299, p-value = 0.09698
#> alternative hypothesis: greater
#> sample estimates:
#> Lee's L statistic       Expectation          Variance 
#>       0.045863776      -0.004572276       0.001507618
moran_bv(x, y, listw)
#> $Ib
#> [1] -0.02359931
#> 
#> $p_sim
#> [1] 0.26

Created on 2022-09-02 by the reprex package (v2.0.1)

@rsbivand
Copy link
Member

rsbivand commented Sep 2, 2022

And for boston:

> set.seed(1)
> lee.mc(x, y, listw, nsim=99)

	Monte-Carlo simulation of Lee's L

data:  x ,  y 
weights: listw  
number of simulations + 1: 100 

statistic = 0.39935, observed rank = 100, p-value = 0.01
alternative hypothesis: greater

For Guerry:

> set.seed(1)
> lee.mc(x, y, listw, nsim=99)

	Monte-Carlo simulation of Lee's L

data:  x ,  y 
weights: listw  
number of simulations + 1: 100 

statistic = 0.045864, observed rank = 95, p-value = 0.05
alternative hypothesis: greater

The existing lee.mc() does not (yet) accept alternative="two.sided", which it should.

@rsbivand rsbivand mentioned this pull request Sep 2, 2022
@JosiahParry
Copy link
Contributor Author

JosiahParry commented Sep 4, 2022

I'm struggling to identify the lineage of the Global Bivariate Moran.

Anselin defineds the formula in the Geoda Workbook Ch. 5b without a citation but includes Lee 2001 in references.

Lee, 2001 indicates that Wartenberg, 1985 is the earliest attempt at defining a bivariate measure of Moran's I. However, I cannot identify the formulation in the Wartenberg paper. Lee also refers to Griffith 1993 calling the bivariate moran the cross-MC.

A book chapter by the USDA
from ~1990s walks through the derivation as well.

A paper (DOI 10.1117/12.760796) cites Lee 2001 as the formulation for the bivariate moran with $I_{kl} = \frac{Z'_kWZ_l}{Z'_kZ_k}$

So, with all that, I am still unsure as to which paper should be cited as having the definitive definition of the global bivarite moran.

@JosiahParry
Copy link
Contributor Author

See https://stackoverflow.com/questions/45177590/map-of-bivariate-spatial-correlation-in-r-bivariate-lisa for discussion of availability of global moran vs lee in spdep

@rsbivand
Copy link
Member

rsbivand commented Sep 5, 2022

The SO questioner mixes up local and global. The global Lee test was written by Virgilio Gómex-Rubio and contributed in April 2014, so perhaps @becarioprecario would like to comment? Stéphane Dray @sdray was also working on the Wartenberg approach.

@sdray
Copy link

sdray commented Sep 5, 2022

Hi,

for me, the proposition is simply the Wartenberg's approach. However, the problem is that this approach can be hard to interpret when W is asymmetric (e.g., row standardized) as it looks at the link between one variable and the lagged version for the second one. Hence, moran_bv(x, y, listw) and moran_bv(y, x, listw) should provide different values when W is asymmetric. How to interpret the results of such test is, for me, an open question. One solution is to use 1/2 * (W + t(W)) instead of W to obtain a symmetric index. To be honest, for me, it is not clear what is expected for these bivariate measures and I think that some work should be done to evaluate the different approaches proposed in the literature.

@JosiahParry
Copy link
Contributor Author

Thank you, @sdray! I've updated the citation.

Given that there isn't any easily replicable data from that paper, I've gone ahead and compared this implementation to rgeoda. Note that rgeoda doesn't provide the global bivariate morans I but only the local. So the below example calculates the global from the local.

library(sf)
library(spdep)
library(rgeoda)

guerry_path <- system.file("extdata", "Guerry.shp", package = "rgeoda")
guerry <- st_read(guerry_path)

queen_w <- queen_weights(guerry)
lisa <- local_bimoran(queen_w, guerry[c('Crm_prs','Litercy')])
lms <- lisa_values(lisa)

nb <- poly2nb(guerry)
listw <- nb2listw(nb)

# calulate bivariate morans i
moran_bv(guerry$Crm_prs, guerry$Litercy, listw)
#> $Ib
#> [1] -0.02359931
#> 
#> $p_sim
#> [1] 0.29

# taking local bv moran and calculating global from rgeoda
sum(lisa$lisa_vals) / sum(scale(guerry$Crm_prs)^2)
#> [1] -0.02359931

Created on 2022-09-06 by the reprex package (v2.0.1)

@rsbivand, do you have a preference as to what the example should contain?

@rsbivand rsbivand merged commit 8766f49 into r-spatial:main Sep 7, 2022
@JosiahParry JosiahParry deleted the bv-moran-global branch September 7, 2022 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants