Moments for Moran's I_i following Sokal 1998 #159

ljwolf · 2021-01-19T15:47:32Z

Sokal (1998) provides a pretty solid argument about the differences between analytical moments derived from the "total" randomization null and the "conditional" randomization null for local Moran's I. The original article by @lanselin uses the "total" null, as does spdep.

This adds both the "total" and the "conditional" implementations to the Moran_Local class, in hopes of shedding a bit more light on the Bivand and Wong (2018) results about heteroskedasticity and our conditional permutation method. I also have a numba-fied version of the w_i(kh) quantity from the original paper.

Tests are in progress.

esda/moran.py

ljwolf · 2021-01-19T18:06:15Z

Some visualizations and reasoning are in this gist.

Basically, I think our computational scores are right on the money, so long as we use the right currency!

When we use the analytical versions of I for conditional permutation, our Z scores are nearly perfectly correlated. When we use the analytical versions based on the total randomization assumption, the Z-scores don't match.

ljwolf · 2021-01-19T18:10:18Z

That means we need to find out the version used in Bivand and Wong (2018) and identify whether that's the difference in their data.

lanselin · 2021-01-19T18:52:05Z

I would generally recommend against using analytical derivations. These are large sample results, and with local statistics there is no growing sample if the number of neighbors is kept small. Unlike infill asymptotics, which are typically the basis for the properties of “local” estimators, we have expanding domain asymptotics. So, in practice, the large sample properties do not kick in. Not sure I understand the distinction between total and conditional. In the implementation of conditional randomization in GeoDa, the value at i is kept out of the sample, and k (= number of neighbors) other values are sampled from the rest.

…

On Jan 19, 2021, at 12:06 PM, Levi John Wolf ***@***.***> wrote: Some visualizations and reasoning is here <https://gist.github.com/ljwolf/649cdc99778d9cb018748e8d23571eaf> Basically, I think our computational scores are right on the money, so long as we use the right currency! When we use the analytical versions of I for conditional permutation, our Z scores are nearly perfectly correlated. When we use the analytical versions based on the total randomization assumption, the Z-scores don't match. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#159 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA6STH3U7EDK65HT6HFLUITS2XC2RANCNFSM4WI7ARNQ>.

ljwolf · 2021-01-19T21:28:21Z

Oh, most definitely! I agree that computing the conditional randomization is probably best.

But, I'm chasing an answer for the speculation in Bivand and Wong (2018), that "local heteroscedasticity" might affect the results from conditional permutation (towards the very end)

The analytical versions here are two versions from Sokal (1998), one of which is supposed to be the true "analytical" expectation and variance for conditional randomization. My thought is that the "analytical" one that Bivand and Wong (2018) examine uses the "total," not "conditional" randomization hypothesis from Sokal.

Once we correct for that, our simulated Z(I_i) values are nearly perfectly correlated with the analytical ones that use conditional randomization.

jeffcsauer · 2021-01-19T23:30:25Z

@ljwolf spent some time looking into this and may have a tentative answer (will still need to confirm). I loaded the Guerry data in both R and Python. I computed Local Moran's I on the Donatns variable with a row-standardized Queens weight matrix in both, and it seems that the spdep::localmoran() Var.Ii column matches your new esda.Moran_Local() VI column near exactly. This would suggest that spdep::localmoran() is using total randomization, correct?

I'm inferring this as your new code shows that VIc is for variance under the conditional randomization, whereas VI is for variance under the total randomization. Reprex below - let me know if I'm off base!

Edit: Also from the spdep documentation (see Details) "The variance of the local Moran statistic is taken from Sokal et al. (1998), equation 5 p. 334 and A4*, p. 351. By default, the implementation divides by n, not (n-1) in calculating the variance and higher moments."

Edit 2: These lines of localmoran() are relevant to the present discussion.

R

library(Guerry)
library(spdep)
data("gfrance85")
lw <- nb2listw(poly2nb(gfrance85))
localm_donatns <- localmoran(x = gfrance85$Donations, 
                             listw = lw, 
                             mlvar = FALSE)
head(localm_donatns)

           Ii        E.Ii    Var.Ii       Z.Ii  Pr(z > 0)
0  0.24243326 -0.01190476 0.2242134  0.5371313 0.29558845
1 -0.12456087 -0.01190476 0.1460390 -0.2947951 0.61592479
2  0.11347193 -0.01190476 0.1460390  0.3280819 0.37142486
3  0.56550399 -0.01190476 0.2242134  1.2194179 0.11134282
4 -0.03542538 -0.01190476 0.3023878 -0.0427727 0.51705864
5  0.59003577 -0.01190476 0.1237035  1.7114436 0.04349963

Python

import libpysal
import geopandas as gpd
guerry = libpysal.examples.load_example('Guerry')
guerry_ds = gpd.read_file(guerry.get_path('Guerry.shp'))
w = libpysal.weights.Queen.from_dataframe(guerry_ds)
# Run Levi's updatedMoran_Local(...)
ml_output = Moran_Local(guerry_ds['Donatns'], w, transformation='r', permutations=9999)

ml_output.Is[0:5] # Matches R exactly localmoran() output in column 1 exactly
array([ 0.24243326, -0.12456087,  0.11347193,  0.56550399, -0.03542538])

ml_output.VI[0:5] # Variance under TOTAL randomization - matches R localmoran() output in column 3 (Var.Ii) near exactly
array([0.24369033, 0.15834854, 0.15833713, 0.24367567, 0.32903211])

ml_output.VIc[0:5] # Variance under CONDITIONAL randomization - does not seem to match R localmoran() output
array([0.02752142, 0.03207976, 0.12133663, 0.16476215, 0.00080231])

ljwolf · 2021-01-20T09:25:18Z

Excellent, yes, perfect @jeffcsauer! We'll chat later today about the writeup, and I'll work to get this merged.

codecov-io · 2021-01-20T16:35:30Z

Codecov Report

Merging #159 (3b0bcf9) into master (5e7e38d) will increase coverage by 0.34%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #159      +/-   ##
==========================================
+ Coverage   80.10%   80.45%   +0.34%     
==========================================
  Files          37       37              
  Lines        3750     3816      +66     
==========================================
+ Hits         3004     3070      +66     
  Misses        746      746

Impacted Files	Coverage Δ
esda/moran.py	`78.20% <100.00%> (+3.00%)`	⬆️
esda/tests/test_moran.py	`76.47% <100.00%> (+2.74%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e7e38d...3b0bcf9. Read the comment docs.

start implementation of conditional and total moments for local i

43e90b9

ljwolf requested review from jeffcsauer and sjsrey January 19, 2021 15:50

ljwolf added 5 commits January 19, 2021 15:58

add simplefilter to imports

c11969a

use np for numpy to match style

4efd912

assume that division is as written

0b4255f

be charitable in the interpretation of appendix A

1a10f50

update docstrings for moment function

da70e3d

ljwolf commented Jan 19, 2021

View reviewed changes

esda/moran.py Outdated Show resolved Hide resolved

esda/moran.py Outdated Show resolved Hide resolved

remove the unnecessary total added earlier

edf2011

ljwolf added 3 commits January 20, 2021 16:01

use np for numpjy

91eeb7c

finalize tests on moran moments

41d9bf9

update references with Sokal 1998

4ce70d8

eliminate weasel words

3b0bcf9

ljwolf merged commit ed8b575 into pysal:master Jan 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moments for Moran's I_i following Sokal 1998 #159

Moments for Moran's I_i following Sokal 1998 #159

ljwolf commented Jan 19, 2021

ljwolf commented Jan 19, 2021 •

edited

ljwolf commented Jan 19, 2021

lanselin commented Jan 19, 2021 via email

ljwolf commented Jan 19, 2021 •

edited

jeffcsauer commented Jan 19, 2021 •

edited

ljwolf commented Jan 20, 2021

codecov-io commented Jan 20, 2021 •

edited

Moments for Moran's I_i following Sokal 1998 #159

Moments for Moran's I_i following Sokal 1998 #159

Conversation

ljwolf commented Jan 19, 2021

ljwolf commented Jan 19, 2021 • edited

ljwolf commented Jan 19, 2021

lanselin commented Jan 19, 2021 via email

ljwolf commented Jan 19, 2021 • edited

jeffcsauer commented Jan 19, 2021 • edited

ljwolf commented Jan 20, 2021

codecov-io commented Jan 20, 2021 • edited

Codecov Report

ljwolf commented Jan 19, 2021 •

edited

ljwolf commented Jan 19, 2021 •

edited

jeffcsauer commented Jan 19, 2021 •

edited

codecov-io commented Jan 20, 2021 •

edited