Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moments for Moran's I_i following Sokal 1998 #159

Merged
merged 11 commits into from Jan 20, 2021

Conversation

ljwolf
Copy link
Member

@ljwolf ljwolf commented Jan 19, 2021

Sokal (1998) provides a pretty solid argument about the differences between analytical moments derived from the "total" randomization null and the "conditional" randomization null for local Moran's I. The original article by @lanselin uses the "total" null, as does spdep.

This adds both the "total" and the "conditional" implementations to the Moran_Local class, in hopes of shedding a bit more light on the Bivand and Wong (2018) results about heteroskedasticity and our conditional permutation method. I also have a numba-fied version of the w_i(kh) quantity from the original paper.

Tests are in progress.

esda/moran.py Outdated Show resolved Hide resolved
esda/moran.py Outdated Show resolved Hide resolved
@ljwolf
Copy link
Member Author

ljwolf commented Jan 19, 2021

Some visualizations and reasoning are in this gist.

Basically, I think our computational scores are right on the money, so long as we use the right currency!

When we use the analytical versions of I for conditional permutation, our Z scores are nearly perfectly correlated. When we use the analytical versions based on the total randomization assumption, the Z-scores don't match.

@ljwolf
Copy link
Member Author

ljwolf commented Jan 19, 2021

That means we need to find out the version used in Bivand and Wong (2018) and identify whether that's the difference in their data.

@lanselin
Copy link
Member

lanselin commented Jan 19, 2021 via email

@ljwolf
Copy link
Member Author

ljwolf commented Jan 19, 2021

Oh, most definitely! I agree that computing the conditional randomization is probably best.

But, I'm chasing an answer for the speculation in Bivand and Wong (2018), that "local heteroscedasticity" might affect the results from conditional permutation (towards the very end)

The analytical versions here are two versions from Sokal (1998), one of which is supposed to be the true "analytical" expectation and variance for conditional randomization. My thought is that the "analytical" one that Bivand and Wong (2018) examine uses the "total," not "conditional" randomization hypothesis from Sokal.

Once we correct for that, our simulated Z(I_i) values are nearly perfectly correlated with the analytical ones that use conditional randomization.

@jeffcsauer
Copy link
Collaborator

jeffcsauer commented Jan 19, 2021

@ljwolf spent some time looking into this and may have a tentative answer (will still need to confirm). I loaded the Guerry data in both R and Python. I computed Local Moran's I on the Donatns variable with a row-standardized Queens weight matrix in both, and it seems that the spdep::localmoran() Var.Ii column matches your new esda.Moran_Local() VI column near exactly. This would suggest that spdep::localmoran() is using total randomization, correct?

I'm inferring this as your new code shows that VIc is for variance under the conditional randomization, whereas VI is for variance under the total randomization. Reprex below - let me know if I'm off base!

Edit: Also from the spdep documentation (see Details) "The variance of the local Moran statistic is taken from Sokal et al. (1998), equation 5 p. 334 and A4*, p. 351. By default, the implementation divides by n, not (n-1) in calculating the variance and higher moments."

Edit 2: These lines of localmoran() are relevant to the present discussion.

R

library(Guerry)
library(spdep)
data("gfrance85")
lw <- nb2listw(poly2nb(gfrance85))
localm_donatns <- localmoran(x = gfrance85$Donations, 
                             listw = lw, 
                             mlvar = FALSE)
head(localm_donatns)

           Ii        E.Ii    Var.Ii       Z.Ii  Pr(z > 0)
0  0.24243326 -0.01190476 0.2242134  0.5371313 0.29558845
1 -0.12456087 -0.01190476 0.1460390 -0.2947951 0.61592479
2  0.11347193 -0.01190476 0.1460390  0.3280819 0.37142486
3  0.56550399 -0.01190476 0.2242134  1.2194179 0.11134282
4 -0.03542538 -0.01190476 0.3023878 -0.0427727 0.51705864
5  0.59003577 -0.01190476 0.1237035  1.7114436 0.04349963

Python

import libpysal
import geopandas as gpd
guerry = libpysal.examples.load_example('Guerry')
guerry_ds = gpd.read_file(guerry.get_path('Guerry.shp'))
w = libpysal.weights.Queen.from_dataframe(guerry_ds)
# Run Levi's updatedMoran_Local(...)
ml_output = Moran_Local(guerry_ds['Donatns'], w, transformation='r', permutations=9999)

ml_output.Is[0:5] # Matches R exactly localmoran() output in column 1 exactly
array([ 0.24243326, -0.12456087,  0.11347193,  0.56550399, -0.03542538])

ml_output.VI[0:5] # Variance under TOTAL randomization - matches R localmoran() output in column 3 (Var.Ii) near exactly
array([0.24369033, 0.15834854, 0.15833713, 0.24367567, 0.32903211])

ml_output.VIc[0:5] # Variance under CONDITIONAL randomization - does not seem to match R localmoran() output
array([0.02752142, 0.03207976, 0.12133663, 0.16476215, 0.00080231])

@ljwolf
Copy link
Member Author

ljwolf commented Jan 20, 2021

Excellent, yes, perfect @jeffcsauer! We'll chat later today about the writeup, and I'll work to get this merged.

@codecov-io
Copy link

codecov-io commented Jan 20, 2021

Codecov Report

Merging #159 (3b0bcf9) into master (5e7e38d) will increase coverage by 0.34%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #159      +/-   ##
==========================================
+ Coverage   80.10%   80.45%   +0.34%     
==========================================
  Files          37       37              
  Lines        3750     3816      +66     
==========================================
+ Hits         3004     3070      +66     
  Misses        746      746              
Impacted Files Coverage Δ
esda/moran.py 78.20% <100.00%> (+3.00%) ⬆️
esda/tests/test_moran.py 76.47% <100.00%> (+2.74%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e7e38d...3b0bcf9. Read the comment docs.

@ljwolf ljwolf merged commit ed8b575 into pysal:master Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants