New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Moments for Moran's I_i following Sokal 1998 #159
Conversation
Some visualizations and reasoning are in this gist. Basically, I think our computational scores are right on the money, so long as we use the right currency! When we use the analytical versions of I for conditional permutation, our Z scores are nearly perfectly correlated. When we use the analytical versions based on the total randomization assumption, the Z-scores don't match. |
That means we need to find out the version used in Bivand and Wong (2018) and identify whether that's the difference in their data. |
I would generally recommend against using analytical derivations. These are large sample results,
and with local statistics there is no growing sample if the number of neighbors is kept small. Unlike
infill asymptotics, which are typically the basis for the properties of “local” estimators, we have
expanding domain asymptotics. So, in practice, the large sample properties do not kick in.
Not sure I understand the distinction between total and conditional. In the implementation of
conditional randomization in GeoDa, the value at i is kept out of the sample, and k (= number of
neighbors) other values are sampled from the rest.
… On Jan 19, 2021, at 12:06 PM, Levi John Wolf ***@***.***> wrote:
Some visualizations and reasoning is here <https://gist.github.com/ljwolf/649cdc99778d9cb018748e8d23571eaf>
Basically, I think our computational scores are right on the money, so long as we use the right currency!
When we use the analytical versions of I for conditional permutation, our Z scores are nearly perfectly correlated. When we use the analytical versions based on the total randomization assumption, the Z-scores don't match.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#159 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA6STH3U7EDK65HT6HFLUITS2XC2RANCNFSM4WI7ARNQ>.
|
Oh, most definitely! I agree that computing the conditional randomization is probably best. But, I'm chasing an answer for the speculation in Bivand and Wong (2018), that "local heteroscedasticity" might affect the results from conditional permutation (towards the very end) The analytical versions here are two versions from Sokal (1998), one of which is supposed to be the true "analytical" expectation and variance for conditional randomization. My thought is that the "analytical" one that Bivand and Wong (2018) examine uses the "total," not "conditional" randomization hypothesis from Sokal. Once we correct for that, our simulated Z(I_i) values are nearly perfectly correlated with the analytical ones that use conditional randomization. |
@ljwolf spent some time looking into this and may have a tentative answer (will still need to confirm). I loaded the Guerry data in both R and Python. I computed Local Moran's I on the I'm inferring this as your new code shows that Edit: Also from the Edit 2: These lines of R library(Guerry)
library(spdep)
data("gfrance85")
lw <- nb2listw(poly2nb(gfrance85))
localm_donatns <- localmoran(x = gfrance85$Donations,
listw = lw,
mlvar = FALSE)
head(localm_donatns)
Ii E.Ii Var.Ii Z.Ii Pr(z > 0)
0 0.24243326 -0.01190476 0.2242134 0.5371313 0.29558845
1 -0.12456087 -0.01190476 0.1460390 -0.2947951 0.61592479
2 0.11347193 -0.01190476 0.1460390 0.3280819 0.37142486
3 0.56550399 -0.01190476 0.2242134 1.2194179 0.11134282
4 -0.03542538 -0.01190476 0.3023878 -0.0427727 0.51705864
5 0.59003577 -0.01190476 0.1237035 1.7114436 0.04349963 Python import libpysal
import geopandas as gpd
guerry = libpysal.examples.load_example('Guerry')
guerry_ds = gpd.read_file(guerry.get_path('Guerry.shp'))
w = libpysal.weights.Queen.from_dataframe(guerry_ds)
# Run Levi's updatedMoran_Local(...)
ml_output = Moran_Local(guerry_ds['Donatns'], w, transformation='r', permutations=9999)
ml_output.Is[0:5] # Matches R exactly localmoran() output in column 1 exactly
array([ 0.24243326, -0.12456087, 0.11347193, 0.56550399, -0.03542538])
ml_output.VI[0:5] # Variance under TOTAL randomization - matches R localmoran() output in column 3 (Var.Ii) near exactly
array([0.24369033, 0.15834854, 0.15833713, 0.24367567, 0.32903211])
ml_output.VIc[0:5] # Variance under CONDITIONAL randomization - does not seem to match R localmoran() output
array([0.02752142, 0.03207976, 0.12133663, 0.16476215, 0.00080231]) |
Excellent, yes, perfect @jeffcsauer! We'll chat later today about the writeup, and I'll work to get this merged. |
Codecov Report
@@ Coverage Diff @@
## master #159 +/- ##
==========================================
+ Coverage 80.10% 80.45% +0.34%
==========================================
Files 37 37
Lines 3750 3816 +66
==========================================
+ Hits 3004 3070 +66
Misses 746 746
Continue to review full report at Codecov.
|
Sokal (1998) provides a pretty solid argument about the differences between analytical moments derived from the "total" randomization null and the "conditional" randomization null for local Moran's I. The original article by @lanselin uses the "total" null, as does spdep.
This adds both the "total" and the "conditional" implementations to the
Moran_Local
class, in hopes of shedding a bit more light on the Bivand and Wong (2018) results about heteroskedasticity and our conditional permutation method. I also have a numba-fied version of thew_i(kh)
quantity from the original paper.Tests are in progress.