New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ML Lag indexing error on optimization result #587
Comments
Would it be possible for you to share the data and specification you are running? That would help us to debug this. |
I'm seeing this as well, though I'm using GeoPandas (rather than the native pysal shapefile reading). I do some cleanup and quick calculation of some things that aren't in the shapefile, but I don't see why that would matter, the input formats are right, to ML_Lag. Specifically, I'm running:
Unfortunately, I'm not really in a position to share my shapefile, as our results haven't been published yet. |
Hmm... I wondered if this had to do with some weirdness with input array shaping, so I tried what I could with your code and could not replicate.
If you're able, can you post the exact traceback? |
anonymized the paths a bit, but otherwise this is the same. I had cleaned up the original to remove the "method='FULL'" piece, because this error happens either way, so ignore that piece in the top-level call.
|
Cool, thanks, that helps! |
Great! I'd love a bug-fix release, if this actually lets you solve the problem! |
I notice in your traceback, you have If you have relatively large data, this results in computing Do you get this error if you use |
Yes, in all three cases. My dataset isn't huge, I have ~1,000 census tracts, so it may be within the range of what you're talking about. That said, given that all three values produce the same error, my guess is that's not the issue? |
Hmm... Okay. Bummer. I'll keep pushing on this, but I need to be able to replicate the error to solve the problem. |
I'm not sure what's going on, I think it's related to some joining I was doing between dataframes, but I started this, with a different way of accessing the data, and I cannot replicate either. From my perspective, I solved my problem, but I'm not clear on what is causing this. |
Good, I'm glad you fixed it! My search is based on @ratishm1's optimize result. If The only way I see for those all to |
Can't reproduce. |
Hi all, I ran into this again with a different dataset. I wanted to provide this as a reproduction of the bug. See reproducing_bug_587.zip I've included the requisite datafiles and a jupyter notebook (which is expecting Python3) that shows the error above. |
Awesome, now we're cooking; I can replicate using the notebook provided. Will hopefully identify & patch if necessary. |
For a import pysal as ps
weights = ps.queen_from_shapefile('shapefile_for_weights.shp', idVariable='gid')
yellow = ps.pdio.read_files('yellow.shp')
y = yellow[['total_spen']].values
X = yellow[['num_rides']].values
ps.spreg.ML_Lag(y, X, weights) |
Ok, I think this is the cause: 82]> np.isnan(X).sum() > 0
True
83]> np.isnan(y).sum() > 0
True
84]> yellow[np.isnan(X)]
#omitted, but will show obs. w/ nan fields Currently, there is no masking of |
Sorry, to be clear: you're saying this is known/intended behavior, because there is no masking of I upgraded to 1.11.1, and I'm still getting the error, but if you're saying this expected behavior because the data has |
I believe this is known/intended behavior, since we don't do any masking anywhere. This also happens in other econometrics packages, like statsmodels: 1]> import statsmodels.api as sm
2]> import pysal as ps
3]> data = ps.pdio.read_files(ps.examples.get_path('columbus.shp))
4]> y = data[['HOVAL']].values
5]> X = data[['CRIME', 'INC']].values
6]> y[5] = X[5] = np.nan
7]> sm.OLS(y,X).fit().summary()
[...] will yield a summary output entirely of In light of our other pre-estimation checks, it might make sense to add a check for |
@jtsmn Yes, Also note that the @ljwolf +1 to add a check for |
This assumption is not particularly clear in any place that I've found (either in the overview documentation or in the API documentation itself). Thank you for clarifying though. I agree that at least pre-estimation checks and clarifying the necessity would be good. |
When running ml_lag, I get the following error:
IndexError: invalid index to scalar variable
. This is from the code:self.rho = res.x[0][0]
. I went into theml_lag.py
file in spreg and printed out res, which is the OptimizeResult object from scipy. This was the following output:X should be a solution array, but here it is just a scalar value. I changed
rho = res.x[0][0]
torho=res.x
; however, that gave me problems elsewhere.The text was updated successfully, but these errors were encountered: