Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ml_lag example failing? #33

Closed
knaaptime opened this issue Feb 20, 2020 · 17 comments · Fixed by #34
Closed

ml_lag example failing? #33

knaaptime opened this issue Feb 20, 2020 · 17 comments · Fixed by #34

Comments

@knaaptime
Copy link
Member

knaaptime commented Feb 20, 2020

so here's something weird. In a brand new environment, (numpy 1.18.1, scipy 1.4.1, spreg 1.0.4) i have all tests passing locally. But if I try to recreate one of the examples:

import libpysal
import numpy as np
from spreg import ML_Lag

db =  libpysal.io.open(libpysal.examples.get_path("baltim.dbf"),'r')
ds_name = "baltim.dbf"
y_name = "PRICE"
y = np.array(db.by_col(y_name)).T
y.shape = (len(y),1)
x_names = ["NROOM","AGE","SQFT"]
x = np.array([db.by_col(var) for var in x_names]).T
ww = libpysal.io.open(libpysal.examples.get_path("baltim_q.gal"))
w = ww.read()
ww.close()
w_name = "baltim_q.gal"
w.transform = 'r'

reg = ML_Lag(y, x, w=w,
                     name_y=y_name, name_x=x_names,
                     name_w=w_name,name_ds=ds_name)

it will fail when calling scipy.stats.pearsonr

/Users/knaaptime/anaconda3/envs/spreg/lib/python3.7/site-packages/scipy/optimize/_minimize.py:770: RuntimeWarning: Method 'bounded' does not support relative tolerance in x; defaulting to absolute tolerance.
  "defaulting to absolute tolerance.", RuntimeWarning)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-c6233bd63bd0> in <module>
----> 1 mllag = ML_Lag(y,x,w,name_y=y_name,name_x=x_names, name_w=w_name,name_ds=ds_name)

~/anaconda3/envs/spreg/lib/python3.7/site-packages/spreg/ml_lag.py in __init__(self, y, x, w, method, epsilon, spat_diag, vm, name_y, name_x, name_w, name_ds)
    561         self.aic = DIAG.akaike(reg=self)
    562         self.schwarz = DIAG.schwarz(reg=self)
--> 563         SUMMARY.ML_Lag(reg=self, w=w, vm=vm, spat_diag=spat_diag)
    564 
    565 def lag_c_loglik(rho, n, e0, e1, W):

~/anaconda3/envs/spreg/lib/python3.7/site-packages/spreg/summary_output.py in ML_Lag(reg, w, vm, spat_diag, regimes)
    195     reg.__summary = {}
    196     # compute diagnostics and organize summary output
--> 197     beta_diag_lag(reg, robust=None, error=False)
    198     reg.__summary['summary_r2'] += "%-20s:%12.3f                %-22s:%12.3f\n" % (
    199         'Sigma-square ML', reg.sig2, 'Log likelihood', reg.logll)

~/anaconda3/envs/spreg/lib/python3.7/site-packages/spreg/summary_output.py in beta_diag_lag(reg, robust, error)
    723     reg.std_err = diagnostics.se_betas(reg)
    724     reg.z_stat = diagnostics.t_stat(reg, z_stat=True)
--> 725     reg.pr2 = diagnostics_tsls.pr2_aspatial(reg)
    726     # organize summary output
    727     reg.__summary['summary_std_err'] = robust

~/anaconda3/envs/spreg/lib/python3.7/site-packages/spreg/diagnostics_tsls.py in pr2_aspatial(tslsreg)
    215     y = tslsreg.y
    216     predy = tslsreg.predy
--> 217     pr = pearsonr(y, predy)[0]
    218     pr2_result = float(pr ** 2)
    219     return pr2_result

~/anaconda3/envs/spreg/lib/python3.7/site-packages/scipy/stats/stats.py in pearsonr(x, y)
   3517         return dtype(np.sign(x[1] - x[0])*np.sign(y[1] - y[0])), 1.0
   3518 
-> 3519     xmean = x.mean(dtype=dtype)
   3520     ymean = y.mean(dtype=dtype)
   3521 

~/anaconda3/envs/spreg/lib/python3.7/site-packages/numpy/core/_methods.py in _mean(a, axis, dtype, out, keepdims)
    149             is_float16_result = True
    150 
--> 151     ret = umr_sum(arr, axis, dtype, out, keepdims)
    152     if isinstance(ret, mu.ndarray):
    153         ret = um.true_divide(

TypeError: No loop matching the specified signature and casting was found for ufunc add

from my own testing, that error usually comes from passing pearsonr arrays that are the wrong shape, but everything is correct in the spreg code and hansn't been touched--and as i said before tests are passing. Can others reproduce this? It happens for me every time

@knaaptime
Copy link
Member Author

i should add the example is copied directly from this test and that by "every time" i mean in new environments on 3 different machines (linux/osx)

@ljwolf
Copy link
Member

ljwolf commented Feb 20, 2020

I can replicate locally

@pedrovma
Copy link
Member

I can look into this today or tomorrow.

Just out of curiosity, why are you using ML_Lag instead of GM_Lag? We didn't pay much attention to the ML code because the only situation when GM is not preferable to it is when data is small and happens to be normally distributed, which is quite a very rare combination of events. =P

@knaaptime
Copy link
Member Author

I wasn’t actually after Ml_lag specifically, just following up on https://gist.github.com/knaaptime/9b56bf676f116472efc20cde03e6c4ec to see if I could estimate any model (and it turns out no, almost all the model classes use pearsonr so this bug affects everything I tried except ols)

@ljwolf
Copy link
Member

ljwolf commented Feb 20, 2020

@pedrovma it doesn't matter; it happens with GM_Lag, GM_Error, GM_Combo, ML_Error.... anywhere where we use diagnostics_tsls.pr2_aspatial(reg) ☹️

@knaaptime, to really reproduce the issue, use the BaseML_Lag classes

from spreg import BaseML_Lag
basemodel = BaseML_Lag(y, x, w=w)

from scipy.stats import pearsonr
pearsonr(basemodel.y, basemodel.predy)

If you flatten the array, it works:

pearsonr(basemodel.y.flatten(), basemodel.predy.flatten())

It looks like scipy's started getting strict about column vectors versus flat vectors. This is indicated in the documentation and appears to have been the case for a while, but I'm not sure why the behavior finally changed.

@pedrovma
Copy link
Member

I've had other issues with the column vector rule every now and then... We do require in spreg that all of our column vectors have two explicit dimensions. This was basically done centuries ago to make sure no silly np.dot or sparse multiply would occur. Given the way spicy is now and that the output of a pandas column vectors also has one dimension only, do you think it may be the case we rethink all this 2 dimensions requirement?

@knaaptime
Copy link
Member Author

I figured it was something like that. But why in the world are tests passing? and dont these lines handle the reshaping like scipy needs?

@pedrovma
Copy link
Member

I didn't mean to close this, just merge =(
I really would like your inputs regarding rethinking the 2 dimensions requirement for all vector arrays. Do you see any potential issue with dropping this requirement?

@knaaptime
Copy link
Member Author

it might be useful to relax the 2dim requirement. My impression is most people will have their data in a (geo)DataFrames so the most convenient interface is one that either (a) accepts dfs directly, or (b) accepts data that's easily converted from a df. A pandas series is a 1d array, so allowing 1d for y would remove an extra conversion step. I can't foresee any times that change would be problematic, but my knowledge here is admittedly pretty shallow.

alternatively, we could leave as is and handle some of the conversion in something like the patsy dispatcher (the simple version of which is now working great after the flatten fix)

@knaaptime
Copy link
Member Author

related, do you think that flatten fix is worthy of a quick bugfix release? otherwise i think the currently released version is unusable?

@ljwolf
Copy link
Member

ljwolf commented Feb 23, 2020

I think a point release makes sense. I'll try to cut one today or tomorrow. If another maintainer can do it faster, have at it!

Also, I think forcing n,1 is fine, but we should use the .squeeze() method any time we call into a function like the rsquared

@pedrovma
Copy link
Member

I've submitted an "in-between" approach to the n,1 issue: the verification class no longer requires a n,1 array, but it will try to force n,1 to any single dimension array. This way we have an easier life for [geo]pandas users without changing our main code.

I have no idea how to make a point release, so if you could do it, I would really appreciate it. =|

@knaaptime
Copy link
Member Author

awesome. I don't think i have maintainer rights on pypi, so I cant do the release, but if you want to add me I'll be happy to do it

@knaaptime
Copy link
Member Author

interesting note while i prepare this: v1.0.4 is the latest released on pypi and conda, but we're on 1.1.0 (which is currently on RTD). This release will bump us to 1.1.1

@pedrovma
Copy link
Member

Wow. The latest changelog in git is also 1.0.4 (https://github.com/pysal/spreg/blob/master/changelog_1.0.4.md)
Maybe RTD was too hasty?

@knaaptime
Copy link
Member Author

1.1.0 was released on github it just didnt make it to pypi

@knaaptime
Copy link
Member Author

everything required by a release is scripttable

  • bumping the version (bumper or bump2version)
  • running the changelog notebook (papermill using the current date as parameter)
  • pushing a github release (hub can't do it yet, but gh can)
  • pushing a pypi release (twine or bumper/bump2version)

so pretty sure I can wrap that logic up in a makefile. will try some expiriments this weekend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants