ml_lag example failing? #33

knaaptime · 2020-02-20T02:05:43Z

so here's something weird. In a brand new environment, (numpy 1.18.1, scipy 1.4.1, spreg 1.0.4) i have all tests passing locally. But if I try to recreate one of the examples:

import libpysal
import numpy as np
from spreg import ML_Lag

db =  libpysal.io.open(libpysal.examples.get_path("baltim.dbf"),'r')
ds_name = "baltim.dbf"
y_name = "PRICE"
y = np.array(db.by_col(y_name)).T
y.shape = (len(y),1)
x_names = ["NROOM","AGE","SQFT"]
x = np.array([db.by_col(var) for var in x_names]).T
ww = libpysal.io.open(libpysal.examples.get_path("baltim_q.gal"))
w = ww.read()
ww.close()
w_name = "baltim_q.gal"
w.transform = 'r'

reg = ML_Lag(y, x, w=w,
                     name_y=y_name, name_x=x_names,
                     name_w=w_name,name_ds=ds_name)

it will fail when calling scipy.stats.pearsonr

/Users/knaaptime/anaconda3/envs/spreg/lib/python3.7/site-packages/scipy/optimize/_minimize.py:770: RuntimeWarning: Method 'bounded' does not support relative tolerance in x; defaulting to absolute tolerance.
  "defaulting to absolute tolerance.", RuntimeWarning)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-c6233bd63bd0> in <module>
----> 1 mllag = ML_Lag(y,x,w,name_y=y_name,name_x=x_names, name_w=w_name,name_ds=ds_name)

~/anaconda3/envs/spreg/lib/python3.7/site-packages/spreg/ml_lag.py in __init__(self, y, x, w, method, epsilon, spat_diag, vm, name_y, name_x, name_w, name_ds)
    561         self.aic = DIAG.akaike(reg=self)
    562         self.schwarz = DIAG.schwarz(reg=self)
--> 563         SUMMARY.ML_Lag(reg=self, w=w, vm=vm, spat_diag=spat_diag)
    564 
    565 def lag_c_loglik(rho, n, e0, e1, W):

~/anaconda3/envs/spreg/lib/python3.7/site-packages/spreg/summary_output.py in ML_Lag(reg, w, vm, spat_diag, regimes)
    195     reg.__summary = {}
    196     # compute diagnostics and organize summary output
--> 197     beta_diag_lag(reg, robust=None, error=False)
    198     reg.__summary['summary_r2'] += "%-20s:%12.3f                %-22s:%12.3f\n" % (
    199         'Sigma-square ML', reg.sig2, 'Log likelihood', reg.logll)

~/anaconda3/envs/spreg/lib/python3.7/site-packages/spreg/summary_output.py in beta_diag_lag(reg, robust, error)
    723     reg.std_err = diagnostics.se_betas(reg)
    724     reg.z_stat = diagnostics.t_stat(reg, z_stat=True)
--> 725     reg.pr2 = diagnostics_tsls.pr2_aspatial(reg)
    726     # organize summary output
    727     reg.__summary['summary_std_err'] = robust

~/anaconda3/envs/spreg/lib/python3.7/site-packages/spreg/diagnostics_tsls.py in pr2_aspatial(tslsreg)
    215     y = tslsreg.y
    216     predy = tslsreg.predy
--> 217     pr = pearsonr(y, predy)[0]
    218     pr2_result = float(pr ** 2)
    219     return pr2_result

~/anaconda3/envs/spreg/lib/python3.7/site-packages/scipy/stats/stats.py in pearsonr(x, y)
   3517         return dtype(np.sign(x[1] - x[0])*np.sign(y[1] - y[0])), 1.0
   3518 
-> 3519     xmean = x.mean(dtype=dtype)
   3520     ymean = y.mean(dtype=dtype)
   3521 

~/anaconda3/envs/spreg/lib/python3.7/site-packages/numpy/core/_methods.py in _mean(a, axis, dtype, out, keepdims)
    149             is_float16_result = True
    150 
--> 151     ret = umr_sum(arr, axis, dtype, out, keepdims)
    152     if isinstance(ret, mu.ndarray):
    153         ret = um.true_divide(

TypeError: No loop matching the specified signature and casting was found for ufunc add

from my own testing, that error usually comes from passing pearsonr arrays that are the wrong shape, but everything is correct in the spreg code and hansn't been touched--and as i said before tests are passing. Can others reproduce this? It happens for me every time

The text was updated successfully, but these errors were encountered:

knaaptime · 2020-02-20T07:52:27Z

i should add the example is copied directly from this test and that by "every time" i mean in new environments on 3 different machines (linux/osx)

ljwolf · 2020-02-20T12:54:53Z

I can replicate locally

pedrovma · 2020-02-20T13:00:58Z

I can look into this today or tomorrow.

Just out of curiosity, why are you using ML_Lag instead of GM_Lag? We didn't pay much attention to the ML code because the only situation when GM is not preferable to it is when data is small and happens to be normally distributed, which is quite a very rare combination of events. =P

knaaptime · 2020-02-20T13:55:57Z

I wasn’t actually after Ml_lag specifically, just following up on https://gist.github.com/knaaptime/9b56bf676f116472efc20cde03e6c4ec to see if I could estimate any model (and it turns out no, almost all the model classes use pearsonr so this bug affects everything I tried except ols)

ljwolf · 2020-02-20T13:56:10Z

@pedrovma it doesn't matter; it happens with GM_Lag, GM_Error, GM_Combo, ML_Error.... anywhere where we use diagnostics_tsls.pr2_aspatial(reg) ☹️

@knaaptime, to really reproduce the issue, use the BaseML_Lag classes

from spreg import BaseML_Lag
basemodel = BaseML_Lag(y, x, w=w)

from scipy.stats import pearsonr
pearsonr(basemodel.y, basemodel.predy)

If you flatten the array, it works:

pearsonr(basemodel.y.flatten(), basemodel.predy.flatten())

It looks like scipy's started getting strict about column vectors versus flat vectors. This is indicated in the documentation and appears to have been the case for a while, but I'm not sure why the behavior finally changed.

pedrovma · 2020-02-20T14:37:38Z

I've had other issues with the column vector rule every now and then... We do require in spreg that all of our column vectors have two explicit dimensions. This was basically done centuries ago to make sure no silly np.dot or sparse multiply would occur. Given the way spicy is now and that the output of a pandas column vectors also has one dimension only, do you think it may be the case we rethink all this 2 dimensions requirement?

knaaptime · 2020-02-20T17:38:33Z

I figured it was something like that. But why in the world are tests passing? and dont these lines handle the reshaping like scipy needs?

pedrovma · 2020-02-21T14:48:04Z

I didn't mean to close this, just merge =(
I really would like your inputs regarding rethinking the 2 dimensions requirement for all vector arrays. Do you see any potential issue with dropping this requirement?

knaaptime · 2020-02-21T19:04:31Z

it might be useful to relax the 2dim requirement. My impression is most people will have their data in a (geo)DataFrames so the most convenient interface is one that either (a) accepts dfs directly, or (b) accepts data that's easily converted from a df. A pandas series is a 1d array, so allowing 1d for y would remove an extra conversion step. I can't foresee any times that change would be problematic, but my knowledge here is admittedly pretty shallow.

alternatively, we could leave as is and handle some of the conversion in something like the patsy dispatcher (the simple version of which is now working great after the flatten fix)

knaaptime · 2020-02-21T20:45:19Z

related, do you think that flatten fix is worthy of a quick bugfix release? otherwise i think the currently released version is unusable?

ljwolf · 2020-02-23T08:35:41Z

I think a point release makes sense. I'll try to cut one today or tomorrow. If another maintainer can do it faster, have at it!

Also, I think forcing n,1 is fine, but we should use the .squeeze() method any time we call into a function like the rsquared

pedrovma · 2020-02-24T19:27:56Z

I've submitted an "in-between" approach to the n,1 issue: the verification class no longer requires a n,1 array, but it will try to force n,1 to any single dimension array. This way we have an easier life for [geo]pandas users without changing our main code.

I have no idea how to make a point release, so if you could do it, I would really appreciate it. =|

knaaptime · 2020-02-24T19:46:04Z

awesome. I don't think i have maintainer rights on pypi, so I cant do the release, but if you want to add me I'll be happy to do it

knaaptime · 2020-02-24T20:25:53Z

interesting note while i prepare this: v1.0.4 is the latest released on pypi and conda, but we're on 1.1.0 (which is currently on RTD). This release will bump us to 1.1.1

pedrovma · 2020-02-24T20:32:07Z

Wow. The latest changelog in git is also 1.0.4 (https://github.com/pysal/spreg/blob/master/changelog_1.0.4.md)
Maybe RTD was too hasty?

knaaptime · 2020-02-24T23:53:30Z

1.1.0 was released on github it just didnt make it to pypi

knaaptime · 2020-02-24T23:58:39Z

everything required by a release is scripttable

bumping the version (bumper or bump2version)
running the changelog notebook (papermill using the current date as parameter)
pushing a github release (hub can't do it yet, but gh can)
pushing a pypi release (twine or bumper/bump2version)

so pretty sure I can wrap that logic up in a makefile. will try some expiriments this weekend

knaaptime mentioned this issue Feb 20, 2020

use flatten instead of setting shape directly #34

Merged

pedrovma closed this as completed in #34 Feb 20, 2020

pedrovma reopened this Feb 21, 2020

knaaptime closed this as completed Feb 24, 2020

ljwolf mentioned this issue Apr 21, 2020

Dtype error when training the GM Lag model pysal/pysal#1173

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ml_lag example failing? #33

ml_lag example failing? #33

knaaptime commented Feb 20, 2020 •

edited

Loading

knaaptime commented Feb 20, 2020

ljwolf commented Feb 20, 2020

pedrovma commented Feb 20, 2020

knaaptime commented Feb 20, 2020

ljwolf commented Feb 20, 2020 •

edited

Loading

pedrovma commented Feb 20, 2020

knaaptime commented Feb 20, 2020

pedrovma commented Feb 21, 2020

knaaptime commented Feb 21, 2020

knaaptime commented Feb 21, 2020

ljwolf commented Feb 23, 2020

pedrovma commented Feb 24, 2020

knaaptime commented Feb 24, 2020

knaaptime commented Feb 24, 2020

pedrovma commented Feb 24, 2020

knaaptime commented Feb 24, 2020

knaaptime commented Feb 24, 2020

ml_lag example failing? #33

ml_lag example failing? #33

Comments

knaaptime commented Feb 20, 2020 • edited Loading

knaaptime commented Feb 20, 2020

ljwolf commented Feb 20, 2020

pedrovma commented Feb 20, 2020

knaaptime commented Feb 20, 2020

ljwolf commented Feb 20, 2020 • edited Loading

pedrovma commented Feb 20, 2020

knaaptime commented Feb 20, 2020

pedrovma commented Feb 21, 2020

knaaptime commented Feb 21, 2020

knaaptime commented Feb 21, 2020

ljwolf commented Feb 23, 2020

pedrovma commented Feb 24, 2020

knaaptime commented Feb 24, 2020

knaaptime commented Feb 24, 2020

pedrovma commented Feb 24, 2020

knaaptime commented Feb 24, 2020

knaaptime commented Feb 24, 2020

knaaptime commented Feb 20, 2020 •

edited

Loading

ljwolf commented Feb 20, 2020 •

edited

Loading