Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Got error log like "** On entry to XXXXparameter number xxx had an illegal value" on RISCV platform #20423

Open
wallace-hu opened this issue Apr 9, 2024 · 20 comments
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.linalg

Comments

@wallace-hu
Copy link

wallace-hu commented Apr 9, 2024

Describe your issue.

Hi I am running a pmdarima example (https://github.com/alkaline-ml/pmdarima/blob/master/examples/arima/example_auto_arima.py )on my RISCV platform , It has the calltree like below:

 ** On entry to DGELSD parameter number  7 had an illegal value
  File "/home/example_auto_arima.py", line 13, in <module>
    modl = pm.auto_arima(train, start_p=1, start_q=1, start_P=1, start_Q=1,max_p=5, max_q=5, max_P=5, max_Q=5, seasonal=True,stepwise=True, suppress_warnings=True, D=10, max_D=10,error_action='ignore')
  File "/usr/local/lib/python3.10/dist-packages/pmdarima/arima/auto.py", line 547, in auto_arima
    d = ndiffs(
  File "/usr/local/lib/python3.10/dist-packages/pmdarima/arima/utils.py", line 189, in ndiffs
    pval, dodiff = testfunc(x)
  File "/usr/local/lib/python3.10/dist-packages/pmdarima/arima/stationarity.py", line 186, in should_diff
    lm = LinearRegression().fit(t, x)
  File "/usr/local/lib/python3.10/dist-packages/sklearn/base.py", line 1474, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_base.py", line 653, in fit
    self.coef_, _, self.rank_, self.singular_ = linalg.lstsq(X, y)
  File "/usr/local/lib/python3.10/dist-packages/scipy/linalg/_basic.py", line 1212, in lstsq
    lwork, iwork = _compute_lwork(lapack_lwork, m, n, nrhs, cond)
  File "/usr/local/lib/python3.10/dist-packages/scipy/linalg/lapack.py", line 1004, in _compute_lwork
    raise ValueError("Internal work array size computation failed: -7

But this example run well on x86_linux platfrom with the same packeage version. When i debug into the script, I find at last it call lapack.py _compute_lwork function and enter below code, and got -7 ret value.

----
    dtype = getattr(routine, 'dtype', None)
    int_dtype = getattr(routine, 'int_dtype', None)
    ret = routine(*args, **kwargs)
    if ret[-1] != 0:
        raise ValueError("Internal work array size computation failed: "
                         "%d" % (ret[-1],))
-----

Here i print out the value of *args, they are:
90 1 1 2.220446049250313e-16

Looks like it call the _flapack.cpython-310-riscv64-linux-gnu.so list functions like DGELSD ,then got error, With the same action it works well in x86 linux platform.

Can anyone indicate how can i continue debug such "On enrty to xxxxx parameter number xx had an illegal value" error?

Reproducing Code Example

Find a RSICV plaform and build a riscv64/ubuntu22.04 docker image. Build below list reuqirments:
pip3 install Cython==3.0.9 
pip3 install et-xmlfile==1.1.0
pip3 install joblib==1.3.2
pip3 install numpy==1.26.4
pip3 install openpyxl==3.1.2
pip3 install packaging==24.0
pip3 install patsy==0.5.6
pip3 install pefile==2023.2.7
pip3 install pyinstaller==6.5.0
pip3 install pyinstaller-hooks-contrib==2024.3
pip3 install python-dateutil==2.9.0.post0
pip3 install pytz==2024.1

pip3 install scipy==1.12.0
pip3 install six==1.16.0
pip3 install some-package==0.1
pip3 install threadpoolctl==3.4.0
pip3 install tzdata==2024.1
pip3 install urllib3==2.2.1

pip3 install xgboost==2.0.3 
pip3 install pandas==2.2.1 
pip3 install scikit-learn==1.4.1.post1 
pip3 install statsmodels==0.14.1
pip3 install lightgbm==4.3.0 
pip3 install pmdarima==2.0.4 


Then run the pmdarima example in https://github.com/alkaline-ml/pmdarima/blob/master/examples/arima/example_auto_arima.py

Error message

File "/home/example_auto_arima.py", line 13, in <module>
    modl = pm.auto_arima(train, start_p=1, start_q=1, start_P=1, start_Q=1,max_p=5, max_q=5, max_P=5, max_Q=5, seasonal=True,stepwise=True, suppress_warnings=True, D=10, max_D=10,error_action='ignore')
  File "/usr/local/lib/python3.10/dist-packages/pmdarima/arima/auto.py", line 547, in auto_arima
    d = ndiffs(
  File "/usr/local/lib/python3.10/dist-packages/pmdarima/arima/utils.py", line 189, in ndiffs
    pval, dodiff = testfunc(x)
  File "/usr/local/lib/python3.10/dist-packages/pmdarima/arima/stationarity.py", line 186, in should_diff
    lm = LinearRegression().fit(t, x)
  File "/usr/local/lib/python3.10/dist-packages/sklearn/base.py", line 1474, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_base.py", line 653, in fit
    self.coef_, _, self.rank_, self.singular_ = linalg.lstsq(X, y)
  File "/usr/local/lib/python3.10/dist-packages/scipy/linalg/_basic.py", line 1212, in lstsq
    lwork, iwork = _compute_lwork(lapack_lwork, m, n, nrhs, cond)
  File "/usr/local/lib/python3.10/dist-packages/scipy/linalg/lapack.py", line 1004, in _compute_lwork
    raise ValueError("Internal work array size computation failed: -7

SciPy/NumPy/Python version and system information

scipy==1.12.0
pip3 install numpy==1.26.4
python3 =3.10.11
@wallace-hu wallace-hu added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label Apr 9, 2024
@ev-br
Copy link
Member

ev-br commented Apr 9, 2024

What are the inputs to the lstsq routine, X and y?
You can add a breakpoint(), step through and print the inputs and their shapes.

@wallace-hu
Copy link
Author

wallace-hu commented Apr 9, 2024

What are the inputs to the lstsq routine, X and y? You can add a breakpoint(), step through and print the inputs and their shapes.
X is
[[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]
[0.]]
y is [-1220.63333333 -1168.63333333 -904.63333333 -618.63333333
-14.63333333 1331.36666667 2438.36666667 4453.36666667
3460.36666667 1087.36666667 -966.63333333 -1391.63333333
-1305.63333333 -1210.63333333 -1080.63333333 795.36666667
1195.36666667 1919.36666667 334.36666667 -1080.63333333
-1338.63333333 -1444.63333333 -1421.63333333 -1276.63333333
-943.63333333 -456.63333333 639.36666667 1046.36666667
-532.63333333 -1128.63333333 -1112.63333333 -1264.63333333
-1129.63333333 -758.63333333 148.36666667 1235.36666667
1381.36666667 629.36666667 -805.63333333 -1190.63333333
-1253.63333333 -1244.63333333 -937.63333333 133.36666667
1821.36666667 5231.36666667 2764.36666667 -802.63333333
-1234.63333333 -1016.63333333 -1131.63333333 -705.63333333
104.36666667 186.36666667 761.36666667 -63.63333333
-733.63333333 -1190.63333333 -1288.63333333 -1260.63333333
-1020.63333333 -753.63333333 552.36666667 1321.36666667
2941.36666667 1021.36666667 -1100.63333333 -1416.63333333
-1450.63333333 -1440.63333333 -1430.63333333 -1301.63333333
-1112.63333333 -197.63333333 2541.36666667 2005.36666667
-902.63333333 -1384.63333333 -1336.63333333 -1102.63333333
-731.63333333 -182.63333333 1975.36666667 5501.36666667
4823.36666667 2304.36666667 346.36666667 -1144.63333333
-1107.63333333 -681.63333333]

These values are identical in my x86 and RISC-V platform platform. When i debug into to /usr/local/lib/python3.10/dist-packages/scipy/linalg/lapack.py routine in below code

dtype = getattr(routine, 'dtype', None)
int_dtype = getattr(routine, 'int_dtype', None)
ret = routine(*args, **kwargs)
if ret[-1] != 0:
    raise ValueError("Internal work array size computation failed: "
                     "%d" % (ret[-1],))

RISCV platfrom return -7 error, But x86 platform have no problem

@ilayn
Copy link
Member

ilayn commented Apr 9, 2024

can you just run this on your platform

import numpy as np
import scipy.linalg as la

X = np.zeros([90, 1], dtype=np.float64)
y = np.array([
-1220.63333333,  -1168.63333333,  -904.63333333,  -618.63333333,  -14.63333333,  1331.36666667,  2438.36666667,
  4453.36666667,  3460.36666667,  1087.36666667,  -966.63333333,  -1391.63333333,  -1305.63333333,  -1210.63333333,
  -1080.63333333,  795.36666667,  1195.36666667,  1919.36666667,  334.36666667,  -1080.63333333,  -1338.63333333,
  -1444.63333333,  -1421.63333333,  -1276.63333333,  -943.63333333,  -456.63333333,  639.36666667,  1046.36666667,
  -532.63333333,  -1128.63333333,  -1112.63333333,  -1264.63333333,  -1129.63333333,  -758.63333333,  148.36666667,
  1235.36666667,  1381.36666667,  629.36666667,  -805.63333333,  -1190.63333333,  -1253.63333333,  -1244.63333333,
  -937.63333333,  133.36666667,  1821.36666667,  5231.36666667,  2764.36666667,  -802.63333333,  -1234.63333333,
  -1016.63333333,  -1131.63333333,  -705.63333333,  104.36666667,  186.36666667,  761.36666667,  -63.63333333,
  -733.63333333,  -1190.63333333,  -1288.63333333,  -1260.63333333,  -1020.63333333,  -753.63333333,  552.36666667,
  1321.36666667,  2941.36666667,  1021.36666667,  -1100.63333333,  -1416.63333333,  -1450.63333333,  -1440.63333333,
  -1430.63333333,  -1301.63333333,  -1112.63333333,  -197.63333333,  2541.36666667,  2005.36666667,  -902.63333333,
  -1384.63333333,  -1336.63333333,  -1102.63333333,  -731.63333333,  -182.63333333,  1975.36666667,  5501.36666667,
  4823.36666667,  2304.36666667,  346.36666667,  -1144.63333333,  -1107.63333333,  -681.63333333])

la.lstsq(X, y)

@wallace-hu
Copy link
Author

can you just run this on your platform

import numpy as np
import scipy.linalg as la

X = np.zeros([90, 1], dtype=np.float64)
y = np.array([
-1220.63333333,  -1168.63333333,  -904.63333333,  -618.63333333,  -14.63333333,  1331.36666667,  2438.36666667,
  4453.36666667,  3460.36666667,  1087.36666667,  -966.63333333,  -1391.63333333,  -1305.63333333,  -1210.63333333,
  -1080.63333333,  795.36666667,  1195.36666667,  1919.36666667,  334.36666667,  -1080.63333333,  -1338.63333333,
  -1444.63333333,  -1421.63333333,  -1276.63333333,  -943.63333333,  -456.63333333,  639.36666667,  1046.36666667,
  -532.63333333,  -1128.63333333,  -1112.63333333,  -1264.63333333,  -1129.63333333,  -758.63333333,  148.36666667,
  1235.36666667,  1381.36666667,  629.36666667,  -805.63333333,  -1190.63333333,  -1253.63333333,  -1244.63333333,
  -937.63333333,  133.36666667,  1821.36666667,  5231.36666667,  2764.36666667,  -802.63333333,  -1234.63333333,
  -1016.63333333,  -1131.63333333,  -705.63333333,  104.36666667,  186.36666667,  761.36666667,  -63.63333333,
  -733.63333333,  -1190.63333333,  -1288.63333333,  -1260.63333333,  -1020.63333333,  -753.63333333,  552.36666667,
  1321.36666667,  2941.36666667,  1021.36666667,  -1100.63333333,  -1416.63333333,  -1450.63333333,  -1440.63333333,
  -1430.63333333,  -1301.63333333,  -1112.63333333,  -197.63333333,  2541.36666667,  2005.36666667,  -902.63333333,
  -1384.63333333,  -1336.63333333,  -1102.63333333,  -731.63333333,  -182.63333333,  1975.36666667,  5501.36666667,
  4823.36666667,  2304.36666667,  346.36666667,  -1144.63333333,  -1107.63333333,  -681.63333333])

la.lstsq(X, y)

Also have the issue. below is the error log

** On entry to DGELSD parameter number 7 had an illegal value
Traceback (most recent call last):
File "/home/aa.py", line 20, in
la.lstsq(X, y)
File "/usr/local/lib/python3.10/dist-packages/scipy/linalg/_basic.py", line 1212, in lstsq
lwork, iwork = _compute_lwork(lapack_lwork, m, n, nrhs, cond)
File "/usr/local/lib/python3.10/dist-packages/scipy/linalg/lapack.py", line 1009, in _compute_lwork
raise ValueError("Internal work array size computation failed: "
ValueError: Internal work array size computation failed: -7
root@1d253a791c90:/home#

@ilayn
Copy link
Member

ilayn commented Apr 9, 2024

Yes nice. That code should give you

(array([0.]), array([], dtype=float64), 0, array([0.]))

as a result. So something is not right with your NumPy/SciPy installation because -7 indicates the array passed to LAPACK function is not right shape through f2py.

@wallace-hu
Copy link
Author

Yes nice. That code should give you

(array([0.]), array([], dtype=float64), 0, array([0.]))

as a result. So something is not right with your NumPy/SciPy installation because -7 indicates the array passed to LAPACK function is not right shape through f2py.
The versions are :
scipy==1.12.0
pip3 install numpy==1.26.4
python3 =3.10.11

Any suggestion to correct my installation?

@ilayn
Copy link
Member

ilayn commented Apr 9, 2024

I don't know anything about RISC versions. But, stating the obvious terrible attempt, I would uninstall numpy scipy scikit-learn and
reinstall numpy scipy and scikit-learn and other package in this order.

Sometimes, things get mixed up when you try to use locally installed package with system wide python and/or virtual environments (which seems like you are not using any).

@wallace-hu
Copy link
Author

I don't know anything about RISC versions. But, stating the obvious terrible attempt, I would uninstall numpy scipy scikit-learn and reinstall numpy scipy and scikit-learn and other package in this order.

Sometimes, things get mixed up when you try to use locally installed package with system wide python and/or virtual environments (which seems like you are not using any).

Can anywhere i can force the program pass the array to LAPACK function through f2py.

@wallace-hu
Copy link
Author

I will try the action uninstalling the numpy scipy scikit-learn, And reinstall them in order.

@ev-br
Copy link
Member

ev-br commented Apr 9, 2024

If this persists for a clean reinstall (you don't need scikit-learn, just numpy and scipy), I'd

  1. step through lstsq in pdb to make sure the arrays are what I think they are, save the inputs to the lapack call
  2. make a repro which calls the f2py wrapped function directly with exactly these inputs (EDIT: e.g., using scipy.linalg.lapack)
  3. step it through with $ gdb --args python your_repro.py. This will tell you whether the error is in f2py-generated C wrappers or in LAPACK itself.

It's possible that the problem is indeed within RISC-V OpenBLAS kernels. Switching a LAPACK library could help isolating the issue if so. See this page :https://scipy.github.io/devdocs/dev/contributor/debugging_linalg_issues.html#debugging-linalg-issues

@wallace-hu
Copy link
Author

If this persists for a clean reinstall (you don't need scikit-learn, just numpy and scipy), I'd

  1. step through lstsq in pdb to make sure the arrays are what I think they are, save the inputs to the lapack call
  2. make a repro which calls the f2py wrapped function directly with exactly these inputs (EDIT: e.g., using scipy.linalg.lapack)
  3. step it through with $ gdb --args python your_repro.py. This will tell you whether the error is in f2py-generated C wrappers or in LAPACK itself.

It's possible that the problem is indeed within RISC-V OpenBLAS kernels. Switching a LAPACK library could help isolating the issue if so. See this page :https://scipy.github.io/devdocs/dev/contributor/debugging_linalg_issues.html#debugging-linalg-issues

By GDB method, I can't step into routine(*args, **kwargs) in below code. Is there any simple way i can know whether the error is in f2py-generated C wrappers or in LAPACK itself.


dtype = getattr(routine, 'dtype', None)
int_dtype = getattr(routine, 'int_dtype', None)
ret = routine(*args, **kwargs)
if ret[-1] != 0:

@ev-br
Copy link
Member

ev-br commented Apr 9, 2024

One trick is to add pair of breakpoints: a python one + a C breakpoint:

$ cat lstsq_repro.py 
import numpy as np
from scipy.linalg import lstsq

a = np.eye(2, dtype=float)
b = np.ones(2, dtype=float)

breakpoint()                # <<<<< HERE
res = lstsq(a, b)

print(res)
br@gonzales:~/sweethome/temp/chol$ gdb --args python lstsq_repro.py
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...
(gdb) b dgelsd_                         # <<<<< HERE
Function "dgelsd_" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (dgelsd_) pending.
(gdb) run
Starting program: /home/br/mambaforge/envs/scipy-dev/bin/python lstsq_repro.py
[Thread debugging using libthread_db enabled]
...

@wallace-hu
Copy link
Author

wallace-hu commented Apr 9, 2024

One trick is to add pair of breakpoints: a python one + a C breakpoint:

$ cat lstsq_repro.py 
import numpy as np
from scipy.linalg import lstsq

a = np.eye(2, dtype=float)
b = np.ones(2, dtype=float)

breakpoint()                # <<<<< HERE
res = lstsq(a, b)

print(res)
br@gonzales:~/sweethome/temp/chol$ gdb --args python lstsq_repro.py
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...
(gdb) b dgelsd_                         # <<<<< HERE
Function "dgelsd_" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (dgelsd_) pending.
(gdb) run
Starting program: /home/br/mambaforge/envs/scipy-dev/bin/python lstsq_repro.py
[Thread debugging using libthread_db enabled]
...

Thanks for the help , Now i can stop at the .so dgelsd_ enrty , Can below msg show any clue?
root@1d253a791c90:/home# gdb --args python3 aa.py
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "riscv64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
https://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...
(No debugging symbols found in python3)
(gdb) b dgelsd_
Function "dgelsd_" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (dgelsd_) pending.
(gdb) run aa.py
Starting program: /usr/bin/python3 aa.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
[New Thread 0x3ff62e7100 (LWP 89844)]
[New Thread 0x3ff5ae6100 (LWP 89845)]
[New Thread 0x3ff12e5100 (LWP 89846)]
[Detaching after vfork from child process 89847]
name: openblas
found: True
version: 0.3.20
detection method: pkgconfig
include directory: /usr/include/riscv64-linux-gnu/openblas-pthread/
lib directory: /usr/lib/riscv64-linux-gnu/openblas-pthread/
openblas configuration: USE_64BITINT= DYNAMIC_ARCH= DYNAMIC_OLDER= NO_CBLAS= NO_LAPACK= NO_LAPACKE=1 NO_AFFINITY=1 USE_OPENMP=0 RISCV64_GENERIC MAX_THREADS=64
pc file directory: /usr/lib/riscv64-linux-gnu/pkgconfig

/usr/local/lib/python3.10/dist-packages/scipy/linalg/lapack.py(1004)_compute_lwork()
-> dtype = getattr(routine, 'dtype', None)
(Pdb) n
/usr/local/lib/python3.10/dist-packages/scipy/linalg/lapack.py(1005)_compute_lwork()
-> int_dtype = getattr(routine, 'int_dtype', None)
(Pdb) n
/usr/local/lib/python3.10/dist-packages/scipy/linalg/lapack.py(1006)_compute_lwork()
-> ret = routine(*args, **kwargs)
(Pdb) n

Thread 1 "python3" hit Breakpoint 1, 0x0000003ff669e990 in dgelsd_ () from /usr/lib/riscv64-linux-gnu/openblas-pthread/libopenblas.so.0
(gdb) n
Single stepping until exit from function dgelsd_,
which has no line number information.
** On entry to DGELSD parameter number 7 had an illegal value
0x0000003fee8b858c in f2py_rout.flapack_dgelsd_lwork () from /usr/local/lib/python3.10/dist-packages/scipy/linalg/_flapack.cpython-310-riscv64-linux-gnu.so
(gdb)

@wallace-hu
Copy link
Author

My debug py is:
root@1d253a791c90:/home# cat aa.py
import scipy
import numpy as np
import scipy.linalg as la

blas_dep = scipy.show_config(mode='dicts')['Build Dependencies']['blas']
for key in blas_dep:
print(f"{key}: {blas_dep[key]}")

X = np.zeros([90, 1], dtype=np.float64)
y = np.array([
-1220.63333333, -1168.63333333, -904.63333333, -618.63333333, -14.63333333, 1331.36666667, 2438.36666667,
4453.36666667, 3460.36666667, 1087.36666667, -966.63333333, -1391.63333333, -1305.63333333, -1210.63333333,
-1080.63333333, 795.36666667, 1195.36666667, 1919.36666667, 334.36666667, -1080.63333333, -1338.63333333,
-1444.63333333, -1421.63333333, -1276.63333333, -943.63333333, -456.63333333, 639.36666667, 1046.36666667,
-532.63333333, -1128.63333333, -1112.63333333, -1264.63333333, -1129.63333333, -758.63333333, 148.36666667,
1235.36666667, 1381.36666667, 629.36666667, -805.63333333, -1190.63333333, -1253.63333333, -1244.63333333,
-937.63333333, 133.36666667, 1821.36666667, 5231.36666667, 2764.36666667, -802.63333333, -1234.63333333,
-1016.63333333, -1131.63333333, -705.63333333, 104.36666667, 186.36666667, 761.36666667, -63.63333333,
-733.63333333, -1190.63333333, -1288.63333333, -1260.63333333, -1020.63333333, -753.63333333, 552.36666667,
1321.36666667, 2941.36666667, 1021.36666667, -1100.63333333, -1416.63333333, -1450.63333333, -1440.63333333,
-1430.63333333, -1301.63333333, -1112.63333333, -197.63333333, 2541.36666667, 2005.36666667, -902.63333333,
-1384.63333333, -1336.63333333, -1102.63333333, -731.63333333, -182.63333333, 1975.36666667, 5501.36666667,
4823.36666667, 2304.36666667, 346.36666667, -1144.63333333, -1107.63333333, -681.63333333])

la.lstsq(X, y)

@ev-br
Copy link
Member

ev-br commented Apr 9, 2024

Step through the python code of lstsq, find which python line fails. From it, find the corresponding C symbol and add a gdb breakpoint on it. It might be that the failure is at the lwork call, so the C symbol may be dgelsd_lwork or dgelsd_lwork_ (use nm on the .so file to find out.

@wallace-hu
Copy link
Author

wallace-hu commented Apr 9, 2024

Step through the python code of lstsq, find which python line fails. From it, find the corresponding C symbol and add a gdb breakpoint on it. It might be that the failure is at the lwork call, so the C symbol may be dgelsd_lwork or dgelsd_lwork_ (use nm on the .so file to find out.

Per my debug, It get dgelsd_lwork function attibution from _flapack.cpython-310-riscv64-linux-gnu.so, and then fall into dgelsd_lwork and report error "** On entry to DGELSD parameter number 7 had an illegal value". Don't know what's happened
inside dgelsd_lwork .

And if i set a beakpoint with "b dgelsd_lwork", it can't trigger the break. below is the so info about dgelsd_lwork
if i set a beakpoint with "b f2py_rout__flapack_dgelsd_lwork", it can trigger the break.

below is the .so info about dgelsd_lwork
root@1d253a791c90:/home# nm /usr/local/lib/python3.10/dist-packages/scipy/linalg/_flapack.cpython-310-riscv64-linux-gnu.so |grep dgelsd_lwork
00000000001572a8 d doc_f2py_rout__flapack_dgelsd_lwork
0000000000072400 t f2py_rout__flapack_dgelsd_lwork

SO canit be f2py-generated C wrappers problem? Where can i get the src code of _flapack.cpython-310-riscv64-linux-gnu.so Thank you so much @ev-br

@wallace-hu
Copy link
Author

No Response,Close the issue. Thanks to @ev-br @ilayn

@wallace-hu
Copy link
Author

Close

@lucascolley
Copy link
Member

Probably best to leave this open if the problem is still unresolved. Maintainers are generally busy and don't always have the time to reply quickly to every issue :)

@lucascolley lucascolley reopened this Apr 11, 2024
@ev-br
Copy link
Member

ev-br commented Apr 11, 2024

The point is, RISC-V is currently supported on a best-effort basis. Meaning, we are happy to accept patches which do not break our supported set of platforms (e.g., do not break our CI). We are also happy to offer advice and otherwise discuss problems which arise for "non-standard" platform like RISC-V. But in the end of the day somebody needs to provide a patch, to either SciPy or OpenBLAS, depending on where the problem is. ATM it's not even clear this is a scipy issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.linalg
Projects
None yet
Development

No branches or pull requests

5 participants
@ilayn @ev-br @lucascolley @wallace-hu and others