Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix contruction of np.array from custom types #240

Merged
merged 30 commits into from
Jul 30, 2021

Conversation

jcarpent
Copy link
Contributor

No description provided.

@jcarpent
Copy link
Contributor Author

@nim65s Could you check the reasons of the failure on gitlab? As it is passing on Ubuntu/conda, this should work too on the LAAS build farm.

@nim65s
Copy link
Contributor

nim65s commented Jul 17, 2021

Hi @jcarpent,

I think this is a numpy version issue.

For example, in 20.04, this will fail with the same error message as in gitlab CI:

FROM ubuntu:20.04                                                                                                      

ENV DEBIAN_FRONTEND=noninteractive CTEST_OUTPUT_ON_FAILURE=true

RUN apt-get update -y \
 && apt-get install -y \
    build-essential \
    cmake \
    git \
    libboost-all-dev \
    libeigen3-dev \
    python-is-python3 \
    python3-numpy \
    python3-pip \
 && rm -rf /var/lib/apt/lists/*

#RUN python -m pip install --upgrade numpy
RUN git clone --recursive -j4 -b devel https://github.com/jcarpent/eigenpy.git
WORKDIR eigenpy/build
RUN cmake ..
RUN make -sj16
RUN make test

Uncommenting numpy upgrade from 1.17.4 (ubuntu) to 1.21.0 (PyPI) make the tests pass as in conda CI.

@proyan
Copy link
Member

proyan commented Jul 20, 2021

@nim65s I tried this branch on my local installation, with Ubuntu20.04, and numpy version 1.21.1, but couldn't reproduce this error. Perhaps this might be a numpy bug fixed between 1.21.0 and 1.21.1 🤞🏽

@jcarpent
Copy link
Contributor Author

as mentioned by @nim65s, the issue is coming for numpy < 1.21

@proyan
Copy link
Member

proyan commented Jul 20, 2021

For Ubuntu20.04, numpy installation by apt (1.17.4) to and PyPI (1.21.1) both seem to be supported (if the bug really was in 1.21.0). If so, do we still need to provide a patch for 1.17.4< numpy version <1.21.1? I guess we could simply ask the user to update the installed numpy version

@jcarpent
Copy link
Contributor Author

We should not ask people to update their NumPy version to get the latest one.
It would be more robust to support any recent versions coming natively with Ubuntu for instance.
So, we should provide a patch.

@proyan
Copy link
Member

proyan commented Jul 23, 2021

The CI failure for

mat[:] = ...

The issue appears in earlier version of numpy (1.17.x), and while later versions have fixed this assignment by access, the earlier version still has this bug.

In the numpy version 1.17.4, which corresponds to the installation on Ubuntu18.04, the same memory manipulation that was discussed in numpy/numpy#19553 is carried out here:

https://github.com/numpy/numpy/blob/v1.17.4/numpy/core/src/multiarray/arrayobject.c#L296

and then the badly formed source pointer is given to copyswapn for assignment. The corresponding function PyArray_CopyObject has been updated in the future releases, and uses PyArray_DATA for data pointer access.

I don't see any way to resolve this issue.

@proyan
Copy link
Member

proyan commented Jul 29, 2021

I think we can remove the array assignment from the unit-test, merge this PR, and create an issue with label won't-fix to keep track of the numpy bug.

@jcarpent
Copy link
Contributor Author

I'm almost done with some updates to allow casting between types. Is it fine for you?

@proyan
Copy link
Member

proyan commented Jul 30, 2021

For the simple test of :

import user_type                                                                                                                                                                                               
v = user_type.CustomDouble(1)                                                                                                                                                                                  
print("address of v:",hex(id(v)))                                                                                                                                                                              

the following is the backtrace:

address of v: 0x7fffdc229170

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x000000000053acc5 in gc_list_remove (node=0x7fffdc229160, node=0x7fffdc229160) at ../Modules/gcmodule.c:2056
2056	../Modules/gcmodule.c: No such file or directory.
(gdb) bt
#0  0x000000000053acc5 in gc_list_remove (node=0x7fffdc229160, node=0x7fffdc229160) at ../Modules/gcmodule.c:2056
#1  PyObject_GC_Del (op=0x7fffdc229170) at ../Modules/gcmodule.c:2056
#2  0x00000000004d3bc4 in subtype_dealloc (self=<CustomDouble at remote 0x7fffdc229170>) at ../Objects/typeobject.c:1192

It looks like PyGC_Head is undefined/ill-defined for the newly created objects. Perhaps it is unable to understand the size of the new type. As a result, when the gc tries to search for the memory address here, it simply decreases the pointer reference (here) and that is invalid.

This is with Python 3.8, numpy 1.21.1. It would change with different versions.

@jcarpent
Copy link
Contributor Author

jcarpent commented Jul 30, 2021

Thanks for the report.
It seems that we need to register again the type, restore with 65ed89c.
Could you check whether 65ed89c solves the issue on your side?

@jcarpent
Copy link
Contributor Author

Gitlab seems to be happy (fantastic).

@jcarpent jcarpent merged commit e4e506a into stack-of-tasks:devel Jul 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants