pyximport.install() before import api crash #1195

Closed
Carreau opened this Issue Nov 19, 2013 · 16 comments

Projects

None yet

3 participants

@Carreau
Carreau commented Nov 19, 2013
In [1]: import pyximport
In [2]: pyximport.install()
Out[2]: (None, <pyximport.pyximport.PyxImporter at 0x10e3bdfd0>)

In [3]: import statsmodels.api as sm
/Users/matthiasbussonnier/.pyxbld/temp.macosx-10.7-x86_64-2.7/pyrex/statsmodels/tsa/kalmanf/kalman_loglike.c:321:10: fatal error: 'numpy/arrayobject.h' file not found
#include "numpy/arrayobject.h"
         ^
1 error generated.
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
...

Full trace here. I can go around by pximport.install() after, but just want to record the bug. I should be mostly on recent stable but don't have time to try master now. Thanks.

@josef-pkt
Member

Can you provide a bit of background, I haven't tried pyximport in years.

Why is pyximport trying to build the extension again?
Was statsmodels correctly build with extensions before?

There won't be anything different with statsmodels master since we haven't changed anything in building extensions recently.

If pyximport wants to install any .pyx files that it finds, then we would have to remove those files. In that case I think this will be a "wont-fix".

I have no idea if pyximport could make to work with installing statsmodels.

For comparison: What happens if you do pyximport.install() before importing pandas?

@josef-pkt
Member

Thank for the report.
Even if we cannot do anything about it, it is good to know what potential problems there are.

@jseabold
Member

The numpy_include is done in our custom build_ext, though I don't know why this should matter.

>>> import pyximport
>>> import numpy
>>> pyximport.install(setup_args={"include_dirs":numpy.get_include()})
>>> import statsmodels.api as sm
@jseabold
Member

It's unclear to me whether this is a local issue or our problem.

https://groups.google.com/forum/#!topic/cython-users/_PyzG5F4LoA

@josef-pkt
Member

The last time I was playing with pyximport I had to patch it to be able to use it on Windows with numpy extensions.
I didn't pay any attention to whether anything changed in Cython in the last few years with this.

I don't see any advantage of using pyximport for statsmodels, since we have all the cythonizing and build set up with setup.py

@jseabold
Member

You mean, you had to define distutils.cfg or actually patch it?

@josef-pkt
Member

I didn't remember correctly.

pyximport was fixed in 2010 to have a way to use user specified config options (even if it doesn't pick up numpy automatically)

The last time I had problems and needed to patch the source code of cython.inline. I just tried to track my message to cython-dev, however it seems I had sent it to a defunct cython-dev@codespeak
The issue I had looks like https://github.com/cython/cython/pull/129/files (based on a quick check)
pyinline didn't check the config files.

@Carreau
Carreau commented Nov 19, 2013

For comparison: What happens if you do pyximport.install() before importing pandas?

Nothing, import seem to work correctly.

Can you provide a bit of background, I haven't tried pyximport in years.

Long story short, I go that by trying seaborn in a long notebook.
I prototype using IPython's %%cython magic and move things to pyx file once things settle.
Hence this particular notebook had a pyximport.install() before import seaborn (which import scikit-learn, that picks stat model, that crashes, IIRC).

This is basically my extent of (relevant) knowledge of cython and pyximport.

So I'm not an expert on cython, nor pyximport, just came across that, made the minimal not-working example and reported. I can try to move that to cython but I guess they will say see with library authors first.

I'll try to do that on another machine and report. Also FYI, the current one is on OSX 10.7.5, probably stable of everything involved here.

@jseabold
Member

To be clear, I'm able to reproduce, but if you add the numpy include in the pyximport.install, then everything is fine. It's unclear to me why this is the case for use and not say pandas or sklearn though.

@josef-pkt
Member

Backing up a bit:

The installed directory of statsmodels doesn't have any pyx nor c files (spot checking several of my python versions)
So if statsmodels is imported from the installed version, then pyximport has nothing to do.

So in this case pyximport needs to have access to the statsmodels source. I have no idea how that can be the case except adding the source directory directly to the python path.
Since we use largely the same setup.py as pandas, we shouldn't have any different problems

@Carreau Do you have the statsmodels source directory on the python path, and not the one of pandas?

@jseabold Did you try with the source directories of pandas and sklearn on the python path?

@Carreau
Carreau commented Nov 19, 2013

Do you have the statsmodels source directory on the python path, and not the one of pandas?

Nop, just pip install it and :

$ cd /usr/local/lib/python2.7/site-packages/statsmodels
$  ls -1 -R **/*.c **/*.pyx
nonparametric/_smoothers_lowess.c
nonparametric/_smoothers_lowess.pyx
nonparametric/linbin.c
nonparametric/linbin.pyx
src/bspline_ext.c
src/bspline_impl.c
tsa/kalmanf/kalman_loglike.c
tsa/kalmanf/kalman_loglike.pyx
@Carreau
Carreau commented Nov 19, 2013

pandas dir have no .c nor .pyx file, did just reinstall statmodels, and still have .pyx and .c (have checked that they were gone after removing)

@Carreau
Carreau commented Nov 19, 2013

re-install pandas too, dir is clean of .c and .pyx file too ... looked at the 2 setup.py from the 2 project, don't find the differences...

@jseabold
Member

I don't have any source directories on my path.

@josef-pkt
Member

I think the problem might be in MANIFEST.in statsmodels has the .c and .pyx in there, pandas doesn't

I found now also .pyx and .c in one of my virtualenv (python 2.6) but don't have them in my python 2.7 and python 3.3 virtualenvironments.
I have no idea what would be different, but I created and updated the virtualenvs at different times.

What you could try is to remove *.pyx *.c from the first line of MANIFEST.in and try the install from an unzipped statsmodels source.

There are so many ways to install statsmodels that it's often difficult to figure out what makes the difference. And we don't have tests for OSX on a regular basis.

@jseabold
Member

Yep, not installing *.c and *.pyx does the trick.

@jseabold jseabold closed this in #1200 Nov 23, 2013
@jseabold jseabold added a commit that referenced this issue Nov 23, 2013
@jseabold jseabold Backport PR #1200: BLD: do not install *.pyx *.c MANIFEST.in
remove *.pyx *.c  MANIFEST.in

this should fix and close #1195  cython pyximport tries to recompile the statsmodels extensions
8e07d34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment