Packages cannot have both pandas and statsmodels in install_requires #1267

Closed
mwaskom opened this Issue Dec 23, 2013 · 22 comments

Projects

None yet

5 participants

@mwaskom
mwaskom commented Dec 23, 2013

Hi,

I am testing a fresh install of seaborn, which includes both pandas and statsmodels in its install_requires.

I ran into some trouble trying to install seaborn into a virtual environment with only numpy, scipy and matplotlib. The problem is that pip goes to install seaborn's dependencies, and even though it installs pandas and then statsmodels, the statsmodels installation crashes when performing some kind of check for Pandas, killing the whole operation.

Sorry, I know Python packaging is a huge PITA, but this is pretty annoying.

Here are the four commands to reproduce. The output, on my machine, is copied below.

conda create -n testenv pip
source activate testenv
conda install numpy scipy matplotlib
pip install seaborn
$ conda create -n testenv pip

Package plan for creating environment at /Users/mwaskom/anaconda/envs/testenv:

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    pip-1.4.1                  |           py27_1         644 KB
    setuptools-1.4             |           py27_0         490 KB

The following packages will be linked:

    package                    |            build
    ---------------------------|-----------------
    pip-1.4.1                  |           py27_1   hard-link
    python-2.7.6               |                0   hard-link
    readline-6.2               |                1   hard-link
    setuptools-1.4             |           py27_0   hard-link
    sqlite-3.7.13              |                1   hard-link
    tk-8.5.13                  |                1   hard-link
    zlib-1.2.7                 |                1   hard-link

Proceed ([y]/n)? y

Fetching packages ...
pip-1.4.1-py27_1.tar.bz2 100% |######################| Time: 0:00:09  68.79 kB/s
setuptools-1.4-py27_0.tar.bz2 100% |#################| Time: 0:00:04 120.80 kB/s
Extracting packages ...
[      COMPLETE      ] |##################################################| 100%
Linking packages ...
[      COMPLETE      ] |##################################################| 100%
#
# To activate this environment, use:
# $ source activate testenv
#
# To deactivate this environment, use:
# $ source deactivate
#


$ source activate testenv
prepending /Users/mwaskom/anaconda/envs/testenv/bin to PATH


(testenv)$ conda install numpy scipy matplotlib

Package plan for installation in environment /Users/mwaskom/anaconda/envs/testenv:

The following packages will be linked:

    package                    |            build
    ---------------------------|-----------------
    dateutil-2.1               |           py27_2   hard-link
    freetype-2.4.10            |                1   hard-link
    libpng-1.5.13              |                1   hard-link
    matplotlib-1.3.1           |       np17py27_0   hard-link
    numpy-1.7.1                |           py27_2   hard-link
    pyparsing-1.5.6            |           py27_0   hard-link
    pytz-2013b                 |           py27_0   hard-link
    scipy-0.13.2               |       np17py27_1   hard-link
    six-1.4.1                  |           py27_0   hard-link

Proceed ([y]/n)? y

Linking packages ...
[      COMPLETE      ] |##################################################| 100%


(testenv)$ pip install seaborn
Downloading/unpacking seaborn
  Downloading seaborn-0.2.0.tar.gz
  Running setup.py egg_info for package seaborn

Downloading/unpacking husl (from seaborn)
  Downloading husl-2.1.0.tar.gz
  Running setup.py egg_info for package husl

Downloading/unpacking moss>0.1 (from seaborn)
  Downloading moss-0.2.0.tar.gz
  Running setup.py egg_info for package moss
    /Users/mwaskom/anaconda/envs/testenv/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'URL'
      warnings.warn(msg)

Downloading/unpacking patsy (from seaborn)
  Downloading patsy-0.2.1.tar.gz (316kB): 316kB downloaded
  Running setup.py egg_info for package patsy

    no previously-included directories found matching 'doc/_build'
Downloading/unpacking pandas (from seaborn)
  Downloading pandas-0.12.0.tar.gz (3.2MB): 3.2MB downloaded
  Running setup.py egg_info for package pandas

    warning: no files found matching 'TODO.rst'
    warning: no files found matching 'setupegg.py'
    no previously-included directories found matching 'doc/build'
    warning: no previously-included files matching '*.so' found anywhere in distribution
    warning: no previously-included files matching '*.pyd' found anywhere in distribution
    warning: no previously-included files matching '*.pyc' found anywhere in distribution
    warning: no previously-included files matching '.git*' found anywhere in distribution
    warning: no previously-included files matching '.DS_Store' found anywhere in distribution
    warning: no previously-included files matching '*.png' found anywhere in distribution
Downloading/unpacking statsmodels (from seaborn)
  Downloading statsmodels-0.5.0.tar.gz (5.5MB): 5.5MB downloaded
  Running setup.py egg_info for package statsmodels
    Traceback (most recent call last):
      File "<string>", line 16, in <module>
      File "/private/var/folders/rr/svlz5yhs0tsdqnsp8g4qg65m0000gn/T/pip_build_mwaskom/statsmodels/setup.py", line 463, in <module>
        check_dependency_versions(min_versions)
      File "/private/var/folders/rr/svlz5yhs0tsdqnsp8g4qg65m0000gn/T/pip_build_mwaskom/statsmodels/setup.py", line 118, in check_dependency_versions
        raise ImportError("statsmodels requires pandas")
    ImportError: statsmodels requires pandas
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 16, in <module>

  File "/private/var/folders/rr/svlz5yhs0tsdqnsp8g4qg65m0000gn/T/pip_build_mwaskom/statsmodels/setup.py", line 463, in <module>

    check_dependency_versions(min_versions)

  File "/private/var/folders/rr/svlz5yhs0tsdqnsp8g4qg65m0000gn/T/pip_build_mwaskom/statsmodels/setup.py", line 118, in check_dependency_versions

    raise ImportError("statsmodels requires pandas")

ImportError: statsmodels requires pandas

----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /private/var/folders/rr/svlz5yhs0tsdqnsp8g4qg65m0000gn/T/pip_build_mwaskom/statsmodels
Storing complete log in /Users/mwaskom/.pip/pip.log
@josef-pkt
Member

Essentially you need to install pandas before statsmodels, and before statsmodels is run.

You could do pip install pandas, patsy before pip install seaborn or before pip install statsmodels

statsmodels doesn't have a proper requirements because I hate the default behavior of pip to reinstall/update all dependencies, which made several times a big mess with my carefully installed numpy and scipy, which is not so easy in a virtualenv on Windows.
Maybe it will change with newer version of pip and the availability of wheels.

@rgommers
Member

You don't have to add requirements to statsmodels setup.py to make this work, something like https://github.com/scipy/scipy/blob/master/setup.py#L205 should do it. The essential bit is to not try to import anything for the egg_info command.

@rgommers
Member

And fully agree with the upgrade sentiment of course:(

@josef-pkt
Member

Thanks Ralf, we can try to do that. If pip only requires the egg_info or similar, then we could hide the dependency version checks for this command.

However, I suspect that pip might still try to install statsmodels out of sequence, because it doesn't know the dependencies and might not have the information to install pandas and patsy before statsmodels.

We could switch to warnings for dependencies instead of errors, but those get lost in the usual install noise.

@mwaskom I think it will be useful for statsmodels to use seaborn as an optional dependency. @phobson said he will look into it.
This will not create problems on install because we don't check for optional dependencies and we wouldn't get into circular dependency problems.
However, I expect that eventually we will need to watch out for circular imports across packages.

@mwaskom
mwaskom commented Dec 23, 2013

Yeah I think the problem is just that the installation procedure tries to import pandas (and patsy, as I later leaned) and moreover that each install isn't "finished" from pip's perspective until the whole command completes, so even though pip was installing pandas first, statsmodels couldn't see it.

@rgommers
Member

It could be you have two problems here, but for a requirements file with numpy + scipy we had the exact same issue, and the change to scipy that I linked to fixed it.

@thatneat

My team is having this same issue with using statsmodels in an application where our dependency tree is managed by pip. We'd greatly appreciate it if you could take another look at this issue and see if you can allow pip's dependency management to do its thing. Otherwise we won't be able to use statsmodels without an ugly hack or two.

@jseabold
Member

Patches that try and implement the above mentioned fix would be welcome.

@jseabold
Member

Can you try #1381.

@jseabold
Member

Is it possible to replicate this without anaconda? If so, can someone post a MWE.

@mwaskom
mwaskom commented Feb 12, 2014

I can put together an example using Miniconda based on my Travis builds if that works? It should run in just a minute or two.

@jseabold
Member

Sure that would be helpful. Is it trivial to point the Travis build at that PR, or would it help if I merged it to see if it works?

@mwaskom
mwaskom commented Feb 12, 2014

This gets most of the way although it's crashing now when trying to build statsmodels from the git egg. I'll try to make that work, but you might have more success.

#! /bin/bash

if [ -f requirements.txt ]; then
    rm requirements.txt
fi 
echo pandas >> requirements.txt
echo patsy >> requirements.txt

# EDIT THIS TO TEST
# +++++++++++++++++
#echo statsmodels >> requirements.txt
echo git+git://github.com/jseabold/statsmodels.git@fix-1267#egg=statsmodels >> requirements.txt

wget http://repo.continuum.io/miniconda/Miniconda-2.2.2-Linux-x86_64.sh -O miniconda.sh
chmod +x miniconda.sh
./miniconda.sh -b
export PATH="~/anaconda/bin:$PATH"

if [ -d  ~/anaconda/envs/test_statsmodels_install ]; then
    rm -r ~/anaconda/envs/test_statsmodels_install
fi
conda create -n test_statsmodels_install --yes pip
source activate test_statsmodels_install
conda install --yes numpy==1.8.0 scipy matplotlib cython
pip install -r requirements.txt && pip install seaborn
@jseabold
Member

Traceback?

@josef-pkt
Member

AFAIU:
That will not work. You cannot merge the requirements of patsy and pandas with those of statsmodels.

I think pip install -r requirements.txt tries to install all at the same time, but pip doesn't know it should install statsmodels last. If you split those into two pip installs then it should be fine.

@mwaskom
mwaskom commented Feb 12, 2014

Traceback ?

Oops, of course: http://pastebin.com/xtKD7qZD

@mwaskom
mwaskom commented Feb 12, 2014

AFAIU:
That will not work. You cannot merge the requirements of patsy and pandas with those of statsmodels.

Isn't that the problem we're trying to solve here? My usecase is having statsmodels (and pandas/patsy) in the install_requires field of the seaborn setup.py, which I can't split into two parts.

@jseabold
Member

This works fine for me locally AFAICT. Do you have Cython installed on that machine? Cython is also a requirement to build from github source.

@jseabold
Member

I split the last line, though it shouldn't matter because of the &&

<snip>
Successfully installed pandas statsmodels
Cleaning up...
<snip>
Successfully installed seaborn husl moss scikit-learn
Cleaning up...
@mwaskom
mwaskom commented Feb 12, 2014

Oh nice catch. I added cython to the install script (edited above) and it now works for me too.

Looks like the fix works, thanks!

@jseabold
Member

Great. I'll merge.

@jseabold
Member

Thanks for the pointer @rgommers

@jseabold jseabold closed this in f184b8b Feb 12, 2014
@rgommers rgommers added the build label Aug 17, 2014
@PierreBdR PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this issue Sep 2, 2014
@jseabold jseabold BLD: Don't check dependencies on egg_info for pip. Closes #1267. 818997b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment