Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated examples and tests that use scipy.misc.lena() #5920

Closed

Conversation

nelson-liu
Copy link
Contributor

scipy.misc.lena() will be removed in scipy version 0.17, so this PR changes all tests and examples that use it to use scipy.misc.face() instead. Addresses issue #5739. I've verified that all the scripts work on my local machine, except for what was formerly plot_lena_segmentation.py (now plot_face_segmentation.py). plot_face_segmentation.py runs extremely slowly; I ran it for around 6 hours during as I was running "make html" before I gave up and interrupted it. This is probably related to bug #1966.

…removed in scipy version 0.17, to use scipy.misc.face() instead.
@nelson-liu
Copy link
Contributor Author

It seems like the tests on circleCI and travis-CI are failing because they don't have the version of scipy needed to access the face image (v 0.12). How should I be getting around this?

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Nov 25, 2015 via email

@nelson-liu
Copy link
Contributor Author

@GaelVaroquaux sorry if i was unclear, but it does work with version 0.12 (that's the version during which the face image was added [see: https://github.com/scipy/scipy/pull/351] ) . travisci / circleci are running earlier versions.

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Nov 26, 2015 via email

@ogrisel
Copy link
Member

ogrisel commented Dec 3, 2015

do we drop support for scipy 0.12? 0.13 is in Ubuntu LTS 14.04. I am not terribly happy with that, but the other option is accepting that some of our examples will be broken in 0.17.

I would rather upgrade the circleci configuration to have a more recent version of scipy to build the examples.

I think it's ok if some examples are broken on very old scipy (as 0.9) as long as the tests pass and the examples run correctly on more recent scipy.

@GaelVaroquaux
Copy link
Member

I would rather upgrade the circleci configuration to have a more recent
version of scipy to build the examples.

I think it's ok if some examples are broken on very old scipy (as 0.9) as long
as the tests pass and the examples run correctly on more recent scipy.

I agree.

Remain the issue that the segmentation example takes forever. Maybe that
can solved by working on a subpart, cropped, of the image. That might
work better than downsampling (because the linear algebra problem will be
better conditionned).

@ogrisel
Copy link
Member

ogrisel commented Dec 3, 2015

@nelson-liu how long does it take to run this example on your machine if you install PyAMG? Maybe we could skip this example if PyAMG is not installed?

@nelson-liu
Copy link
Contributor Author

@ogrisel I have pyAMG installed, and it segfaulted out of execution (#5908)

@amueller
Copy link
Member

amueller commented Dec 9, 2015

I think we should just raise a SkipTest in this one test if scipy is too old. I want travis running old scipy.

@nelson-liu
Copy link
Contributor Author

@amueller how would I raise a skiptest? Or is that not something I should fiddle with.

@ogrisel
Copy link
Member

ogrisel commented Dec 28, 2015

I think we should just raise a SkipTest in this one test if scipy is too old. I want travis running old scipy.

+1

@nelson-liu use git grep -i skiptest to find other usage examples in the scikit-learn code base.

@ogrisel
Copy link
Member

ogrisel commented Dec 28, 2015

To speed up the computation you can try to resize the image, e.g. to resize to 25%:

face = sp.misc.imresize(face, 0.10)

and also play with the eigen_tol parameter of spectral_clustering.

@ogrisel
Copy link
Member

ogrisel commented Dec 28, 2015

Actually increasing the tolerance of the arpack solver yields to poor results. I got better results by twicking the other parameters of the example:

diff --git a/examples/cluster/plot_face_segmentation.py b/examples/cluster/plot_face_segmentation.py
index 43e1da8..27d4c76 100644
--- a/examples/cluster/plot_face_segmentation.py
+++ b/examples/cluster/plot_face_segmentation.py
@@ -31,8 +31,12 @@ import matplotlib.pyplot as plt
 from sklearn.feature_extraction import image
 from sklearn.cluster import spectral_clustering

+# Load the racoon face as a numpy array
 face = sp.misc.face(gray=True)

+# Resize it to 10% of the original size to speed up the processing
+face = sp.misc.imresize(face, 0.10) / 255.
+
 # Convert the image into a graph with the value of the gradient on the
 # edges.
 graph = image.img_to_graph(face)
@@ -42,18 +46,19 @@ graph = image.img_to_graph(face)
 # actual image. For beta=1, the segmentation is close to a voronoi
 beta = 5
 eps = 1e-6
-graph.data = np.exp(-beta * graph.data / face.std()) + eps
+graph.data = np.exp(-beta * graph.data / graph.data.std()) + eps

 # Apply spectral clustering (this step goes much faster if you have pyamg
 # installed)
-N_REGIONS = 11
+N_REGIONS = 25

 ###############################################################################
 # Visualize the resulting regions

 for assign_labels in ('kmeans', 'discretize'):
     t0 = time.time()
-    labels = spectral_clustering(graph, n_clusters=N_REGIONS, assign_labels=assign_labels, random_state=1)
+    labels = spectral_clustering(graph, n_clusters=N_REGIONS,
+                                 assign_labels=assign_labels, random_state=1)
     t1 = time.time()
     labels = labels.reshape(face.shape)

@@ -64,6 +69,8 @@ for assign_labels in ('kmeans', 'discretize'):
                     colors=[plt.cm.spectral(l / float(N_REGIONS)), ])
     plt.xticks(())
     plt.yticks(())
-    plt.title('Spectral clustering: %s, %.2fs' % (assign_labels, (t1 - t0)))
+    title = 'Spectral clustering: %s, %.2fs' % (assign_labels, (t1 - t0))
+    print(title)
+    plt.title(title)

 plt.show()

It runs in ~8s total on my laptop without pyamg.

Here are the results:

kmeans_raccoon
discretized_raccoun

@nelson-liu
Copy link
Contributor Author

@ogrisel that actually doesn't look too bad! Shall I keep tweaking the parameters, or should we go with that diff?

@ogrisel
Copy link
Member

ogrisel commented Dec 29, 2015

@ogrisel that actually doesn't look too bad! Shall I keep tweaking the parameters, or should we go with that diff?

Feel free to reuse the parameters I gave in the diff. If you tweak further and find more interestingly looking results that take less than 20s to compute, feel free to use them in your PR.

@ogrisel
Copy link
Member

ogrisel commented Dec 29, 2015

@nelson-liu if you don't have the time to finish addressing the comments of this PR, let me know and I can takeover from here.

@nelson-liu
Copy link
Contributor Author

@ogrisel, sorry I've just been busy lately with the holiday season and that's mostly passed. I'll get this example into the pr by tomorrow night. Was a decision ever reached as for scipy versions for circleci and travisci?

…n, dramatically improving runtime of the example.
@nelson-liu
Copy link
Contributor Author

@ogrisel I've added your changes to the PR. I feel like that is the last code-based change left? The only thing is to decide how to deal with circleci / travisci in terms of them running older versions of scipy that do not have the face() image.

@ogrisel
Copy link
Member

ogrisel commented Dec 29, 2015

As @amueller said, raise SkipTest in the failing travis test on scipy < 0.12.

For circle ci we need a more recent version of scipy, either by using an apt repository such as neurodebian that provides recent packages for numpy and scipy on old distributions such as ubuntu 12.04 (see http://neuro.debian.net/) or by find a way to configure circle ci to use a more recent distribution such as ubuntu 14.04 or by using anaconda instead of the system packages.

@ogrisel
Copy link
Member

ogrisel commented Dec 29, 2015

As of now changing the version of the base image of circle ci is not yet possible:

https://discuss.circleci.com/t/when-are-we-going-to-get-ubuntu-14-04/331/4

So the best bet is either to use miniconda (as done in the travis configuration) or by leveraging the neurodebian repo to add to the ubuntu 12.04 distro used in the base cicle ci image.

@nelson-liu
Copy link
Contributor Author

@ogrisel sorry I'm not very familiar with unit testing on travis, would I just do something like this: http://stackoverflow.com/questions/30867145/tell-travis-to-skip-a-test-but-continue-to-include-it-in-my-main-test-suite/30874941#30874941
Since the script has no def, I'm assuming I'll should put this line after all the imports?

edit: rather, should I just raise a skiptest on the version of scipy imported instead of specifically targeting travis?

@ogrisel
Copy link
Member

ogrisel commented Dec 30, 2015

No have a look at other example usage of SkipTest in scikit-learn by using git grep as I said earlier.

@ogrisel
Copy link
Member

ogrisel commented Dec 30, 2015

edit: rather, should I just raise a skiptest on the version of scipy imported instead of specifically targeting travis?

Yes, exactly.

@nelson-liu
Copy link
Contributor Author

Ah apologies, didn't see your comment about git grep. thanks!

- sudo apt-get update
- sudo apt-get install libatlas-dev libatlas3gf-base
- sudo apt-get install build-essential python-dev python-setuptools
- sudo apt-get install python-numpy python-scipy python-dev python-matplotlib
- sudo apt-get install python-nose python-coverage
- sudo apt-get install python-sphinx
- pip install cython
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer needed as it can be installed via conda directly.

@nelson-liu
Copy link
Contributor Author

edit: I didn't read your previous comment before posting this. I'll implement the changes you highlighted in the diff and see what happens. Thanks for the input!
aha! It seems like there are some virtualenv troubles. For example, echoing $PATH in circle.yml after "entering" testenv returns (there are no signs of conda anywhere):

/usr/local/heroku/bin:/opt/google-cloud-sdk/bin:/home/ubuntu/virtualenvs/venv-system/bin:/home/ubuntu/.pyenv/shims:/home/ubuntu/.pyenv/bin:/home/ubuntu/.local/bin:/home/ubuntu/.go_workspace/bin:/usr/local/go_workspace/bin:/usr/local/go/bin:/opt/ghc/7.6.3/bin:/opt/happy/1.19.3/bin:/opt/alex/3.1.3/bin:/opt/cabal/1.22/bin:/opt/google-cloud-sdk/bin:/home/ubuntu/.m2/apache-maven-3.2.5/bin:/home/ubuntu/nvm/v0.10.33/bin:/home/ubuntu/.phpenv/shims:/home/ubuntu/.phpenv/bin:/home/ubuntu/.rvm/gems/ruby-1.9.3-p448/bin:/home/ubuntu/.rvm/gems/ruby-1.9.3-p448@global/bin:/home/ubuntu/.rvm/rubies/ruby-1.9.3-p448/bin:/home/ubuntu/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/android-sdk-linux/tools:/usr/local/android-sdk-linux/platform-tools:/home/ubuntu/.rvm/bin:/home/ubuntu/.composer/vendor/bin:/usr/local/gradle-1.10/bin:/usr/local/heroku/bin:/home/ubuntu/.rvm/bin

I enabled an ssh-build, and ssh'd into the machine after the build failed. The venv-system of circleci was still active. I deactivated the venv-system, activated testenv, and ran a conda list. This command ran successfully, and all the dependencies were there. Then, from within testenv, I ran setup.py install --user, which also went off without a hitch.

In summary, I believe its an issue with not properly deactivating the venv that comes with circleci. However, it could simply just be that the venv-system is automatically activated upon ssh (I logged out and logged back in and it was active again). It could possibly be related to this: https://discuss.circleci.com/t/disable-autodetection-of-project-or-application-of-python-venv/235

Any thoughts?

@ogrisel
Copy link
Member

ogrisel commented Dec 31, 2015

In summary, I believe its an issue with not properly deactivating the venv that comes with circleci.

Indeed, it might be a good idea to get rid of it. Creating a conda env from within a virtualenv seems to cause the "conda command not found" problem and can probably cause other issues.

@nelson-liu
Copy link
Contributor Author

@ogrisel is there any way to go about getting rid of it completely?
Additionally, when running the build now, there is an exception when running sphinx

Exception occurred:
  File "/home/ubuntu/scikit-learn/doc/sphinxext/gen_rst.py", line 676, in make_thumbnail
    import Image
ImportError: No module named Image

Should Pillow be a dependency? In the file (gen_rst.py) it has:

def make_thumbnail(in_fname, out_fname, width, height):
    """Make a thumbnail with the same aspect ratio centered in an
       image with a given width and height
    """
    # local import to avoid testing dependency on PIL:
    try:
        from PIL import Image
    except ImportError:
        import Image

@ogrisel
Copy link
Member

ogrisel commented Dec 31, 2015

Please add the pillow package when creating the conda env.

@ogrisel
Copy link
Member

ogrisel commented Dec 31, 2015

@ogrisel is there any way to go about getting rid of it completely?

It should be possible to just delete the virtualenv folder. Calling deactivate prior to installing miniconda and creating the conda env might be enough though.

@nelson-liu
Copy link
Contributor Author

hmm, make html seems to have segfaulted. Here's the full trace

plotting plot_species_distribution_modeling.py

=============================
Species distribution modeling
=============================

Modeling species' geographic distributions is an important
problem in conservation biology. In this example we
model the geographic distribution of two south american
mammals given past observations and 14 environmental
variables. Since we have only positive examples (there are
no unsuccessful observations), we cast this problem as a
density estimation problem and use the `OneClassSVM` provided
by the package `sklearn.svm` as our modeling tool.
The dataset is provided by Phillips et. al. (2006).
If available, the example uses
`basemap <http://matplotlib.sourceforge.net/basemap/doc/html/>`_
to plot the coast lines and national boundaries of South America.

The two species are:

 - `"Bradypus variegatus"
   <http://www.iucnredlist.org/apps/redlist/details/3038/0>`_ ,
   the Brown-throated Sloth.

 - `"Microryzomys minutus"
   <http://www.iucnredlist.org/apps/redlist/details/13408/0>`_ ,
   also known as the Forest Small Rice Rat, a rodent that lives in Peru,
   Colom/home/ubuntu/scikit-learn/sklearn/covariance/graph_lasso_.py:230: RuntimeWarning: invalid value encountered in multiply
  * coefs)
/home/ubuntu/scikit-learn/sklearn/covariance/graph_lasso_.py:232: RuntimeWarning: invalid value encountered in multiply
  * coefs)

# deactivate circleci virtualenv and setup miniconda env.
deactivate
sudo apt-get update
sudo apt-get install libatlas-dev libatlas3gf-base build-essential
# Use the miniconda installer for faster download / install of conda
# itself
pushd .
cd
mkdir -p download
cd download
echo "Cached in $HOME/download :"
ls -l
echo
if [[ ! -f miniconda.sh ]]
   then
   wget http://repo.continuum.io/miniconda/Miniconda-3.6.0-Linux-x86_64.sh \
   -O miniconda.sh
fi
chmod +x miniconda.sh && ./miniconda.sh -b -p $HOME/miniconda
cd ..
export PATH="$HOME/miniconda/bin:$PATH"
conda update --yes conda
popd

# Configure the conda environment and put it in the path using the
# provided versions
conda create -n testenv --yes python numpy scipy \
  cython=$CYTHON_VERSION nose coverage matplotlib sphinx pillow
source /home/ubuntu/miniconda/envs/testenv/bin/activate testenv
# Install scikit-learn in development mode.
# The pipefail is requested to propagate exit code
python setup.py develop
set -o pipefail && cd doc && make html 2>&1 | tee ~/log.txt
 returned exit code 2

Action failed: # deactivate circleci virtualenv and setup miniconda env.
deactivate
sudo apt-get update
sudo apt-get install libatlas-dev libatlas3gf-base build-essential
# Use the miniconda installer for faster download / install of conda
# itself
pushd .
cd
mkdir -p download
cd download
echo "Cached in $HOME/download :"
ls -l
echo
if [[ ! -f miniconda.sh ]]
   then
   wget http://repo.continuum.io/miniconda/Miniconda-3.6.0-Linux-x86_64.sh \
   -O miniconda.sh
fi
chmod +x miniconda.sh && ./miniconda.sh -b -p $HOME/miniconda
cd ..
export PATH="$HOME/miniconda/bin:$PATH"
conda update --yes conda
popd

# Configure the conda environment and put it in the path using the
# provided versions
conda create -n testenv --yes python numpy scipy \
  cython=$CYTHON_VERSION nose coverage matplotlib sphinx pillow
source /home/ubuntu/miniconda/envs/testenv/bin/activate testenv
# Install scikit-learn in development mode.
# The pipefail is requested to propagate exit code
python setup.py develop
make: *** [html] Segmentation fault (core dumped) set -o pipefail && cd doc && make html 2>&1 | tee ~/log.txt

Any ideas as to why this might be happening?

@nelson-liu
Copy link
Contributor Author

so I ssh'd into the box after the segfault, and tried to run make html. However, it would segfault on the stock market example. I also tried to run the script directly, and got a segfault. Could this possibly be related to: #5724?

@amueller amueller added this to the 0.17.1 milestone Jan 26, 2016
@amueller
Copy link
Member

@ogrisel not sure we want to include this in the release?

@ogrisel
Copy link
Member

ogrisel commented Jan 28, 2016

Yeah that would be great. Need to fix the doc build configuration though.

@ogrisel
Copy link
Member

ogrisel commented Feb 1, 2016

closing in favor of the squashed and rebased version at #6260.

@ogrisel ogrisel closed this Feb 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants