Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected behavior from scipy.spatial.distances with different dtypes (uint8) #9961

Open
MarvinT opened this issue Mar 19, 2019 · 2 comments · May be fixed by #10102
Open

unexpected behavior from scipy.spatial.distances with different dtypes (uint8) #9961

MarvinT opened this issue Mar 19, 2019 · 2 comments · May be fixed by #10102
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.spatial

Comments

@MarvinT
Copy link

MarvinT commented Mar 19, 2019

My issue is about unexpected behavior from scipy.spatial.distance.cosine and scipy.spatial.distance.euclidean with different dtypes, in particular here is an example using uint8.

Cosine and euclidean return a float and nowhere does it mention that they expects a float. This doesn't seem to be an issue with correlation.

Reproducing code example:

import numpy as np
from scipy.spatial.distance import cosine, euclidean, correlation

np.random.seed(31415)

a = np.random.randint(100, size=100, dtype='uint8')
b = np.random.randint(100, size=100, dtype='uint8')

print(cosine(a, b))
print(cosine(a.astype(float), b.astype(float)))

print(euclidean(a, b))
print(euclidean(a.astype(float), b.astype(float)))

print(correlation(a, b))
print(correlation(a.astype(float), b.astype(float)))

Which returns:

-0.14902052975578384
0.2763095875084359

1602.4755848374102
407.597841015

0.9621755273397231
0.9621755273397231

This should be an easy fix of casting to float beforehand because I doubt anyone would be using these functions this way meaningfully... Otherwise we could add a note to the documentation that it expects a float.

Scipy/Numpy/Python version information:

('1.1.0', '1.15.1', sys.version_info(major=2, minor=7, micro=15, releaselevel='final', serial=0))
MarvinT added a commit to MarvinT/morphs that referenced this issue Mar 19, 2019
MarvinT added a commit to MarvinT/morphs that referenced this issue Mar 20, 2019
* xcor notebook version

* morph_xcor_viz

* refactor to singleunit.py

* euclidean, correlation, cosine

* pdist

* remove unused blocked functions

* resource usage

* cast to float  due to scipy/scipy#9961

* updated notebook

* rename notebook and add spect jointplot

* update notebook

* range for py3

* hued scatterplot

* OLS regression
@MarvinT
Copy link
Author

MarvinT commented Mar 20, 2019

I can submit a PR for either of these solutions if a maintainer comments which is preferable.

@rgommers
Copy link
Member

Hmm. Most distance functions (including cosine) have

    u = _validate_vector(u)
    v = _validate_vector(v)

Which converts to float arrays for array-like (list etc) inputs. However it preserves dtypes of arrays that are passed in.

sqeuclidean on the other hand has:

    # Preserve float dtypes, but convert everything else to np.float64
    # for stability.
    utype, vtype = None, None
    if not (hasattr(u, "dtype") and np.issubdtype(u.dtype, np.inexact)):
        utype = np.float64
    if not (hasattr(v, "dtype") and np.issubdtype(v.dtype, np.inexact)):
        vtype = np.float64

    u = _validate_vector(u, dtype=utype)
    v = _validate_vector(v, dtype=vtype)

It looks like a similar issue was reported for it and fixed in commit 05378cb. That approach should be extended to all/most functions I think. The right approach is probably:

  • move that bit of code into _validate_vector
  • check all distance functions using _validate_vector to see if any explicitly expect non-float (e.g. bool special case IIRC)
  • all some unit tests for all distance functions similar to the one for sqeuclidean in the commit I linked to above

@rgommers rgommers added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label Mar 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.spatial
Projects
None yet
2 participants