Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does stats.binned_statistic_2d convert its values argument from array into a list? #13608

Closed
Warren-Porter opened this issue Feb 24, 2021 · 4 comments · Fixed by #13633
Closed
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats
Milestone

Comments

@Warren-Porter
Copy link

Warren-Porter commented Feb 24, 2021

Consider this toy example:

Reproducing code example:

import numpy as np
import scipy as sp

def ToyFunction(Data): #Computes mean of only observations within one standard deviation of the mean
    assert(type(Data)==np.ndarray) #This fails when called by binned_statistic_2d
    Z = sp.stats.zscore(Data)
    non_outlier_index = np.logical_and(-1<Z, Z<1)
    non_outlier_values = Data[non_outlier_index]
    Quality_Control_Mean = np.mean(non_outlier_values)
    return(Quality_Control_Mean)
    
Xtoy = np.random.uniform(-180, 180, size=1000000)
Ytoy = np.random.uniform(-90, 90, size=1000000)
Ttoy = np.random.uniform(200, 300, size=1000000)
Trivial_Test_Result = ToyFunction(Ttoy) #Works because Ttoy is a NumPy Array
Toy_Result = sp.stats.binned_statistic_2d(Xtoy, Ytoy, Ttoy, statistic=ToyFunction, bins=(range(-180,181,1), range(-90,91,1) ), expand_binnumbers=True) #Fails because Ttoy has been converted to a list

Looking at the latest documentation, I read the following describing how the statistic argument can be a user-defined function:

function : a user-defined function which takes a 1D array of values, and outputs a single numerical statistic. This function will be called on the values in each bin. Empty bins will be represented by function([]), or NaN if this returns an error.

Evidently this isn't the correct behavior. Looking at the source code I see the culprit:

def _create_binned_data(bin_numbers, unique_bin_numbers, values, vv):
    """ Create hashmap of bin ids to values in bins
    key: bin number
    value: list of binned data
    """
    bin_map = dict()
    for i in unique_bin_numbers:
        bin_map[i] = []
    for i in builtins.range(len(bin_numbers)):
        bin_map[bin_numbers[i]].append(values[vv, i])
    return bin_map

Clearly, the elements of bin_map are going to be a python list rather than a numpy array. What exactly is the intended behavior here?

Scipy/Numpy/Python version information:

1.6.0 1.20.0 sys.version_info(major=3, minor=9, micro=1, releaselevel='final', serial=0)

@Warren-Porter Warren-Porter changed the title Why does stats.binned_statistic_2d converts its values argument from array into a list? Why does stats.binned_statistic_2d convert its values argument from array into a list? Feb 24, 2021
@tupui
Copy link
Member

tupui commented Feb 24, 2021

Hi! Thanks for noticing this. Indeed seems like it is a list instead. Would you do a PR to improve the doc?

In the meantime, if you really need an array, you can mitigate this by adding this at the top of your function: Data = np.asarray(Data)

@tupui tupui added Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org scipy.stats labels Feb 24, 2021
@Warren-Porter
Copy link
Author

I don't know how to do a PR?

@tupui
Copy link
Member

tupui commented Feb 24, 2021

I don't know how to do a PR?

You can find some instructions here: https://docs.scipy.org/doc/scipy/reference/hacking.html
You have to create a fork of the project, then create a branch and do the update. Then you can propose a PR. Thanks for willing to contribute 😃

Feel free to drop me an email if you need assistance.

@tupui
Copy link
Member

tupui commented Mar 1, 2021

Thanks @Warren-Porter. FYI, I created a PR to move further with this.

@rgommers rgommers added this to the 1.7.0 milestone Mar 13, 2021
@rgommers rgommers added defect A clear bug or issue that prevents SciPy from being installed or used as expected and removed Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org labels Mar 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants