Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting "ValueError: setting an array element with a sequence" for user-defined splits on categorical variables #229

Closed
hermanashley opened this issue Jan 31, 2023 · 5 comments
Assignees
Labels
question Further information is requested
Milestone

Comments

@hermanashley
Copy link

hermanashley commented Jan 31, 2023

Hi there,

I'm using manual bins that were previously working and can't seem to figure out what I'm doing wrong. The issue only exists for categorical variables. If I define features/bins like so:

features = [ 'LOCATION']
 
 binning_parameters =  { 
        'LOCATION':{
                "dtype": 'categorical',
                "user_splits":[
                        ['Canada', 'USA', 'Mexico'],
                        ['Taiwan', 'China', 'India', 'Thailand', 'Malaysia']
                ],
                "user_splits_fixed": [False, False],
                "special_codes":['missing'],
               "min_prebin_size":10e-6,
    },
}

And I'm creating my binning process object like so:

categorical_variables = data[features].select_dtypes(
            include=["object", "category", "string"]
          ).columns.values

        return BinningProcess(
            categorical_variables=categorical_variables,
            variable_names=features,
            min_prebin_size=10e-6,
        )

I get the error when trying to fit the data (I've tried with both fit_transform() and fit():

binned_data = binning_process.fit_transform(
            X=data[features],
            y=data[target].tolist(),
            metric=metric
        )  

The full error is as follows:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

I can't seem to figure out what I'm missing here. I'm using Optbinning 0.15.1 and python 3.8.

Thanks!

@hermanashley
Copy link
Author

update: I was also using numpy version 1.24.1 - the binning parameters are passed in a format that was not supported by numpy. Downgrading to 1.23.1 fixes the issue for now.

@guillermo-navas-palencia
Copy link
Owner

Thanks, @hermanashley for reporting this issue. I will look into this.

@lassilehtonen
Copy link
Contributor

lassilehtonen commented Sep 5, 2023

Hi, I submitted a pull request #261 which should fix this issue. The function np.asarray() does not accept input like

'user_splits': [
  ['Canada', 'USA', 'Mexico'],
  ['Taiwan', 'China', 'India', 'Thailand', 'Malaysia']
 ]

where the bins have different lengths (3 and 5 in this case). Converting the list to pd.Series should work.

@guillermo-navas-palencia
Copy link
Owner

Hi. I think the solution is as simple as:

user_splits = np.array([
  ['Canada', 'USA', 'Mexico'],
  ['Taiwan', 'China', 'India', 'Thailand', 'Malaysia']
], dtype=object)

as shown in the tutorial:
image
Tested with numpy 1.24.1

@lassilehtonen
Copy link
Contributor

That is true, I have changed the line in the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants