Skip to content

Numpy overflow with discrete data #197

@Matyasch

Description

@Matyasch

Dear maintainers, thank you for the great package.

Package version:
0.1.3.8

Problem desciprtion:
I have encountered the following issue when using the chisq CI test. If the number of nodes and/or the number of discrete classes is sufficiently large, then integer representations of numpy (int64) can overflow, resulting in an error. In particular, np.prod(cardSXY) at line 291 in the _Fill3DCountTable function can overflow, returning the result as negative (related stackoverflow question). Then, in the function _Fill3DCountTableByBincount, the product cardS * cardX * cardY at line 248 also overflows, resulting in a negative number being passed to the minlength argument of np.bincount. This throws the following error: ValueError: 'minlength' must not be negative.

Possible solution:
Python uses "unlimited integers", making them better suited for this scenario. Thus, using prod from the math package instead, at line 291, as prod(cardSXY.tolist()), solves this issue. I did not notice any performance issues yet with this solution.

Let me know what you think,
Mátyás

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions