Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benfordslaw.fit doesn't work on pandas Series with Int64Dtype (nullable) #8

Closed
ThomasOfferman opened this issue May 24, 2022 · 1 comment

Comments

@ThomasOfferman
Copy link

ThomasOfferman commented May 24, 2022

Hi Erdogan,

I was recently working with a pandas DataFrame that had a column with a Int64Dtype, which is nullable. The column didn't actually have any null-values. This gave me the following error:

  File "/usr/local/lib/python3.8/dist-packages/benfordslaw/benfordslaw.py", line 293, in _count_digit
    digits[Iloc] = list(map(lambda x: int(str(x)[d]), data[Iloc]))
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid `indices`

I looked into it and it's because the nullable int also produces a nullable boolean series. So the variable Iloc was actually a nullable boolean, which I guess isn't supported by numpy. See below for a small reproducable example.

import pandas 
from benfordslaw import benfordslaw

bl = benfordslaw(alpha=0.05)

data = pandas.DataFrame({'value': [1,2,3,4,5]})
bl.fit(data['value'].astype(int)) # this works fine 
bl.fit(data['value'].astype(pandas.Int64Dtype())) #this throws an error

I feel like something like this would solve it (not tested):

# Get the ith digit
digits = np.zeros_like(data)
Iloc = data>=np.power(10, d)
# ignore nulls and cast to non-nullable dtype just in case
Iloc = Iloc.fillna(False).astype(bool)
digits[Iloc] = list(map(lambda x: int(str(x)[d]), data[Iloc]))

I wouldn't mind making a pull request with some test cases. But I'll leave it up to you, I can also imagine this is not a high priority since I think the nullable IntDtype is still pretty experimental.

Kind regards,
Thomas

@erdogant
Copy link
Owner

erdogant commented Jun 5, 2022

Thank you! I implemented your solution.
update with: pip install -U benfordslaw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants