Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index out of bounds error when a col has all different value #8

Closed
fndjjx opened this issue Jan 12, 2017 · 6 comments
Closed

Index out of bounds error when a col has all different value #8

fndjjx opened this issue Jan 12, 2017 · 6 comments

Comments

@fndjjx
Copy link
Contributor

fndjjx commented Jan 12, 2017

Hi
I find a issue in datacleaner. When I use this tool to deal with my dataset, it generates a index out of bounds error. I check the code and I find this row in function autoclean:

input_dataframe[column].fillna(input_dataframe[column].mode()[0], inplace=True)

when a col has no same value, the mode will return empty, so the index will out of bound.
I think this is the reason, could you confirm it. Thank you!

@rhiever
Copy link
Owner

rhiever commented Jan 12, 2017

Can you please share a minimal example that reproduces this issue?

@fndjjx
Copy link
Contributor Author

fndjjx commented Jan 13, 2017

Add data.csv like this for example:
a,b,c,d
b,b,b,b
c,c,c,c
a,a,a,a

and test.py like this:

from datacleaner import autoclean
import pandas as pd

raw_data = pd.read_csv("data.csv")
clean_data = autoclean(raw_data)
clean_data.to_csv("new_data.csv", sep=',', index=False)

and execute it and get the error like this:

ly@ly-VirtualBox:/tmp$ python test.py 
Traceback (most recent call last):
  File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/core/nanops.py", line 100, in f
    result = alt(values, axis=axis, skipna=skipna, **kwds)
  File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/core/nanops.py", line 319, in nanmedian
    values = values.astype('f8')
ValueError: could not convert string to float: 'a'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/core/nanops.py", line 103, in f
    result = alt(values, axis=axis, skipna=skipna, **kwds)
  File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/core/nanops.py", line 319, in nanmedian
    values = values.astype('f8')
ValueError: could not convert string to float: 'a'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ly/anaconda3/lib/python3.5/site-packages/datacleaner/datacleaner.py", line 77, in autoclean
    input_dataframe[column].fillna(input_dataframe[column].median(), inplace=True)
  File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 5310, in stat_func
    numeric_only=numeric_only)
  File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/core/series.py", line 2245, in _reduce
    return op(delegate, skipna=skipna, **kwds)
  File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/core/nanops.py", line 44, in _f
    return f(*args, **kwargs)
  File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/core/nanops.py", line 111, in f
    raise TypeError(e)
TypeError: could not convert string to float: 'a'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 1980, in get_value
    tz=getattr(series.dtype, 'tz', None))
  File "pandas/index.pyx", line 103, in pandas.index.IndexEngine.get_value (pandas/index.c:3332)
  File "pandas/index.pyx", line 111, in pandas.index.IndexEngine.get_value (pandas/index.c:3035)
  File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)
  File "pandas/hashtable.pyx", line 303, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6610)
  File "pandas/hashtable.pyx", line 309, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6554)
KeyError: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 6, in <module>
    clean_data = autoclean(raw_data)
  File "/home/ly/anaconda3/lib/python3.5/site-packages/datacleaner/datacleaner.py", line 85, in autoclean
    input_dataframe[column].fillna(input_dataframe[column].mode()[0], inplace=True)
  File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/core/series.py", line 583, in __getitem__
    result = self.index.get_value(self, key)
  File "/home/ly/anaconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 1986, in get_value
    return tslib.get_value_box(s, key)
  File "pandas/tslib.pyx", line 777, in pandas.tslib.get_value_box (pandas/tslib.c:17017)
  File "pandas/tslib.pyx", line 793, in pandas.tslib.get_value_box (pandas/tslib.c:16774)
IndexError: index out of bounds

@rhiever
Copy link
Owner

rhiever commented Jan 13, 2017

That does indeed seem like a bug, albeit a strange one! Can you send a PR with a patch to fix it?

@rhiever
Copy link
Owner

rhiever commented Jan 18, 2017

Merged the PR - thanks for your help!

@rhiever rhiever closed this as completed Jan 18, 2017
@rhiever
Copy link
Owner

rhiever commented Jan 18, 2017

datacleaner v0.1.5 has your changes.

@yw2817
Copy link

yw2817 commented Nov 6, 2020

Hi, just want to check in, does this issue solved? I have an exact bug as yours, how did you address it in the end? Many thanks! @fndjjx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants