Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate unit tests #5

Closed
rhiever opened this issue Mar 3, 2016 · 4 comments
Closed

Integrate unit tests #5

rhiever opened this issue Mar 3, 2016 · 4 comments

Comments

@rhiever
Copy link
Owner

rhiever commented Mar 3, 2016

Test both autoclean() and autoclean_cv(), each with 5 test cases:

  1. Simulated data, no NaNs, all columns numerical

  2. Simulated data, with NaNs, all columns numerical

  3. Simulated data, no NaNs, some columns with strings

  4. Simulated data, with NaNs, some columns with strings

  5. Real data (adult.csv.gz) with some NaNs placed into it

@myselfHimanshu
Copy link

I simulated a data with NaNs, all columns numerical.
I have 'Age' as a feature in dataset which is continuous variable, so we should use the median for continuous variables.
But running the script with autoclean() , the missing values are filled with the mode of the column.

@rhiever
Copy link
Owner Author

rhiever commented Mar 5, 2016

Can you please share the data or the code that generated that data?

@myselfHimanshu
Copy link

I have attached the code(Datacleaner.py) and dataset(test.csv).
dataclean.zip

rhiever added a commit that referenced this issue Mar 5, 2016
rhiever added a commit that referenced this issue Mar 6, 2016
Related to #5
rhiever added a commit that referenced this issue Mar 6, 2016
rhiever added a commit that referenced this issue Mar 6, 2016
@rhiever rhiever closed this as completed in 9032894 Mar 6, 2016
@rhiever
Copy link
Owner Author

rhiever commented Mar 6, 2016

@myselfHimanshu, please give the latest version (datacleaner-0.1.4) a try and see if that has corrected your issue. While building the unit tests, I found a bug in the software and have since corrected it in the latest release. You can get the latest version of datacleaner with the following command:

pip install --upgrade datacleaner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants