Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Basic version of MICE Imputation #8478

Merged
merged 170 commits into from Apr 16, 2018
Merged
Changes from 1 commit
Commits
Show all changes
170 commits
Select commit Hold shift + click to select a range
931f938
initial commit
sergeyf Feb 27, 2017
87da5df
init bug fix
sergeyf Feb 27, 2017
e576400
fixing pep8 errors
sergeyf Feb 27, 2017
164ba16
more pep8 fixes
sergeyf Feb 27, 2017
9f0b3c9
reseting diabetes as default in examples
sergeyf Feb 27, 2017
3747ebb
fixing build failures
sergeyf Feb 27, 2017
70d0309
fixing error for _statistics in Imputer
sergeyf Feb 27, 2017
b7456f6
fixing failed test by skipping MICEImputer
sergeyf Feb 28, 2017
a6892f9
fixing circular import issue. Questionable style?
sergeyf Feb 28, 2017
bae44ef
one flake left
sergeyf Feb 28, 2017
5416534
initial commit
sergeyf Feb 27, 2017
e7ab5ce
init bug fix
sergeyf Feb 27, 2017
2c64972
fixing pep8 errors
sergeyf Feb 27, 2017
dfb3c77
more pep8 fixes
sergeyf Feb 27, 2017
f5ccaee
fixing build failures
sergeyf Feb 27, 2017
020d121
fixing error for _statistics in Imputer
sergeyf Feb 27, 2017
c296da2
fixing failed test by skipping MICEImputer
sergeyf Feb 28, 2017
ac61666
fixing circular import issue. Questionable style?
sergeyf Feb 28, 2017
84a3552
one flake left
sergeyf Feb 28, 2017
d1bac87
removing errant commit?
Feb 28, 2017
db25789
fixing some requests
sergeyf Feb 28, 2017
ef9f19a
Fixing the single flake issue
sergeyf Mar 1, 2017
312c83c
addressing many requested changes + dummpy.py additional test
sergeyf Mar 7, 2017
ef76f3f
Merge branch 'mice' of https://github.com/sergeyf/scikit-learn into mice
sergeyf Mar 7, 2017
1bd909c
removing the word 'column' from everywhere plus some feature renaming…
sergeyf Mar 7, 2017
fd02840
minibug
sergeyf Mar 7, 2017
1eca006
minibug
sergeyf Mar 7, 2017
4a69c7d
check that model's predict supports return_std
sergeyf Mar 7, 2017
e6c2d11
fixing some requests
sergeyf Feb 28, 2017
4e64083
Fixing the single flake issue
sergeyf Mar 1, 2017
c529634
addressing many requested changes + dummpy.py additional test
sergeyf Mar 7, 2017
bbe74b7
removing the word 'column' from everywhere plus some feature renaming…
sergeyf Mar 7, 2017
c3ea941
minibug
sergeyf Mar 7, 2017
5838e1f
minibug
sergeyf Mar 7, 2017
e1376f6
check that model's predict supports return_std
sergeyf Mar 7, 2017
a7c5f6b
Merge branch 'mice' of https://github.com/sergeyf/scikit-learn into mice
sergeyf Aug 20, 2017
5f4406a
fixing some requests
sergeyf Feb 28, 2017
e5ce12e
Fixing the single flake issue
sergeyf Mar 1, 2017
0bfceae
addressing many requested changes + dummpy.py additional test
sergeyf Mar 7, 2017
9e1ae43
removing the word 'column' from everywhere plus some feature renaming…
sergeyf Mar 7, 2017
195fbdc
minibug
sergeyf Mar 7, 2017
a082e29
minibug
sergeyf Mar 7, 2017
c9fac3e
check that model's predict supports return_std
sergeyf Mar 7, 2017
f6aded8
fixing some requests
sergeyf Feb 28, 2017
eb93105
Fixing the single flake issue
sergeyf Mar 1, 2017
a9fd034
addressing many requested changes + dummpy.py additional test
sergeyf Mar 7, 2017
3aaa4dd
removing the word 'column' from everywhere plus some feature renaming…
sergeyf Mar 7, 2017
51d1407
check that model's predict supports return_std
sergeyf Mar 7, 2017
bb32770
fixing merge conflict
sergeyf Aug 20, 2017
b47b00e
fixing merge conflict with upstream master
sergeyf Aug 20, 2017
2fd937f
fixing bug
sergeyf Aug 20, 2017
5919cca
adding a few tests
sergeyf Aug 20, 2017
a92ebae
more test coverage
sergeyf Aug 21, 2017
2ac2500
fixing flakes
sergeyf Aug 21, 2017
150ed6d
slight test expansion
sergeyf Aug 21, 2017
bcd1f6e
changing name correct name
sergeyf Aug 22, 2017
b6d7aa0
lots of doc fixes, as per reviewer comments
sergeyf Aug 30, 2017
d6cebda
lots of doc fixes, as per reviewer comments
sergeyf Aug 30, 2017
a5ca5f9
spurious random_state
sergeyf Aug 30, 2017
6b7c513
nearest bugfix
sergeyf Aug 30, 2017
3fe8481
fixing None/nan issue
sergeyf Aug 30, 2017
8c991b1
trying to fix tests again
sergeyf Aug 30, 2017
23a0201
fixing mutability error
sergeyf Aug 30, 2017
e97e618
Merge branch 'master' into mice
sergeyf Sep 22, 2017
2c4d8f9
addressing some reviewer comments
Oct 31, 2017
79146ee
minifix
Oct 31, 2017
16090ea
minifix v2
Oct 31, 2017
55fa426
addressing more comments
Oct 31, 2017
8986020
trying to avoid strange test failure
Oct 31, 2017
961e298
i'm baffled
Oct 31, 2017
6e23dd7
another fix attempt
Oct 31, 2017
50aa24d
printing debug statement
Oct 31, 2017
c15d46a
sigma nan fix
Oct 31, 2017
da365e4
more attempts to avoid odd test failures
Nov 1, 2017
afa698b
should be fixed now
Nov 2, 2017
9d3fed8
renaming _inds to _idx
Nov 2, 2017
4070baf
Updating Y for MICE test
sergeyf Nov 8, 2017
57d128c
addressing some reviewer comments
Nov 9, 2017
4c75546
making estimator=None more explicit
Nov 9, 2017
aec7e70
fixing pep8
Nov 9, 2017
3b5057c
more pep8
Nov 9, 2017
7a08fca
missed comment
Nov 9, 2017
d63da48
fixing bug and flaky test
Nov 9, 2017
89e2cfe
flakes
Nov 9, 2017
3c27090
fixing init errors
Nov 9, 2017
e0912be
nits
Nov 9, 2017
0e0add8
added dummyregressor back in
Nov 9, 2017
a1c9613
fixing edge case for sampling without replacement
Nov 9, 2017
1d78534
flake
Nov 10, 2017
b4089c9
inflight sparse support
Nov 12, 2017
5877b24
fixing merge error
Nov 12, 2017
f7dc80d
Merge remote-tracking branch 'upstream/master' into mice
Nov 12, 2017
28e8aa1
reverting some changes
Nov 12, 2017
b38cbc2
fixing pep8 error
Nov 12, 2017
892ba0c
flakes
Nov 12, 2017
d432bd4
lots of reviewer comments and new tests
Nov 13, 2017
ef901cf
avoiding test failuire?
Nov 13, 2017
a8dcdbf
more robust
Nov 13, 2017
06d149a
adding two more tests
Nov 16, 2017
e4287c2
fixing comments
Nov 16, 2017
a52c199
make tests easier
Nov 16, 2017
659a76c
making test tougher again but with a random seed
Nov 16, 2017
f3c2815
Merge branch 'master' of git://github.com/scikit-learn/scikit-learn
Nov 16, 2017
0c321db
resolving merge conflict
Nov 16, 2017
df1152b
pep8 fixes
Nov 16, 2017
6600655
fixing seed for other test
Nov 17, 2017
677ede4
triggering tests
Nov 17, 2017
e1a20c3
fixing merge message
Nov 23, 2017
8952e6f
fixing merge message
Nov 23, 2017
93d93cf
updating test
Nov 23, 2017
9717276
lots of reviewer comments
Dec 27, 2017
1dc20e5
fixing test errors
Dec 27, 2017
5e1f535
a few minor updates plus a clipping test
Dec 27, 2017
fd9dd12
adding user guide snippet
Dec 27, 2017
cf8bdb3
fixing random state in rst file
Dec 27, 2017
9d9f794
minibug
Dec 27, 2017
3173575
minibug
Dec 27, 2017
bbdff73
addressing latest comments
Jan 15, 2018
0a79f05
Removing redundant tests
sergeyf Jan 16, 2018
c602b27
trying to fix mysterious test failure
Jan 17, 2018
12b2a86
trying to fix mysterious test failure
Jan 17, 2018
be19222
merging
Jan 17, 2018
4ed3668
Merge remote-tracking branch 'upstream/master'
sergeyf Jan 22, 2018
2d13128
merging
sergeyf Jan 22, 2018
135bd86
check_array change to comply with chages #10459
sergeyf Jan 22, 2018
062e777
removing test that no longer applies: test_imputation_mean_median_onl…
sergeyf Jan 22, 2018
f229aa2
Merge branch 'master' into mice
sergeyf Feb 12, 2018
9793ace
Adding imports back in.
sergeyf Feb 12, 2018
8350981
Fixing change to Imputer to get tests to pass
sergeyf Feb 12, 2018
4abdbb0
Fixing flake
sergeyf Feb 12, 2018
8b415b9
Fixing merge mistake
sergeyf Feb 12, 2018
3e2786e
addressing comments
Feb 13, 2018
15aa6e2
a truly silly bug
Feb 13, 2018
d72bd9f
fix bug with allow nan
Feb 13, 2018
4e0d9ac
allow broader dtypes for check_array
Feb 13, 2018
386e311
Merge branch 'master' into mice
sergeyf Feb 14, 2018
e13b62a
Undoing change to Imputer
sergeyf Feb 14, 2018
2cb176c
Undoing `_axis` changes
sergeyf Feb 14, 2018
e2a5428
Removing deprecation test
sergeyf Feb 14, 2018
e876810
Merge branch 'master' into mice
sergeyf Feb 14, 2018
781029b
Flake fix
sergeyf Feb 15, 2018
2c95818
merging upstream
sergeyf Feb 15, 2018
8303987
merging upstream
sergeyf Feb 15, 2018
58a6315
merging
Feb 16, 2018
3cd5fd6
Expanding check_array nan
sergeyf Feb 27, 2018
e6fcd21
Merge branch 'master' into mice
sergeyf Feb 27, 2018
48dbcd6
refactoring into impute.py
sergeyf Feb 28, 2018
288e7c0
flake
sergeyf Feb 28, 2018
63f21ea
addressing review
Mar 2, 2018
4636124
missing fix
Mar 2, 2018
24c8047
fix?
Mar 2, 2018
d0b0d4a
flaaaake!
Mar 2, 2018
f06945b
fixing nits
sergeyf Mar 15, 2018
f4af676
fixing bug + estimator_checks
sergeyf Mar 15, 2018
77feb5a
minibug
sergeyf Mar 15, 2018
63e7b71
Merge branch 'master' into mice
glemaitre Mar 29, 2018
a014e66
Merge branch 'master' into mice
sergeyf Mar 30, 2018
eaa13ff
Fixing spacing test failure in impute.rst
sergeyf Mar 30, 2018
1b9b861
removing axis where necessary
sergeyf Mar 30, 2018
afe536b
flake
sergeyf Mar 30, 2018
ae25023
Remove '# doctest: +ELLIPSIS' from impute.rst
sergeyf Apr 3, 2018
7963763
Deleting caveat in impute.rst
sergeyf Apr 3, 2018
3b68471
Updating impute.rst
sergeyf Apr 11, 2018
e22ff37
Loosen test `test_mice_transform_recovery`
sergeyf Apr 11, 2018
baa437c
merge fix
Apr 11, 2018
8b0f6c7
merge fix
Apr 11, 2018
a851ab3
reverting dummy.py changes
Apr 11, 2018
a3a98f0
variable rename
Apr 11, 2018
dfc33e5
Changing rank in `test_mice_transform_recovery` and hoping for test p…
sergeyf Apr 14, 2018
1c38fd6
Added random_state to all MICE tests
sergeyf Apr 14, 2018
File filter...
Filter file types
Jump to…
Jump to file or symbol
Failed to load files and symbols.
+11 −6
Diff settings

Always

Just for now

more pep8 fixes

  • Loading branch information...
sergeyf committed Feb 27, 2017
commit 164ba16558b888240d2c2e8ca1a22498e9cda2c3
@@ -3,9 +3,10 @@
Imputing missing values before building an estimator
====================================================
This example shows that imputing the missing values can give better results
than discarding the samples containing any missing value.
Imputing does not always improve the predictions, so please check via cross-validation.
This example shows that imputing the missing values can give
better results than discarding the samples containing any missing value.
Imputing does not always improve the predictions,
so please check via cross-validation.
Sometimes dropping rows or using marker values is more effective.
Missing values can be replaced by the mean, the median or the most frequent
@@ -28,8 +29,8 @@
In this case, imputing helps the classifier match the original score.
Note that MICE will not always be better than, e.g., simple mean imputation.
To see an example of this, swap out ``load_diabetes()`` for ``load_boston``.
To see an example of this, swap in ``diabetes`` for ``boston``.
"""
import numpy as np

@@ -41,7 +42,11 @@

rng = np.random.RandomState(0)

dataset = load_diabetes() # load_boston() for another example
dataset_name = 'boston' # 'diabetes' for another examples
if dataset_name == 'boston':
dataset = load_boston()
elif dataset_name == 'diabetes':
dataset = load_diabetes()
X_full, y_full = dataset.data, dataset.target
n_samples = X_full.shape[0]
n_features = X_full.shape[1]
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.