[MRG] User Guide #295

glemaitre · 2017-06-15T21:36:57Z

Reference Issue

closes #253

What does this implement/fix? Explain your changes.

Any other comments?

pep8speaks · 2017-06-15T21:37:00Z

Hello @glemaitre! Thanks for updating the PR.

In the file doc/conf.py, following are the PEP8 issues :

Line 30:1: E722 do not use bare except'
Line 41:1: E402 module level import not at top of file
Line 311:80: E501 line too long (86 > 79 characters)
Line 338:80: E501 line too long (84 > 79 characters)

Comment last updated on August 11, 2017 at 23:05 Hours UTC

codecov · 2017-06-15T21:45:24Z

Codecov Report

Merging #295 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #295   +/-   ##
=======================================
  Coverage   98.33%   98.33%           
=======================================
  Files          66       66           
  Lines        3848     3848           
=======================================
  Hits         3784     3784           
  Misses         64       64

Impacted Files	Coverage Δ
imblearn/ensemble/easy_ensemble.py	`100% <ø> (ø)`	⬆️
...g/prototype_selection/edited_nearest_neighbours.py	`100% <ø> (ø)`	⬆️
.../under_sampling/prototype_selection/tomek_links.py	`100% <ø> (ø)`	⬆️
imblearn/pipeline.py	`97.8% <ø> (ø)`	⬆️
...mpling/prototype_selection/random_under_sampler.py	`100% <ø> (ø)`	⬆️
imblearn/metrics/classification.py	`96.77% <ø> (ø)`	⬆️
imblearn/ensemble/balance_cascade.py	`100% <ø> (ø)`	⬆️
imblearn/over_sampling/random_over_sampler.py	`100% <ø> (ø)`	⬆️
...sampling/prototype_generation/cluster_centroids.py	`100% <ø> (ø)`	⬆️
imblearn/combine/smote_tomek.py	`100% <ø> (ø)`	⬆️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e9c2756...98e920e. Read the comment docs.

glemaitre · 2017-08-04T22:28:10Z

@chkoar @massich Would you have time to split the task because it is huge.
I would think that ensemble and combination methods are not so hard to explain.

For the moment it is how it looks:
https://699-36019880-gh.circle-artifacts.com/0/home/ubuntu/imbalanced-learn/doc/_build/html/user_guide.html

What need to be carefully done is to design an example if we need some images such that we automatically generate them. Then we need cross-referencing to the API doc to the User Guide and ase well from the API doc to the example.

Let me know if you would have some time to put on that.

glemaitre · 2017-08-09T11:04:38Z

@pep8speaks

glemaitre · 2017-08-09T11:17:16Z

@massich @chkoar @mrastgoo Can you review it such that we get down with it.

chkoar · 2017-08-09T11:25:24Z

Could you provide us the link for the artifacts?

glemaitre · 2017-08-09T11:33:40Z

https://715-36019880-gh.circle-artifacts.com/0/home/ubuntu/imbalanced-learn/doc/_build/html/index.html

mrastgoo · 2017-08-09T11:49:26Z

doc/combine.rst

+
+We previously presented :class:`SMOTE` and showed that this method can generate
+noisy samples by interpolating new points between marginal outliers and
+inliers. This issue can be solved by cleaning the resulting space obtained


cleaning the resulted space obtained from/ after over-sampling.

chkoar · 2017-08-09T12:21:53Z

In general this work is quite good. We can merge in order to enhance the documentation and correct afterwards. Some notes:

I think that all figures could be smaller.
I would place a title on all images (probably under the plots). For instance what we see here without reading the text?

Thanks @glemaitre

chkoar · 2017-08-09T12:14:05Z

doc/over_sampling.rst

+------------------
+
+While the :class:`RandomOverSampler` is over-sampling by repeating some of the
+original samples, :class:`SMOTE` and :class:`ADASYN` generate new samples in by


I would say:

by duplicating some of the original samples of the minority class

massich

I would move the 1.1 and do 0. where we state the problem.

If I land in under-sampling, the example of what I'm trying to understand is under the title of oversampling, so I wont be able to find it. However if in the index there's a 0 - The problem of imbalance. It's more likely that I realize that I might want to read that first.

I didn't fin this (It is also possible to bootstrap the data when resampling by setting replacement to True.) but I would change it by: ...resampling using bootstrap by setting ... and I would actually not use it also but RandomUnderSampler allows

massich · 2017-08-09T13:05:46Z

doc/over_sampling.rst

+generated considering its k neareast-neighbors (corresponding to
+``k_neighbors``). For instance, the 3 nearest-neighbors are included in the
+blue circle as illustrated in the figure below. Then, one of these
+nearest-neighbors :math:`x_{zi}` will be selected and a sample will be


using will on the previous sentence is fine. 'cos is something that would appear as a consequence. But here confuses me and I had to read twice (acutally more :P)
is selected and the new sample is generated as follows:

massich · 2017-08-09T13:20:22Z

examples/under-sampling/plot_comparison_under_sampling.py

+
+###############################################################################
+# The algorithm performing prototype selection can be subdivided into two
+# groups: (i) the controlled unde-sampling methods and (ii) the cleaning


under-sampling (misses an r)

mrastgoo · 2017-08-09T11:53:56Z

doc/combine.rst

+.. currentmodule:: imblearn.combine
+
+In this regard, Tomek's link and edited nearest-neighbours are the two cleaning
+methods which have been pipeline after SMOTE over-sampling to obtain a cleaner


which have can be added to the pipeline

mrastgoo · 2017-08-09T11:57:20Z

doc/combine.rst

+pipeline both over- and under-sampling methods: (i) :class:`SMOTETomek`
+and (ii) :class:`SMOTEENN`.
+
+Those two classes can be used as any other sampler with identical parameters


mrastgoo · 2017-08-09T12:00:22Z

doc/combine.rst

+  >>> print(Counter(y_resampled))
+  Counter({1: 4566, 0: 4499, 2: 4413})
+
+We can also see in the example below that :class:`SMOTEENN` tend to clean more


mrastgoo · 2017-08-09T12:01:48Z

doc/datasets/index.rst

+
+.. currentmodule:: imblearn.datasets
+
+The ``imblearn.datasets`` package is complementing the the


one the extra

mrastgoo · 2017-08-09T12:02:15Z

doc/datasets/index.rst

+.. currentmodule:: imblearn.datasets
+
+The ``imblearn.datasets`` package is complementing the the
+``sklearn.datasets`` package. The package provide both: (i) a set of


mrastgoo · 2017-08-09T13:10:13Z

doc/under_sampling.rst

+Controlled under-sampling techniques
+------------------------------------
+
+:class:`RandomUnderSampler` is a fast and easy to balance the data by randomly


easy way to balance

mrastgoo · 2017-08-09T13:13:18Z

doc/under_sampling.rst

+
+As later stated in the next section, :class:`NearMiss` heuristic rules are
+based on nearest neighbors algorithm. Therefore, the parameters ``n_neighbors``
+and ``n_neighbors_ver3`` accepts classifier derived from ``KNeighborsMixin``


mrastgoo · 2017-08-09T13:16:18Z

doc/under_sampling.rst

+will be selected. NearMiss-2 will not have this effect since it does not focus
+on the nearest samples but rather on the farthest samples. We can imagine that
+the presence of noise can also altered the sampling mainly in the presence of
+marginal outliers. NearMiss-3 is probably the version which will be the less


which will be less affected

mrastgoo · 2017-08-09T13:24:23Z

doc/under_sampling.rst

+In the contrary, :class:`OneSidedSelection` will use :class:`TomekLinks` to
+remove noisy samples. In addition, more samples will be kept since it will not
+iterate over the samples of the majority class but all samples which do not
+agree with the 1 nearest neighbor rule will be added at once. The class can be


Two sentence maybe, will be added at once at the end of sentence doesnt make sense

mrastgoo · 2017-08-09T13:25:46Z

doc/under_sampling.rst

+This class has 2 important parameters. ``estimator`` will accept any
+scikit-learn classifier which has a method ``predict_proba``. The classifier
+training is performed using a cross-validation and the parameter ``cv`` can set
+the number of fold to use.


glemaitre · 2017-08-09T14:51:46Z

resampling using bootstrap by setting

bootstrap is a verb which mean resampling with replacement. So I would be inclined to use:

RandomUnderSampling allows to boostrap the data by setting ....

glemaitre · 2017-08-09T22:22:19Z

@chkoar @massich I added the backreferencing of sphinx-gallery. I think this is good for merging and nitpicking can come in another PR.

I let you make the merging if you agree.

glemaitre · 2017-08-12T11:54:22Z

Finally we got a user guide :D

Guillaume Lemaitre and others added 9 commits June 15, 2017 23:30

DOC Modifying the index

fc4e5eb

DOC Update the index

cec4f59

DOC add datasets module

8c2d559

DOC add barebone

11cc6e3

itr

e9f7f55

IMG add image linear SVC

9148e96

IMG RUS

afdfe07

DOC add practical guide over-sampling

a7c8526

DOC finish over-sampling

1377af9

add missing image in over-sampling

9516928

glemaitre added 2 commits June 16, 2017 00:23

DOC change color figure

d83021e

FIX ADASYN generate from minority class only

31a10b0

glemaitre mentioned this pull request Jul 9, 2017

[RFC] best practices pydicom/pydicom#383

Closed

glemaitre added 12 commits August 3, 2017 17:52

Merge remote-tracking branch 'origin/master' into is/253

0a3d5e5

iter

70f5e93

EXA add couples of examples

e81dfba

EXA add plots

fdde270

DOC fix whats new and install

abf9707

DOC linked to examples

f38a7f8

DOC linked to examples

ebb42b3

Advance the dataset

492e672

CI upgrade sphinx and sphinx-gallery

5f44cd4

CI remove warning numpydoc

10ac077

EXA update example

dbd7909

DOC cross-referencing

6d57ef1

glemaitre added 2 commits August 5, 2017 00:28

DOC fix cross-referencing of 2 examples

bf7dc05

DOC add warning regarding cleaning algorithm in ratio docstring

c7e2686

DOC add small description data sets

0488e6e

glemaitre changed the title ~~[WIP] User Guide~~ [MRG] User Guide Aug 9, 2017

mrastgoo reviewed Aug 9, 2017

View reviewed changes

glemaitre force-pushed the master branch from 9395cbe to 333d81b Compare August 9, 2017 11:53

chkoar reviewed Aug 9, 2017

View reviewed changes

Merge remote-tracking branch 'origin/master' into is/253

10c17a8

massich reviewed Aug 9, 2017

View reviewed changes

mrastgoo reviewed Aug 9, 2017

View reviewed changes

glemaitre added 5 commits August 9, 2017 17:09

DOC fix spelling

881f6f2

iter

a75ad3d

iter

8eaccc7

DOC add backreference examples

123ddb1

DOC add next previous buttons back

25163b0

DOC update the install from master using pip

38b8cd9

glemaitre force-pushed the master branch 2 times, most recently from 1b22868 to 33660d4 Compare August 11, 2017 14:43

glemaitre added 5 commits August 12, 2017 00:24

Merge remote-tracking branch 'origin/master' into is/253

a008f03

Merge remote-tracking branch 'origin/master' into is/253

5114a98

EXA improve make_imbalance example

7348c4c

Merge branch 'master' into is/253

a07390f

Merge branch 'master' into is/253

98e920e

chkoar merged commit ca5452c into scikit-learn-contrib:master Aug 12, 2017


		.. currentmodule:: imblearn.datasets

		The ``imblearn.datasets`` package is complementing the the

[MRG] User Guide #295

[MRG] User Guide #295

Uh oh!

Conversation

glemaitre commented Jun 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

pep8speaks commented Jun 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on August 11, 2017 at 23:05 Hours UTC

Uh oh!

codecov bot commented Jun 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

glemaitre commented Aug 4, 2017

Uh oh!

glemaitre commented Aug 9, 2017

Uh oh!

glemaitre commented Aug 9, 2017

Uh oh!

chkoar commented Aug 9, 2017

Uh oh!

glemaitre commented Aug 9, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chkoar commented Aug 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

massich left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Aug 9, 2017

Uh oh!

glemaitre commented Aug 9, 2017

Uh oh!

glemaitre commented Aug 12, 2017

Uh oh!

Uh oh!

glemaitre commented Jun 15, 2017 •

edited

Loading

pep8speaks commented Jun 15, 2017 •

edited

Loading

codecov bot commented Jun 15, 2017 •

edited

Loading

chkoar commented Aug 9, 2017 •

edited

Loading