Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(steps): implement DiscretizeKBins transform #57

Merged
merged 5 commits into from
Apr 24, 2024
Merged

feat(steps): implement DiscretizeKBins transform #57

merged 5 commits into from
Apr 24, 2024

Conversation

jitingxu1
Copy link
Collaborator

Added KBinsDiscretizer
It supports:

  • Uniform strategy: Divides data into bins of equal width based on the minimum and maximum values of each column.
  • Quantile strategy: Divides data into bins with an equal number of samples in each bin.

Examples:

import ibis
import ibisml as ml
from ibisml.core import Metadata
ibis.options.interactive = True

p = ibis.examples.penguins.fetch()
step = ml.KBinsDiscretizer(
    ml.numeric(),
    strategy="quantile",
    overwrite=True,
    )
step.fit_table(p, Metadata())
step.transform_table(p)

Outputs:

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━┩
│ string  │ string    │ int8           │ int8          │ int8              │ int8        │ string │ int8 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼──────┤
│ Adelie  │ Torgersen │              1 │             3 │                 0 │           1 │ male   │    0 │
│ Adelie  │ Torgersen │              1 │             2 │                 0 │           1 │ female │    0 │
│ Adelie  │ Torgersen │              1 │             3 │                 2 │           0 │ female │    0 │
│ Adelie  │ Torgersen │           NULL │          NULL │              NULL │        NULL │ NULL   │    0 │
│ Adelie  │ Torgersen │              0 │             4 │                 1 │           0 │ female │    0 │
│ Adelie  │ Torgersen │              1 │             4 │                 1 │           1 │ male   │    0 │
│ Adelie  │ Torgersen │              1 │             2 │                 0 │           1 │ female │    0 │
│ Adelie  │ Torgersen │              1 │             4 │                 2 │           3 │ male   │    0 │
│ Adelie  │ Torgersen │              0 │             3 │                 1 │           0 │ NULL   │    0 │
│ Adelie  │ Torgersen │              1 │             4 │                 1 │           2 │ NULL   │    0 │
│ …       │ …         │              … │             … │                 … │           … │ …      │    … │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴──────

Resolves #44

@jitingxu1 jitingxu1 requested a review from deepyaman April 9, 2024 22:21
@jitingxu1 jitingxu1 mentioned this pull request Apr 11, 2024
1 task
ibisml/steps/discretization.py Outdated Show resolved Hide resolved
ibisml/steps/discretization.py Outdated Show resolved Hide resolved
ibisml/steps/discretization.py Outdated Show resolved Hide resolved
Comment on lines 96 to 100
else:
warnings.warn(
f"No column selected for discretization - {self.inputs!r}.",
stacklevel=2,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else:
warnings.warn(
f"No column selected for discretization - {self.inputs!r}.",
stacklevel=2,
)

I don't know that this is necessary, and I don't see any other step with it. The existing behavior is to just skip doing anything if there's no columns selected.

I'd be open to considering a warning across the board, but I think it should be done in a separate issue then.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create an separate issue for this.

ibisml/steps/discretization.py Outdated Show resolved Hide resolved
tests/test_discretization.py Outdated Show resolved Hide resolved
tests/test_discretization.py Outdated Show resolved Hide resolved
@deepyaman deepyaman changed the title feat(steps): KBinsDiscretizer feat(steps): implement DiscretizeKBins transform Apr 24, 2024
@deepyaman deepyaman dismissed their stale review April 24, 2024 21:08

Changes addressed, and not re-reviewing right now to unblock testing

@deepyaman deepyaman merged commit 54d5d6f into ibis-project:main Apr 24, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: K-bins discretization
2 participants