Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add noise parameter to LabelEncoder #500

Closed
amontanez24 opened this issue Apr 25, 2022 · 0 comments · Fixed by #503
Closed

Add noise parameter to LabelEncoder #500

amontanez24 opened this issue Apr 25, 2022 · 0 comments · Fixed by #503
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@amontanez24
Copy link
Contributor

Problem Description

Most distributions used by GaussianCopula and CopulaGAN fit better to continuous variables. In order to optimize the fitting process we can add noise to the output of the LabelEncoder to makes the transformed categorical variables continuous.

Expected behavior

  • Add a new boolean parameter, add_noise, to __init__ for LabelEncoder.
    • If value is false, no changes should be made
    • If value is true, then
      • On the forward transform: Perform the label encoding as usual, and then add uniform noise within the interval. For example:
        1. Label=1 is noised to anything in the interval [1, 2) → 1.002, 1.3, 1.9999, ..
        2. Label=2 is noised to anything in the interval [2, 3)
        3. Label=3 is noised to anything in the interval [3, 4)
      • On the reverse transform floor the values to the nearest integer (eg. 3.9 becomes 3, 4.321 becomes 4) and then continue the normal reverse transformation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants