Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantile encoder #303

Merged
merged 32 commits into from Oct 20, 2021
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
284f378
#302 quantileEncoder and SummaryEncoder
cmougan May 31, 2021
55e00f4
#302 test for QE and SE - passing
cmougan May 31, 2021
591257d
Quantile Encoder and Summary Encoder
cmougan May 31, 2021
21bbb24
Quantile Encoder and Summary Encoder update docs
cmougan May 31, 2021
da6de04
#302 Quantile Encoder and Summary Encoder update docs
cmougan May 31, 2021
56ca905
doc QE
cmougan Jun 17, 2021
e3ea3e7
remove summary encoder
cmougan Jun 17, 2021
40a8a1c
Update quantile_encoder.py
cmougan Oct 10, 2021
b06f108
remove unnecesary imports
cmougan Oct 10, 2021
c72e73f
qe cosmetic issues
cmougan Oct 10, 2021
828e518
m bio
cmougan Oct 10, 2021
4df3bf1
formatting
cmougan Oct 10, 2021
64d1d5c
summary encoder
cmougan Oct 10, 2021
2032815
summary encoder
cmougan Oct 10, 2021
7a6da5b
e
cmougan Oct 10, 2021
ae7478a
change name Summary Encoder
cmougan Oct 12, 2021
a7bc033
cosmetic docs
cmougan Oct 12, 2021
d9ff993
Merge branch 'master' into quantileEncoder
cmougan Oct 12, 2021
70d46e5
test_summary_quantile
cmougan Oct 12, 2021
3d11c8e
Throw error in case of two quantiles with same percentile
david26694 Oct 12, 2021
3d5c91c
Merge pull request #1 from david26694/quantileEncoder
cmougan Oct 12, 2021
bbe1a15
Refactor summary encoder
david26694 Oct 15, 2021
86173c6
Fix failing tests QE
david26694 Oct 15, 2021
60ddb4f
Add default arguments to SE
david26694 Oct 15, 2021
979c774
Parametrise summary encoder
david26694 Oct 15, 2021
29f7c0d
Add summary encoder in all QE tests
david26694 Oct 15, 2021
741a21e
Merge pull request #2 from david26694/quantileEncoder
cmougan Oct 15, 2021
6618f26
fixed tests for summary encoder
Oct 17, 2021
b4b814f
Merge pull request #3 from PaulWestenthanner/quantileEncoder
cmougan Oct 17, 2021
b9bf00f
add future string to support python3.5 for summary encoder test
cmougan Oct 19, 2021
3292dd1
remove fstring from QE tests
cmougan Oct 19, 2021
c88e53e
handling coinciding quantiles for SE
cmougan Oct 20, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Expand Up @@ -34,6 +34,8 @@ __Supervised:__
* M-estimator [7]
* Target Encoding [7]
* Weight of Evidence [8]
* Quantile Encoder [13]
* Summary Encoder [13]

Installation
------------
Expand Down Expand Up @@ -142,4 +144,5 @@ References
10. Simple Count or Frequency Encoding. From https://www.datacamp.com/community/tutorials/encoding-methodologies
11. Transforming categorical features to numerical features. From https://tech.yandex.com/catboost/doc/dg/concepts/algorithm-main-stages_cat-to-numberic-docpage/
12. Andrew Gelman and Jennifer Hill (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. From https://faculty.psau.edu.sa/filedownload/doc-12-pdf-a1997d0d31f84d13c1cdc44ac39a8f2c-original.pdf
13. Carlos Mougan, David Masip, Jordi Nin and Oriol Pujol (2021). Quantile Encoder: Tackling High Cardinality Categorical Features in Regression Problems. From https://arxiv.org/abs/2105.13783

41 changes: 22 additions & 19 deletions category_encoders/__init__.py
Expand Up @@ -23,27 +23,30 @@
from category_encoders.james_stein import JamesSteinEncoder
from category_encoders.cat_boost import CatBoostEncoder
from category_encoders.glmm import GLMMEncoder
from category_encoders.quantile_encoder import QuantileEncoder, SummaryEncoder

__version__ = '2.2.2'
__version__ = "2.2.2"

__author__ = 'willmcginnis'
__author__ = "willmcginnis", "cmougan"

__all__ = [
'BackwardDifferenceEncoder',
'BinaryEncoder',
'CountEncoder',
'HashingEncoder',
'HelmertEncoder',
'OneHotEncoder',
'OrdinalEncoder',
'SumEncoder',
'PolynomialEncoder',
'BaseNEncoder',
'LeaveOneOutEncoder',
'TargetEncoder',
'WOEEncoder',
'MEstimateEncoder',
'JamesSteinEncoder',
'CatBoostEncoder',
'GLMMEncoder'
"BackwardDifferenceEncoder",
"BinaryEncoder",
"CountEncoder",
"HashingEncoder",
"HelmertEncoder",
"OneHotEncoder",
"OrdinalEncoder",
"SumEncoder",
"PolynomialEncoder",
"BaseNEncoder",
"LeaveOneOutEncoder",
"TargetEncoder",
"WOEEncoder",
"MEstimateEncoder",
"JamesSteinEncoder",
"CatBoostEncoder",
"GLMMEncoder",
"QuantileEncoder",
"SummaryEncoder",
]