New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantile encoder #303
Quantile encoder #303
Conversation
Hi @cmougan, |
Many thanks for your answer. We will be very happy to contribute to your package. It makes all the sense to have it as another method in category_encoders. So, yes please, let's merge. The only reason to keep it alive in stools it's because the congress proceedings paper points there. Still we will like to merge. Let us know what we can do. |
Hi @cmougan |
Many thanks @PaulWestenthanner for the detailed review. I just pushed the Summary Encoder. If I have not missed anything, everything should be there. |
for quantile in self.quantiles: | ||
for col in self.cols: | ||
percentile = round(quantile * 100) | ||
X[f"{col}_{percentile}"] = X[col] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you just add this in the for loop below (either before or after the col-names.append(,,,)
you don't need the two loops here and don't need to calculate the percentile.
Note that the comment on coinciding quantiles below also applies here.
Thanks for the update. I've added a second round of comments. |
Throw error in case of two quantiles with same percentile
Hi @PaulWestenthanner I believe we just covered your comments. We also did a review of possible things that we might have left out. Do you see anything else that might be worth to improve? Or that we missed out? Many thanks for your suggestions, we really appreciate them :) |
The code looks fine to me now. I'm happy to merge as soon as we get the pipeline to run. The issue at the moment is that we support python 3.5 which does not support f-strings yet. Since f-strings are awesome I'd recommend using the fstring package from future. Check this SO post as reference https://stackoverflow.com/questions/55182209/f-string-invalid-syntax-in-python-3-5 |
Maybe we should drop support for python 3.5 since it is no longer officially supported anyway. But that would be out of scope for this PR |
@PaulWestenthanner thanks again! |
After some discussion and testing with @david26694 we notice that the summary encoder does not pass most of the test.
|
Thanks for adding the tests. This was a good catch! I'd suggest we merge my branch into yours and then into master? |
fixed tests for summary encoder
Hi @PaulWestenthanner @david26694 Waw, impresive coding. Just merged your branch. |
The tests still fail for python 3.5 since f-strings are used. I fixed that in the code but forgot the tests. Would you mind fixing this @cmougan ? |
@PaulWestenthanner and now? |
Sorry I missed one of the tests. I've added a comment on the appropriate position |
@PaulWestenthanner I was trying to debug that very unsuccessful. Many thanks |
Great! Thanks for your effort. LGTM & Merge |
This PR (#302), implements two methods from a recently published paper at a conference (MDAI 2021).
Encoding methods, full technical development can be followed in the paper:
Tests are implemented and passed.
Scikit learn API semantics
Docs is extended
If I missed something or any comments, please let me know :)