-
Notifications
You must be signed in to change notification settings - Fork 179
added zero padding for column names in MinHashEnocder #1069
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @Shree7676 , thanks a lot for handling this! I like that we pad only as much as is necessary and it's great that you added a test case. There are always a couple of setup details to take care of in a first Pull Request, but we only need to do it once and then we can merge this one and you'll be able to open many more :) :
addressing these 2 points should take care of the 2 Continuous Integration checks that are complaining, and then we can merge. thanks again! |
|
Hi @Shree7676, thank you for this PR! Out of curiosity, have you found the contributing page? It's ok if you didn't, we want to improve the contributing experience in skrub and are eager to get your feedback :) |
1646553 to
d330ca0
Compare
|
Hi @Vincent-Maladiere , thank you for asking! I did go through the contributing page, and I found it clear and to the point. During EuroSciPy, I learned that many projects use tools like pytest for testing, so I knew to look for it in skrub as well. Similarly, through this PR, I learned that most projects also have specific formatting requirements, like pre-commit, which I now know to check before pushing changes. I think these are general things that beginners, especially those without a computer science background like myself, might not be familiar with right away. If you wanted to make the contributing page more beginner-friendly, it could be helpful to mention tools like pytest and pre-commit as part of the development process. However, if you feel this is already common knowledge for most contributors, it might not be necessary. Overall, I think the page is well-written and informative. Thanks again for your support! |
|
Hi @jeromedockes, |
|
I still have to work on gap encoder but I am thinking to open a separate pr for it. |
|
Thanks!
we don't want to assume that, and for new contributors at any level of experience each project has its own quirks and getting started can be a hassle so improving the contributor guide is always time well spent. thanks for your suggestions!
that's a good idea, I would indeed recommend doing it in a separate one |
Co-authored-by: Jérôme Dockès <jerome@dockes.org>
|
I think I just messed up was trying to resolve the conflict. |
|
for commit number 7dd566f (resolved conflict) |
|
Hey @Shree7676, github can be tricky sometimes! As far as I understand, you want to revert your last 6 commits to start from "Update skrub/tests/test_minhash_encoder.py" ( You can do it with: git reset HEAD~6 # revert to the 6th last commit
git push -f origin add_zero_padding # push and force the changes |
7dd566f to
d2d1680
Compare
|
alright we can merge it now! thanks a lot @Shree7676 !! 🎉 |
|
Thanks @jeromedockes and @Vincent-Maladiere for the guidance |

#1050
Adding zero padding for MinHashEncoder
Column names are now able to get zero padding based on the units length of the column
Added the test case