Skip to content

Conversation

@Shree7676
Copy link
Contributor

#1050
Adding zero padding for MinHashEncoder
Column names are now able to get zero padding based on the units length of the column
Added the test case

@jeromedockes
Copy link
Member

Hi @Shree7676 , thanks a lot for handling this! I like that we pad only as much as is necessary and it's great that you added a test case.

There are always a couple of setup details to take care of in a first Pull Request, but we only need to do it once and then we can merge this one and you'll be able to open many more :) :

  • we use pre-commit to handle some linting and code formatting. By using it you can make sure the code on your local copy will get automatically formatted in the same way as the main branch, and this avoids spurious lines of diff that are only due to formatting changes (as those we see at the beginning of the minhash module). you can install it with pip install pre-commit and then activate the hooks that run when you create a git commit by running pre-commit install inside the skrub directory (you only need to do this once)
  • when making changes that are visible to users, we usually add an entry in CHANGES.rst (in this case it would be a "minor" change). it is a brief statement about the addition and a link to the corresponding pull request; you can follow the format of the other entries in there.

addressing these 2 points should take care of the 2 Continuous Integration checks that are complaining, and then we can merge. thanks again!

@Vincent-Maladiere
Copy link
Member

Hi @Shree7676, thank you for this PR! Out of curiosity, have you found the contributing page? It's ok if you didn't, we want to improve the contributing experience in skrub and are eager to get your feedback :)

@Shree7676
Copy link
Contributor Author

Hi @Vincent-Maladiere , thank you for asking! I did go through the contributing page, and I found it clear and to the point. During EuroSciPy, I learned that many projects use tools like pytest for testing, so I knew to look for it in skrub as well. Similarly, through this PR, I learned that most projects also have specific formatting requirements, like pre-commit, which I now know to check before pushing changes.

I think these are general things that beginners, especially those without a computer science background like myself, might not be familiar with right away. If you wanted to make the contributing page more beginner-friendly, it could be helpful to mention tools like pytest and pre-commit as part of the development process. However, if you feel this is already common knowledge for most contributors, it might not be necessary.

Overall, I think the page is well-written and informative. Thanks again for your support!

@Shree7676
Copy link
Contributor Author

Hi @jeromedockes,
Thanks for the guidance :)

@Shree7676
Copy link
Contributor Author

I still have to work on gap encoder but I am thinking to open a separate pr for it.

@jeromedockes
Copy link
Member

Thanks!

you feel this is already common knowledge for most contributors

we don't want to assume that, and for new contributors at any level of experience each project has its own quirks and getting started can be a hassle so improving the contributor guide is always time well spent. thanks for your suggestions!

I still have to work on gap encoder but I am thinking to open a separate pr for it.

that's a good idea, I would indeed recommend doing it in a separate one

Co-authored-by: Jérôme Dockès <jerome@dockes.org>
@Shree7676
Copy link
Contributor Author

I think I just messed up was trying to resolve the conflict.
But as I can see some additional unwanted files have changed
Need Help!!

@Shree7676
Copy link
Contributor Author

Shree7676 commented Sep 12, 2024

for commit number 7dd566f (resolved conflict)
I just added changes.rst file and test_minhash_encoder.py file and then commited as you can see in below screenshot
but still after git push why do other files are updated here
image

@Vincent-Maladiere
Copy link
Member

Hey @Shree7676, github can be tricky sometimes! As far as I understand, you want to revert your last 6 commits to start from "Update skrub/tests/test_minhash_encoder.py" (d2d16808278b1daffb7bb036cf9068e03931ce49).

You can do it with:

git reset HEAD~6  # revert to the 6th last commit
git push -f origin add_zero_padding  # push and force the changes

@jeromedockes
Copy link
Member

alright we can merge it now! thanks a lot @Shree7676 !! 🎉

@jeromedockes jeromedockes enabled auto-merge (squash) September 13, 2024 12:16
@jeromedockes jeromedockes merged commit 338f235 into skrub-data:main Sep 13, 2024
@Shree7676
Copy link
Contributor Author

Shree7676 commented Sep 13, 2024

Thanks @jeromedockes and @Vincent-Maladiere for the guidance

jeromedockes pushed a commit to jeromedockes/skrub that referenced this pull request Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants