New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Pandas String Methods in a Single Class #694
[ENH] Pandas String Methods in a Single Class #694
Conversation
@samukweku thanks for the PR! I have to admit my brain is struggling a bit to decide whether this goes in The other piece I'm noticing is that some of the names are inconsistent with the string methods. Would it make sense to simply wrap them verbatim? |
@ericmjl Thanks for the feedback! For the names, the only one is The other thought I had, which I want feedback on, applies to chaining. An example could be:
Does it look pleasing to the eye? the multiple |
Is this meant to handle a subset of |
@hectormz Yes, handle all of them. So, just one class to access all string methods |
Okay got it. My concern is that there are about 53 |
I agree with you, @samukweku, this should be the end goal, being able to chain up all of the string methods.
To make it easier to refer to, here's all of the string methods. I agree with @hectormz and @szuckerman - an automated thing might be better. @samukweku, I think we can approach this task in a slightly smarter way than manually wrapping each function manually. This should probably be done over a few PRs, but the general idea probably would involve the following functions:
From my prior experience, you might end up using the Now, I just wanted to also mention that even though there may be a more automated way of approaching this task, your work thus far in this PR is still valuable, @samukweku, because it prompted the conversation and realization of how much work it would be to wrap all of the methods manually. So thank you for taking the initiative here! |
|
Hi Team. Made some updates based on the suggestions. As always, love to get your feedback and suggestions for further improvement. |
Codecov Report
@@ Coverage Diff @@
## dev #694 +/- ##
=======================================
Coverage 93.16% 93.16%
=======================================
Files 16 16
Lines 600 600
=======================================
Hits 559 559
Misses 41 41 |
Wonderful, wonderful effort, @samukweku! Thank you for handling this. I will be reviewing it this evening, after I finish my work today :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggested edits, @samukweku, for consistency. Otherwise, great work, thanks for doing the wrapping piece!
Let's get another review in before merging.
janitor/functions.py
Outdated
data = [ | ||
func.__name__ | ||
for _, func in inspect.getmembers(pd.Series.str, inspect.isfunction) | ||
if not func.__name__.startswith("_") | ||
] | ||
|
||
if string_function not in data: | ||
raise KeyError(f"{string_function} is not a Pandas string method.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WOW! 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @samukweku ! I echo Eric.
I'll take his docstring requests one small step further by asking for periods at the end of each.
Currently, pandas will raise an exception if an invalid kwarg is provided right? Should we leave that as is, or add our JanitorError
into the mix as well?
I'm assuming that the tests are the same ones you made when you started to implement this manually. Do you think they are sufficient, or are there other tests for other functions that would cover some corner cases? It would be too much to test them all, but just a thought that came to mind.
@ericmjl @HectorM14 Thanks! got some guidance from @VPerrollaz script, so it's a community effort :) I will add more tests, and possibly check Stack Overflow or some blogs for some edge cases. Not familiar with JanitorError, so cant say much on that; I default to Pandas exception handling. As always, more feedback are welcome. |
@samukweku You don't have to add new tests. I was just curious if they were picked for a reason, or if there is an ideal set that covers different issues. Do all the string methods return a series? |
@HectorM14 , yes, all the string methods return a series, same as in Pandas. The tests were picked at random, just testing for those that needed extra parameters and just one test for string methods that required no parameters |
mamba IRL! 🤩 |
If the conda/mamba solving is fixed, I just added some requested changes to tighten up the docstrings and it's all good to go @samukweku ! |
Co-authored-by: Hector <23343812+hectormz@users.noreply.github.com>
Co-authored-by: Hector <23343812+hectormz@users.noreply.github.com>
Co-authored-by: Hector <23343812+hectormz@users.noreply.github.com>
Co-authored-by: Hector <23343812+hectormz@users.noreply.github.com>
Co-authored-by: Hector <23343812+hectormz@users.noreply.github.com>
Co-authored-by: Hector <23343812+hectormz@users.noreply.github.com>
@ericmjl Also, how do I fix the deepsource error? |
@samukweku when I looked at the build log, I saw that only that particular build stalled at the docker container build step in which the conda env solving piece looked slow. As such, I had the following hypotheses:
At first, I thought it was conda stalling. In writing the response to you, I then realized, oh, the first conda step actually worked, so maybe the docker build was the real issue. But I wasn’t sure how to debug what’s going on in a Docker container build on a remote machine, so I thought, let’s just disable it for now, since it’s not 100% crucial for the package. It’s merely helpful for development. That was my thought process! Hopefully that’s informative for you too :). |
As for the current errors, let’s see. For DeepSource: does the deep source config file ( For the Azure builds, looks like one was an HTTP error. That just needs a restart. I think I have added you to the pipelines, @samukweku. Can you see if you can restart the build? (You might need to login to Azure again.) |
Didn't have to do anything except push new commit. Fingers crossed. |
Everything looks good to me. @hectormz would you like to do the honors? |
@ericmjl will do! just fixed one last thing in .rst's. Will wait for the CI to finish and then will merge! |
🎉 |
PR Description
Please describe the changes proposed in the pull request:
This PR resolves #360 .
PR Checklist
Please ensure that you have done the following:
<your_username>
:dev
, but rather from<your_username>
:<feature-branch_name>
.AUTHORS.rst
.CHANGELOG.rst
under the latest version header (i.e. the one that is "on deck") describing the contribution.Quick Check
To do a very quick check that everything is correct, follow these steps below:
make check
from pyjanitor's top-level directory. This will automatically run:Once done, please check off the check-box above.
If
make check
does not work for you, you can execute the commands listed in the Makefile individually.Code Changes
If you are adding code changes, please ensure the following:
$ pytest .
) locally on your machine.Documentation Changes
If you are adding documentation changes, please ensure the following:
Relevant Reviewers
Please tag maintainers to review.