New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] updates to update_where and find_replace functions #673
Conversation
Made the updates to the update_where function to use an expression, similar to |
@zbarry, I will need your guidance on how pyspark and update_where functions are intertwined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samukweku some changes requested. It's mostly good, though I think we should refrain from commenting out tests without discussion first - they're like contracts by our historical selves to our future selves about things we want to ensure are working.
Codecov Report
@@ Coverage Diff @@
## dev #673 +/- ##
==========================================
- Coverage 93.26% 93.15% -0.12%
==========================================
Files 16 16
Lines 609 599 -10
==========================================
- Hits 568 558 -10
Misses 41 41 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional changes requested, @samukweku. Hope you don't mind them. Thanks for taking this PR on! 😄
janitor/functions.py
Outdated
in dataframe, a new column will be created; note that entries that do | ||
not get set in the new column will be null. | ||
in the dataframe, a new column will be created; | ||
note that entries that do not get set | ||
in the new column will be null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation.
janitor/functions.py
Outdated
:raises: IndexError if **conditions** does not have the same length as | ||
**df**. | ||
**df**. | ||
:raises: TypeError if **conditions** is not a pandas-compatible string | ||
query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation.
Ok. I’ll fix it. I just manually shift the indentations to line up. Any suggestions for Extensions on VSCode for this? |
Not that I know of. A simple 4-space indent (by tabbing) is probably simpler than making things line up, but also preserves the visual distinction between each param, while also leaving enough room for longer text, bounded within the 79 char line length for split screen readability, in case it's needed. |
Co-authored-by: Eric Ma <ericmjl@users.noreply.github.com>
@samukweku something's really wonky, I'm seeing lots of indentation changes that shouldn't be happening. Do you know what's going on? Be careful when committing changes too, only commit what you intend to commit. |
It looks like some code formatter other than Black was applied to the codebase. |
Haven't actually worked on any of the pyspark code myself (I don't use Spark), but judging from the code: https://github.com/ericmjl/pyjanitor/blob/dev/janitor/spark/functions.py#L97 , there's nothing that needs to be changed; they are seemingly completely independent functions with different implementations. Also, it actually wasn't my intent to deprecate the old way of using |
@zbarry no worries, I actually think moving to query strings is better long-term for chain-ability. I am happy to be corrected, though! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! This will be really nice to have.
I personally like query strings a lot better, myself, especially for chain-ability, but I'm not sure that that's necessarily a good justification for taking the old functionality away. It's also not a complex code path to maintain going forward, so there's not really any additional burden on us to leave it there. If it was, then I think it'd be a different story. |
Morning everyone. I worked only on the update where section in the functions module. Also working within the dev containers setup, so no other formatter except what comes preinstalled in the containers. My apologies for the inconvenience |
G'morning @samukweku! No problem, there is one convenience function that should get you where you need to - if you close all text editors and then in the terminal use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@samukweku I did one code change, and also took the liberty of directly fixing the indentation. Keep in mind for the future that the convention used in pyjanitor is that for the params section, a text continuation indent uses 4 spaces, and not an alignment to the start of the text.
Don't forget to pull down the changes, otherwise you'll probably get some git errors.
Finally, since you and @zbarry have volunteered to do some infrastructure stuff, you'll need to start signing up for the myriad of services to see what's going on behind-the-scenes. Can you both sign up for CodeCov, Travis and Azure first, and report back whether pyjanitor shows up for you on those services? If not, I'm going to have to dig into the settings to see what happens.
Co-authored-by: Eric Ma <ericmjl@users.noreply.github.com>
Yep. I will let the formatter do its thing with indentation and params. I definitely have to let go of that |
@samukweku I'm going to approve. I'm not sure why the code coverage went down by 0.12%, but the other codecov piece tells me that 100% of diff was hit, so nothing in practice has decreased in code testing coverage. @zbarry and @szuckerman, if either of you have a moment, would you be kind enough to do a review so there's more eyes on the code? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one little formatting thing. Otherwise, looks good to me. Thanks again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @samukweku sorry for the late comments on an otherwise finished PR! Let me know what you think about the requested changes.
Thanks @samukweku ! 👍 |
@zbarry corrections made. Kindly review, if you have a moment. |
Tremendous work, @samukweku, thank you for handling this! Final thing, can we get the changelog updated? Want to make sure every contribution is acknowledged properly! 😄 I won't need to submit a further review after this. And we can also ignore @zbarry 😜 - I jest, I see you've done that docstring change, so it's all good. Once we've merged, I'll go ahead and cut another release manually (for now). Later on, we should build more automation w.r.t. releases. Finally, I'd love for others to get into the habit of hitting the merge button (as long as it's not your own PR!). So @zbarry or @hectormz, please do the honors when the Changelog is updated. |
@ericmjl Added information to changelog. However, there is an issue with tests, specifically , |
@samukweku also not worried about that one, as the function actually depends on live access to the web service, so in a way, we need to know when the service goes down. (Thankfully for now it's been ok.) I restarted the tests; are you able to see an option to do so? If not, this weekend I will see how I can add you to the Azure pipelines. |
Do you mean I should rerun |
@samukweku sorry! I should have been a bit more clear. What I meant was whether you see the option in the GitHub checks interface to "Re-run" a particular check. I can point that out to you going forward. |
PR Description
Please describe the changes proposed in the pull request:
This PR resolves #663 .
PR Checklist
Please ensure that you have done the following:
<your_username>
:dev
, but rather from<your_username>
:<feature-branch_name>
.AUTHORS.rst
.CHANGELOG.rst
under the latest version header (i.e. the one that is "on deck") describing the contribution.Quick Check
To do a very quick check that everything is correct, follow these steps below:
make check
from pyjanitor's top-level directory. This will automatically run:Once done, please check off the check-box above.
If
make check
does not work for you, you can execute the commands listed in the Makefile individually.Code Changes
If you are adding code changes, please ensure the following:
$ pytest .
) locally on your machine.Documentation Changes
If you are adding documentation changes, please ensure the following:
Relevant Reviewers
Please tag maintainers to review.