Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] updates to update_where and find_replace functions #673

Merged
merged 24 commits into from Jun 20, 2020
Merged

[ENH] updates to update_where and find_replace functions #673

merged 24 commits into from Jun 20, 2020

Conversation

samukweku
Copy link
Collaborator

@samukweku samukweku commented Jun 15, 2020

PR Description

Please describe the changes proposed in the pull request:

  • Changed update_where conditions to query style mode
  • Removed find_replace dependency on update_where function

This PR resolves #663 .

PR Checklist

Please ensure that you have done the following:

  1. PR in from a fork off your branch. Do not PR from <your_username>:dev, but rather from <your_username>:<feature-branch_name>.
  1. If you're not on the contributors list, add yourself to AUTHORS.rst.
  1. Add a line to CHANGELOG.rst under the latest version header (i.e. the one that is "on deck") describing the contribution.
    • Do use some discretion here; if there are multiple PRs that are related, keep them in a single line.

Quick Check

To do a very quick check that everything is correct, follow these steps below:

  • Run the command make check from pyjanitor's top-level directory. This will automatically run:
    • black formatting
    • flake8 checking
    • running the test suite
    • docs build

Once done, please check off the check-box above.

If make check does not work for you, you can execute the commands listed in the Makefile individually.

Code Changes

If you are adding code changes, please ensure the following:

  • Ensure that you have added tests.
  • Run all tests ($ pytest .) locally on your machine.
    • Check to ensure that test coverage covers the lines of code that you have added.
    • Ensure that all tests pass.

Documentation Changes

If you are adding documentation changes, please ensure the following:

  • Build the docs locally.
  • View the docs to check that it renders correctly.

Relevant Reviewers

Please tag maintainers to review.

@samukweku
Copy link
Collaborator Author

Made the updates to the update_where function to use an expression, similar to pandas query style. Also, since find_replace was initially based on update_where, I removed the dependency and based it solely on loc

@samukweku samukweku removed the request for review from hectormz June 15, 2020 00:45
@samukweku
Copy link
Collaborator Author

@zbarry, I will need your guidance on how pyspark and update_where functions are intertwined

Copy link
Member

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samukweku some changes requested. It's mostly good, though I think we should refrain from commenting out tests without discussion first - they're like contracts by our historical selves to our future selves about things we want to ensure are working.

tests/functions/test_update_where.py Outdated Show resolved Hide resolved
janitor/functions.py Outdated Show resolved Hide resolved
janitor/functions.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jun 16, 2020

Codecov Report

Merging #673 into dev will decrease coverage by 0.11%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              dev     #673      +/-   ##
==========================================
- Coverage   93.26%   93.15%   -0.12%     
==========================================
  Files          16       16              
  Lines         609      599      -10     
==========================================
- Hits          568      558      -10     
  Misses         41       41              

Copy link
Member

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some additional changes requested, @samukweku. Hope you don't mind them. Thanks for taking this PR on! 😄

janitor/functions.py Outdated Show resolved Hide resolved
Comment on lines 3146 to 3148
in dataframe, a new column will be created; note that entries that do
not get set in the new column will be null.
in the dataframe, a new column will be created;
note that entries that do not get set
in the new column will be null.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation.

Comment on lines 3151 to 3154
:raises: IndexError if **conditions** does not have the same length as
**df**.
**df**.
:raises: TypeError if **conditions** is not a pandas-compatible string
query.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation.

@samukweku
Copy link
Collaborator Author

Ok. I’ll fix it. I just manually shift the indentations to line up. Any suggestions for Extensions on VSCode for this?

@ericmjl
Copy link
Member

ericmjl commented Jun 16, 2020

Any suggestions for Extensions on VSCode for this?

Not that I know of. A simple 4-space indent (by tabbing) is probably simpler than making things line up, but also preserves the visual distinction between each param, while also leaving enough room for longer text, bounded within the 79 char line length for split screen readability, in case it's needed.

@ericmjl
Copy link
Member

ericmjl commented Jun 16, 2020

@samukweku something's really wonky, I'm seeing lots of indentation changes that shouldn't be happening. Do you know what's going on? Be careful when committing changes too, only commit what you intend to commit.

@zbarry
Copy link
Collaborator

zbarry commented Jun 16, 2020

It looks like some code formatter other than Black was applied to the codebase.

@zbarry
Copy link
Collaborator

zbarry commented Jun 16, 2020

@zbarry, I will need your guidance on how pyspark and update_where functions are intertwined

Haven't actually worked on any of the pyspark code myself (I don't use Spark), but judging from the code: https://github.com/ericmjl/pyjanitor/blob/dev/janitor/spark/functions.py#L97 , there's nothing that needs to be changed; they are seemingly completely independent functions with different implementations.

Also, it actually wasn't my intent to deprecate the old way of using update_where. I think that's something that can be left intact - based on the type of conditions, use either the original or the .query path.

@ericmjl
Copy link
Member

ericmjl commented Jun 16, 2020

Also, it actually wasn't my intent to deprecate the old way of using update_where. I think that's something that can be left intact - based on the type of conditions, use either the original or the .query path.

@zbarry no worries, I actually think moving to query strings is better long-term for chain-ability. I am happy to be corrected, though!

Copy link
Collaborator

@zbarry zbarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! This will be really nice to have.

janitor/functions.py Outdated Show resolved Hide resolved
tests/functions/test_update_where.py Outdated Show resolved Hide resolved
tests/functions/test_update_where.py Outdated Show resolved Hide resolved
@zbarry
Copy link
Collaborator

zbarry commented Jun 16, 2020

@zbarry no worries, I actually think moving to query strings is better long-term for chain-ability. I am happy to be corrected, though!

I personally like query strings a lot better, myself, especially for chain-ability, but I'm not sure that that's necessarily a good justification for taking the old functionality away. It's also not a complex code path to maintain going forward, so there's not really any additional burden on us to leave it there. If it was, then I think it'd be a different story.

@samukweku
Copy link
Collaborator Author

Morning everyone. I worked only on the update where section in the functions module. Also working within the dev containers setup, so no other formatter except what comes preinstalled in the containers. My apologies for the inconvenience

@ericmjl
Copy link
Member

ericmjl commented Jun 16, 2020

Morning everyone. I worked only on the update where section in the functions module. Also working within the dev containers setup, so no other formatter except what comes preinstalled in the containers. My apologies for the inconvenience

G'morning @samukweku! No problem, there is one convenience function that should get you where you need to - if you close all text editors and then in the terminal use make format to apply all of the opinionated formatting automatically, that should restore the unintended changes back to the "correct" formatting. You can then push them up with no problems!

janitor/functions.py Outdated Show resolved Hide resolved
janitor/functions.py Outdated Show resolved Hide resolved
janitor/functions.py Outdated Show resolved Hide resolved
Copy link
Member

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samukweku I did one code change, and also took the liberty of directly fixing the indentation. Keep in mind for the future that the convention used in pyjanitor is that for the params section, a text continuation indent uses 4 spaces, and not an alignment to the start of the text.

Don't forget to pull down the changes, otherwise you'll probably get some git errors.

Finally, since you and @zbarry have volunteered to do some infrastructure stuff, you'll need to start signing up for the myriad of services to see what's going on behind-the-scenes. Can you both sign up for CodeCov, Travis and Azure first, and report back whether pyjanitor shows up for you on those services? If not, I'm going to have to dig into the settings to see what happens.

janitor/functions.py Outdated Show resolved Hide resolved
Co-authored-by: Eric Ma <ericmjl@users.noreply.github.com>
@samukweku
Copy link
Collaborator Author

Yep. I will let the formatter do its thing with indentation and params. I definitely have to let go of that alignment concept

@ericmjl
Copy link
Member

ericmjl commented Jun 18, 2020

@samukweku I'm going to approve. I'm not sure why the code coverage went down by 0.12%, but the other codecov piece tells me that 100% of diff was hit, so nothing in practice has decreased in code testing coverage.

@zbarry and @szuckerman, if either of you have a moment, would you be kind enough to do a review so there's more eyes on the code?

Copy link
Collaborator

@zbarry zbarry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one little formatting thing. Otherwise, looks good to me. Thanks again!

janitor/functions.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@hectormz hectormz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @samukweku sorry for the late comments on an otherwise finished PR! Let me know what you think about the requested changes.

tests/functions/test_update_where.py Outdated Show resolved Hide resolved
tests/functions/test_update_where.py Outdated Show resolved Hide resolved
@hectormz
Copy link
Collaborator

Thanks @samukweku ! 👍

@samukweku
Copy link
Collaborator Author

@zbarry corrections made. Kindly review, if you have a moment.

@ericmjl
Copy link
Member

ericmjl commented Jun 19, 2020

Tremendous work, @samukweku, thank you for handling this! Final thing, can we get the changelog updated? Want to make sure every contribution is acknowledged properly! 😄 I won't need to submit a further review after this. And we can also ignore @zbarry 😜 - I jest, I see you've done that docstring change, so it's all good.

Once we've merged, I'll go ahead and cut another release manually (for now). Later on, we should build more automation w.r.t. releases.

Finally, I'd love for others to get into the habit of hitting the merge button (as long as it's not your own PR!). So @zbarry or @hectormz, please do the honors when the Changelog is updated.

@samukweku
Copy link
Collaborator Author

@ericmjl Added information to changelog. However, there is an issue with tests, specifically , test_inflate_currency function. requests is failing to get the data from the url - it returns an error 443, bad SSL handshake .... I suggest getting the raw data and putting it into the tests - I feel it will be safer and better to manage than the current scenario. Your thoughts. Hope you are having a lovely weekend. @rhosbach

@ericmjl
Copy link
Member

ericmjl commented Jun 20, 2020

@samukweku also not worried about that one, as the function actually depends on live access to the web service, so in a way, we need to know when the service goes down. (Thankfully for now it's been ok.) I restarted the tests; are you able to see an option to do so? If not, this weekend I will see how I can add you to the Azure pipelines.

@samukweku
Copy link
Collaborator Author

I restarted the tests; are you able to see an option to do so?>

Do you mean I should rerun make tests in my dev container?

@zbarry zbarry merged commit f78a462 into pyjanitor-devs:dev Jun 20, 2020
@samukweku samukweku deleted the update_where_update branch June 20, 2020 12:53
@ericmjl
Copy link
Member

ericmjl commented Jun 20, 2020

@samukweku sorry! I should have been a bit more clear. What I meant was whether you see the option in the GitHub checks interface to "Re-run" a particular check. I can point that out to you going forward.

@hectormz hectormz mentioned this pull request Jun 22, 2020
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] support .query()-style strings in df.update_where()
4 participants