Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: update the pandas.DataFrame.clip_lower docstring #20289

Conversation

adatasetaday
Copy link
Contributor

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.clip_lower" correct. :)

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

@datapythonista
Copy link
Member

This PR is for clip which is quite similar: #20212. Your changes look great, but I'd take a look at the discussion there to see if you see anything you can improve.

The Return section is probably worth changing to the format in the first example here:
https://python-sprints.github.io/pandas/guide/pandas_docstring.html#section-4-returns-or-yields

Return copy of the input with values below given value(s) truncated.
Return copy of the input with values below given value(s) trimmed.

If an element value is below the threshold, the threshold value is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since threshold is an object here you can just say "value is below `threshold`..." (note backticks)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also changing the wording a little to match #20212 as per @datapythonista suggestion.

Align object with threshold along the given axis.
inplace : boolean, default False
Whether to perform the operation in place on the data
.. versionadded:: 0.21.0
.. versionadded:: 0.21.0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No period after versionadded. Also, does this render? I think it needs to be down one more line and aligned with text directly above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It rendered OK, but for consistency with other docstrings I am adding the extra line and shifting it to the left.

Re the period, the validation script complains if it's not there. See discussion in #20227.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm OK. @datapythonista @jorisvandenbossche another issue of where I think the script is incorrectly calling out a missing period - we shouldn't be adding this to the end of a versionadded directive right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the script is wrong when versionadded is present. Please ignore the error in the validation.

Return copy of the input with values below given value(s) trimmed.

If an element value is below the threshold, the threshold value is
returned instead.

Parameters
----------
threshold : float or array_like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array-like instead of array_like

DataFrame.clip_upper: Return copy of input with values above given
value(s) trimmed.
Series.clip_lower: Return copy of the input with values below given
value(s) trimmed.

Returns
-------
clipped : same type as input
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to be explicit about return types here. I'd add the type (Series or DataFrame?) and then on the line below say "Returned object type is determined by caller"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see other docstrings say "type of caller" or "same type as caller". Should I use one of these for consistency or is the plan change all to the wording you provided?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I would at the very least put the type declaration on the first line (I think it is only Series or DataFrames that can call this, so if that's right just put Series or DataFrame). On the next line describing the type then go ahead and put Same type as caller


Returns
-------
clipped : same type as input

Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job here - I think the examples are very clear


Parameters
----------
threshold : float or array_like
axis : int or string axis name, optional
Lower value(s) to which the input value(s) will be trimmed.
axis : {0 or 'index', 1 or 'columns'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

axis : {0 or ‘index’, 1 or ‘columns’, None}, default None


See Also
--------
clip
DataFrame.clip: Trim values at input threshold(s).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Insert space before each colon

DataFrame.clip: Trim values at input threshold(s).
DataFrame.clip_upper: Return copy of input with values above given
value(s) trimmed.
Series.clip_lower: Return copy of the input with values below given
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For comprehensiveness and consistency add the Series clip and clip_lower methods as well

Lower value(s) to which the input value(s) will be trimmed.
axis : {0 or 'index', 1 or 'columns'}
axis : {0 or index, 1 or columns’, None}, default None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I think this was a copy/paste issue but you should be fine with normal single quotes around the values

@@ -5718,30 +5718,34 @@ def clip_lower(self, threshold, axis=None, inplace=False):
"""
Return copy of the input with values below given value(s) trimmed.

If an element value is below the threshold, the threshold value is
returned instead.
Elements below the `threshold` will be changed to `threshold`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"changed to match the threshold value(s)"


See Also
--------
DataFrame.clip: Trim values at input threshold(s).
DataFrame.clip_upper: Return copy of input with values above given
DataFrame.clip : Trim values at input threshold(s).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggested better wording:

Series.clip : General purpose method to trim DataFrame values to given threshold(s)
DataFrame.clip : General purpose method to trim Series values to given threshold(s)
Series.clip_lower : Trim Series values below given threshold(s)
Series.clip_upper : Trim Series values above given threshold(s)
DataFrame.clip_upper : Trim DataFrame values above given threshold(s)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but changed the order to "more related" to "less related".

@codecov
Copy link

codecov bot commented Mar 12, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@1986dbe). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #20289   +/-   ##
=========================================
  Coverage          ?   91.73%           
=========================================
  Files             ?      150           
  Lines             ?    49148           
  Branches          ?        0           
=========================================
  Hits              ?    45087           
  Misses            ?     4061           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.12% <100%> (?)
#single 41.9% <71.42%> (?)
Impacted Files Coverage Δ
pandas/core/generic.py 95.85% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1986dbe...e48d49b. Read the comment docs.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 minor comments otherwise lgtm

2 0.710404 0.760925
3 0.800000 0.800000

Clip to an array column the index axis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe for consistency here you meant to say "Clip to an array along the column axis"

@@ -5716,24 +5716,86 @@ def clip_upper(self, threshold, axis=None, inplace=False):

def clip_lower(self, threshold, axis=None, inplace=False):
"""
Return copy of the input with values below given value(s) truncated.
Return copy of the input with values below given value(s) trimmed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use "caller" instead of "input". FWIW I guess it's not technically true that it returns a copy of the caller because inplace would modify directly. Any idea on how to make this statement more accurate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something along the line of

Return a copy of the caller (or the caller itself if inplace == True) with values below given threshold trimmed.

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm well I don't think you need to get in the discussion of return values since that's documented later in the docstring. First line is supposed to be very simple so I would mirror what's there for clip and say something like Trim the caller's values below a given threshold

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Since Series.clip_lower uses the same docstring (the function is defined in generic.py) I thought that I should change the "See Also" description for Series.clip_lower to be the same. But when I tested the generation of the HTML I realized that if we use the "See Also" as it is right now, for Series.clip_lower we will not have a reference to DataFrame.clip_lower and it will have a reference to itself.

Does this make sense? Is there an elegant way to solve it?

Copy link
Member

@WillAyd WillAyd Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that makes sense. Typically in this case you could use the @Substitution decorator - you'll see some other functions using in that same module.

That said, since your method is currently just inherited by Series and DataFrame and not necessarily overriden, I don't think there's a good way without overriding those implementation (if even just to call super) to implement Substitution. Unless you can think of a better way, I'd suggest you remove the clip_lower reference in See Also and open up a separate issue to try and make substitution work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I can also only think of overriding in order for it to work. But is it worth the effort just to make the "See Also" complete? Would a reference to just clip_lower (without DataFrame. or Series. prefixes) be a good enough option?

I'll sleep through it to see if I can think of something better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say just remove but if you come up with a better idea then by all means

See Also
--------
clip
.. versionadded:: 0.21.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I forgot some discussion about it, but don't we want a See Also section? I think clip and clip_upper should be listed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good spot we definitely still want this - @adatasetaday can you add back in?

I know we had some back and forth about Series.clip_lower being self-referential with the shared doc system. If you find that confusing just remove that one entry, but all the rest should remain


Returns
-------
clipped : same type as input
Series or DataFrame
Same type as caller.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this method may be it's a bit obvious, but for consistency I'd say in the description what this method returns (e.g. Original data with values trimmed). The note about the time is all right (not sure if other methods have it as part of the type or the description).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would any of these work?

Series or DataFrame (same type as caller)
Original data with values trimmed.

Series or DataFrame
Original data (in same type as caller) with values trimmed .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They look great. If you want to take a look at the latest merged pull requests, I think this happens in few, and may be there is a way more standard than another. If you don't find any, from my side you can choose whatever you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on what I see in the code, this seems to be the most common format

type of caller
Original data with values trimmed.

I was thinking that there would be some value in creating PRs for standarizing these kind of things. For example, do one that fixes the axis parameters in ALL methods to match the format agreed in the discussions in this sprint. Another one could be done to make sure there is no period after the versionadded (which a lot of them have it, probably due to the validating script complaining if it was not there). Would this be a good idea? Who should I contact if I want to take care of some of these?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to do it. I'd probably wait until most of the PRs from the sprint are closed, to avoid conflicts. And more than contacting someone, you can create an issue (or one per change) with what you're planning to do, you can get some discussion and feedback before doing all the work.

Examples
--------
>>> df = pd.DataFrame({'a': [0.740518, 0.450228, 0.710404, -0.771225],
... 'b': [0.040507, -0.45121, 0.760925, 0.010624]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I find the data used in #20212 easier to understand. It requires a decent amount of concentration to see what the examples are doing with so many decimal numbers.

threshold(s)
Series.clip : General purpose method to trim `Series` values to given
threshold(s)
Series.clip_upper : Trim `Series` values above given threshold(s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, for DataFrame.clip_lower we would like to have in See Also:

  • Series.clip_lower
  • DataFrame.clip
  • DataFrame.clip_upper

So, the same in Series and nothing else from Series. And the similar from the same class (DataFrame). As this is being reused by both Series and DataFrame, I'd probably list all clip, clip_lower and clip_upper, but I'm not sure. @jreback ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only caveat here is that the user is working on a doc that is shared by Series and DataFrame. Mentioned earlier that without further updates (i.e. implementing substitution) that the clip_lower reference would be self-referential half of the time.

I advised just removing it to not get hung up on it for now, the other option would be to explicitly list both DataFrame.clip_lower and Series.clip_lower and live with the fact that one will be self referencing. No strong preference on my end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was waiting for a little clarification, since I'm not sure how to proceed. I can go either way, but I haven't seen a conclusive definition. Was a decision made on how to proceed with these?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not wrong, in another ticket the final implementation just listed everything. As @WillAyd says in some cases the same method being documented will be self-referenced, but it's not a big issue.

@jorisvandenbossche jorisvandenbossche merged commit 28378fd into pandas-dev:master Jul 8, 2018
@jorisvandenbossche
Copy link
Member

Rebased (another PR also already updated this docstring, so the diff became somewhat smaller).

Thanks @adatasetaday !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants