Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify (quite a bit...) _preprocess_data #10928

Merged
merged 5 commits into from Dec 24, 2018

Conversation

anntzer
Copy link
Contributor

@anntzer anntzer commented Mar 31, 2018

PR Summary

Public API change: step no longer defaults to using y as
label_namer. This is consistent with other functions that wrap plot
(plot itself, loglog, etc.). (Alternatively, we could make all
these functions use y as label_namer; I don't really care either way.)

The plot-specific data kwarg logic was moved to
_process_plot_var_args, dropping the need for
general callable positional_parameter_names,
_plot_args_replacer, and positional_parameter_names.
test_positional_parameter_names_as_function and tests using
plot_func_varargs were removed as a consequence.

replace_all_args can be replaced by making replace_names=None
trigger replacement of all args, even the "unknown" ones. There was
no real use of "replace all known args but not unknown ones" (even if
there was, this can easily be handled by explicitly listing the args in
replace_names). test_function_call_with_replace_all_args was removed
as a consequence.

replace_names no longer complains if some argument names it is given
are not present in the "explicit" signature, as long as the function
accepts **kwargs -- because it may find the arguments in kwargs
instead.
edit: not anymore, we still error out in that case.

label_namer no longer triggers if data is not passed (if the
argument specified by label_namer was a string, then it is
likely a categorical and shouldn't be considered as a label
anyways). test_label_problems_at_runtime was renamed to
test_label_namer_only_if_data and modified accordingly.

Calling data-replaced functions used to trigger RuntimeError in some
cases of mismatched arguments; they now trigger TypeError similarly to
how normal functions do (test_more_args_than_pos_parameters).

PR Checklist

  • Has Pytest style unit tests
  • Code is PEP 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@anntzer anntzer added this to the v3.0 milestone Mar 31, 2018
@anntzer anntzer force-pushed the process-plot-args branch 3 times, most recently from 2a2e584 to ebcc968 Compare March 31, 2018 02:42
@tacaswell
Copy link
Member

(Alternatively, we could make all
these functions use y as label_namer; I don't really care either way.)

I would rather do this, it seems like an oversight that we did not do this for plot and friends originally.

@tacaswell
Copy link
Member

This is easier to review with https://github.com/matplotlib/matplotlib/pull/10928/files?w=1 Most of the whitespace change is due to using partial instead of an extra layer of inner function to create the paramterized decorator.

@tacaswell
Copy link
Member

Calling data-replaced functions used to trigger RuntimeError in some
cases of mismatched arguments; they now trigger TypeError similarly to
how normal functions do (test_more_args_than_pos_parameters).

Please document this as an API change.

It seems a good fraction of the simplification comes from using signature.bind and not re-implementing functional that core python provides?

The docstring and signature for plot and friends needs to be updated as they are no longer done by the decorator.

Over all 👍 !

Copy link
Member

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Document API change for return types
  • Update the docstring and signature of plot and friends

Anyone can dismiss this review.

@anntzer
Copy link
Contributor Author

anntzer commented Mar 31, 2018

The simplications come from

  1. don't do the work that signature.bind does
  2. move the plot-specific logic to its own implementation rather than attempt to have a single overgeneralized version
  3. dropping replace_all_args and simplifying the replace_names logic (either replace this, this, this and this named argument, or replace all of them) to restrict them to what we need them to do

@anntzer
Copy link
Contributor Author

anntzer commented Mar 31, 2018

Question: Currently,

plot([1, 2], "foo", data={"foo": "y"})

works as expected but

plot([1, 2], [3, 4], "foo", data={"foo": "y"})

crashes ("unrecognized character f in format string"). This PR makes both forms "work", although I'm tempted to make neither work instead (I think the shorthand format string should, well, be a shorthand format string). Thoughts?

@anntzer
Copy link
Contributor Author

anntzer commented Mar 31, 2018

The data kwarg actually already has its own documentation in the docstring of plot:

        data : indexable object, optional
            An object with labelled data. If given, provide the label names to
            plot in *x* and *y*.

            .. note::
                Technically there's a slight ambiguity in calls where the
                second label is a valid *fmt*. `plot('n', 'o', data=obj)`
                could be `plt(x, y)` or `plt(y, fmt)`. In such cases,
                the former interpretation is chosen, but a warning is issued.
                You may suppress the warning by adding an empty format string
                `plot('n', 'o', '', data=obj)`.

The same note, without the PS, is present in step; it is not present at all in loglog and friends. While these could be rationalized I think this is out of the scope of this PR.

Added label_namer support for plot and friends. While doing so, moved from using the terribly named get_label to a private _label_from_arg which checks whether its second argument is a string, and, if not so, defaults to None instead (to share that logic between the general and plot-specific implementations).

@timhoffm
Copy link
Member

timhoffm commented Jun 6, 2018

Any chances that this will go in soon? Otherwise I propose to preliminary merge #10872 to fix #11389.

@anntzer
Copy link
Contributor Author

anntzer commented Jun 7, 2018

rebased

@@ -4533,7 +4533,7 @@ def barbs(self, *args, **kw):

# Uses a custom implementation of data-kwarg handling in
# _process_plot_var_args.
def fill(self, *args, **kwargs):
def fill(self, *args, data=None, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data should be added to the Parameters section in the docstring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jklymak
Copy link
Member

jklymak commented Jul 9, 2018

What is the status of this and #10872. Should #10872 go in as a stopgap?

@@ -75,36 +58,6 @@ def func_no_ax_args(*args, **kwargs): pass
with pytest.raises(AssertionError):
_preprocess_data(label_namer="z")(func_args)

# but "ok-ish", if func has kwargs -> will show up at runtime :-(
Copy link
Contributor Author

@anntzer anntzer Sep 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted these as we don't support "implicit" label_name ("hoping that they'll show up in kwargs") anymore (this was basically present only to support plot() and fill() which don't have y explicitly in the signature but always have a y argument; this PR changes these functions to have their own preprocessor).

@@ -205,42 +158,6 @@ def func_replace_all(ax, x, y, ls="x", label=None, w="NOT"):
func_replace_all(None, x="a", y="b", w="x", label="text", data=data) ==
"x: [1, 2], y: [8, 9], ls: x, w: xyz, label: text")

@_preprocess_data(label_namer="y")
def func_varags_replace_all(ax, *args, **kwargs):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above re: "implicit" label_namer. Otherwise this test function is redundant with func_replace_all just above.

@anntzer
Copy link
Contributor Author

anntzer commented Sep 10, 2018

As a side point, after this PR, we may want to consider the possibility of (... being very prudent here :) ...) making _preprocess_data part of the public API (perhaps under a better name, e.g. process_data_kwarg, up to bikeshedding).
This will both allow others to reuse the functionality, and possibly to move out the spectral plotting functions out to their own mpl_toolkit :p (like everyone else they depend on _preprocess data to massage their arguments).

@tacaswell
Copy link
Member

Sold on this in general and on making preprocess_data public.

@anntzer
Copy link
Contributor Author

anntzer commented Nov 3, 2018

rebased

@timhoffm
Copy link
Member

timhoffm commented Nov 3, 2018

Sold on this in general and on making preprocess_data public.

I'm a bit hesitant about this. _preprocess_data seems still a bit too clever to be released in the wild.

  1. I'd rather strip out the signature manipulation and instead simply require that the wrapped function has a data parameter.
  2. Also It shouldn't manipulate the docstring. We cannot assume that users would want the data note as we have it now. - Actually I'm not a fan of this for the matplotlib docs as well. The parameter should appear in the Parameters or Additional Parameters section like any other parameter.

@anntzer
Copy link
Contributor Author

anntzer commented Nov 3, 2018

I'd rather strip out the signature manipulation and instead simply require that the wrapped function has a data parameter.

But the inner function actually doesn't take data as parameter.

Also It shouldn't manipulate the docstring. We cannot assume that users would want the data note as we have it now. - Actually I'm not a fan of this for the matplotlib docs as well. The parameter should appear in the Parameters or Additional Parameters section like any other parameter.

This has basically been discussed to death in the other docstring manipulation issues.

In any case this PR is not making the function public; let's keep that discussion for when the PR shows up (if ever).

@timhoffm
Copy link
Member

timhoffm commented Nov 3, 2018

I'd rather strip out the signature manipulation and instead simply require that the wrapped function has a data parameter.

But the inner function actually doesn't take data as parameter.

Yep, sorry. I was just repeating a thought from long time ago. The above comment was not the full story. Essentially, when stripping off all the fancy manipulation magic of signatures and docstrings, this could be a simple function x, y = preprocess_data(x, y, data=data). But as you said, let's keep that discussion out of here.

Public API change: `step` no longer defaults to using `y` as
label_namer.  This is consistent with other functions that wrap `plot`
(`plot` itself, `loglog`, etc.).  (Alternatively, we could make all
these functions use `y` as label_namer; I don't really care either way.)

The plot-specific data kwarg logic was moved to
`_process_plot_var_args`, dropping the need for
general callable `positional_parameter_names`,
`_plot_args_replacer`, and `positional_parameter_names`.
`test_positional_parameter_names_as_function` and tests using
`plot_func_varargs` were removed as a consequence.

`replace_all_args` can be replaced by making `replace_names=None`
trigger replacement of all args, even the "unknown" ones.  There was
no real use of "replace all known args but not unknown ones" (even if
there was, this can easily be handled by explicitly listing the args in
replace_names). `test_function_call_with_replace_all_args` was removed
as a consequence.

`replace_names` no longer complains if some argument names it is given
are not present in the "explicit" signature, as long as the function
accepts `**kwargs` -- because it may find the arguments in kwargs
instead.

label_namer no longer triggers if `data` is not passed (if the
argument specified by label_namer was a string, then it is
likely a categorical and shouldn't be considered as a label
anyways). `test_label_problems_at_runtime` was renamed to
`test_label_namer_only_if_data` and modified accordingly.

Calling data-replaced functions used to trigger RuntimeError in some
cases of mismatched arguments; they now trigger TypeError similarly to
how normal functions do (`test_more_args_than_pos_parameters`).
Assert that label_namer (if given) is a parameter in the function
signature (not "possibly" present via `**kwargs`).  This seems more
consistent with the expectations of the check later; in fact we can
now just remove that check.
We only support passing them positionally; right now passing them as
kwargs silently does nothing.  The error message is the standard one for
unknown kwargs.
@anntzer
Copy link
Contributor Author

anntzer commented Nov 8, 2018

Added a commit to make plot() error out when x/y are passed as kwargs (this silently does nothing currently, xref #12106).

@timhoffm timhoffm mentioned this pull request Dec 10, 2018
1 task
@tacaswell tacaswell merged commit 3826ee5 into matplotlib:master Dec 24, 2018
@tacaswell
Copy link
Member

Thanks @anntzer ! Sorry for the slow review on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants