Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: introducing some keyword-only args for NX 3.0 #5063

Closed
wants to merge 3 commits into from

Conversation

rossbar
Copy link
Contributor

@rossbar rossbar commented Sep 4, 2021

Related to #4100

As an experiment, I converted the convert_matrix module to use keyword-only args to see what it would take and how disruptive the changes would be within NX (as a proxy for how disruptive the change would be for general users).

I naively chose to be very aggressive in specifying what must be kwarg-only: basically, I made it so that every keyword argument (i.e. that has a default value) must be named when calling the function. Some observations from this experiment:

Switching to keyword-only arguments is likely to break people

As expected, making this switch did break some tests and some internal usages of e.g. nx.to_numpy_array in other NetworkX functions. For example, about half of the tests in the test_pandas.py suite fail due to the keyword-only requirement on things like:

nx.from_pandas_edgelist(df, 0, "b", ["weight", "cost"])

Of course, failing on things like this is kind of the point, because the above code is not easy to understand without looking up the from_pandas_edgelist signature, whereas the function call that obeys the kwarg-only requirements is much more readable:

nx.from_pandas_edgelist(df, source=0, target="b", edge_attr=["weight", "cost"])

Nevertheless, I think this indicates that switching to keyword-only arguments is likely to catch people out and raise a lot of new TypeErrors on previously working code.

Deciding which arguments exactly should be keyword only is going to be tricky

In this experiment, I made all of the arguments after the first kwarg-only. This is because all of the conversion functions follow the same general pattern where the first argument is the main data structure to be acted upon (a graph, a numpy array, scipy sparse matrix, etc.) and all of the other arguments are optional. For cases like the pandas conversion functions above I think this made sense, as it's not at all clear what the 2nd and 3rd positional arguments represent without additional context.

However, for e.g. the to_numpy_array case, making the nodelist argument keyword only caused quite a few failures on internal uses of to_numpy_array in other nx functions. There, the common pattern seemed to be:

nodelist = [1, 4]
ary = nx.to_numpy_array(G, nodelist)

Forcing nodelist to be kwarg only causes the above to fail, requiring nx.to_numpy_array(G, nodelist=nodelist) instead. Based on the number of places this usage pattern shows up, it seems like there is a pretty strong convention that the 2nd positional argument of the to_* conversion functions is nodelist. This highlights one of the challenges in deciding where the kwarg-only cutoff should be. My general sense is that we should not be overly aggressive in making things kwarg-only if it will be disruptive without any obvious benefit.

Takeaways

My thoughts after conducting this experiment:

  • I think there should be an NXEP to clearly layout the motivation and general approach to introducing the keyword only requirement. I suspect that even the most seemingly-obvious/innocuous changes are going to cause new failures for some users, and it would be very beneficial to have a design doc to help frame any discussions.
  • Also, because of the potential for breakage, any kwarg-only modifications to signatures should coincide with a major release
  • My feeling is that we should be cognizant of how disruptive any change could be, and when in doubt to error on the side of caution. For instance, having conducted this experiment I would advocate for to_numpy_array(G, nodelist=None, *, ...) instead of to_numpy_array(G, *, nodelist=None, ...).

@rossbar rossbar added this to the networkx-3.0 milestone Sep 4, 2021
@rossbar
Copy link
Contributor Author

rossbar commented Sep 16, 2021

An important question to answer is: is this worth it? Even the mildest changes to the API are likely to break at least some existing code somewhere. There may be ways to make this less noisy (e.g. a decorator to convert kwarg-only exceptions to warnings), but no matter what it's likely to be a noisy change for existing code. In addition to the benefits mentioned above, having a slightly more strict interface can help prevent issues like e.g. #4845 . However, similar issues are mitigated by careful use of the existing API, and introducing more strict rules about how functions can be called will likely break users who are not experiencing such errors.

However, for new additions to the API, there are no such concerns about breaking existing usage, so it would definitely make sense to consider adding kwarg-only specifiers to the signatures of new functions where appropriate. Definitely something to keep in mind when reviewing contributions!

@rossbar rossbar added the Question A question about NetworkX or network science in general label Sep 16, 2021
@mjschwenne
Copy link
Contributor

As I originally read this, I was a bit confused as to why this would be needed, then I realized that it was probably because all of my recent work with NetworkX has been from a developer prospective were I can read all of the docstrings and implementations without needing to refer to the website or GitHub repo rather than a user prospective.

That being said, I agree that there is a possibility to increase the overall readability of NetworkX code across the board. This is still a new area of discussion for me, I'm not used to having to think about the overall API in programs that I am normally writing, but I think that we should be careful about your comment about new additions to the API. I don't think that we would want inconsistent keyword requirements based on something as seemingly arbitrary to an end-user as when a function was added to the library. At least to me, this is something that we would have to commit to entirely or not enforce at all and maybe try to encourage 'proper' usage by using keyword arguments if that is at all possible.

(If I'm totally off base here, just let me know)

@rossbar
Copy link
Contributor Author

rossbar commented Oct 14, 2021

(If I'm totally off base here, just let me know)

Definitely not - you've made a lot of nice observations that get at the core of the problem!

then I realized that it was probably because all of my recent work with NetworkX has been from a developer prospective were I can read all of the docstrings and implementations without needing to refer to the website or GitHub repo rather than a user prospective.

This is an insightful comment 👍 it's very easy to lose sight of other perspectives when you're used to reading the source code any time you don't understand something!

I don't think that we would want inconsistent keyword requirements based on something as seemingly arbitrary to an end-user

Agreed, I think deciding which (if any) new kwargs should be keyword-only is the tricky part. You're right that we definitely don't want it to seem arbitrary and there's a lot of subjectivity built-in here. For example, a usage-pattern that may feel natural to some might feel overly verbose to others, and unfortunately there is no "right" answer. IMO flexibility is one of the strengths of Python, but it's a double-edged sword and can catch unwary users out when not paying close attention to function signatures: see e.g. #4845 .

At least to me, this is something that we would have to commit to entirely or not enforce at all and maybe try to encourage 'proper' usage by using keyword arguments if that is at all possible.

I have a slightly different opinion here - IMO the penalty for breaking backwards compatibility is much higher than that of making new function signatures overly-restrictive. If a new function is introduced with keyword-only arguments and users raise issues about it, it's very easy to back the restriction out. OTOH, if we modify existing signatures we risk breaking user code that already runs. Thus I think if we're going to consider adopting keyword-only arguments at all, that doing so in new functions (that have no backwards compatibility concerns) is the place to start

@rossbar rossbar modified the milestones: networkx-3.0, networkx-3.1 Nov 29, 2022
@rossbar
Copy link
Contributor Author

rossbar commented Feb 27, 2023

I'm going to go ahead and close this as the purpose was to have collection point for discussion and it has served that purpose!

IMO the next step here is probably to formalize a plan via NXEP.

@rossbar rossbar closed this Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Question A question about NetworkX or network science in general
Development

Successfully merging this pull request may close these issues.

None yet

2 participants