-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Naming conventions #59
Comments
Thanks for raising this issue, @szuckerman! I’m also in favor of consistent naming schemes. Breaking this down a bit: ColumnsI vote in favor of “col_name”. It is more explicit than “column”. Verbose vs Concise FunctionsZen of Python states that explicit is better than implicit, with no qualifiers afterwards. For that reason, I vote in favor of verbose names. |
Labelling this a "good first issue" for newcomers, as it basically involves a find-and-replace. |
I'd be interested in taking on this issue if no one has done so yet. Would this need some support to warn users that names have changed, and that previous names will no longer work in a future version? |
@HectorM14 yes, that would be great! Please do go ahead.
Yes agreed. Since we're at version 0.16.x right now, the responsible way to evolve the API is to keep it through the next two minor versions, and the deprecate it at the next major or 3rd minor version. Perhaps raising a warning if and only if certain functions are called/keyword arguments are used. Personally, I would prefer one function at a time, which makes it easier to review, but at the same time, I think we should trust your judgment - especially if there's things that are logically groupable, feel free to PR in them together. Looking forward to what you've got; also, don't hesitate to ping in here! |
Great, I'll use my best judgement for PR scope. Is there a standard way to deprecate/discourage old argument names? I was thinking of using decorators to keep the method APIs clear about their usage going forward. |
Thanks @HectorM14!
To the best of my knowledge, the standard way to do this should be to insert a conditional checking to see if that old argument name was called or not, and then inside the conditional:
For example: from warning import warning
def deprecation_warning(old_arg, new_arg):
"""generic warning function."""
warning(f"{argname.__name__} is deprecated, and will be replaced with {new_arg.__name__}")
def func(df, old_arg1, arg2, arg3, new_arg):
"""docstring here"""
if old_arg1 is not None:
deprecation_warning(old_arg1, new_arg)
new_arg = old_arg1
# continue code
That might actually be better than what I had up there! I'd like to see the pattern in the first PR, it'd be educational for myself. |
Relevant PyCon talk on refactoring and adding |
Oh cool, thanks for sharing that, @szuckerman! |
Following advice in that talk, we could update def expand_column(df, sep, column_name=None, concat=True, **kwargs):
"""
Expand a categorical column with multiple labels into dummy-coded columns.
Super sugary syntax that wraps :py:meth:`pandas.Series.str.get_dummies`.
Functional usage example:
.. code-block:: python
df = expand_column(df, column_name='col_name',
sep=', ') # note space in sep
Method chaining example:
.. code-block:: python
import pandas as pd
import janitor
df = pd.DataFrame(...).expand_column(df, column_name='col_name', sep=', ')
:param df: A pandas DataFrame.
:param column_name: A `str` indicating which column to expand.
:param sep: The delimiter. Example delimiters include `|`, `, `, `,` etc.
:param bool concat: Whether to return the expanded column concatenated to
the original dataframe (`concat=True`), or to return it standalone
(`concat=False`).
"""
if kwargs and column_name is not None:
raise TypeError('Mixed usage of column and column_name')
if column_name is None:
warnings.warn('Use of column is deprecated. You should use column_name.')
column_name = kwargs['column']
expanded = df[column_name].str.get_dummies(sep=sep)
if concat:
df = df.join(expanded)
return df
else:
return expanded One notable consequence of this, is having to set column_name as None, which requires us to move its position in the list of arguments (during the deprecation period). The way it's written also assumes that additional arguments received would only relate to deprecated usage, and include Another issue is to decide what is the proper variable form of "column". Even when the variable names are explicit, |
@HectorM14 those are great points you've raised 😄.
I am guilty of being inconsistent here, as I sometimes use |
I like column_name too. I can aim for being explicit with these names unless they become too unwieldy. I'll start making individual PRs for each updated function. Updating variables relating columns is easy, but if anyone finds other variables that should be updated, they can list them here. |
Updating names is becoming a bit more complicated. For example we currently have: def coalesce(df, columns, new_column_name): Following the current style we've been using, we would update this to: def coalesce(df, new_column_name, columns=None, **kwargs): But quick, intuitive usage of this function from df = (
df
.coalesce(["certification", "certification_1"], "certification") With the proposed changes, this would become (due to rearranging of argument order): df = (
df
.coalesce(column_names=["certification", "certification_1"], new_column_name="certification")
) Which is excessively more complicated. So our fix will have to be more complex/clever to keep it clean for the user. |
@HectorM14 I wouldn't worry too much about the kwargs. In Python, explicit is better than implicit, generally speaking. I find myself writing out kwarg names even though it is sometimes easier to not do so, because I recognize that it makes the code more readable, actually. I would encourage you to keep going with the PRs 😄. |
Thanks @ericmjl , explicit is good! import functools
import warnings
def _deprecated_alias(**aliases):
def deco(f):
@functools.wraps(f)
def wrapper(*args, **kwargs):
_rename_kwargs(f.__name__, kwargs, aliases)
return f(*args, **kwargs)
return wrapper
return deco
def _rename_kwargs(func_name, kwargs, aliases):
for alias, new in aliases.items():
if alias in kwargs:
if new in kwargs:
raise TypeError('{} received both {} and {}'.format(
func_name, alias, new))
warnings.warn('{} is deprecated; use {}'.format(alias, new),
DeprecationWarning)
kwargs[new] = kwargs.pop(alias) using this as a decorator we would be able to do: @_deprecated_alias(a='alpha', b='beta')
def simple_sum(alpha, beta):
return alpha + beta
print(simple_sum(alpha=6, beta=9))
print(simple_sum(a=6, b=9)) With this method, updating the argument names would be much faster, and when the deprecation period has expired, we only need to delete one/two lines instead of 6+ lines currently. If we used this approach, I would make the above functions more human-readable. What do you think? |
This is a really great find! I'm all for reducing the maintenance burden 😄. Perhaps this function can be inserted into Also, does this mean you'd redo the name changes already PR'd? I don't want to put too much pressure on you, but if you'd do so, that'd be superb for maintenance purposes! |
Cool! I'll link to the SO post in the |
I have been merging from |
Rebasing works as well! It's a suggestion from @zbarry, which, frankly speaking, I rarely do, because it's just easier to remember That said, good practices are good to use, so I'm going to try moving to rebasing now. |
@HectorM14 - that decorator is a great find. Thanks for adding it! |
Yes! It's a very clean method that I would have been happy to find in the Python standard library. I assume most projects go through name changes and deprecation at some point. 🤷♂️ |
I didn't think of this previously, but should I add some tag comment to the decorators to indicate under which release the names were deprecated? After the decided major/minor release, it'll be easy to search for |
I wanted to discuss naming conventions for the various functions and arguments for consistency.
expand_column
has a parametercolumn
whereasadd_column
hadcol_name
.Also, is
_column
necessary in each function? Would it be ok to just have anadd()
ortransform()
method? In general I'm more on the side of more verbose function names, but just wanted to throw the idea out there.Similarly, when following the format of
limit_column_characters
functions likechange_type
should probably be namedchange_column_type
.I'm not married to any of this (except the function arguments, those should be consistent) but wanted to get peoples' ideas.
The text was updated successfully, but these errors were encountered: