Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Naming conventions #59

Closed
szuckerman opened this issue Nov 27, 2018 · 21 comments · Fixed by #168, #319, #321, #322 or #323
Closed

[ENH] Naming conventions #59

szuckerman opened this issue Nov 27, 2018 · 21 comments · Fixed by #168, #319, #321, #322 or #323
Labels
enhancement New feature or request good first issue Good for newcomers reference Comments and suggestions from external sources

Comments

@szuckerman
Copy link
Collaborator

I wanted to discuss naming conventions for the various functions and arguments for consistency. expand_column has a parameter column whereas add_column had col_name.

Also, is _column necessary in each function? Would it be ok to just have an add() or transform() method? In general I'm more on the side of more verbose function names, but just wanted to throw the idea out there.

Similarly, when following the format of limit_column_characters functions like change_type should probably be named change_column_type.

I'm not married to any of this (except the function arguments, those should be consistent) but wanted to get peoples' ideas.

@ericmjl
Copy link
Member

ericmjl commented Nov 27, 2018

Thanks for raising this issue, @szuckerman! I’m also in favor of consistent naming schemes.

Breaking this down a bit:

Columns

I vote in favor of “col_name”. It is more explicit than “column”.

Verbose vs Concise Functions

Zen of Python states that explicit is better than implicit, with no qualifiers afterwards. For that reason, I vote in favor of verbose names.

@ericmjl ericmjl added reference Comments and suggestions from external sources enhancement New feature or request good first issue Good for newcomers labels Dec 4, 2018
@ericmjl
Copy link
Member

ericmjl commented Dec 4, 2018

Labelling this a "good first issue" for newcomers, as it basically involves a find-and-replace.

@hectormz
Copy link
Collaborator

I'd be interested in taking on this issue if no one has done so yet. Would this need some support to warn users that names have changed, and that previous names will no longer work in a future version?

@ericmjl
Copy link
Member

ericmjl commented Apr 23, 2019

@HectorM14 yes, that would be great! Please do go ahead.

Would this need some support to warn users that names have changed, and that previous names will no longer work in a future version?

Yes agreed. Since we're at version 0.16.x right now, the responsible way to evolve the API is to keep it through the next two minor versions, and the deprecate it at the next major or 3rd minor version. Perhaps raising a warning if and only if certain functions are called/keyword arguments are used.

Personally, I would prefer one function at a time, which makes it easier to review, but at the same time, I think we should trust your judgment - especially if there's things that are logically groupable, feel free to PR in them together. Looking forward to what you've got; also, don't hesitate to ping in here!

@hectormz
Copy link
Collaborator

Great, I'll use my best judgement for PR scope. Is there a standard way to deprecate/discourage old argument names? I was thinking of using decorators to keep the method APIs clear about their usage going forward.

@ericmjl
Copy link
Member

ericmjl commented Apr 24, 2019

Thanks @HectorM14!

Is there a standard way to deprecate/discourage old argument names?

To the best of my knowledge, the standard way to do this should be to insert a conditional checking to see if that old argument name was called or not, and then inside the conditional:

  1. Raise a warning.
  2. Assign the value of the old argument to a new variable that has the same name as the new argument.

For example:

from warning import warning

def deprecation_warning(old_arg, new_arg):
    """generic warning function."""
    warning(f"{argname.__name__} is deprecated, and will be replaced with {new_arg.__name__}")

def func(df, old_arg1, arg2, arg3, new_arg):
    """docstring here"""
    if old_arg1 is not None:
        deprecation_warning(old_arg1, new_arg)
        new_arg = old_arg1
    # continue code

I was thinking of using decorators to keep the method APIs clear about their usage going forward.

That might actually be better than what I had up there! I'd like to see the pattern in the first PR, it'd be educational for myself.

@szuckerman
Copy link
Collaborator Author

Relevant PyCon talk on refactoring and adding warnings for legacy code. The actual code for warnings comes in around 17:00.

@ericmjl
Copy link
Member

ericmjl commented Apr 24, 2019

Oh cool, thanks for sharing that, @szuckerman!

@hectormz
Copy link
Collaborator

hectormz commented May 5, 2019

Following advice in that talk, we could update expand_column() as follows:

def expand_column(df, sep, column_name=None, concat=True, **kwargs):
    """
    Expand a categorical column with multiple labels into dummy-coded columns.

    Super sugary syntax that wraps :py:meth:`pandas.Series.str.get_dummies`.

    Functional usage example:

    .. code-block:: python

        df = expand_column(df, column_name='col_name',
                           sep=', ')  # note space in sep

    Method chaining example:

    .. code-block:: python

        import pandas as pd
        import janitor
        df = pd.DataFrame(...).expand_column(df, column_name='col_name', sep=', ')

    :param df: A pandas DataFrame.
    :param column_name: A `str` indicating which column to expand.
    :param sep: The delimiter. Example delimiters include `|`, `, `, `,` etc.
    :param bool concat: Whether to return the expanded column concatenated to
        the original dataframe (`concat=True`), or to return it standalone
        (`concat=False`).
    """
    if kwargs and column_name is not None:
        raise TypeError('Mixed usage of column and column_name')
    if column_name is None:
        warnings.warn('Use of column is deprecated. You should use column_name.')
        column_name = kwargs['column']
    expanded = df[column_name].str.get_dummies(sep=sep)
    if concat:
        df = df.join(expanded)
        return df
    else:
        return expanded

One notable consequence of this, is having to set column_name as None, which requires us to move its position in the list of arguments (during the deprecation period).

The way it's written also assumes that additional arguments received would only relate to deprecated usage, and include column.

Another issue is to decide what is the proper variable form of "column". Even when the variable names are explicit, *_col_* and *_column_* are both used throughout the code.

@ericmjl
Copy link
Member

ericmjl commented May 5, 2019

@HectorM14 those are great points you've raised 😄.

Another issue is to decide what is the proper variable form of "column". Even when the variable names are explicit, *_col_* and *_column_* are both used throughout the code.

I am guilty of being inconsistent here, as I sometimes use col and sometimes use column and sometimes use colname. Yes, being consistent is a great thing. I would personally vote for column_name, in adherence to "explicit is better than implicit", but if there are better reasons (brevity, perhaps?) for other conventions, not a problem! I have no strong opinions on this one, as long as we can make it towards consistency in the end!

@hectormz
Copy link
Collaborator

hectormz commented May 5, 2019

I like column_name too. I can aim for being explicit with these names unless they become too unwieldy. I'll start making individual PRs for each updated function. Updating variables relating columns is easy, but if anyone finds other variables that should be updated, they can list them here.

@hectormz
Copy link
Collaborator

hectormz commented May 7, 2019

Updating names is becoming a bit more complicated. For example we currently have:

def coalesce(df, columns, new_column_name):

Following the current style we've been using, we would update this to:

def coalesce(df, new_column_name, columns=None, **kwargs):

But quick, intuitive usage of this function from example.py is:

df = (
    df
    .coalesce(["certification", "certification_1"], "certification")

With the proposed changes, this would become (due to rearranging of argument order):

df = (
    df
    .coalesce(column_names=["certification", "certification_1"], new_column_name="certification")
)

Which is excessively more complicated. So our fix will have to be more complex/clever to keep it clean for the user.

@ericmjl
Copy link
Member

ericmjl commented May 8, 2019

@HectorM14 I wouldn't worry too much about the kwargs. In Python, explicit is better than implicit, generally speaking. I find myself writing out kwarg names even though it is sometimes easier to not do so, because I recognize that it makes the code more readable, actually. I would encourage you to keep going with the PRs 😄.

@ericmjl ericmjl changed the title Naming conventions [ENH] Naming conventions May 8, 2019
@hectormz
Copy link
Collaborator

hectormz commented May 8, 2019

Thanks @ericmjl , explicit is good!
I actually came across this example on SO:

import functools
import warnings

def _deprecated_alias(**aliases):
    def deco(f):
        @functools.wraps(f)
        def wrapper(*args, **kwargs):
            _rename_kwargs(f.__name__, kwargs, aliases)
            return f(*args, **kwargs)
        return wrapper
    return deco

def _rename_kwargs(func_name, kwargs, aliases):
    for alias, new in aliases.items():
        if alias in kwargs:
            if new in kwargs:
                raise TypeError('{} received both {} and {}'.format(
                    func_name, alias, new))
            warnings.warn('{} is deprecated; use {}'.format(alias, new),
                          DeprecationWarning)
            kwargs[new] = kwargs.pop(alias)

using this as a decorator we would be able to do:

@_deprecated_alias(a='alpha', b='beta')
def simple_sum(alpha, beta):
    return alpha + beta

print(simple_sum(alpha=6, beta=9))
print(simple_sum(a=6, b=9))

With this method, updating the argument names would be much faster, and when the deprecation period has expired, we only need to delete one/two lines instead of 6+ lines currently. If we used this approach, I would make the above functions more human-readable.

What do you think?

@ericmjl
Copy link
Member

ericmjl commented May 8, 2019

This is a really great find! I'm all for reducing the maintenance burden 😄.

Perhaps this function can be inserted into utils.py? Be sure to acknowledge the source by linking to SO!

Also, does this mean you'd redo the name changes already PR'd? I don't want to put too much pressure on you, but if you'd do so, that'd be superb for maintenance purposes!

@hectormz
Copy link
Collaborator

hectormz commented May 8, 2019

Cool! utils.py sounds like a good place.

I'll link to the SO post in the _deprecated_alias docstring.
I'll start with a PR that includes these new functions, and include some tests to make sure it works/fails/warns as expected. After that, I'm fine updating the previously changed functions and moving forward. I actually was considering revisiting those functions anyway after learning that there was an actual DeprecationWarning subclass.

@hectormz
Copy link
Collaborator

hectormz commented May 8, 2019

I have been merging from upstream before committing, should I be rebasing instead?

@ericmjl
Copy link
Member

ericmjl commented May 8, 2019

Rebasing works as well! It's a suggestion from @zbarry, which, frankly speaking, I rarely do, because it's just easier to remember git fetch upstream; git merge upstream/dev. 😄

That said, good practices are good to use, so I'm going to try moving to rebasing now.

@zbarry
Copy link
Collaborator

zbarry commented May 11, 2019

@HectorM14 - that decorator is a great find. Thanks for adding it!

@hectormz
Copy link
Collaborator

Yes! It's a very clean method that I would have been happy to find in the Python standard library. I assume most projects go through name changes and deprecation at some point. 🤷‍♂️

@hectormz
Copy link
Collaborator

Yes agreed. Since we're at version 0.16.x right now, the responsible way to evolve the API is to keep it through the next two minor versions, and the deprecate it at the next major or 3rd minor version. Perhaps raising a warning if and only if certain functions are called/keyword arguments are used.

I didn't think of this previously, but should I add some tag comment to the decorators to indicate under which release the names were deprecated? After the decided major/minor release, it'll be easy to search for #0.16x etc and remove those lines. In a similar vein, in the unlikely event a function has multiple rounds of deprecation at different releases, a second decorator could be added for the new deprecation (unless that's just too complicated).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment