Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename ak.choose and ak.cross. #201

Merged
merged 10 commits into from Apr 3, 2020
Merged

Conversation

jpivarski
Copy link
Member

This is for #199, and I think I'm going to go with

  • choosecombinations
  • diagonalreplacement
  • crossproduct

for the sake of following itertools as much as possible. cartesian is also a good word for the last one, but I think if we're going to follow itertools for one, we should follow it for both. (These two functions are related; rules for one should apply to the other.) Besides, someone might not know who Descartes is.

@jpivarski
Copy link
Member Author

Something to consider:

  • rpadpadna

It can't just be "pad" because of np.pad. We (with @ianna) added an "r" because the None values go on the right, not the left. But really, unequal length lists always align left like this—it has to do with the fact that the leftmost objects can be identified with each other while the rightmost cannot. Putting "r" for "right" in the name is (1) a little obscure; I wouldn't guess "right", and (2) begs the question, "why not left?" I don't want to entertain that question.

So instead of saying which side should get padded, since this function isn't actually different in that regard, let's use the disambiguator to say what it should be padded with. NumPy's pad fills with zeros; Pandas's pd.Series.str.pad pads with spaces. The fact that Awkward fills with None is a bit unusual—the other libraries don't have the opportunity to do that.

So maybe this or

  • rpadpad_none and
  • fillnafill_none

This would be more in keeping with the use of snake_case and whole words, which was a request from many people (scikit-hep/awkward-0.x#240). @nsmith- suggested fillna for agreement with Pandas, but that's a method on DataFrame, not a free function like ours, and the "na" comes from R, which doesn't really have an equivalent in Python.

@jpivarski
Copy link
Member Author

jpivarski commented Apr 2, 2020

Ach! But ak.Array must have a method named isna or else it can't be a Pandas column!

Okay, so there's a difference between free functions and methods:

  • function ak.pad_none
  • function ak.fill_none
  • function ak.is_none (no function ak.not_none; that's gratuitous)
  • function ak.to_list
  • method ak.Array.isna because Pandas requires it
  • method ak.Array.dropna might be required for performance in Pandas
  • method ak.Array.fillna might be required for performance in Pandas
  • method ak.Array.tolist to be like NumPy; it's just natural to type
  • method ak.Array.tojson for symmetry

If the People ask for it, synonyms ak.tolistak.to_list and ak.tojsonak.to_json can be later added, but only if it's a true desire path.

To do:

  • Document the ak.Array methods that exist only for Pandas compatibility.

@jpivarski
Copy link
Member Author

Hmm. NumPy has an np.product, but it's going to be deprecated. That could count as an argument against it—a NumPy function of that name technically exists—but it could count as an argument for it—NumPy will never use that name again, for fear of conflicts with old code.

@jpivarski
Copy link
Member Author

As long as they're not quite the same, I'll keep their different names. That, at least, will prevent misidentification by accident. The NumPy functions seem to be designed for a single large matrix, while the Awkward functions are reducer-like, which favors computing many small covariances and correlations (e.g. tracking or vertex-finding or something).

Also, the ak.covar name has symmetry with ak.var, and they are much closer in behavior than np.cov is to np.var.

@jpivarski
Copy link
Member Author

jpivarski commented Apr 2, 2020

Manually document

  • ak.numba.register
  • ak.pandas.register
  • ak.pandas.df
  • ak.pandas.dfs
  • ak.numexpr.evaluate
  • ak.numexpr.re_evaluate
  • ak.autograd.elementwise_grad

@jpivarski jpivarski changed the title [WIP] Rename ak.choose and ak.cross. Rename ak.choose and ak.cross. Apr 3, 2020
@jpivarski
Copy link
Member Author

Final names:

old new
ak.choose ak.combinations
ak.diagonal ak.replacement
ak.cross ak.cartesian
ak.rpad ak.pad_none
ak.fillna ak.fill_none
ak.isna ak.is_none
ak.notna gone!
ak.Array.to_list ak.Array.tolist
ak.Array.to_json ak.Array.tojson

ak.covar and ak.corr are unchanged, as are the methods that are required for Pandas.

@jpivarski jpivarski merged commit 5ef9c9f into master Apr 3, 2020
@jpivarski jpivarski deleted the bug/0200-rename-choose-and-cross branch April 3, 2020 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant