Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Awkward-as-a-Pandas-column feature, as discussed in #350. #460

Merged
merged 1 commit into from
Sep 18, 2020

Conversation

jpivarski
Copy link
Member

No description provided.

@jpivarski jpivarski linked an issue Sep 18, 2020 that may be closed by this pull request
@jpivarski jpivarski merged commit 3d461fc into master Sep 18, 2020
@jpivarski jpivarski deleted the jpivarski/remove-awkward-as-pandas-column branch September 18, 2020 20:21
@nsmith-
Copy link
Contributor

nsmith- commented Sep 18, 2020

This PR removed Array.columns, is this intended? How do we enumerate the items of a highlevel record array in its absence?

@jpivarski
Copy link
Member Author

That was intended. Wasn't there a deprecation message? The proper way is

ak.keys(array)

@jpivarski
Copy link
Member Author

I also forgot that array.tojson should be dropped in favor of ak.to_json(array). The only method that stays is array.tolist (no space) because it's so useful, NumPy has it, and it's not going to conflict with a domain-specific method.

Is a non-method for keys too cumbersome? Would array.keys() conflict with any physics cases? ak.Array isn't going to be a Mapping, and I don't want to give that impression. array.columns was one of the unintended side-effects of supporting Pandas, which confused at least one person in a tutorial.

@nsmith-
Copy link
Contributor

nsmith- commented Sep 18, 2020

On master,

>>> import awkward1
>>> a = awkward1.Array([{"x": 1, "y": 2}])
>>> a.columns
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ncsmith/src/awkward-1.0/awkward1/highlevel.py", line 1058, in __getattr__
    raise AttributeError(
AttributeError: no field named 'columns'

(https://github.com/scikit-hep/awkward-1.0/blob/0.3.0/src/awkward1/highlevel.py#L1060)

We've been using a.columns in coffea documentation for a while, and it was available in ak0..
Why drop this and not ['Mask', 'behavior', 'cache', 'layout', 'mask', 'nbytes', 'numba_type', 'slot0', 'slot1', 'slot2', 'slot3', 'slot4', 'slot5', 'slot6', 'slot7', 'slot8', 'slot9', 'tojson', 'tolist'] ? It seems to me the well-intentioned separation between awkward operations and behaviors is becoming overzealous.

@jpivarski
Copy link
Member Author

columns was never intended or desired. The high-level name for them everywhere is keys. (Although internally, I do slip into fields.)

tojson was a generalization of tolist, but having tolist as a method is far more useful.

nbytes is because caches use this to determine how much of the cache an object is using (not just Uproot's).

The slot* form a pattern. It's not really taking up 10 names from the namespace, but 1-ish.

layout and behavior were intended from the beginning, as intentional gaps in the namespace.

numba_type was a later addition, but unlikely to hit anyone's domain-specific data because it has a product name in it.

mask is specifically for the array[cut]array.mask[cut] generalization, so it has to be a property. Mask is the same thing, capitalized, so the two of these are 1-ish.

cache was originally metadata so that any more such things would be hidden inside that one spot in the namespace. But you know how that turned out.

That's all of them. They all had reasons. If it's convenient to want to add .keys() or .fields as an afterthought, after typing the array, then I'm willing to add it in the same spirit as array.tolist(). You've pointed out how I'm not 100% purist about using this namespace; I'm willing to make this a method/property, but not named "columns." We don't use the word "columns" anywhere else to refer to this concept, and it is good to reign in the number of different names for the same thing.

@nsmith-
Copy link
Contributor

nsmith- commented Sep 18, 2020

I guess now or never to change. I scanned code I have and it looks like switching x.columns to ak.keys(x) is palatable. The fact that awkward0 arrays were near enough to pandas (or Spark) dataframes with respect to this property was, I guess, an unintended feature that we've gotten used to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should Awkward Arrays be usable as Pandas columns?
2 participants