Remove Awkward-as-a-Pandas-column feature, as discussed in #350. #460

jpivarski · 2020-09-18T19:39:57Z

No description provided.

nsmith- · 2020-09-18T21:19:21Z

This PR removed Array.columns, is this intended? How do we enumerate the items of a highlevel record array in its absence?

jpivarski · 2020-09-18T21:25:45Z

That was intended. Wasn't there a deprecation message? The proper way is

ak.keys(array)

jpivarski · 2020-09-18T21:31:47Z

I also forgot that array.tojson should be dropped in favor of ak.to_json(array). The only method that stays is array.tolist (no space) because it's so useful, NumPy has it, and it's not going to conflict with a domain-specific method.

Is a non-method for keys too cumbersome? Would array.keys() conflict with any physics cases? ak.Array isn't going to be a Mapping, and I don't want to give that impression. array.columns was one of the unintended side-effects of supporting Pandas, which confused at least one person in a tutorial.

nsmith- · 2020-09-18T21:35:20Z

On master,

>>> import awkward1
>>> a = awkward1.Array([{"x": 1, "y": 2}])
>>> a.columns
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ncsmith/src/awkward-1.0/awkward1/highlevel.py", line 1058, in __getattr__
    raise AttributeError(
AttributeError: no field named 'columns'

(https://github.com/scikit-hep/awkward-1.0/blob/0.3.0/src/awkward1/highlevel.py#L1060)

We've been using a.columns in coffea documentation for a while, and it was available in ak0..
Why drop this and not ['Mask', 'behavior', 'cache', 'layout', 'mask', 'nbytes', 'numba_type', 'slot0', 'slot1', 'slot2', 'slot3', 'slot4', 'slot5', 'slot6', 'slot7', 'slot8', 'slot9', 'tojson', 'tolist'] ? It seems to me the well-intentioned separation between awkward operations and behaviors is becoming overzealous.

jpivarski · 2020-09-18T21:45:12Z

columns was never intended or desired. The high-level name for them everywhere is keys. (Although internally, I do slip into fields.)

tojson was a generalization of tolist, but having tolist as a method is far more useful.

nbytes is because caches use this to determine how much of the cache an object is using (not just Uproot's).

The slot* form a pattern. It's not really taking up 10 names from the namespace, but 1-ish.

layout and behavior were intended from the beginning, as intentional gaps in the namespace.

numba_type was a later addition, but unlikely to hit anyone's domain-specific data because it has a product name in it.

mask is specifically for the array[cut] → array.mask[cut] generalization, so it has to be a property. Mask is the same thing, capitalized, so the two of these are 1-ish.

cache was originally metadata so that any more such things would be hidden inside that one spot in the namespace. But you know how that turned out.

That's all of them. They all had reasons. If it's convenient to want to add .keys() or .fields as an afterthought, after typing the array, then I'm willing to add it in the same spirit as array.tolist(). You've pointed out how I'm not 100% purist about using this namespace; I'm willing to make this a method/property, but not named "columns." We don't use the word "columns" anywhere else to refer to this concept, and it is good to reign in the number of different names for the same thing.

nsmith- · 2020-09-18T22:09:24Z

I guess now or never to change. I scanned code I have and it looks like switching x.columns to ak.keys(x) is palatable. The fact that awkward0 arrays were near enough to pandas (or Spark) dataframes with respect to this property was, I guess, an unintended feature that we've gotten used to.

Remove Awkward-as-a-Pandas-column feature, as discussed in #350.

37c890d

jpivarski linked an issue Sep 18, 2020 that may be closed by this pull request

Should Awkward Arrays be usable as Pandas columns? #350

Closed

jpivarski merged commit 3d461fc into master Sep 18, 2020

jpivarski deleted the jpivarski/remove-awkward-as-pandas-column branch September 18, 2020 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Awkward-as-a-Pandas-column feature, as discussed in #350. #460

Remove Awkward-as-a-Pandas-column feature, as discussed in #350. #460

jpivarski commented Sep 18, 2020

nsmith- commented Sep 18, 2020

jpivarski commented Sep 18, 2020

jpivarski commented Sep 18, 2020

nsmith- commented Sep 18, 2020

jpivarski commented Sep 18, 2020

nsmith- commented Sep 18, 2020

Remove Awkward-as-a-Pandas-column feature, as discussed in #350. #460

Remove Awkward-as-a-Pandas-column feature, as discussed in #350. #460

Conversation

jpivarski commented Sep 18, 2020

nsmith- commented Sep 18, 2020

jpivarski commented Sep 18, 2020

jpivarski commented Sep 18, 2020

nsmith- commented Sep 18, 2020

jpivarski commented Sep 18, 2020

nsmith- commented Sep 18, 2020