Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add __repr__ for Column and ColumnAccessor #7531

Merged
merged 2 commits into from
Mar 12, 2021

Conversation

shwina
Copy link
Contributor

@shwina shwina commented Mar 8, 2021

Summary:

  • Add a __repr__ for Column (thin wrapper around the __repr__ of the underlying pa.Array)
  • Add a __repr__ for ColumnAccessor (similar to pa.Table, shows the names/types of the columns of the ColumnAccessor)

Additional info:

Debugging is sometimes made painful by the fact that we don't have a __repr__ for columns and column accessors. For example, here's what a ColumnAccessor and a Column currently look like when printed...:

In [2]: cudf.DataFrame({'a': [1, 2, 3], "b": [4, 5, 6], "z_1": [2, 3, 4]})._data
Out[2]: ColumnAccessor(OrderedColumnDict([('a', <cudf.core.column.numerical.NumericalColumn object at 0x7f0306336f80>), ('b', <cudf.core.column.numerical.NumericalColumn object at 0x7f03062a05f0>), ('z_1', <cudf.core.column.numerical.NumericalColumn object at 0x7f03062a0e60>)]), multiindex=False, level_names=(None,))

In [3]: cudf.Series([1, 2, None, 3])._column
Out[3]: <cudf.core.column.numerical.NumericalColumn at 0x7f2190746710>

After this PR:

In [2]: cudf.DataFrame({'a': [1, 2, 3], "b": [4, 5, 6], "z_1": [2, 3, 4]})._data
Out[2]:
ColumnAccessor(multiindex=False, level_names=(None,))
a: int64
b: int64
z_1: int64

In [3]: cudf.Series([1, 2, None, 3])._column
Out[3]:
<cudf.core.column.numerical.NumericalColumn object at 0x7f3e90c2ac20>
[
  1,
  2,
  null,
  3
]
dtype: int64

@shwina shwina requested a review from a team as a code owner March 8, 2021 19:24
@github-actions github-actions bot added the Python Affects Python cuDF API. label Mar 8, 2021
@shwina
Copy link
Contributor Author

shwina commented Mar 8, 2021

Should ColumnAccessor print more like a dictionary to reinforce that it's a mapping?

@shwina shwina added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 8, 2021
@shwina shwina added the 3 - Ready for Review Ready for review by team label Mar 8, 2021
@isVoid
Copy link
Contributor

isVoid commented Mar 8, 2021

Does {'a': 1, 'b': 2} imply the keys are unordered in python's context? I suggest using extra symbols to also indicate the orderness.

@shwina
Copy link
Contributor Author

shwina commented Mar 8, 2021

@isVoid currently we don't print ColumnAccessor like dicts -- so there's no implication of unordered-ness.

@shwina
Copy link
Contributor Author

shwina commented Mar 11, 2021

rerun tests

@codecov
Copy link

codecov bot commented Mar 11, 2021

Codecov Report

Merging #7531 (0afb9a2) into branch-0.19 (7871e7a) will increase coverage by 0.51%.
The diff coverage is 92.00%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #7531      +/-   ##
===============================================
+ Coverage        81.86%   82.38%   +0.51%     
===============================================
  Files              101      101              
  Lines            16884    17340     +456     
===============================================
+ Hits             13822    14285     +463     
+ Misses            3062     3055       -7     
Impacted Files Coverage Δ
python/cudf/cudf/core/column/column.py 87.83% <83.33%> (+0.07%) ⬆️
python/cudf/cudf/core/column/decimal.py 93.33% <90.90%> (-1.54%) ⬇️
python/cudf/cudf/core/column/string.py 86.76% <100.00%> (+0.26%) ⬆️
python/cudf/cudf/core/column_accessor.py 95.45% <100.00%> (+0.14%) ⬆️
python/cudf/cudf/utils/cudautils.py 52.94% <100.00%> (+2.55%) ⬆️
python/cudf/cudf/utils/gpu_utils.py 53.65% <0.00%> (-4.88%) ⬇️
python/cudf/cudf/core/abc.py 87.23% <0.00%> (-1.14%) ⬇️
python/cudf/cudf/core/column/numerical.py 94.85% <0.00%> (-0.17%) ⬇️
python/cudf/cudf/io/feather.py 100.00% <0.00%> (ø)
... and 48 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9618a81...0afb9a2. Read the comment docs.

@kkraus14
Copy link
Collaborator

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 5c4fa28 into rapidsai:branch-0.19 Mar 12, 2021
hyperbolic2346 pushed a commit to hyperbolic2346/cudf that referenced this pull request Mar 25, 2021
## Summary:

* Add a `__repr__` for Column (thin wrapper around the `__repr__` of the underlying pa.Array)
* Add a `__repr__` for ColumnAccessor (similar to pa.Table, shows the names/types of the columns of the ColumnAccessor)

## Additional info:

Debugging is sometimes made painful by the fact that we don't have a `__repr__` for columns and column accessors. For example, here's what a `ColumnAccessor` and a `Column` currently look like when printed...:

```python
In [2]: cudf.DataFrame({'a': [1, 2, 3], "b": [4, 5, 6], "z_1": [2, 3, 4]})._data
Out[2]: ColumnAccessor(OrderedColumnDict([('a', <cudf.core.column.numerical.NumericalColumn object at 0x7f0306336f80>), ('b', <cudf.core.column.numerical.NumericalColumn object at 0x7f03062a05f0>), ('z_1', <cudf.core.column.numerical.NumericalColumn object at 0x7f03062a0e60>)]), multiindex=False, level_names=(None,))

In [3]: cudf.Series([1, 2, None, 3])._column
Out[3]: <cudf.core.column.numerical.NumericalColumn at 0x7f2190746710>
```

After this PR:

```python
In [2]: cudf.DataFrame({'a': [1, 2, 3], "b": [4, 5, 6], "z_1": [2, 3, 4]})._data
Out[2]:
ColumnAccessor(multiindex=False, level_names=(None,))
a: int64
b: int64
z_1: int64

In [3]: cudf.Series([1, 2, None, 3])._column
Out[3]:
<cudf.core.column.numerical.NumericalColumn object at 0x7f3e90c2ac20>
[
  1,
  2,
  null,
  3
]
dtype: int64
```

Authors:
  - Ashwin Srinath (@shwina)

Approvers:
  - Keith Kraus (@kkraus14)

URL: rapidsai#7531
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants