Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend __str__ and __repr__ methods of BaseTableCoordinate, ExtraCoords, GlobalCoords #453

Merged
merged 5 commits into from
Oct 22, 2021

Conversation

dhomeier
Copy link
Contributor

@dhomeier dhomeier commented Aug 4, 2021

Description

This is a WIP to improve and expand the string representations to begin addressing #413.
For a start adding physical_types information to ExtraCoords and get line breaks better in line with numpy standards for long arrays.

>>> cube.extra_coords
<ndcube.extra_coords.extra_coords.ExtraCoords object at 0x121c60b80>
ExtraCoords(time (0) ['custom:time:creation']: ['2310-03-22T12:30:30.990' '2416-03-22T12:30:32.000'
 '2416-03-23T12:30:33.000'],
            exposure time (0) ['custom:time:duration']: (<Quantity [0., 1., 2., 3., 4., 5., 6., 7., 8., 9.] s>,))
>>> np.set_printoptions(linewidth=160)
>>> ndc.extra_coords
<ndcube.extra_coords.extra_coords.ExtraCoords object at 0x121c60b80>
ExtraCoords(time (0) ['custom:time:creation']: ['2310-03-22T12:30:30.990' '2416-03-22T12:30:32.000' '2416-03-23T12:30:33.000'],
            exposure time (0) ['custom:time:duration']: (<Quantity [0., 1., 2., 3., 4., 5., 6., 7., 8., 9.] s>,))
>>> np.set_printoptions(linewidth=300)
>>> ndc.extra_coords
<ndcube.extra_coords.extra_coords.ExtraCoords object at 0x121c60b80>
ExtraCoords(time (0) ['custom:time:creation']: ['2310-03-22T12:30:30.990' '2416-03-22T12:30:32.000' '2416-03-23T12:30:33.000'], exposure time (0) ['custom:time:duration']: (<Quantity [0., 1., 2., 3., 4., 5., 6., 7., 8., 9.] s>,))

The output could still be slightly prettified for empty physical_types, but having None printed might be useful, too.

Some information on NDCube.data should be added next; do we want to keep this at metadata or also print the content in a similar manner?

@DanRyanIrish DanRyanIrish added this to the 2.0 milestone Aug 4, 2021
@DanRyanIrish
Copy link
Member

Hi @dhomeier. Thanks so much for tackling this issue. Any more progress recently?

@pep8speaks
Copy link

pep8speaks commented Aug 12, 2021

Hello @dhomeier! Thanks for updating this PR.

Line 477:56: W504 line break after binary operator
Line 242:48: W504 line break after binary operator

Line 70:70: W504 line break after binary operator

Line 661:38: E741 ambiguous variable name 'l'

Comment last updated at 2021-08-13 13:46:03 UTC

ndcube/ndcube.py Outdated
@@ -667,7 +667,8 @@ def __str__(self):
NDCube
------
Dimensions: {self.dimensions}
Physical Types of Axes: {self.array_axis_physical_types}""")
Physical Types of Axes: {self.array_axis_physical_types}
Data: Array{self.data.shape} [{self.unit}] '{self.data.dtype}'""")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Data: Array{self.data.shape} [{self.unit}] '{self.data.dtype}'""")
Unit: {self.unit}
Data Type: {self.data.dtype}""")

I suggest his change because the data shape is already captured by the dimensions line, above.

@DanRyanIrish
Copy link
Member

I like what you've done wit ExtraCoords and GlobalCoords @dhomeier . My one question is what happens if an coord is very long? Is the __str__ still readable?

@DanRyanIrish
Copy link
Member

This should have a changelog file named 453.doc.rst in the changelog directory.

@DanRyanIrish
Copy link
Member

See what my comment today on #413 for a roadmap for this PR.

@dhomeier
Copy link
Contributor Author

dhomeier commented Aug 13, 2021

I like what you've done wit ExtraCoords and GlobalCoords @dhomeier . My one question is what happens if an coord is very long? Is the __str__ still readable?

That would be handled by the Quantity / numpy print settings, e.g. with np.set_printoptions(threshold=300, linewidth=100)

GlobalCoords(length ['custom:physical_type1']:
<Quantity [400., 401., 402., ..., 897., 898., 899.] nm>,
             time ['custom:physical_type2']:
<Quantity [ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13., 14., 15., 16., 17.,
           18., 19., 20., 21., 22., 23., 24., 25., 26., 27., 28., 29., 30., 31., 32., 33., 34., 35.,
           36., 37., 38., 39.] s>)

I have not tried to match the indentation level for the values part but rather left that to Quantity; indenting additionally to the GlobalCoords position would also consume too much space imo.

Or did you mean very long (multiple?) physical_type strings?

@DanRyanIrish
Copy link
Member

No, that's exactly what I meant. I think your decision makes sense.

@dhomeier
Copy link
Contributor Author

dhomeier commented Aug 13, 2021

I followed the same for the *TableCoordinate representations, slightly inconsistent indenting but hopefully readable.
I also just stripped the tuple representation of self.table as it introduced further offsets; for a BaseTableCoordinate with multiple quantities it's still not wrapping well; might improve this further with manually putting the elements together as in ExtraCoords, but I think NDCubeSequence should have higher priority now.

<ndcube.extra_coords.table_coord.QuantityTableCoordinate object at 0x107a2c670>
QuantityTableCoordinate ['length', 'width'] ['custom:distance1', 'custom:distance2']:
<Quantity [0., 1., 2., 3., 4., 5., 6., 7., 8., 9.] km>, <Quantity [ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13.,
           14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26., 27.,
           28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38., 39., 40., 41.,
           42., 43., 44., 45., 46., 47., 48., 49.] m>

<ndcube.extra_coords.table_coord.MultipleTableCoordinate object at 0x107b61d00>
MultipleTableCoordinate(tables=[QuantityTableCoordinate [None]:
<Quantity [0., 1., 2., 3., 4., 5., 6., 7., 8., 9.] km>, <Quantity [ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.,
           13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.,
           26., 27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38.,
           39., 40., 41., 42., 43., 44., 45., 46., 47., 48., 49.] m>,
                                QuantityTableCoordinate [None]:
<Quantity [ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.,
           13., 14., 15., 16., 17., 18., 19.] s>])

@DanRyanIrish
Copy link
Member

Keep up the good work!

@Cadair
Copy link
Member

Cadair commented Aug 19, 2021

Some information on NDCube.data should be added next; do we want to keep this at metadata or also print the content in a similar manner?

I don't think printing the content, or in fact doing anything which makes it assume it's a numpy array is a good idea. Printing the actual type of the array i.e. dask vs numpy might be useful though?

@wtbarnes
Copy link
Member

wtbarnes commented Aug 19, 2021

I've been meaning to comment on this to say this would be a good opportunity to have support for the dask html repr in cases where the underlying array is a dask array. I'm not sure you would even need to special case for dask. You could presumably just pass through the html repr for the underlying array if it exists.

@Cadair
Copy link
Member

Cadair commented Aug 19, 2021

Yeah that was one of the things I was hoping to achieve as well.

@dhomeier
Copy link
Contributor Author

dhomeier commented Aug 23, 2021

Thanks for the comments on Dask arrays; I did have the pointer to its _repr_html but wasn't sure if we wanted also a method for that or just equivalent information through the __str__ and __repr__ methods. Adding a thin wrapper should in fact be quite easy; constructing something similar for ndarray probably feasible. Though looking at the examples in dask/dask#7763 maybe the question is how much there is to display for plain ndarray data.

But in fact I had not realised that NDCube.data would take Dask arrays as input objects; the general docs imply that it might accept anything (sufficiently) "ndarray-like", but the API reference states numpy.ndarray. nddata also does not advertise this very clearly, though it looks again like anything indexable with shape and __array__ attributes might get in untouched.
Do you have any further pointers where or how such usage is implemented?

@Cadair
Copy link
Member

Cadair commented Aug 24, 2021

Do you have any further pointers where or how such usage is implemented?

We pretty much don't do anything with the array at all in most places, so any "array-like" should work. The only real exceptions to this are reproject and the upcoming superpixel methods where we are actually modifying the data. If there are obvious cases in the docs where we are saying numpy array we should change that, and probably make more of a point of selling (and maintaining) good dask support.

@DanRyanIrish
Copy link
Member

@dhomeier, FYI I posted a comment relevant to this PR on the associated issue, #413

@dhomeier
Copy link
Contributor Author

We pretty much don't do anything with the array at all in most places, so any "array-like" should work. The only real exceptions

Sounds reasonable. I certainly don't think the representations would need anything beyond that. I am a bit less sure what other (inherited) NDData methods might use additional ndarray attributes.

@Cadair
Copy link
Member

Cadair commented Aug 24, 2021

Yeah it's decidedly possible that subclasses will assume ndarray, but that's ok :D

@dhomeier
Copy link
Contributor Author

I was thinking more about methods people might use that NDCube has inherited, and not overwritten, from NDData potentially using other ndarray attributes. But the only instance springing to my eye was in fact __repr__ using np.array2string in the latter... ;-)

@Cadair
Copy link
Member

Cadair commented Aug 24, 2021

Yeah, the __repr__ one is something I meant to patch in astropy a long time ago, and was the original motivation for overriding it in NDCube!

I don't think there are any other examples, but if there are we should fix them.

@dhomeier
Copy link
Contributor Author

Are there any tests using dask arrays as data?

@Cadair
Copy link
Member

Cadair commented Aug 24, 2021

I am not sure, I don't think they are that comprehensive if they exist.

@dhomeier dhomeier marked this pull request as ready for review October 21, 2021 12:27
@dhomeier dhomeier changed the title Extend __str__ and __repr__ methods Extend __str__ and __repr__ methods of BaseTableCoordinate, ExtraCoords, GlobalCoords` Oct 21, 2021
@dhomeier
Copy link
Contributor Author

Marking this as ready now, since any real overhaul of NDCube depends on #474, so this should work as is for the classes covered here.

@dhomeier dhomeier changed the title Extend __str__ and __repr__ methods of BaseTableCoordinate, ExtraCoords, GlobalCoords` Extend __str__ and __repr__ methods of BaseTableCoordinate, ExtraCoords, GlobalCoords Oct 21, 2021
@Cadair
Copy link
Member

Cadair commented Oct 21, 2021

🤔 Do we really have no examples of the __repr__ or __str__ methods in the docs at all? This surprises me. (Also means I can't see any examples of the output 🤣 )

@Cadair Cadair modified the milestones: 2.0, 2.0.0rc3 Oct 21, 2021
@dhomeier
Copy link
Contributor Author

Quite possible; not sure what documentation is needed on how to use it (it's what pops up when you just type cube.extra_coords anyway). 😜
What was not clear to me before I started this is the difference between __str__ and __repr__, but that is mostly generic Python knowledge.

@DanRyanIrish DanRyanIrish merged commit b73edc4 into sunpy:main Oct 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants