New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Features/89 str repr #614
Features/89 str repr #614
Conversation
…ule. Added temporary implementation with empty string for the content but correct array meta information
…tion to account correctly for profile=None
…use of torch print options instead of own. Draft implementation of __repr__ done, split case above threshold is still todo. Test are partially incomplete dummies
…ray in test cases
""" | ||
Computes a string representation of the passed DNDarray. | ||
""" | ||
return printing.__str__(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the output resemble torch or numpy? In numpy there is a slightly different output between repr() and str().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I am aware of this. I was wondering also on how to proceed with this and whether to chose NumPy's behaviour (i.e. different) or Torch's (same). I think there is no discussion for __repr__()
as it is supposed to
Return a string containing a printable representation of an object. For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to
eval()
according to the Python documentation. For __str__()
, though, the Python documentation says
str(object)
returnsobject.__str__()
, which is the “informal” or nicely printable string representation of object.
In my opinion this also entail the extra meta-information of split-axis and device as they are integral for understanding everything about the DNDarray's
state.
…/heat into features/89-str-repr
…a portion at high indices is smaller than the number of edgeitems to be displayed
Codecov Report
@@ Coverage Diff @@
## master #614 +/- ##
==========================================
+ Coverage 97.44% 97.48% +0.03%
==========================================
Files 75 77 +2
Lines 15291 15429 +138
==========================================
+ Hits 14901 15041 +140
+ Misses 390 388 -2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks fine to me. i have no vested interest in making HeAT look exactly like torch or numpy. as long as its legible and easy to use (which this is) i think its great.
However, until the doc string reformatting is done i think we should avoid the type hints. since these are already done, they could be moved into a comment / into the doc string. that way once the doc string reformatting is done these can be copied directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is brilliant, @Markus-Goetz !
I have a problem with printing slices though. Example: this one works:
a = ht.arange(3 * 4, split=0).reshape((3, 2, 2))
print(a)
mpirun -n 2
[1,0]<stdout>:DNDarray([[[ 0, 1],
[1,0]<stdout>: [ 2, 3]],
[1,0]<stdout>:
[1,0]<stdout>: [[ 4, 5],
[1,0]<stdout>: [ 6, 7]],
[1,0]<stdout>:
[1,0]<stdout>: [[ 8, 9],
[1,0]<stdout>: [10, 11]]], dtype=ht.int32, device=cpu:0, split=0)
[1,1]<stdout>:
This one doesn't:
a = ht.arange(3 * 4, split=0).reshape((3, 2, 2))
print(a[0])
[1,0]<stderr>:Traceback (most recent call last):
[1,0]<stderr>: File "local_test.py", line 327, in <module>
[1,0]<stderr>: print(a[0])
[1,0]<stderr>: File "/Users/c.comito/HAF/heat/heat/core/dndarray.py", line 3357, in __str__
[1,0]<stderr>: return printing.__str__(self)
[1,0]<stderr>: File "/Users/c.comito/HAF/heat/heat/core/printing.py", line 74, in __str__
[1,0]<stderr>: tensor_string = _tensor_str(dndarray, __INDENT + 1)
[1,0]<stderr>: File "/Users/c.comito/HAF/heat/heat/core/printing.py", line 172, in _tensor_str
[1,0]<stderr>: torch_data = _torch_data(dndarray, summarize)
[1,0]<stderr>: File "/Users/c.comito/HAF/heat/heat/core/printing.py", line 99, in _torch_data
[1,0]<stderr>: data = dndarray.copy().resplit_(None)._DNDarray__array
[1,0]<stderr>: File "/Users/c.comito/HAF/heat/heat/core/dndarray.py", line 2692, in resplit_
[1,0]<stderr>: self.comm.Allgatherv(self.__array, (gathered, counts, displs), recv_axis=self.split)
[1,0]<stderr>: File "/Users/c.comito/HAF/heat/heat/core/communication.py", line 674, in Allgatherv
[1,0]<stderr>: self.handle.Allgatherv, sendbuf, recvbuf, recv_axis
[1,0]<stderr>: File "/Users/c.comito/HAF/heat/heat/core/communication.py", line 642, in __allgather_like
[1,0]<stderr>: exit_code = func(mpi_sendbuf, mpi_recvbuf, **kwargs)
[1,0]<stderr>: File "mpi4py/MPI/Comm.pyx", line 652, in mpi4py.MPI.Comm.Allgatherv
[1,0]<stderr>:mpi4py.MPI.Exception: MPI_ERR_TRUNCATE: message truncated
(same error message on both ranks)
Slightly better but still wrong (Alltoall feeling...):
a = ht.arange(3 * 4, split=0).reshape((3, 2, 2), axis=1)
print(a[0])
[1,0]<stdout>:DNDarray([[0, 2],
[1,0]<stdout>: [1, 3]], dtype=ht.int32, device=cpu:0, split=1)
[1,1]<stdout>:
And finally the expected result:
a = ht.arange(3 * 4, split=0).reshape((3, 2, 2), axis=2)
print(a[0])
[1,0]<stdout>:DNDarray([[0, 1],
[1,0]<stdout>: [2, 3]], dtype=ht.int32, device=cpu:0, split=1)
[1,1]<stdout>:
Printing out slices of other dimensions works better although I've seen some quirks there as well.
Sorry about the lengthy feedback.
I had the same issue during testing. It is actually not an issue with the printing functionality itself, but rather with
This is a known issue with alltoallv. I dont think I should be fixing this in this PR. Any suggestions on how to proceed? |
… explanatory comments to correctly state concatenate instead of stacking (different behavior
I figured, I'm a bit puzzled because I thought the current version of resplit is bypassing Alltoallv. And actually, the error message above comes from resplit_(None). It's Allgatherv, not Alltoallv. Why doesn't this work? We're using it all over the library. (I can see that my 3rd example is an Alltoallv problem though) Or am I misinterpreting the error message? |
…e PR due to move towards doc branch
Can it be a problem with get_item? The shape of a[0] above returns (2,2) and (0,). |
which shape are you talking about? a = ht.arange(3 * 4, split=0).reshape((3, 2, 2), axis=1)
b = a[0]
print(b.shape)
print(b.lshape)
print(b)
|
where the split axis is zero
On normal DNDarrays the local shape is more like (2,2) and (0,2) |
there are no elements in the tensor on that process. |
the local shape is there is something that was omitted there: a = ht.arange(3 * 4, split=0).reshape((3, 2, 2), axis=0)
b = a[0]
print(b.shape)
print(b.lshape)
here is a working script: a = ht.arange(3 * 4, split=0).reshape((3, 2, 2), axis=0)
b = a[0]
b.balance_()
print(b.shape)
print(b.lshape)
print(b)
the balance is key there. that is why it was failing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has my approval. unbalanced arrays do not print properly (i.e. slices) but that is expected as unbalanced arrays will break lots of things
I'm not amused, but willing to wait until the first users complain about this. |
Description
Added the capability to print out DNDarrays
Issue/s resolved: #89
Changes proposed:
Type of change
Examples
sections.Due Diligence
Does this change modify the behaviour of other functions? If so, which?
no