Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features/89 str repr #614

Merged
merged 19 commits into from Jul 1, 2020
Merged

Features/89 str repr #614

merged 19 commits into from Jul 1, 2020

Conversation

Markus-Goetz
Copy link
Member

Description

Added the capability to print out DNDarrays

Issue/s resolved: #89

Changes proposed:

  • Added a str function to the DNDarray class
  • Got rid of the repr as it is only a fallback when str is not implemented
  • Introduced printing module containing the

Type of change

  • Breaking change - requires all documentation to be updated with respect to new printing format in the Examples sections.
  • Documentation update

Due Diligence

  • All split configurations tested
  • Multiple dtypes tested in relevant functions
  • Documentation updated (if needed)
  • Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

no

"""
Computes a string representation of the passed DNDarray.
"""
return printing.__str__(self)
Copy link
Collaborator

@mtar mtar Jun 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the output resemble torch or numpy? In numpy there is a slightly different output between repr() and str().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am aware of this. I was wondering also on how to proceed with this and whether to chose NumPy's behaviour (i.e. different) or Torch's (same). I think there is no discussion for __repr__() as it is supposed to

Return a string containing a printable representation of an object. For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval()

according to the Python documentation. For __str__(), though, the Python documentation says

str(object) returns object.__str__(), which is the “informal” or nicely printable string representation of object.

In my opinion this also entail the extra meta-information of split-axis and device as they are integral for understanding everything about the DNDarray's state.

@Markus-Goetz Markus-Goetz added this to the 2-week sprint milestone Jun 30, 2020
@codecov
Copy link

codecov bot commented Jun 30, 2020

Codecov Report

Merging #614 into master will increase coverage by 0.03%.
The diff coverage is 99.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #614      +/-   ##
==========================================
+ Coverage   97.44%   97.48%   +0.03%     
==========================================
  Files          75       77       +2     
  Lines       15291    15429     +138     
==========================================
+ Hits        14901    15041     +140     
+ Misses        390      388       -2     
Impacted Files Coverage Δ
heat/core/tests/test_communication.py 97.96% <ø> (-0.01%) ⬇️
heat/core/tests/test_constants.py 100.00% <ø> (ø)
heat/core/tests/test_indexing.py 100.00% <ø> (ø)
heat/core/tests/test_io.py 93.03% <ø> (-0.03%) ⬇️
heat/core/tests/test_manipulations.py 99.90% <ø> (-0.01%) ⬇️
heat/core/tests/test_relational.py 100.00% <ø> (ø)
heat/core/tests/test_stride_tricks.py 100.00% <ø> (ø)
heat/core/tests/test_tiling.py 100.00% <ø> (ø)
heat/core/dndarray.py 96.93% <80.00%> (+<0.01%) ⬆️
heat/core/__init__.py 100.00% <100.00%> (ø)
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4fb25f...56a3349. Read the comment docs.

heat/core/printing.py Outdated Show resolved Hide resolved
heat/core/printing.py Outdated Show resolved Hide resolved
heat/core/printing.py Outdated Show resolved Hide resolved
Copy link
Member

@coquelin77 coquelin77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks fine to me. i have no vested interest in making HeAT look exactly like torch or numpy. as long as its legible and easy to use (which this is) i think its great.

However, until the doc string reformatting is done i think we should avoid the type hints. since these are already done, they could be moved into a comment / into the doc string. that way once the doc string reformatting is done these can be copied directly

CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Contributor

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is brilliant, @Markus-Goetz !
I have a problem with printing slices though. Example: this one works:

a = ht.arange(3 * 4, split=0).reshape((3, 2, 2))
print(a)

mpirun -n 2

[1,0]<stdout>:DNDarray([[[ 0,  1],
[1,0]<stdout>:           [ 2,  3]],
[1,0]<stdout>:
[1,0]<stdout>:          [[ 4,  5],
[1,0]<stdout>:           [ 6,  7]],
[1,0]<stdout>:
[1,0]<stdout>:          [[ 8,  9],
[1,0]<stdout>:           [10, 11]]], dtype=ht.int32, device=cpu:0, split=0)
[1,1]<stdout>:

This one doesn't:

a = ht.arange(3 * 4, split=0).reshape((3, 2, 2))
print(a[0])
[1,0]<stderr>:Traceback (most recent call last):
[1,0]<stderr>:  File "local_test.py", line 327, in <module>
[1,0]<stderr>:    print(a[0])
[1,0]<stderr>:  File "/Users/c.comito/HAF/heat/heat/core/dndarray.py", line 3357, in __str__
[1,0]<stderr>:    return printing.__str__(self)
[1,0]<stderr>:  File "/Users/c.comito/HAF/heat/heat/core/printing.py", line 74, in __str__
[1,0]<stderr>:    tensor_string = _tensor_str(dndarray, __INDENT + 1)
[1,0]<stderr>:  File "/Users/c.comito/HAF/heat/heat/core/printing.py", line 172, in _tensor_str
[1,0]<stderr>:    torch_data = _torch_data(dndarray, summarize)
[1,0]<stderr>:  File "/Users/c.comito/HAF/heat/heat/core/printing.py", line 99, in _torch_data
[1,0]<stderr>:    data = dndarray.copy().resplit_(None)._DNDarray__array
[1,0]<stderr>:  File "/Users/c.comito/HAF/heat/heat/core/dndarray.py", line 2692, in resplit_
[1,0]<stderr>:    self.comm.Allgatherv(self.__array, (gathered, counts, displs), recv_axis=self.split)
[1,0]<stderr>:  File "/Users/c.comito/HAF/heat/heat/core/communication.py", line 674, in Allgatherv
[1,0]<stderr>:    self.handle.Allgatherv, sendbuf, recvbuf, recv_axis
[1,0]<stderr>:  File "/Users/c.comito/HAF/heat/heat/core/communication.py", line 642, in __allgather_like
[1,0]<stderr>:    exit_code = func(mpi_sendbuf, mpi_recvbuf, **kwargs)
[1,0]<stderr>:  File "mpi4py/MPI/Comm.pyx", line 652, in mpi4py.MPI.Comm.Allgatherv
[1,0]<stderr>:mpi4py.MPI.Exception: MPI_ERR_TRUNCATE: message truncated

(same error message on both ranks)

Slightly better but still wrong (Alltoall feeling...):

a = ht.arange(3 * 4, split=0).reshape((3, 2, 2), axis=1)
print(a[0])
[1,0]<stdout>:DNDarray([[0, 2],
[1,0]<stdout>:          [1, 3]], dtype=ht.int32, device=cpu:0, split=1)
[1,1]<stdout>:

And finally the expected result:

a = ht.arange(3 * 4, split=0).reshape((3, 2, 2), axis=2)
print(a[0])
[1,0]<stdout>:DNDarray([[0, 1],
[1,0]<stdout>:          [2, 3]], dtype=ht.int32, device=cpu:0, split=1)
[1,1]<stdout>:

Printing out slices of other dimensions works better although I've seen some quirks there as well.

Sorry about the lengthy feedback.

heat/core/printing.py Outdated Show resolved Hide resolved
@Markus-Goetz
Copy link
Member Author

I had the same issue during testing. It is actually not an issue with the printing functionality itself, but rather with resplit_ and the usage of alltoallv, see also:

[1,0]<stderr>: File "/Users/c.comito/HAF/heat/heat/core/dndarray.py", line 2692, in resplit_

This is a known issue with alltoallv. I dont think I should be fixing this in this PR. Any suggestions on how to proceed?

… explanatory comments to correctly state concatenate instead of stacking (different behavior
@ClaudiaComito
Copy link
Contributor

ClaudiaComito commented Jul 1, 2020

I had the same issue during testing. It is actually not an issue with the printing functionality itself, but rather with resplit_ and the usage of alltoallv, see also:

[1,0]<stderr>: File "/Users/c.comito/HAF/heat/heat/core/dndarray.py", line 2692, in resplit_

This is a known issue with alltoallv. I dont think I should be fixing this in this PR. Any suggestions on how to proceed?

I figured, I'm a bit puzzled because I thought the current version of resplit is bypassing Alltoallv.

And actually, the error message above comes from resplit_(None). It's Allgatherv, not Alltoallv. Why doesn't this work? We're using it all over the library.

(I can see that my 3rd example is an Alltoallv problem though)

Or am I misinterpreting the error message?

@mtar
Copy link
Collaborator

mtar commented Jul 1, 2020

Can it be a problem with get_item? The shape of a[0] above returns (2,2) and (0,).

@coquelin77
Copy link
Member

Can it be a problem with get_item? The shape of a[0] above returns (2,2) and (0,).

which shape are you talking about?

    a = ht.arange(3 * 4, split=0).reshape((3, 2, 2), axis=1)
    b = a[0]
    print(b.shape)
    print(b.lshape)
    print(b)
[0] (2, 2)
[0] (1, 2)
[0] DNDarray([[0, 2],
[0]           [1, 3]], dtype=ht.int32, device=cpu:0, split=1)
[1] (2, 2)
[1] (1, 2)
[1] 

@mtar
Copy link
Collaborator

mtar commented Jul 1, 2020

where the split axis is zero

b.shape
(2, 2)
(2, 2)
b.lshape
(2, 2)
(0,)

On normal DNDarrays the local shape is more like (2,2) and (0,2)

@coquelin77
Copy link
Member

where the split axis is zero

there are no elements in the tensor on that process. lshape returns the size of the local torch.Tensor. it does not know anything about the global size. this could be the source of the problem for Allgatherv.

@coquelin77
Copy link
Member

coquelin77 commented Jul 1, 2020

the local shape is (0, ). this is simply what torch does. it does not know that it is used anywhere else. this could be changed but it may have far reaching implications. also, even with monkey patching this to test it, it fails.

there is something that was omitted there:

    a = ht.arange(3 * 4, split=0).reshape((3, 2, 2), axis=0)
    b = a[0]
    print(b.shape)
    print(b.lshape)
[0] (2, 2)
[0] (2, 2)
[1] /home/daniel/.git/heat/heat/core/dndarray.py:1583: ResourceWarning: This process (rank: 1) is without data after slicing, running the .balance_() function is recommended
[1]   ResourceWarning,
[1] (2, 2)
[1] (0,)

here is a working script:

    a = ht.arange(3 * 4, split=0).reshape((3, 2, 2), axis=0)
    b = a[0]
    b.balance_()
    print(b.shape)
    print(b.lshape)
    print(b)
[1] /home/daniel/.git/heat/heat/core/dndarray.py:1583: ResourceWarning: This process (rank: 1) is without data after slicing, running the .balance_() function is recommended
[1]   ResourceWarning,
[0] (2, 2)
[0] (1, 2)
[0] DNDarray([[0, 1],
[0]           [2, 3]], dtype=ht.int32, device=cpu:0, split=0)
[1] (2, 2)
[1] (1, 2)
[1] 

the balance is key there. that is why it was failing

Copy link
Member

@coquelin77 coquelin77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has my approval. unbalanced arrays do not print properly (i.e. slices) but that is expected as unbalanced arrays will break lots of things

@coquelin77 coquelin77 merged commit afd1968 into master Jul 1, 2020
@coquelin77 coquelin77 deleted the features/89-str-repr branch July 1, 2020 13:23
@ClaudiaComito
Copy link
Contributor

the balance is key there. that is why it was failing

I'm not amused, but willing to wait until the first users complain about this.

@ClaudiaComito ClaudiaComito added this to Done in Current sprint Jul 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Add custom __str__ and __repr__ functions for tensors
4 participants