Features/89 str repr #614

Markus-Goetz · 2020-06-30T09:28:20Z

Description

Added the capability to print out DNDarrays

Issue/s resolved: #89

Changes proposed:

Added a str function to the DNDarray class
Got rid of the repr as it is only a fallback when str is not implemented
Introduced printing module containing the

Type of change

Breaking change - requires all documentation to be updated with respect to new printing format in the Examples sections.
Documentation update

Due Diligence

All split configurations tested
Multiple dtypes tested in relevant functions
Documentation updated (if needed)
Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

no

…ule. Added temporary implementation with empty string for the content but correct array meta information

…tion to account correctly for profile=None

…use of torch print options instead of own. Draft implementation of __repr__ done, split case above threshold is still todo. Test are partially incomplete dummies

…ded DNDarrays

…ray in test cases

mtar · 2020-06-30T10:18:05Z

heat/core/dndarray.py

+        """
+        Computes a string representation of the passed DNDarray.
+        """
+        return printing.__str__(self)


Should the output resemble torch or numpy? In numpy there is a slightly different output between repr() and str().

Yes, I am aware of this. I was wondering also on how to proceed with this and whether to chose NumPy's behaviour (i.e. different) or Torch's (same). I think there is no discussion for __repr__() as it is supposed to

Return a string containing a printable representation of an object. For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval()

according to the Python documentation. For __str__(), though, the Python documentation says

str(object) returns object.__str__(), which is the “informal” or nicely printable string representation of object.

In my opinion this also entail the extra meta-information of split-axis and device as they are integral for understanding everything about the DNDarray's state.

…/heat into features/89-str-repr

…a portion at high indices is smaller than the number of edgeitems to be displayed

codecov · 2020-06-30T13:55:48Z

Codecov Report

Merging #614 into master will increase coverage by 0.03%.
The diff coverage is 99.50%.

@@            Coverage Diff             @@
##           master     #614      +/-   ##
==========================================
+ Coverage   97.44%   97.48%   +0.03%     
==========================================
  Files          75       77       +2     
  Lines       15291    15429     +138     
==========================================
+ Hits        14901    15041     +140     
+ Misses        390      388       -2

Impacted Files	Coverage Δ
heat/core/tests/test_communication.py	`97.96% <ø> (-0.01%)`	⬇️
heat/core/tests/test_constants.py	`100.00% <ø> (ø)`
heat/core/tests/test_indexing.py	`100.00% <ø> (ø)`
heat/core/tests/test_io.py	`93.03% <ø> (-0.03%)`	⬇️
heat/core/tests/test_manipulations.py	`99.90% <ø> (-0.01%)`	⬇️
heat/core/tests/test_relational.py	`100.00% <ø> (ø)`
heat/core/tests/test_stride_tricks.py	`100.00% <ø> (ø)`
heat/core/tests/test_tiling.py	`100.00% <ø> (ø)`
heat/core/dndarray.py	`96.93% <80.00%> (+<0.01%)`	⬆️
heat/core/__init__.py	`100.00% <100.00%> (ø)`
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4fb25f...56a3349. Read the comment docs.

heat/core/printing.py

coquelin77

This all looks fine to me. i have no vested interest in making HeAT look exactly like torch or numpy. as long as its legible and easy to use (which this is) i think its great.

However, until the doc string reformatting is done i think we should avoid the type hints. since these are already done, they could be moved into a comment / into the doc string. that way once the doc string reformatting is done these can be copied directly

CHANGELOG.md

ClaudiaComito

This is brilliant, @Markus-Goetz !
I have a problem with printing slices though. Example: this one works:

a = ht.arange(3 * 4, split=0).reshape((3, 2, 2))
print(a)

mpirun -n 2

[1,0]<stdout>:DNDarray([[[ 0,  1],
[1,0]<stdout>:           [ 2,  3]],
[1,0]<stdout>:
[1,0]<stdout>:          [[ 4,  5],
[1,0]<stdout>:           [ 6,  7]],
[1,0]<stdout>:
[1,0]<stdout>:          [[ 8,  9],
[1,0]<stdout>:           [10, 11]]], dtype=ht.int32, device=cpu:0, split=0)
[1,1]<stdout>:

This one doesn't:

a = ht.arange(3 * 4, split=0).reshape((3, 2, 2))
print(a[0])

[1,0]<stderr>:Traceback (most recent call last):
[1,0]<stderr>:  File "local_test.py", line 327, in <module>
[1,0]<stderr>:    print(a[0])
[1,0]<stderr>:  File "/Users/c.comito/HAF/heat/heat/core/dndarray.py", line 3357, in __str__
[1,0]<stderr>:    return printing.__str__(self)
[1,0]<stderr>:  File "/Users/c.comito/HAF/heat/heat/core/printing.py", line 74, in __str__
[1,0]<stderr>:    tensor_string = _tensor_str(dndarray, __INDENT + 1)
[1,0]<stderr>:  File "/Users/c.comito/HAF/heat/heat/core/printing.py", line 172, in _tensor_str
[1,0]<stderr>:    torch_data = _torch_data(dndarray, summarize)
[1,0]<stderr>:  File "/Users/c.comito/HAF/heat/heat/core/printing.py", line 99, in _torch_data
[1,0]<stderr>:    data = dndarray.copy().resplit_(None)._DNDarray__array
[1,0]<stderr>:  File "/Users/c.comito/HAF/heat/heat/core/dndarray.py", line 2692, in resplit_
[1,0]<stderr>:    self.comm.Allgatherv(self.__array, (gathered, counts, displs), recv_axis=self.split)
[1,0]<stderr>:  File "/Users/c.comito/HAF/heat/heat/core/communication.py", line 674, in Allgatherv
[1,0]<stderr>:    self.handle.Allgatherv, sendbuf, recvbuf, recv_axis
[1,0]<stderr>:  File "/Users/c.comito/HAF/heat/heat/core/communication.py", line 642, in __allgather_like
[1,0]<stderr>:    exit_code = func(mpi_sendbuf, mpi_recvbuf, **kwargs)
[1,0]<stderr>:  File "mpi4py/MPI/Comm.pyx", line 652, in mpi4py.MPI.Comm.Allgatherv
[1,0]<stderr>:mpi4py.MPI.Exception: MPI_ERR_TRUNCATE: message truncated

(same error message on both ranks)

Slightly better but still wrong (Alltoall feeling...):

a = ht.arange(3 * 4, split=0).reshape((3, 2, 2), axis=1)
print(a[0])

[1,0]<stdout>:DNDarray([[0, 2],
[1,0]<stdout>:          [1, 3]], dtype=ht.int32, device=cpu:0, split=1)
[1,1]<stdout>:

And finally the expected result:

a = ht.arange(3 * 4, split=0).reshape((3, 2, 2), axis=2)
print(a[0])

[1,0]<stdout>:DNDarray([[0, 1],
[1,0]<stdout>:          [2, 3]], dtype=ht.int32, device=cpu:0, split=1)
[1,1]<stdout>:

Printing out slices of other dimensions works better although I've seen some quirks there as well.

Sorry about the lengthy feedback.

heat/core/printing.py

Markus-Goetz · 2020-07-01T11:16:16Z

I had the same issue during testing. It is actually not an issue with the printing functionality itself, but rather with resplit_ and the usage of alltoallv, see also:

[1,0]<stderr>: File "/Users/c.comito/HAF/heat/heat/core/dndarray.py", line 2692, in resplit_

This is a known issue with alltoallv. I dont think I should be fixing this in this PR. Any suggestions on how to proceed?

… explanatory comments to correctly state concatenate instead of stacking (different behavior

ClaudiaComito · 2020-07-01T11:21:56Z

I had the same issue during testing. It is actually not an issue with the printing functionality itself, but rather with resplit_ and the usage of alltoallv, see also:

[1,0]<stderr>: File "/Users/c.comito/HAF/heat/heat/core/dndarray.py", line 2692, in resplit_

This is a known issue with alltoallv. I dont think I should be fixing this in this PR. Any suggestions on how to proceed?

I figured, I'm a bit puzzled because I thought the current version of resplit is bypassing Alltoallv.

And actually, the error message above comes from resplit_(None). It's Allgatherv, not Alltoallv. Why doesn't this work? We're using it all over the library.

(I can see that my 3rd example is an Alltoallv problem though)

Or am I misinterpreting the error message?

…e PR due to move towards doc branch

mtar · 2020-07-01T12:43:55Z

Can it be a problem with get_item? The shape of a[0] above returns (2,2) and (0,).

coquelin77 · 2020-07-01T12:47:46Z

Can it be a problem with get_item? The shape of a[0] above returns (2,2) and (0,).

which shape are you talking about?

    a = ht.arange(3 * 4, split=0).reshape((3, 2, 2), axis=1)
    b = a[0]
    print(b.shape)
    print(b.lshape)
    print(b)

[0] (2, 2)
[0] (1, 2)
[0] DNDarray([[0, 2],
[0]           [1, 3]], dtype=ht.int32, device=cpu:0, split=1)
[1] (2, 2)
[1] (1, 2)
[1]

mtar · 2020-07-01T12:51:05Z

where the split axis is zero

b.shape
(2, 2)
(2, 2)

b.lshape
(2, 2)
(0,)

On normal DNDarrays the local shape is more like (2,2) and (0,2)

coquelin77 · 2020-07-01T12:53:18Z

where the split axis is zero

there are no elements in the tensor on that process. lshape returns the size of the local torch.Tensor. it does not know anything about the global size. this could be the source of the problem for Allgatherv.

coquelin77 · 2020-07-01T12:59:23Z

the local shape is (0, ). this is simply what torch does. it does not know that it is used anywhere else. this could be changed but it may have far reaching implications. also, even with monkey patching this to test it, it fails.

there is something that was omitted there:

    a = ht.arange(3 * 4, split=0).reshape((3, 2, 2), axis=0)
    b = a[0]
    print(b.shape)
    print(b.lshape)

[0] (2, 2)
[0] (2, 2)
[1] /home/daniel/.git/heat/heat/core/dndarray.py:1583: ResourceWarning: This process (rank: 1) is without data after slicing, running the .balance_() function is recommended
[1]   ResourceWarning,
[1] (2, 2)
[1] (0,)

here is a working script:

    a = ht.arange(3 * 4, split=0).reshape((3, 2, 2), axis=0)
    b = a[0]
    b.balance_()
    print(b.shape)
    print(b.lshape)
    print(b)

[1] /home/daniel/.git/heat/heat/core/dndarray.py:1583: ResourceWarning: This process (rank: 1) is without data after slicing, running the .balance_() function is recommended
[1]   ResourceWarning,
[0] (2, 2)
[0] (1, 2)
[0] DNDarray([[0, 1],
[0]           [2, 3]], dtype=ht.int32, device=cpu:0, split=0)
[1] (2, 2)
[1] (1, 2)
[1]

the balance is key there. that is why it was failing

coquelin77

This has my approval. unbalanced arrays do not print properly (i.e. slices) but that is expected as unbalanced arrays will break lots of things

ClaudiaComito · 2020-07-02T04:45:40Z

the balance is key there. that is why it was failing

I'm not amused, but willing to wait until the first users complain about this.

Markus-Goetz added 12 commits June 18, 2020 13:01

Added capabilites for getting and setting printing options

37426b7

Added printing module to the heat namespace

2a49de4

Added __repr__ and __str__ forwards from DNDarray to the printing mod…

e8dac57

…ule. Added temporary implementation with empty string for the content but correct array meta information

Removed unused imports from test cases

492b501

Added test cases for get_ and set_printoptions(). Adjusted implementa…

7b1d4ae

…tion to account correctly for profile=None

Reworked printing options to have a default linewidth of 120. Making …

e521a7f

…use of torch print options instead of own. Draft implementation of __repr__ done, split case above threshold is still todo. Test are partially incomplete dummies

Working non thresholded test cases

e4b92d2

Suspected first working version of printing, including split threshol…

94d18b9

…ded DNDarrays

Reintroduced __repr__ for completion, fixed naming of tensor to dndar…

3953f45

…ray in test cases

Added changelog information

30d04ef

Merge branch 'master' into features/89-str-repr

84ba81a

Update CHANGELOG.md

223db11

mtar reviewed Jun 30, 2020

View reviewed changes

Markus-Goetz added this to the 2-week sprint milestone Jun 30, 2020

Markus-Goetz added 3 commits June 30, 2020 12:50

Fixed for single rank printing

d09af90

Merge branch 'features/89-str-repr' of github.com:helmholtz-analytics…

57422dd

…/heat into features/89-str-repr

Fixed a slicing issue for large processor counts, where the local dat…

123a8fe

…a portion at high indices is smaller than the number of edgeitems to be displayed

Markus-Goetz requested review from mtar, coquelin77 and ClaudiaComito June 30, 2020 13:56

mtar reviewed Jun 30, 2020

View reviewed changes

heat/core/printing.py Outdated Show resolved Hide resolved

mtar reviewed Jun 30, 2020

View reviewed changes

heat/core/printing.py Show resolved Hide resolved

mtar reviewed Jun 30, 2020

View reviewed changes

heat/core/printing.py Outdated Show resolved Hide resolved

mtar reviewed Jun 30, 2020

View reviewed changes

heat/core/printing.py Outdated Show resolved Hide resolved

Markus-Goetz added 2 commits June 30, 2020 16:57

Added type information for get_printoptions

f5a9f49

Added return type annotation for set_printoptions

5a80808

coquelin77 requested changes Jul 1, 2020

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

ClaudiaComito requested changes Jul 1, 2020

View reviewed changes

ClaudiaComito reviewed Jul 1, 2020

View reviewed changes

heat/core/printing.py Outdated Show resolved Hide resolved

Using carrots in the CHANGELOG now to display code correctly, updated…

9507448

… explanatory comments to correctly state concatenate instead of stacking (different behavior

Import fix in test cases for BaseTest. Removed type annotations in th…

56a3349

…e PR due to move towards doc branch

coquelin77 approved these changes Jul 1, 2020

View reviewed changes

coquelin77 merged commit afd1968 into master Jul 1, 2020

coquelin77 deleted the features/89-str-repr branch July 1, 2020 13:23

ClaudiaComito added this to Done in Current sprint Jul 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Features/89 str repr #614

Features/89 str repr #614

Markus-Goetz commented Jun 30, 2020

mtar Jun 30, 2020 •

edited

Markus-Goetz Jun 30, 2020

codecov bot commented Jun 30, 2020 •

edited

coquelin77 left a comment

ClaudiaComito left a comment •

edited

Markus-Goetz commented Jul 1, 2020

ClaudiaComito commented Jul 1, 2020 •

edited

mtar commented Jul 1, 2020 •

edited

coquelin77 commented Jul 1, 2020

mtar commented Jul 1, 2020 •

edited

coquelin77 commented Jul 1, 2020

coquelin77 commented Jul 1, 2020 •

edited

coquelin77 left a comment •

edited

ClaudiaComito commented Jul 2, 2020

Features/89 str repr #614

Features/89 str repr #614

Conversation

Markus-Goetz commented Jun 30, 2020

Description

Changes proposed:

Type of change

Due Diligence

Does this change modify the behaviour of other functions? If so, which?

mtar Jun 30, 2020 • edited

Choose a reason for hiding this comment

Markus-Goetz Jun 30, 2020

Choose a reason for hiding this comment

codecov bot commented Jun 30, 2020 • edited

Codecov Report

coquelin77 left a comment

Choose a reason for hiding this comment

ClaudiaComito left a comment • edited

Choose a reason for hiding this comment

Markus-Goetz commented Jul 1, 2020

ClaudiaComito commented Jul 1, 2020 • edited

mtar commented Jul 1, 2020 • edited

coquelin77 commented Jul 1, 2020

mtar commented Jul 1, 2020 • edited

coquelin77 commented Jul 1, 2020

coquelin77 commented Jul 1, 2020 • edited

coquelin77 left a comment • edited

Choose a reason for hiding this comment

ClaudiaComito commented Jul 2, 2020

mtar Jun 30, 2020 •

edited

codecov bot commented Jun 30, 2020 •

edited

ClaudiaComito left a comment •

edited

ClaudiaComito commented Jul 1, 2020 •

edited

mtar commented Jul 1, 2020 •

edited

mtar commented Jul 1, 2020 •

edited

coquelin77 commented Jul 1, 2020 •

edited

coquelin77 left a comment •

edited