Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features/880 binop ben bou #902

Merged
merged 62 commits into from
Jan 31, 2022
Merged

Features/880 binop ben bou #902

merged 62 commits into from
Jan 31, 2022

Conversation

ben-bou
Copy link
Collaborator

@ben-bou ben-bou commented Jan 18, 2022

Description

Issue/s resolved: #880

Changes proposed:

  • Add None / newaxis indexing to getitem
  • Fix the ht.equal Method. Previously it used the binop interface, which was wrong because it isn't a binop
  • Make the stack function compatible to not balanced arrays
  • check lshape-map before binop instead of after try...except because not all processes necessarily fail so synchronization is necessary anyway
  • redistribute OUT-OF-PLACE - binops should not alter their arguments
  • add support for unbalanced array when the other one is not split
  • check arrays for having the same split axis AFTER (shape)broadcasting added empty dimensions (?)
  • restructure input sanitation; reduce nested ifs

Type of change

  • New feature (breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
    --->

Performance

Checks lshape-maps inside every binop between distributed dndarrays -> more communication
Redistributes out-of-place -> more memory

If the dndarrays are balanced (i.e. the features of this PR are not used) the introduced overhead is small.
Binops between two balanced dndarrays of shape (30,100,100) with 24 MPI-processes spend >90% of the runtime in the respective torch functions:

         53713 function calls (53203 primitive calls) in 1.866 seconds

   Ordered by: internal time
   List reduced from 51 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    0.378    0.004    0.378    0.004 {built-in method mul}
      100    0.375    0.004    0.375    0.004 {built-in method add}
      100    0.375    0.004    0.375    0.004 {built-in method true_divide}
      100    0.370    0.004    0.370    0.004 {built-in method sub}
      100    0.286    0.003    0.286    0.003 {built-in method eq}
      500    0.017    0.000    1.858    0.004 /p/project/cslts/local/juwels/HeAT/binop_Branch/lib/python3.8/site-packages/heat/core/_operations.py:24(__binary_op)
      500    0.008    0.000    0.008    0.000 /p/project/cslts/local/juwels/HeAT/binop_Branch/lib/python3.8/site-packages/heat/core/dndarray.py:63(__init__)
      500    0.007    0.000    0.009    0.000 /p/project/cslts/local/juwels/HeAT/binop_Branch/lib/python3.8/site-packages/heat/core/stride_tricks.py:12(broadcast_shape)
      500    0.006    0.000    0.009    0.000 /p/project/cslts/local/juwels/HeAT/binop_Branch/lib/python3.8/site-packages/heat/core/sanitation.py:31(sanitize_distribution)
 1000/500    0.005    0.000    0.007    0.000 /p/project/cslts/local/juwels/HeAT/binop_Branch/lib/python3.8/site-packages/heat/core/types.py:889(result_type_rec)
     2500    0.005    0.000    0.010    0.000 /p/software/juwels/stages/2020/software/SciPy-Stack/2020-gcccoremkl-9.3.0-2020.2.254-Python-3.8.5/lib/python3.8/site-packages/numpy-1.19.1-py3.8-linux-x86_64.egg/numpy/core/numeric.py:1816(isscalar)
      100    0.004    0.000    1.866    0.019 profile-binop.py:10(test_function)
     8500    0.003    0.000    0.005    0.000 {built-in method builtins.isinstance}
     1500    0.003    0.000    0.004    0.000 /p/project/cslts/local/juwels/HeAT/binop_Branch/lib/python3.8/site-packages/heat/core/types.py:495(canonical_heat_type)
     1000    0.002    0.000    0.002    0.000 {method 'type' of 'torch._C._TensorBase' objects}
      500    0.002    0.000    0.012    0.000 /p/project/cslts/local/juwels/HeAT/binop_Branch/lib/python3.8/site-packages/heat/core/_operations.py:114(__get_out_params)
      500    0.002    0.000    0.005    0.000 /p/project/cslts/local/juwels/HeAT/binop_Branch/lib/python3.8/site-packages/heat/core/types.py:565(heat_type_of)
     2500    0.001    0.000    0.001    0.000 {built-in method _abc._abc_instancecheck}
     1500    0.001    0.000    0.001    0.000 {built-in method builtins.issubclass}
     2000    0.001    0.000    0.001    0.000 {built-in method builtins.max}
     1000    0.001    0.000    0.001    0.000 /p/project/cslts/local/juwels/HeAT/binop_Branch/lib/python3.8/site-packages/heat/core/dndarray.py:941(is_balanced)
      500    0.001    0.000    0.008    0.000 /p/project/cslts/local/juwels/HeAT/binop_Branch/lib/python3.8/site-packages/heat/core/types.py:868(result_type)
     2500    0.001    0.000    0.002    0.000 /p/software/juwels/stages/2020/software/Python/3.8.5-GCCcore-9.3.0/lib/python3.8/abc.py:96(__instancecheck__)
      100    0.001    0.000    0.301    0.003 /p/project/cslts/local/juwels/HeAT/binop_Branch/lib/python3.8/site-packages/heat/core/relational.py:35(eq)
     5000    0.001    0.000    0.001    0.000 /p/project/cslts/local/juwels/HeAT/binop_Branch/lib/python3.8/site-packages/heat/core/dndarray.py:293(shape)
      500    0.001    0.000    0.001    0.000 /p/project/cslts/local/juwels/HeAT/binop_Branch/lib/python3.8/site-packages/heat/core/dndarray.py:279(lshape_map)
     5500    0.001    0.000    0.001    0.000 {built-in method builtins.len}
      100    0.001    0.000    0.385    0.004 /p/project/cslts/local/juwels/HeAT/binop_Branch/lib/python3.8/site-packages/heat/core/arithmetics.py:904(sub)
      100    0.001    0.000    0.390    0.004 /p/project/cslts/local/juwels/HeAT/binop_Branch/lib/python3.8/site-packages/heat/core/arithmetics.py:63(add)
      100    0.001    0.000    0.390    0.004 /p/project/cslts/local/juwels/HeAT/binop_Branch/lib/python3.8/site-packages/heat/core/arithmetics.py:430(div)

Due Diligence

  • All split configurations tested
  • Multiple dtypes tested in relevant functions
  • Documentation updated (if needed)
  • Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

yes, every function using binops,

@ClaudiaComito
Copy link
Contributor

run tests

@ClaudiaComito
Copy link
Contributor

run tests

@@ -582,15 +582,15 @@ def create_lshape_map(self, force_check: bool = False) -> torch.Tensor:
result. Otherwise, create the lshape_map
"""
if not force_check and self.__lshape_map is not None:
return self.__lshape_map
return self.__lshape_map.clone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why you do it, but the lshape_map can get quite big with many nodes. Why not cloning or copying as need arises?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well somehow it must be ensured that the attribute is immutable, and as far as I know, there are no read-only Tensors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of the time this shouldnt be a problem. the lshape_map is generally much smaller than the other tensor sizes

@@ -601,7 +601,7 @@ def create_lshape_map(self, force_check: bool = False) -> torch.Tensor:
self.comm.Allreduce(MPI.IN_PLACE, lshape_map, MPI.SUM)

self.__lshape_map = lshape_map
return lshape_map
return lshape_map.clone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

Comment on lines 766 to 769
# for dim in expand:
# self = self.expand_dims(dim)
# if len(expand):
# return self[tuple(key)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code?

key = tuple(key)
self_proxy = self.__torch_proxy__()
# None and newaxis indexing
# expand = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code?

array_dtype = types.canonical_heat_type(t_array_dtype)
target = arrays[0]
try:
# arrays[1:] = sanitation.sanitize_distribution(*arrays[1:], target=target) # error in unpacking
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Another thing about this part: the try ... except is only needed to transform a NotImplementedError to a ValueError. At some point it would be good to have unified Errors within HeAT, such that e.g. a wrong split axis always yields the same Exception


def __get_out_params(target, other=None, map=None):
"""
Getter for the output parameters of a binop with target.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"binop with target" -> "binary operation with target distribution"

def __get_out_params(target, other=None, map=None):
"""
Getter for the output parameters of a binop with target.
If other is provided, it's distribution will be matched to target or, if provided,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other -> other
it's -> its
target -> target

"""
Getter for the output parameters of a binop with target.
If other is provided, it's distribution will be matched to target or, if provided,
redistributed according to map.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map -> map

other : DNDarray
DNDarray to be adapted
map : Tensor
Lshape-Map other should be matched to. Defaults to target's lshape_map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lshape-Map -> lshape_map
other -> other
target's lshape_map -> target.lshape_map

Comment on lines 104 to 111
# result_tensor = _operations.__binary_op(torch.equal, x, y)
#
# if result_tensor.larray.numel() == 1:
# result_value = result_tensor.larray.item()
# else:
# result_value = True
#
# return result_tensor.comm.allreduce(result_value, MPI.LAND)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code?

) -> Union[DNDarray, Tuple(DNDarray)]:
"""
Distribute every arg according to target.lshape_map or, if provided, diff_map.
After this sanitation, the lshapes are compatible along the split dimension.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arg, target.lshape_map, diff_map

Copy link
Contributor

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ben-bou thanks again for so much work. The default cloning of the lshape_map is something we should rethink. Otherwise I've got mostly editorial changes.

@coquelin77
Copy link
Member

run tests

Copy link
Contributor

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brillian @ben-bou , thank you so much!

@ClaudiaComito ClaudiaComito merged commit dbb8300 into master Jan 31, 2022
@ClaudiaComito ClaudiaComito deleted the features/880-binop_ben-bou branch January 31, 2022 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Binary operations: Support differing lshape-maps
4 participants