# Block matrices and emulated parallelism

</br>
</br>



You all should know how to multiply matrices. In terms of indices, 

$$
\large
\left[ \, \mathbf{A}\, \mathbf{B}\, \right]_{\,i,\,k} \ = \ \sum_{j} [\, \mathbf{A} \, ]_{\,i,\,j}\, [\,\mathbf{B} \,]_{\,j,\,k}.
$$

The notation here is that $[\,\mathbf{A} \,]_{i,j}$ is the $(i,j)$ the element of the matrix $\mathbf{A}$. Of course, the number of columns of $\mathbf{A}$ must match the number of rows of $\mathbf{B}$.


</br>
</br>

A powerful fact about matrices is that matrix multiplication also works in *block* form. For example, 

$$
\large
\left(
\begin{array}{cc}
 \mathbf{A} & \mathbf{B} \\
 \mathbf{C} & \mathbf{D} \\
\end{array}
\right)\left(
\begin{array}{cc}
 \mathbf{E} & \mathbf{F} \\
 \mathbf{G} & \mathbf{H} \\
\end{array}
\right) \ = \ \left(
\begin{array}{cc}
 \mathbf{A\, E + B\, G} & \mathbf{A\, F + B\, H} \\
 \mathbf{C\, E + D\, G} & \mathbf{C\, F + D\, H} \\
\end{array}
\right)
$$

Each entry in each matrix can represent an entire matrix, and each multiplication is a matrix multiplication (where we take care to respect commutativity). 

The block matrix multiplication works straight away if each block is a square matrix of the same size. However, there are many more possibilities.

**Question 0:** What are the *most general* possible size conditions that each matrix, $\mathbf{A}, \ldots , \mathbf{H}$ can take? What are the respective sizes of each output block? What is the total size of each composite matrix on the left- and right-hand sides?

**Answer 0: (click on cell to put your answer in Markdown here)**
<br>

Since we are performing block matrix multiplication,  we need it to be same as the input block and a square matrix. 


**Question 1:** The above block matrices are based on $2\times 2$ row/column partitions in each case. Now suppose 

$$\large
\mathbf{A}\, \mathbf{B} \ = \ \mathbf{C},
$$

where $\mathbf{A},\, \mathbf{B}, \, \mathbf{C}$ are three compatible matrices. What is the *most general* partitioning of all three matrices into block structures?

**Answer 1: (click on cell to put your answer in Markdown here)**

The most general way of partitioning of all three matrices is to divide the matrices equally so that each core can handle it at the same time

<br>

<br>

## Block matrix data structures


Python, in general, and `numpy` in particular, have built-in data structures for handling matrix multiplication, in particular `np.bmat`. 

If you were interested in working with block matrices in serial, then you would definitely want to use one of the optimised built-in Python tools. However, we are interested in learning how to work *in parallel*. 

It is a good idea to put different matrix blocks on different processor cores.  

### Faux-`Cores` and faux-`Local` arrays 

In this set of exercises, you'll construct block matrices on fake local processor cores. 

In [196]:
import numpy as np
#from eMPI import Local, Core

In [197]:
class NotLocalData(Exception):
    pass

class Local(np.ndarray):

    def __new__(cls, input_array, proc):
        obj = np.asarray(input_array).view(cls)
        obj.proc = proc
        return obj
        
    def __array_finalize__(self, obj):
        if obj is None: return
        self.your_new_attr = getattr(obj, 'your_new_attr', None)
    
    def __str__(self):
        return (np.ndarray.__str__(self))
        if s[0] not in ('[','(','{'):
            return f'Local({s}, proc={self.proc})'
        return f"{s}, proc={self.proc}]"
        
    def __repr__(self):
        s = (np.ndarray.__repr__(self))[:-1]
        return f"{s}, proc={self.proc})"
    
    def __getitem__(self,args):
        args = self.check(args)
        return Local(np.ndarray.__getitem__(self, args), self.proc)
    
    def __setitem__(self,args,value):
        args  = self.check(args)
        value = self.check(value)
        return Local(np.ndarray.__setitem__(self, args,value), self.proc)
    
    @property
    def T(self):
        return Local(np.ndarray.transpose(self),self.proc)
    
    def transpose(self):
        return self.T
    
    def trace(self):
        return Local(np.ndarray.trace(self),self.proc)
    
    def __neg__(self):
        return Local(np.ndarray.__neg__(self),self.proc)
    
    def __pos__(self):
        return Local(np.ndarray.__pos__(self),self.proc)
    
    def __abs__(self):
        return Local(np.ndarray.__abs__(self),self.proc)
    
    def __add__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__add__(self,other),self.proc)
    
    def __sub__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__sub__(self,other),self.proc)
    
    def __mul__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__mul__(self,other),self.proc)
    
    def __truediv__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__truediv__(self,other),self.proc)
    
    def __floordiv__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__floordiv__(self,other),self.proc)
    
    def __mod__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__mod__(self,other),self.proc)
    
    def __pow__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__pow__(self,other),self.proc)
    
    def __matmul__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__matmul__(self,other),self.proc)
    
    def __eq__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__eq__(self,other),self.proc)
    
    def __ge__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__ge__(self,other),self.proc)
    
    def __le__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__le__(self,other),self.proc)
    
    def __gt__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__gt__(self,other),self.proc)
    
    def __lt__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__lt__(self,other),self.proc)
    
    def __radd__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__radd__(self,other),self.proc)
    
    def __rsub__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__rsub__(self,other),self.proc)
    
    def __rmul__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__rmul__(self,other),self.proc)
    
    def __rtruediv__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__rtruediv__(self,other),self.proc)
    
    def __rfloordiv__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__rfloordiv__(self,other),self.proc)
    
    def __rmod__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__rmod__(self,other),self.proc)
    
    def __rpow__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__rpow__(self,other),self.proc)
    
    def __req__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__req__(self,other),self.proc)
    
    def __ge__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__rge__(self,other),self.proc)
    
    def __rle__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__rle__(self,other),self.proc)
    
    def __rgt__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__rgt__(self,other),self.proc)
    
    def __rlt__(self,other):
        other = self.check(other)
        return Local(np.ndarray.__rlt__(self,other),self.proc)
    
    def check(self,other):
        if other is None:
            return other
        
        if isinstance(other,(int, float, complex)):
            return Local(other,self.proc)
        
        if isinstance(other, slice):
            for e in [other.start, other.stop, other.step]:
                self.check(e)
            return other
        
        if isinstance(other, (list, tuple)):
            return type(other)([self.check(e) for e in other])
        
        if not isinstance(other, Local):
            raise NotLocalData('Data is not local to any core.')
        
        if self.proc != other.proc:
            raise NotLocalData('The data is not local.')
        
        return other


class Core():
    
    def __init__(self,proc):
        self.proc = proc 
        self.memory = {}
        self.buffer = {}
    
    def __str__(self):
        return "Core("+str(self.proc)+')'
    
    def __repr__(self):
        return str(self)
    
    def __getitem__(self,item):
        return self.memory[item]
    
    def __setitem__(self,item,value):
        self.memory[item] = Local(value,self.proc)
        
    def send(self,other,data):
        other.buffer[self.proc,data] = Local(self[data],other.proc)
        
    def receive(self,other,data,out=None):
        if out == None: out = data
        self[out] = self.buffer[other.proc,data]
        del self.buffer[other.proc,data]


<br>

**TASK 0.a:**

Consider a block matrix with row paritions $[m_{0}, \ldots, m_{r-1}]$ and column paritions $[n_{0}, \ldots, n_{c-1}]$. 

Make a Python function that creates a distributed block matrix across $r \times c$ faux-`Core` objects (as we saw in lecture).

The result should be a `dictionary` of `Core` objects. Each `Core` should have an $m_{i} \times n_{j}$ array filled with zeros.  

In [184]:
# total_rows = 3
# total_cols = 4
rPart = [2,2]
cPart = [2,2]
machine = {(i, j): Core(i * len(cPart) + j) for i in range(len(rPart)) for j in range(len(cPart))}

In [185]:
def distributed_block_matrix(name, row_partitions, column_partitions):
    # machine = {(i, j): Core(i * total_cols + j) for i in range(len(row_partitions)) for j in range(len(column_partitions))}
    for i, row in enumerate(row_partitions):
        for j, col in enumerate(column_partitions):
            # for `k in range(row_partitions[i]):
                machine[(i,j)][name] = np.zeros((row, col))

In [186]:
# Test with Toy Example

# machine = {(i, j): Core(i * total_cols + j) for i in range(len(rPart)) for j in range(len(cPart))}

distributed_block_matrix('A', rPart, cPart)
for key in machine:
    print(machine[key].memory)
# machine

{'A': Local([[0.],
       [0.]], proc=0)}
{'A': Local([[0., 0.],
       [0., 0.]], proc=1)}
{'A': Local([[0., 0., 0.],
       [0., 0., 0.]], proc=2)}
{'A': Local([[0., 0., 0., 0.],
       [0., 0., 0., 0.]], proc=3)}
{'A': Local([[0.],
       [0.],
       [0.]], proc=4)}
{'A': Local([[0., 0.],
       [0., 0.],
       [0., 0.]], proc=5)}
{'A': Local([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]], proc=6)}
{'A': Local([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]], proc=7)}


<br>

**TASK 0.b:**

Write a Python function that fills your `distributed_block_matrix` with some number. You can do this any way you like. You can put random numbers in, or some interesting structure you might like.

In [187]:
def fill_block_matrix(A):
    j = 0
    for key in machine:
        for i in range(len(machine[key][A])):
            machine[key][A][i] = Local(np.random.randint(0,9,len(machine[key][A][i])), proc=j)
        j += 1

I wrote a function that replace zeros from a randomly generated value between $0$ and $9$

In [188]:
# Test with Toy Example
fill_block_matrix('A')
machine
for key in machine:
    print(machine[key].memory)

{'A': Local([[1.],
       [0.]], proc=0)}
{'A': Local([[6., 3.],
       [3., 1.]], proc=1)}
{'A': Local([[0., 7., 6.],
       [3., 1., 8.]], proc=2)}
{'A': Local([[8., 4., 7., 6.],
       [6., 4., 6., 3.]], proc=3)}
{'A': Local([[7.],
       [3.],
       [5.]], proc=4)}
{'A': Local([[0., 8.],
       [8., 3.],
       [6., 5.]], proc=5)}
{'A': Local([[8., 8., 0.],
       [6., 2., 2.],
       [0., 2., 4.]], proc=6)}
{'A': Local([[0., 0., 6., 0.],
       [3., 0., 7., 2.],
       [3., 5., 5., 8.]], proc=7)}


<br>

**TASK 1:**

Create a function that element-wise adds two `distributed_block_matrix` objects.

In TASK 1, make your function assume the blocks of the two matrices are the same shape. 

In [189]:
def add_block_matrix(A,B):
    # C = {(i, j): Core(i * total_cols + j) for i in range(len(rPart)) for j in range(len(cPart))}
    for key in machine:
        machine[key]['A+B'] = machine[key][A] + machine[key][B]

In [190]:
distributed_block_matrix('B', rPart, cPart)
fill_block_matrix('B')
distributed_block_matrix('A+B', rPart, cPart)
add_block_matrix('A','B')
for key in machine:
    print(machine[key].memory)

{'A': Local([[1.],
       [0.]], proc=0), 'B': Local([[6.],
       [1.]], proc=0), 'A+B': Local([[7.],
       [1.]], proc=0)}
{'A': Local([[6., 3.],
       [3., 1.]], proc=1), 'B': Local([[6., 6.],
       [0., 1.]], proc=1), 'A+B': Local([[12.,  9.],
       [ 3.,  2.]], proc=1)}
{'A': Local([[0., 7., 6.],
       [3., 1., 8.]], proc=2), 'B': Local([[4., 7., 0.],
       [3., 5., 6.]], proc=2), 'A+B': Local([[ 4., 14.,  6.],
       [ 6.,  6., 14.]], proc=2)}
{'A': Local([[8., 4., 7., 6.],
       [6., 4., 6., 3.]], proc=3), 'B': Local([[8., 0., 8., 2.],
       [4., 7., 1., 0.]], proc=3), 'A+B': Local([[16.,  4., 15.,  8.],
       [10., 11.,  7.,  3.]], proc=3)}
{'A': Local([[7.],
       [3.],
       [5.]], proc=4), 'B': Local([[8.],
       [4.],
       [1.]], proc=4), 'A+B': Local([[15.],
       [ 7.],
       [ 6.]], proc=4)}
{'A': Local([[0., 8.],
       [8., 3.],
       [6., 5.]], proc=5), 'B': Local([[1., 3.],
       [5., 4.],
       [2., 1.]], proc=5), 'A+B': Local([[ 1., 11.],
       

<br>

**TASK 2:**

Make a Python `class` that contains `distributed_block_matrix` objects and computes `__add__` using your implementation from TASK 1. 

In [191]:
A._machine

{(0,
  0): Local([[0.],
        [7.]], proc=0),
 (0,
  1): Local([[6., 5.],
        [2., 5.]], proc=1),
 (0,
  2): Local([[0., 7., 7.],
        [4., 8., 8.]], proc=2),
 (0,
  3): Local([[2., 4., 4., 8.],
        [2., 2., 6., 4.]], proc=3),
 (1,
  0): Local([[8.],
        [5.]], proc=4),
 (1,
  1): Local([[8., 4.],
        [4., 3.]], proc=5),
 (1,
  2): Local([[4., 5., 8.],
        [6., 6., 3.]], proc=6),
 (1,
  3): Local([[8., 6., 3., 4.],
        [7., 5., 3., 1.]], proc=7),
 (2,
  0): Local([[5.],
        [2.]], proc=8),
 (2,
  1): Local([[3., 6.],
        [3., 6.]], proc=9),
 (2,
  2): Local([[0., 5., 3.],
        [4., 7., 7.]], proc=10),
 (2,
  3): Local([[6., 6., 5., 3.],
        [2., 3., 7., 8.]], proc=11),
 (3,
  0): Local([[3.],
        [0.],
        [6.]], proc=12),
 (3,
  1): Local([[3., 3.],
        [5., 8.],
        [3., 4.]], proc=13),
 (3,
  2): Local([[6., 1., 6.],
        [0., 5., 8.],
        [2., 5., 5.]], proc=14),
 (3,
  3): Local([[4., 6., 5., 6.],
        [0., 4., 

In [194]:
{core for core in machine.values()}

  {core for core in machine.values()}[Core(1)]
  {core for core in machine.values()}[Core(1)]
  {core for core in machine.values()}[Core(1)]


TypeError: 'set' object is not subscriptable

In [141]:
class BlockMatrix():
    def __init__(self, x):
        self.x = x
        self._machine = {key: core[x] for key, core in machine.items()}
        # self._machine = {key: core[x] for key, core in machine.items()}
    
    def __getcore__(self, core):
        return self
    
    def __add__(self, other):
        new_key = self.x + '+' + other.x
        for key in machine:
            machine[key][new_key] = machine[key][self.x] + machine[key][other.x]
        return BlockMatrix(new_key)
    
    def __neg__(self):
        new_key = '-(' + self.x + ')'
        for key in machine:
            machine[key][new_key] = machine[key][self.x] * -1
        return BlockMatrix(new_key)

    def __sub__(self, other):
        new_key = self.x + '-' + other.x
        for key in machine:
            machine[key][new_key] = machine[key][self.x] - machine[key][other.x]
        return BlockMatrix(new_key)

    def __mul__(self, other):
        new_key = self.x + other.x
        for key in machine:
            machine[key][new_key] = machine[key][self.x] * machine[key][other.x]
        return BlockMatrix(new_key)


In [13]:
A = BlockMatrix('A')  # In MACHINE 
B = BlockMatrix('B')

C = A + B
D = C + B
[core.memory for core in machine.values()]

[{'A': Local([[0.],
         [7.]], proc=0),
  'B': Local([[1.],
         [1.]], proc=0),
  'A+B': Local([[1.],
         [8.]], proc=0),
  'A+B+B': Local([[2.],
         [9.]], proc=0)},
 {'A': Local([[6., 5.],
         [2., 5.]], proc=1),
  'B': Local([[7., 8.],
         [4., 8.]], proc=1),
  'A+B': Local([[13., 13.],
         [ 6., 13.]], proc=1),
  'A+B+B': Local([[20., 21.],
         [10., 21.]], proc=1)},
 {'A': Local([[0., 7., 7.],
         [4., 8., 8.]], proc=2),
  'B': Local([[8., 8., 0.],
         [3., 6., 2.]], proc=2),
  'A+B': Local([[ 8., 15.,  7.],
         [ 7., 14., 10.]], proc=2),
  'A+B+B': Local([[16., 23.,  7.],
         [10., 20., 12.]], proc=2)},
 {'A': Local([[2., 4., 4., 8.],
         [2., 2., 6., 4.]], proc=3),
  'B': Local([[3., 5., 3., 2.],
         [5., 0., 2., 2.]], proc=3),
  'A+B': Local([[ 5.,  9.,  7., 10.],
         [ 7.,  2.,  8.,  6.]], proc=3),
  'A+B+B': Local([[ 8., 14., 10., 12.],
         [12.,  2., 10.,  8.]], proc=3)},
 {'A': Local([[8.],
    

<br>

**TASK 3:**

Include all other element-wise matrix operations you can think of, e.g., 

* `__neg__`, i.e., `-A`


* `__sub__` i.e., `A - B`


* `__mul__` i.e., `A * B`


* etc

Note, `__mul__` is *not* matrix multiplication. It is element-wise multiplication. 

In [14]:
E = -A
# F = A.__sub__(B)
F = A - B
G = A * B
[core.memory for core in machine.values()]

[{'A': Local([[0.],
         [7.]], proc=0),
  'B': Local([[1.],
         [1.]], proc=0),
  'A+B': Local([[1.],
         [8.]], proc=0),
  'A+B+B': Local([[2.],
         [9.]], proc=0),
  '-(A)': Local([[-0.],
         [-7.]], proc=0),
  'A-B': Local([[-1.],
         [ 6.]], proc=0),
  'AB': Local([[0.],
         [7.]], proc=0)},
 {'A': Local([[6., 5.],
         [2., 5.]], proc=1),
  'B': Local([[7., 8.],
         [4., 8.]], proc=1),
  'A+B': Local([[13., 13.],
         [ 6., 13.]], proc=1),
  'A+B+B': Local([[20., 21.],
         [10., 21.]], proc=1),
  '-(A)': Local([[-6., -5.],
         [-2., -5.]], proc=1),
  'A-B': Local([[-1., -3.],
         [-2., -3.]], proc=1),
  'AB': Local([[42., 40.],
         [ 8., 40.]], proc=1)},
 {'A': Local([[0., 7., 7.],
         [4., 8., 8.]], proc=2),
  'B': Local([[8., 8., 0.],
         [3., 6., 2.]], proc=2),
  'A+B': Local([[ 8., 15.,  7.],
         [ 7., 14., 10.]], proc=2),
  'A+B+B': Local([[16., 23.,  7.],
         [10., 20., 12.]], proc=2),
  

<br>

**TASK 4:**

Write a Python function that implements block *matrix multiplication* on a `distributed_block_matrix` object.


This will require `send` and `receive`.

In [17]:
def matrix_multiply_block_matrix(A,B):
    if A.proc != B.proc:
        cores = [k for k, v in machine.items()]
        machine[cores[B.proc]].send(machine[cores[A.proc]], A.x)
    
    


In [117]:
A = BlockMatrix('A')  # In MACHINE 
B = BlockMatrix('B')
# matrix_multiply_block_matrix(A, B)

In [150]:
A._machine[(1,2)]

Local([[4., 5., 8.],
       [6., 6., 3.]], proc=6)

In [None]:
    # def __matmul__(self, multiplier):
    #     if self.order[1] != multiplier.order[0]:
    #         raise ValueError("The multiplier was non-conformable under multiplication.")
    #     return [[sum(a*b for a,b in zip(srow,mcol)) for mcol in zip(*multiplier.mat)] for srow in self.mat]

    # def __imatmul__(self, multiplier):
    #     self.mat = self @ multiplier
    #     return self.mat

    # def __rmatmul__(self, multiplicand):
    #     if multiplicand.order[1] != self.order[0]:
    #         raise ValueError("The multiplier was non-conformable under multiplication.")
    #     return [[sum(a*b for a,b in zip(mrow,scol)) for scol in zip(*self.mat)] for mrow in multiplicand.mat]

In [105]:
# machine[(1,2)].memory
# machine[(3,1)]['A']
# machine[(3,1)].send(machine[(1,2)],'A')
# machine[(1,2)].buffer
# machine[(1,2)].receive(machine[(3,1)], 'A', out='tmp')
machine[(1,2)].memory
# del machine[(1,2)].memory['tmp']
# machine.keys()[[machine.index(machine[(1,2)]['A'].proc)]
# print([k for k,v in machine.items() if v == ("Core("+str(machine[(1,2)]['A'].proc)+")")])
# [k for k,v in machine.items() if v == Core(1)]
# machine.values()
# [k for k, v in machine.items()][machine[(1,2)]['A'].proc]
# # machine.keys()[list(machine.values()).index(machine[(1,2)['A'].proc])]

{'A': Local([[4., 5., 8.],
        [6., 6., 3.]], proc=6),
 'B': Local([[4., 3., 2.],
        [8., 2., 2.]], proc=6),
 'A+B': Local([[ 8.,  8., 10.],
        [14.,  8.,  5.]], proc=6),
 'A+B+B': Local([[12., 11., 12.],
        [22., 10.,  7.]], proc=6),
 '-(A)': Local([[-4., -5., -8.],
        [-6., -6., -3.]], proc=6),
 'A-B': Local([[ 0.,  2.,  6.],
        [-2.,  4.,  1.]], proc=6),
 'AB': Local([[16., 15., 16.],
        [48., 12.,  6.]], proc=6)}

In [111]:
# machine[(1,2)]['A']
# machine[(1,2)]['B'].T
matmul(machine[(1,2)]['A'], machine[(1,2)]['B'].T)

[[Local(47., proc=6), Local(58., proc=6)],
 [Local(48., proc=6), Local(66., proc=6)]]

In [113]:
machine[(1,2)]['A'] @ machine[(1,2)]['B']

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 3)

In [74]:
# list(zip(*zip(*machine[1,2]['B'])))
# list(zip(*machine[1,2]['B'].T))
[Local(5,proc=6),Local(6, proc=6)]

[Local(5, proc=6), Local(6, proc=6)]

In [17]:
for row_a in machine[(1,2)]['A']:
    # print(row_a)
    rowprod = []
    for col_b in list(zip(*machine[1,2]['B'].T)):
        # print(col_b)
        crossprod = 0
        for ele_a, ele_b in zip(row_a, col_b):
            # print(ele_a, ele_b)
            crossprod += ele_a * ele_b
        rowprod.append(crossprod)
    print(rowprod)
# rowprod

[Local(133., proc=6), Local(98., proc=6)]
[Local(128., proc=6), Local(100., proc=6)]


In [19]:
def matmul(a,b):
    zip_b = zip(*b)
    # uncomment next line if python 3 : 
    zip_b = list(zip_b)
    return [[sum(ele_a*ele_b for ele_a, ele_b in zip(row_a, col_b)) 
            for col_b in zip_b] for row_a in a]

In [30]:
x = [[1,2,3],[4,5,6],[7,8,9],[10,11,12]]
y = [[1,2],[1,2],[3,4]]
print(list(zip(*y)))
mx = np.matrix(x)
my = np.matrix(y)
matmult(x,y)

[(1, 1, 3), (2, 2, 4)]


[[12, 18], [27, 42], [42, 66], [57, 90]]

<br>

**TASK 5:**

Write a Python function that implements global block matrix *transpose* on a `distributed_block_matrix` object.


This will require `send` and `receive`. 

In [None]:
def transpose_block_matrix(A,B):
    
    pass

<br>

**TASK 6:**

Write a Python function that implements global block matrix *trace* on a `distributed_block_matrix` object.


This will require `send` and `receive`.

In [None]:
def trace_block_matrix(A,B):

    pass

<br>

**TASK 7:**

Include transpose and trace in your `BlockMatrix` class using the same syntax as `numpy`. I.e., `A.T` for transpose and `A.trace()` for trace.

<br>

**TASK 8++:**

Include anything else useful you might think of regarding matrices in your `BlockMatrix` class. Create some tests and have some fun seeing what you can do. 

# End Notebook