# Final Assignment

## Question 1 - Sub-Numpy

### SubClass / Function definition

In [1]:
# Creating the class and defining the different functions

class SNumPy:
    
    # Define the ones function 
        # The allowed input is an integer i
        # The function returns a list with the length len(i) containing only 1's 
        
    def ones(i: int):
        
        ''' 
        The ones function takes an int parameter i and returns an array (list) of len(i), and the array contains only ones.
        '''
        
        return [1] * i 
    
    
    # Define the zeros function
 
    def zeros(i):
        
        ''' 
        The zeros function takes an int parameter i and returns an array (list) of len(i), and the array contains only zeros.
        '''
        
        return [0] * i 
    
    
    
    # Define the reshape function
    ''' 
        Example: In the first iteration of the (nested) for loop, r is 0 and c is 0. Therefore we get the value of the initial array at position [0], no matter how many columns are desired.
                The first subsetted value is then appended to the first row. For every additional column in the row, the loop proceeds similarly.
                Once reached the required number of columns, the resulting row is appended to the final matrix result. Then the same happens for the second row. 
                Through adding "column * r" to the column number, we make sure to take the next values for the next row.
    '''
    
    def reshape(array, shape: tuple):
        
        '''
        The reshape function takes an array and converts it into the dimensions specified by the tuple (row, column). Hence, this function converts from a vector to a matrix.
        '''
        
        # check if shape is entered as tuple and raise a ValueError if not
        if len(shape) != 2:
            raise ValueError("The desired shape must be entered as tuple in the following format: (rows, columns)")
       
        # check if given array (list) can be reshaped with the specified number of rows / columns and raise a ValueError if not
        elif len(array) != shape[0] * shape[1]:
            raise ValueError("The array cannot be reshaped with the given shape because its elements cannot be equally distributed over the specified number of rows.")
        
        # create empty list for the result
        else:
            result = []
    
            # (nested) for loop to create the necessary number of rows (and columns)
            # to get the actual values for the rows, the initial array is subsetted and the resulting values are appended to the created row
            # in the end, all created rows are appended to the result
            for r in range(shape[0]):
                row_new = []
                for c in range(shape[1]):
                    row_new.append(array[c + shape[1] * r])
                result.append(row_new)
        return result
    

    # Define the shape function
    def shape(array):
        
        '''
        The shape function returns a tuple with the matrix / vector's dimensions, e.g. (#rows, #columns)        
        '''
        
        # check if the inserted array is a matrix (i.e. first element of array is a list)
        if isinstance(array[0], list) is True:
            
            # the number of rows equals the length of the inserted array because each element is a list (i.e. row)
            # the number of columns equals the length of the first element of the inserted array
            shape = (len(array),len(array[0]))
            
            return shape
        
        # if vector (i.e. first element of array is not a list)
        else:
            # if the array is a vector, there are no columns but only the length of the vector that we get through len(array)
            shape = (1,len(array))
            return shape
    
    
    
    # Define the append function
    def append(array1, array2):
        
        '''
        The append function returns a new vector / matrix that is the combination of the two input vectors / matrices.
        '''
        
         # store shape 1 and shape 2 variables to ease reusage within the function
        shape1 = SNumPy.shape(array1)
        shape2 = SNumPy.shape(array2)
        
        # if both arrays are matrices
        # check if for each array the first element is a list (i.e. the array is a list of lists and therefore a matrix)
        if isinstance(array1[0], list) is True and isinstance(array2[0], list) is True:
            
            # check if the number of columns of the matrices is identical, otherwise they cannot be appended
            if shape1[0] == shape2[0]:
                
                # create empty list for the new matrix
                new_matrix = []
                
                # the for loop concatenates the first row of array1 with the first row of array2 and so on
                # the newly built rows are then appended to the new matrix
                for i in range(len(array1)):
                    new_row = array1[i] + array2[i]
                    new_matrix.append(new_row)
                return new_matrix
            
            # raise a ValueError if the inserted arrays do not have the same dimension        
            else:
                raise ValueError("The matrices do not have the same number of rows and thus cannot be appended.")

        
        # if both arrays are vectors
        # check if for each array the first element is an integer (i.e. the array is a simple list and therefore a vector)
        elif isinstance(array1[0], int) is True and isinstance(array2[0], int) is True:
             
            # The lists are then concatenated
            new_vector = array1 + array2
            return new_vector
            
        # if trying to append a matrix to a vector and vice versa
        else:
            raise ValueError("A matrix cannot be appended to a vector and vice versa.")
    
    
    
    
    # Define the get function
        # note zero indexing for the shape tuple
    
    def get(array, shape: tuple):
        
        '''
        The get function returns a the value specified by the coordinate point (row, column) of the array provided (can be vector or matrix).
        '''
        
        # in case of matrix
        if isinstance(array[0], list) is True:
            
            # raise an IndexError if either the row number or column number is out of the matrix dimension
            if shape[0] >= len(array) or shape[1] >= len(array[0]):
                raise IndexError("Out of matrix dimension.")
            else:
                # subset list with the specified row and column number
                return array[shape[0]][shape[1]]
        
        # in case of vector
        elif isinstance(array[0], int) is True:
            
            # raise an IndexError if the requested value is not within the length of the vector
            if shape[1] >= len(array):
                raise IndexError("Out of vector dimension.")
            elif shape[0] != 0:
                raise IndexError("The number of rows for the vector must be one which translates to 0 due to zero indexing.")
            else:
                # subset list with the specified column number
                return array[shape[1]]
    
    
    
    # Define the add function
    def add(array1, array2):
        
        '''
        The add function adds on vectors / matrices. 
        '''
        
        # store shape 1 and shape 2 variables to ease reusage within the function
        shape1 = SNumPy.shape(array1)
        shape2 = SNumPy.shape(array2)
        
        # if the arrays are matrices
        if isinstance(array1[0], list) is True and shape1 == shape2:
            new_matrix = []
            # going through the rows
            for i in range(shape1[0]):
                # going through the columns
                for j in range(shape1[1]):
                    # appending the sums to the new list
                    new_matrix.append(array1[i][j] + array2[i][j])
                    
            # creating the final output matrix by reshaping new_matrix with the given dimension and outputting result
            result = []
            for r in range(shape1[0]):
                row_new = []
                for c in range(shape1[1]):
                    row_new.append(new_matrix[c + shape1[1] * r])
                result.append(row_new)
            return result 
        
        # if the arrays are vectors
        elif isinstance(array1[0], int) is True and shape1 == shape2:
            new_vector = []
            
            # iterating over the vector length and appending the sum of the respective elements to the new vector           
            for i in range(len(array1)):
                new_vector.append(array1[i] + array2[i])
            return new_vector
            
        
        else: 
            # raise a ValueError if the arrays do not have the same dimension and thus cannot be added
            raise ValueError("The arrays do not have the same dimension and thus cannot be added.")
            
    
    
    # Define the subtract function
    def subtract(array1, array2):
        
        '''
        The subtract function subtracts on vectors / matrices. 
        '''
        
        # store shape 1 and shape 2 variables to ease reusage within the function
        shape1 = SNumPy.shape(array1)
        shape2 = SNumPy.shape(array2)

        # if the arrays are matrices
        if isinstance(array1[0], list) and shape1 == shape2:
            new_matrix = []
            # going through the rows
            for i in range(shape1[0]):
                # going through the columns
                for j in range(shape1[1]):
                    # appending the new values to the new list
                    new_matrix.append(array1[i][j] - array2[i][j])
                    
            # creating the final output matrix by reshaping new_matrix with the given dimension and outputting result
            result = []
            for r in range(shape1[0]):
                row_new = []
                for c in range(shape1[1]):
                    row_new.append(new_matrix[c + shape1[1] * r])
                result.append(row_new)
            return result 

        # if the arrays are vectors
        elif isinstance(array1[0], int) and shape1 == shape2:
            new_vector = []
            for i in range(len(array1)):
                new_vector.append(array1[i] - array2[i])
            return new_vector

        else: 
            # raise a ValueError if the arrays do not have the same dimension and thus cannot be subtracted
            raise ValueError("The arrays do not have the same dimension and thus cannot be subtracted.") 
    
    
    
    # Define the dotproduct function
    def dotproduct(array1, array2):
        
        '''
        The dotproduct function computes the dot product between two arrays (which could be vectors or / and matrices) and returns an appropriate value. 
        '''
        
        # store shape 1 and shape 2 variables to ease reusage within the function
        shape1 = SNumPy.shape(array1)
        shape2 = SNumPy.shape(array2)
        
        # if both arrays are vectors
        if isinstance(array1[0], int) is True and shape1 == shape2:
            products = []
            
            # for loop iterates over the arrays and appends the product of the respective elements to the new list products
            for i in range(len(array1)):
                products.append(array1[i] * array2[i])
                
            # we return the sum of the products list
            return sum(products)
            
        # if both arrays are matrices
        # check that first element is list and that the number of columns of the first matrix equals the number of rows of the second matrix
        elif isinstance(array1[0], list) is True and shape1[1] == shape2[0]:
            result = []
            
            # iterating over the rows of the first array
            for i in range(len(array1)):
                rows = []
                
                # iterating over the columns of the second array
                for j in range(len(array2[0])):
                    elements = 0
                    
                    # iterating over the columns of the first array
                    for k in range(len(array1[0])):
                        
                        # calculating the elements                      
                        elements += array1[i][k] * array2[k][j]
                        
                    # append the elements to the rows
                    rows.append(elements)
                    
                # append the rows to the result
                result.append(rows)
                
            return result
        
        # if the arrays are of mixed type (e.g. array1 being a matrix and array2 being a vector)
        
        elif isinstance(array1[0], list) is True or isinstance(array2[0], list):
            
            # first array is matrix
            if isinstance(array1[0], list) is True and isinstance(array2[0], int) and len(array1[0]) == len(array2):
                
                # iterate over the number of rows of the matrix (i.e. array1)
                result = []
                for i in range(len(array1)):
                    elements = 0
                    
                    # iterate over the length of the vector
                    for j in range(len(array2)):
                        # calculate elments
                        elements += array1[i][j] * array2[j]
                        # append elements to result
                    result.append(elements)
                return result
            
            # second array is matrix
            elif isinstance(array1[0], int) is True and isinstance(array2[0], list) and len(array1) == len(array2[0]):
                
                # iterate over length the number of rows of the matrix (i.e. array2)
                result = []
                for i in range(len(array2)):
                    elements = 0
                    
                    # iterate over the length of the vector
                    for j in range(len(array1)):
                        # calculate elements
                        elements += array2[j][i] * array1[j]
                        # append elements to result
                    result.append(elements)
                return result
            
            else:
                # raise ValueError for the dimensions
                raise ValueError("The dot product of the given arrays cannot be calculated. Please check the dimensions.")
        
        else:
            # raise ValueError for the dimensions
            raise ValueError("The dot product of the given arrays cannot be calculated. Please check the dimensions.")
    


### Test Functionality

In [2]:
import numpy as np

# Ones 

ex1 = 5
ex2 = 14

print("Example 1: " + str(SNumPy.ones(ex1)))
print("Example 2: " + str(SNumPy.ones(ex2)))



Example 1: [1, 1, 1, 1, 1]
Example 2: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]


In [3]:
# Zeros

ex3 = 8
ex4 = 19

# first function alternative
print("Example 3: " + str(SNumPy.zeros(ex3)))
print("Example 4: " + str(SNumPy.zeros(ex4)))



Example 3: [0, 0, 0, 0, 0, 0, 0, 0]
Example 4: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]


In [4]:
# Reshape

lst1 = [1,2,3,4,5,6,7,8,9]
ex_shape1 = (3,3)

print("SNumPy Example: " + str(SNumPy.reshape(lst1,ex_shape1)))

print()

# comparison with NumPy
print("NumPy Example:")
print(np.reshape(lst1,(3,3)))

SNumPy Example: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

NumPy Example:
[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [5]:
# Shape

lst2 = [[1,2,3],[6,4,2],[9,6,3],[1,8,2]]
lst_2 = [1,2,3,4,5]

print("SNumPy Matrix Example: " + str(SNumPy.shape(lst2)))
print("SNumPy Vector Example: " + str(SNumPy.shape(lst_2)))  

print()

# comparison with NumPy
print("NumPy Matrix Example: " + str(np.shape(lst2)))
print("NumPy Vector Example: " + str(np.shape(lst_2)))

SNumPy Matrix Example: (4, 3)
SNumPy Vector Example: (1, 5)

NumPy Matrix Example: (4, 3)
NumPy Vector Example: (5,)


In [6]:
# Append

# Example: Matrices
lst3 = [[1,2,3],[1,2,3],[1,2,3]]
lst4 = [[7,7,7],[5,2,8],[4,1,9]]

print("SNumPy Matrix Example: " + str(SNumPy.append(lst3,lst4)))

# Example: Vectors
lst5 = [1,2,3,4,5,6,7]
lst6 = [3,4,1,6,8,7,9]

print("SnumPy Vector Example: " + str(SNumPy.append(lst5,lst6)))

print()

# comparison with NumPy
print("NumPy Matrix Example:")
print(np.append(lst3,lst4,axis=1))
print("NumPy Vector Example:")
print(np.append(lst3,lst4))

SNumPy Matrix Example: [[1, 2, 3, 7, 7, 7], [1, 2, 3, 5, 2, 8], [1, 2, 3, 4, 1, 9]]
SnumPy Vector Example: [1, 2, 3, 4, 5, 6, 7, 3, 4, 1, 6, 8, 7, 9]

NumPy Matrix Example:
[[1 2 3 7 7 7]
 [1 2 3 5 2 8]
 [1 2 3 4 1 9]]
NumPy Vector Example:
[1 2 3 1 2 3 1 2 3 7 7 7 5 2 8 4 1 9]


In [7]:
# Get (note: zero indexing)

lst7 = [[7,7,7,7],[5,2,8,9],[4,1,9,22],[2,3,1,13]]
lst_7 = [1,2,3,4]

# get value from matrix
print("SNumPy Matrix Example: " + str(SNumPy.get(lst7,(2,3))))

# get value from vector (row = 0 as row-vector and zero-indexing)
print("SNumPy Vector Example: " + str(SNumPy.get(lst_7,(0,3))))

print()

# comparison with NumPy
print("NumPy Matrix Example: " + str(np.array([[7,7,7,7],[5,2,8,9],[4,1,9,22],[2,3,1,13]]).item(2,3)))
print("NumPy Vector Example: " + str(np.array([1,2,3,4]).item(3,)))

SNumPy Matrix Example: 22
SNumPy Vector Example: 4

NumPy Matrix Example: 22
NumPy Vector Example: 4


In [8]:
# Add

lst7 = [[7,7,7,7],[5,2,8,9],[4,1,9,22]]
lst8 = [[5,2,8,9],[4,1,9,22],[2,3,1,13]]

lst9 = [1,9,2,8,3,7,4,6,5]
lst10 = [3,3,3,3,3,3,3,3,3]

# Example: Matrices
print("SNumPy Matrix Example: " + str(SNumPy.add(lst7,lst8)))

# Example: Vectors
print("SNumPy Vector Example: " + str(SNumPy.add(lst9,lst10)))

print()

# comparison with NumPy
print("NumPy Matrix Example:")
print(np.add(lst7,lst8))
print("NumPy Vector Example:")
print(np.add(lst9,lst10))

SNumPy Matrix Example: [[12, 9, 15, 16], [9, 3, 17, 31], [6, 4, 10, 35]]
SNumPy Vector Example: [4, 12, 5, 11, 6, 10, 7, 9, 8]

NumPy Matrix Example:
[[12  9 15 16]
 [ 9  3 17 31]
 [ 6  4 10 35]]
NumPy Vector Example:
[ 4 12  5 11  6 10  7  9  8]


In [9]:
# Subtract

lst11 = [[7,7,7,7],[3,1,7,6],[5,2,8,9],[4,1,9,22]]
lst12 = [[5,2,8,9],[4,1,9,22],[2,3,1,13],[9,4,1,2]]

lst13 = [1,9,2,8,3,7,4,6,5,7,8,9]
lst14 = [3,3,3,3,3,3,3,3,3,1,5,4]

# Example: Matrices
print("SNumPy Matrix Example: " + str(SNumPy.subtract(lst11,lst12)))

# Example: Vectors
print("SNumPy Vector Example: " + str(SNumPy.subtract(lst13,lst14)))

print()

# comparison with NumPy
print("NumPy Matrix Example:")
print(np.subtract(lst11,lst12))
print("NumPy Vector Example:")
print(np.subtract(lst13,lst14))

SNumPy Matrix Example: [[2, 5, -1, -2], [-1, 0, -2, -16], [3, -1, 7, -4], [-5, -3, 8, 20]]
SNumPy Vector Example: [-2, 6, -1, 5, 0, 4, 1, 3, 2, 6, 3, 5]

NumPy Matrix Example:
[[  2   5  -1  -2]
 [ -1   0  -2 -16]
 [  3  -1   7  -4]
 [ -5  -3   8  20]]
NumPy Vector Example:
[-2  6 -1  5  0  4  1  3  2  6  3  5]


In [10]:
# Dotproduct

lst15 = [2,2]
lst16 = [1,2]

lst17 = [[1,2],[4,5]]
lst18 = [[1,5],[7,2]]
lst19 = [[4,3,4],[1,2,3]]


# both arrays are vectors
print("SNumPy Vectors Example: " + str(SNumPy.dotproduct(lst15,lst16)))

# both arrays are matrices with the same dimensions 
print("SNumPy Matrices (same dimensions) Example: " + str(SNumPy.dotproduct(lst17,lst18)))

# both arrays are matrices, but with different dimensions (however, fulfilling the shape requirements for dot product)
print("SNumPy Matrices (diff. dimensions) Example: " + str(SNumPy.dotproduct(lst17,lst19)))

# mixed arrays (matrix, vector)
print("SNumPy Matrix - Vector Example: " + str(SNumPy.dotproduct(lst17,lst15)))

# mixed arrays (vector, matrix)
print("SNumPy Vector - Matrix Example: " + str(SNumPy.dotproduct(lst15,lst17)))

print()

# comparison with NumPy
print("NumPy Vectors Example: " + str(np.dot(lst15,lst16)))
print("NumPy Matrices (same dimensions) Example:")
print(np.dot(lst17,lst18))
print("NumPy Matrices (diff. dimensions) Example:")
print(np.dot(lst17,lst19))
print("NumPy Matrix - Vector Example:")
print(np.dot(lst17,lst15)) 
print("NumPy Vector - Matrix Example:")
print(np.dot(lst15,lst17)) 

SNumPy Vectors Example: 6
SNumPy Matrices (same dimensions) Example: [[15, 9], [39, 30]]
SNumPy Matrices (diff. dimensions) Example: [[6, 7, 10], [21, 22, 31]]
SNumPy Matrix - Vector Example: [6, 18]
SNumPy Vector - Matrix Example: [10, 14]

NumPy Vectors Example: 6
NumPy Matrices (same dimensions) Example:
[[15  9]
 [39 30]]
NumPy Matrices (diff. dimensions) Example:
[[ 6  7 10]
 [21 22 31]]
NumPy Matrix - Vector Example:
[ 6 18]
NumPy Vector - Matrix Example:
[10 14]


### Test error handling

In [11]:
# reshape

exc_1 = [1,2,3,4,5,6,7,8,9,10]

# Case 1: Desired matrix has different number of elements than the input array

print(SNumPy.reshape(exc_1,(4,3)))

ValueError: The array cannot be reshaped with the given shape because its elements cannot be equally distributed over the specified number of rows.

In [12]:
# Case 2: Shape not entered in tuple form
print(SNumPy.reshape(exc_1,(3,4,5)))

ValueError: The desired shape must be entered as tuple in the following format: (rows, columns)

In [13]:
# append

exc_2 = [[2,3,4],[1,5,7],[4,8,7]]
exc_3 = [[1,5,7],[4,8,7]]

exc_4 = [1,2,3,4,5]
exc_5 = [1,2,3]

# Case 1: Different matrix dimensions
print(SNumPy.append(exc_2,exc_3))

ValueError: The matrices do not have the same number of rows and thus cannot be appended.

In [14]:
# Case 2: Trying to append matrix and vector
print(SNumPy.append(exc_2,exc_4))

ValueError: A matrix cannot be appended to a vector and vice versa.

In [15]:
# get

exc_6 = [[2,2,2],[3,3,3],[4,4,4]]
exc_7 = [1,2,3,4,5,6]

# Case 1: Out of matrix dimension
print(SNumPy.get(exc_6,(3,1)))

IndexError: Out of matrix dimension.

In [16]:
# Case 2: Out of vector dimension
print(SNumPy.get(exc_7,(0,6)))

IndexError: Out of vector dimension.

In [17]:
# add

exc_8 = [[2,3,4],[2,3,4],[6,7,8]]
exc_9 = [[2,3,4],[2,3,4]]

exc_10 = [1,2,3,4]
exc_11 = [1,2,3,4,5]

# Case 1: Matrices do not have the same dimension
print(SNumPy.add(exc_8,exc_9))

ValueError: The arrays do not have the same dimension and thus cannot be added.

In [18]:
# Case 2: Vectors do not have the same dimension
print(SNumPy.add(exc_10,exc_11))

ValueError: The arrays do not have the same dimension and thus cannot be added.

In [19]:
# Case 3: Trying to add matrix and vector
print(SNumPy.add(exc_8,exc_10))

ValueError: The arrays do not have the same dimension and thus cannot be added.

In [20]:
# subtract

exc_8 = [[2,3,4],[2,3,4],[6,7,8]]
exc_9 = [[2,3,4],[2,3,4]]

exc_10 = [1,2,3,4]
exc_11 = [1,2,3,4,5]

# Case 1: Matrices do not have the same dimension
print(SNumPy.subtract(exc_8,exc_9))

ValueError: The arrays do not have the same dimension and thus cannot be subtracted.

In [21]:
# Case 2: Vectors do not have the same dimension
print(SNumPy.subtract(exc_10,exc_11))

ValueError: The arrays do not have the same dimension and thus cannot be subtracted.

In [22]:
# Case 3: Trying to subtract matrix and vector
print(SNumPy.subtract(exc_8,exc_10))

ValueError: The arrays do not have the same dimension and thus cannot be subtracted.

In [23]:
# dotproduct

exc12 = [2,2]
exc13 = [1,2,4,5]

exc14 = [[1,2],[4,5]]
exc15 = [[1,5,3],[7,2,1],[1,2,1]]


# Case 1: Both arrays are matrices but number of columns of array 1 is not equal to number of rows of array2
print(SNumPy.dotproduct(exc15,exc14))

ValueError: The dot product of the given arrays cannot be calculated. Please check the dimensions.

In [24]:
# Case 2: Both arrays are vectors, but they have a different length 
print(SNumPy.dotproduct(exc12,exc13))

ValueError: The dot product of the given arrays cannot be calculated. Please check the dimensions.

In [25]:
# Case 3: Mixed arrays, but the dimension requirements are not fulfilled  
print(SNumPy.dotproduct(exc13,exc15))

ValueError: The dot product of the given arrays cannot be calculated. Please check the dimensions.

## Task 2 - Hamming’s Code

For the implementation we chose to use the numpy library as vector-matrix calcuations are performed effectively and have dedicated functions.

In [1]:
import numpy as np

With multiplying a message with the generator matrix the message is being transformed into the Hamming-encoded message. That means it now carries the parity check bits. If the message is to be read at the receiving end, multiplying the message with the decoding matrix will result in the original matrix given there has not been a bit flip. In order to check for bit flips the parity check matrix is used. If there has been no flip the result of the encoded message multiplied with the parity check matrix will result in the nullvector. The generator matrix, parity check matrix and decoder matrix were defined below as 'gen', 'ham' and 'dec'.

### Define Generator, Parity Check, Decoder matrix

In [2]:
# generation matrix G, here refered to as gen
gen = np.array([[1,1,0,1],[1,0,1,1],[1,0,0,0],[0,1,1,1],[0,1,0,0],[0,0,1,0],[0,0,0,1]])

# partiy-check matrix h, here refered to as ham
ham = np.array([[1,0,1,0,1,0,1],[0,1,1,0,0,1,1],[0,0,0,1,1,1,1]])

# decoder matrix R, here refered to as dec
dec = np.array([[0,0,1,0,0,0,0],[0,0,0,0,1,0,0],[0,0,0,0,0,1,0],[0,0,0,0,0,0,1]])

In order to generate the encoded message, the original message is multiplied with the generator matrix 'gen' using the numpy dot product function followed by the modulo function to break down the result to a 0-1-bit level.

As the $encode()$ function is designed to only take arrays/vectors of shape (4,1), relevant tests result in exceptions if that requirement is not met. 

The first check makes sure the input vector is a numpy array. If that is not the case a TypeError is raised. 

The second check makes sure the input vector only has 4 values. If that is not the case a ValueError is raised. 

The third check makes sure the input vector does not contain other values than 0 or 1. If it does a ValueError is raised. 

If no exception is raised, the input vector can be used to generate the encoded message.

### Question 2.1 

In [3]:
'''Encodes a message'''

def encode(w):
    
    # check whether w is numpy array and not ordinary array; if not raise exception
    if type(w) != np.ndarray:
        raise TypeError('The input needs to be a numpy array.')
    
    # check whether w has length of 4; if not raise exception
    elif w.shape != (4,):
        raise ValueError('The array needs to have exactly 4 digits.')
    
    # check whether all values in w are 0 or 1; if not raise exception
    elif not np.isin(w,[0,1]).all():
        raise ValueError('Elements in array must either be 0 or 1')
        
    # if none of the above are true, then the encoding can proceed
    else:
        # multiply the message vector with the generation matrix
        c1 = gen.dot(w)

        #modulo 2 in order to turn the result into binary
        c2 = np.mod(c1,2)

        return c2

For the parity check, similar checks need to be made to make sure only appropriate inputs are used. The parity check gets an encoded message, so a 7-digit long numpy array with 0s and 1s. Similar to the tests in the encoding function, we are checking for the right type, shape (of (7,1) this time) and only binary values. 

After having made sure the input is of the correct form, the encoded (and potentially manipulated) message can be checked through multiplying it with the parity matrix followed by applying modulo in order to bring all values down to 0s and 1s. If the result is the nullvector, no bitflips occured.

In [4]:
'''Checks whether there has been a bit flip'''
def parity(c):
    
    # check whether c is numpy array and not ordinary array; if not raise exception
    if type(c) != np.ndarray:
        raise TypeError('The input needs to be a numpy array.')
    
    # check whether c has length of 7; if not raise exception
    elif c.shape != (7,):
        raise ValueError('The array needs to have exactly 7 digits.')
    
    # check whether all values in c are 0 or 1; if not raise exception
    elif not np.isin(c,[0,1]).all():
        raise ValueError('Elements in array must either be 0 or 1')
        
    # if none of the above are true, then the parity-check can proceed
    else:

        # multiplying c with parity check matrix
        x1 = ham.dot(c)

        # modulo 2 in order to turn the result into binary 
        x2 = np.mod(x1,2)

        # check wether all numbers are 0
        nullvector = True
        for e in x2:
            if e != 0:
                nullvector = False

        # translation of bit flip position from binary to decimal
        counter = 0   
        if x2[0] == 1:
            counter += 1
        if x2[1] == 1:
            counter += 2
        if x2[2] == 1:
            counter += 4
            
        # print check result
        if (nullvector): return ("Successful transmission")
        else: return ("Unsuccesful transmission: \t\nA bit flip was discovered at position " + str(counter) + " (left to right, [1-7])")

Similar tests are conducted in the decoding function. It also receives a 7-bit vector, of numpy type and only containing 0s and 1s. In order to transform the received encoded message back to the original message, it needs to be multiplied with the decoder matrix.

### Question 2.2

In [5]:
def decode(c_w):
        
    # check whether c_w is numpy array and not ordinary array; if not raise exception
    if type(c_w) != np.ndarray:
        raise TypeError('The input needs to be a numpy array.')
    
    # check whether c_w has length of 4; if not raise exception
    elif c_w.shape != (7,):
        raise ValueError('The array needs to have exactly 7 digits.')
    
    # check whether all values in c_w are 0 or 1; if not raise exception
    elif not np.isin(c_w,[0,1]).all():
        raise ValueError('Elements in array must either be 0 or 1')
        
    # if none of the above are true, then the encoding can proceed
    else:

        # decode c_w into w with multiplying c_w with the decode matrix
        w1 = dec.dot(c_w)

        # modulo 2 in order to turn the result into binary
        w2 = np.mod(w1,2)

        return w2

### Question 2.3

#### Test with [0,1,0,1]

In [8]:
'''encoding a message'''

# P1 = Partiybit 1; B1 = Messagebit 1
encoded = encode(np.array([0,1,0,1]))
print(str(encoded) + " (P1, P2, B1, P3, B2, B3, B4))")

[0 1 0 0 1 0 1] (P1, P2, B1, P3, B2, B3, B4))


In [9]:
'''decoding a message'''

decoded = decode(np.array([0,1,0,0,1,0,1]))
print(decoded)

[0 1 0 1]


In [14]:
'''check for bit flip (2nd position was flipped)'''

result = parity(np.array([0,0,0,0,1,0,1]))
print(result)

Unsuccesful transmission: 	
A bit flip was discovered at position 2 (left to right, [1-7])


In [28]:
'''check for bit flip (there was none)'''

result = parity(np.array([0,1,0,0,1,0,1]))
print(result)

Successful transmission


#### Test with [1,1,1,0]

In [19]:
'''encoding a message'''

# P1 = Partiybit 1; B1 = Messagebit 1
encoded = encode(np.array([1,1,1,0]))
print(str(encoded) + " (P1, P2, B1, P3, B2, B3, B4))")

[0 0 1 0 1 1 0] (P1, P2, B1, P3, B2, B3, B4))


In [20]:
'''decoding a message'''

decoded = decode(np.array([0,0,1,0,1,1,0]))
print(decoded)

[1 1 1 0]


In [22]:
'''check for bit flip (5nd position was flipped)'''

result = parity(np.array([0,0,1,0,0,1,0]))
print(result)

Unsuccesful transmission: 	
A bit flip was discovered at position 5 (left to right, [1-7])


In [29]:
'''check for bit flip (there was none)'''

result = parity(np.array([0,0,1,0,1,1,0]))
print(result)

Successful transmission


#### Test with [0,1,1,0]

In [24]:
'''encoding a message'''

# P1 = Partiybit 1; B1 = Messagebit 1
encoded = encode(np.array([0,1,1,0]))
print(str(encoded) + " (P1, P2, B1, P3, B2, B3, B4))")

[1 1 0 0 1 1 0] (P1, P2, B1, P3, B2, B3, B4))


In [26]:
'''decoding a message'''

decoded = decode(np.array([1,1,0,0,1,1,0]))
print(decoded)

[0 1 1 0]


In [30]:
'''check for bit flip (there was none)'''

result = parity(np.array([1,1,0,0,1,1,0]))
print(result)

Successful transmission


#### other tests 

In [37]:
'''Test with incorrect type'''

array = encode([0,1,0,1])

TypeError: The input needs to be a numpy array.

In [38]:
'''Test with incorrect length'''

array = encode(np.array([0,1,0,1,1]))

ValueError: The array needs to have exactly 4 digits.

In [39]:
'''Test with incorrect digits'''

array = encode(np.array([0,1,3,2]))

ValueError: Elements in array must either be 0 or 1

## Task 3 - Text Document Similarity

### Function Definition...
... and output control

In [40]:
import pandas as pd
import re
import numpy as np

In [42]:
# initiate a text corpus (in this case a list of strings) using news headlines from "The New York Times" website

textCorpus = [
            "What Time Do the Polls Close?",
            "New Voting Laws Add Difficulties for People With Disabilities Disabilities ",
            "African Countries Say Richer Nations Fall Short on Climate Pledges",
            "Switzerland is paying poorer countries to cut emissions on its behalf, raising concerns that other nations will follow.",
            "Apple Built Its Empire With China. Now Its Foundation Is Showing Cracks.",
            "China’s Business Elite See the Country That Let Them Thrive Slipping Away",
            "AFP claims Medibank hackers are Russian cybercriminals",
            "States record spike in COVID-19 cases",
            "Australia marks Remembrance Day",
            "Democracy and the End of Roe Shaped the Midterm Results",
            "An air of suspense is hanging over the Capitol as the vote counting continues, Carl Hulse writes.",
            "Amid Chaos and Explosions, Russia Says Retreat From Kherson City Is Complete",
            "Most emissions are produced by industrialized countries",
            "Emissions emissions Emissions Emissions rich rich rich rich rich rich rich rich rich"
            ]


# initiate a string which should be compared to the others

inputText = "Most emissions are produced by industrialized, rich countries and not other, poorer countries."

In [43]:
# set up by creating functions for each step in the text vectorization + text similarity calculation

# 1. create function for creating a word dictionary from the text corpus
def createWordDict(textCorpus: list):
    uniqueWords = {}
    for sentence in textCorpus:
        # use regex to replace everything that is not a character, number or space
        sentence = re.sub(r'[^A-Za-z0-9 ]+', '', sentence)
        # get sentence to lowercase
        sentence = sentence.lower()
        # split the sentence into its unique words on the space character
        words = sentence.split(" ")
        # remove words already in the dictionary of words
        words = [x for x in words if x not in uniqueWords]
        # for the others, add them to the dictionary of words
        for word in words:
            uniqueWords[word] = 1
    return uniqueWords


# Testing the function and displaying functionality by printing the output

wordDict = createWordDict(textCorpus)
print(wordDict)

{'what': 1, 'time': 1, 'do': 1, 'the': 1, 'polls': 1, 'close': 1, 'new': 1, 'voting': 1, 'laws': 1, 'add': 1, 'difficulties': 1, 'for': 1, 'people': 1, 'with': 1, 'disabilities': 1, '': 1, 'african': 1, 'countries': 1, 'say': 1, 'richer': 1, 'nations': 1, 'fall': 1, 'short': 1, 'on': 1, 'climate': 1, 'pledges': 1, 'switzerland': 1, 'is': 1, 'paying': 1, 'poorer': 1, 'to': 1, 'cut': 1, 'emissions': 1, 'its': 1, 'behalf': 1, 'raising': 1, 'concerns': 1, 'that': 1, 'other': 1, 'will': 1, 'follow': 1, 'apple': 1, 'built': 1, 'empire': 1, 'china': 1, 'now': 1, 'foundation': 1, 'showing': 1, 'cracks': 1, 'chinas': 1, 'business': 1, 'elite': 1, 'see': 1, 'country': 1, 'let': 1, 'them': 1, 'thrive': 1, 'slipping': 1, 'away': 1, 'afp': 1, 'claims': 1, 'medibank': 1, 'hackers': 1, 'are': 1, 'russian': 1, 'cybercriminals': 1, 'states': 1, 'record': 1, 'spike': 1, 'in': 1, 'covid19': 1, 'cases': 1, 'australia': 1, 'marks': 1, 'remembrance': 1, 'day': 1, 'democracy': 1, 'and': 1, 'end': 1, 'of': 1,

In [44]:
# 2. function to calculate word vectors from the textCorpus strings against the dictionary
def textToVectors(textCorpus: list, uniqueWords: dict):
    corpusVectors = {}
    for sentence in textCorpus:
        # storing original sentence for later use as the key
        originalSentence = sentence
        # initiate wordVector
        wordVector = np.array([])
        # clean input text keeping only characters, numbers and spaces
        sentence = re.sub(r'[^A-Za-z0-9 ]+', '', sentence)
        # get sentence to lowercase
        sentence = sentence.lower()
        # split input text into words
        sentence = sentence.split(" ")
        # create a vector by checking if the words in the wordCorpus are in the inputText that should be checked for similarity
        for word in uniqueWords:
            if word in sentence:
                wordVector = np.append(wordVector, uniqueWords[word] * sentence.count(word))
            else:
                wordVector = np.append(wordVector, 0)
        corpusVectors[originalSentence] = wordVector
    return corpusVectors

# Testing the function and displaying functionality by printing the output
corpusVect = textToVectors(textCorpus, wordDict)
print(corpusVect)

{'What Time Do the Polls Close?': array([1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]), 'New Voting Laws Add Difficulties for People With Disabilities Disabilities ': array([0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 2., 1., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0.

In [45]:
#3. function to calculate a word vector from the input 
def inputToVector(inputText: str, uniqueWords: dict):
    # store original input sentece
    originalInput = inputText
    wordVector = np.array([])
    # clean input text keeping only characters, numbers and spaces
    inputText = re.sub(r'[^A-Za-z0-9 ]+', '', inputText)
    # get inputText to lowercase
    inputText = inputText.lower()
    # split input text into words
    inputText = inputText.split(" ")
    # create a vector by checking if the words in the wordCorpus are in the inputText that should be checked for similarity
    for word in uniqueWords:
        if word in inputText:
            wordVector = np.append(wordVector, uniqueWords[word] * inputText.count(word))
        else:
            wordVector = np.append(wordVector, 0)
    inputVector = {originalInput:wordVector}
    return inputVector

# Testing the function and displaying functionality by printing the output
inpVector = inputToVector(inputText, wordDict)
print(inpVector)

{'Most emissions are produced by industrialized, rich countries and not other, poorer countries.': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       2., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0.,
       0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 1., 1., 1., 1., 1.])}


In [46]:
#4. function to calculate the similarity and keep track of results in a dictionary 
def calculateSimilarity(corpusVectors, inputVector, inputText):

    # implement an option for users to chose between using the Dot Product or the Euclidian Distance as a similarity measure
    # while loop to ensure the input is valid
    check = False
    # as long as there was no valid input, request input
    while check == False:
        # get input
        distMeasure = input("Which distance measure do you want to use? Choose between 'Euclidean Distance' (type 1) or 'Dot Product' (type 2) or type 'Stop'")
        distMeasure = str(distMeasure)
        # check if input is valid
        if distMeasure == "1" or distMeasure == "2":
            # if it is valid, change check to true and break the while loop
            check = True
        elif distMeasure.lower() == "stop":
            #
            break
        else:
            # else print error message and stay in loop
            print("There was a problem with your input, try again! (Either 1 or 2, or 'Stop' to end the program)")
            

    # initiate dict to store results
    results = {}
    
    for originalSentence in corpusVectors:
        checkArr = corpusVectors[originalSentence]
        if distMeasure == "2":
            dotProd = inputVector[inputText] @ checkArr
            results[originalSentence] = dotProd 
        elif distMeasure == "1":
            # calculate euclidean distance between vectors
            euclideanDist = np.linalg.norm(inputVector[inputText] - checkArr)
            results[originalSentence] = euclideanDist
    return results, distMeasure

# Testing the function and displaying functionality by printing the output
similarity, distMeasure = calculateSimilarity(corpusVect, inpVector, inputText)
print(similarity)

Which distance measure do you want to use? Choose between 'Euclidean Distance' (type 1) or 'Dot Product' (type 2) or type 'Stop'1
{'What Time Do the Polls Close?': 4.47213595499958, 'New Voting Laws Add Difficulties for People With Disabilities Disabilities ': 5.196152422706632, 'African Countries Say Richer Nations Fall Short on Climate Pledges': 4.47213595499958, 'Switzerland is paying poorer countries to cut emissions on its behalf, raising concerns that other nations will follow.': 4.69041575982343, 'Apple Built Its Empire With China. Now Its Foundation Is Showing Cracks.': 5.291502622129181, 'China’s Business Elite See the Country That Let Them Thrive Slipping Away': 5.0990195135927845, 'AFP claims Medibank hackers are Russian cybercriminals': 4.358898943540674, 'States record spike in COVID-19 cases': 4.47213595499958, 'Australia marks Remembrance Day': 4.242640687119285, 'Democracy and the End of Roe Shaped the Midterm Results': 4.898979485566356, 'An air of suspense is hanging 

In [47]:
def TextDocumentSimilarity(textCorpus, inputText):

    # 1. word dictionary
    uniqueWords = createWordDict(textCorpus)

    # 2. transform individual strings in inputCorpus to vectors
    corpusVectors = textToVectors(textCorpus, uniqueWords)

    #3. calculate a word vector from the input 
    inputVector = inputToVector(inputText, uniqueWords)
    #4. calculate the similarity and keep track of results in a dictionary, also saving chosen distance measure
    results, measure = calculateSimilarity(corpusVectors, inputVector, inputText)

    # 5. finally print the results
    if measure == "2":
        print("Dot product chosen - Interpretation help: The higher the value, the more similar the sentences.")
    elif measure == "1":
        print("Euclidean Distance chosen - Interpretation help: The maximum similarity value for the chosen method would be 0, the higher the value the more different the sentences.")
    print()
    print("The input text: '",inputText, "' is similar to the following sentences in descending order:")
    print()
    resultsList = []
    if measure == "2":
        for arr in sorted(results, key = results.get, reverse = True):
            resultsList.append(str(arr + " - Similarity: " + str(results[arr])))
        print(*resultsList, sep="\n")
    elif measure == "1":
        for arr in sorted(results, key = results.get, reverse = False):
            resultsList.append(str(arr + " - Distance: " + str(results[arr])))
        print(*resultsList, sep="\n")

### Testing Functionality

In [48]:
# Similarity using Euclidean Distance
TextDocumentSimilarity(textCorpus, inputText)

Which distance measure do you want to use? Choose between 'Euclidean Distance' (type 1) or 'Dot Product' (type 2) or type 'Stop'1
Euclidean Distance chosen - Interpretation help: The maximum similarity value for the chosen method would be 0, the higher the value the more different the sentences.

The input text: ' Most emissions are produced by industrialized, rich countries and not other, poorer countries. ' is similar to the following sentences in descending order:

Most emissions are produced by industrialized countries - Distance: 2.23606797749979
Australia marks Remembrance Day - Distance: 4.242640687119285
AFP claims Medibank hackers are Russian cybercriminals - Distance: 4.358898943540674
What Time Do the Polls Close? - Distance: 4.47213595499958
African Countries Say Richer Nations Fall Short on Climate Pledges - Distance: 4.47213595499958
States record spike in COVID-19 cases - Distance: 4.47213595499958
Switzerland is paying poorer countries to cut emissions on its behalf, ra

In [49]:
# Similarity using dot product
TextDocumentSimilarity(textCorpus, inputText)

Which distance measure do you want to use? Choose between 'Euclidean Distance' (type 1) or 'Dot Product' (type 2) or type 'Stop'2
Dot product chosen - Interpretation help: The higher the value, the more similar the sentences.

The input text: ' Most emissions are produced by industrialized, rich countries and not other, poorer countries. ' is similar to the following sentences in descending order:

Emissions emissions Emissions Emissions rich rich rich rich rich rich rich rich rich - Similarity: 13.0
Most emissions are produced by industrialized countries - Similarity: 8.0
Switzerland is paying poorer countries to cut emissions on its behalf, raising concerns that other nations will follow. - Similarity: 5.0
African Countries Say Richer Nations Fall Short on Climate Pledges - Similarity: 2.0
AFP claims Medibank hackers are Russian cybercriminals - Similarity: 1.0
Democracy and the End of Roe Shaped the Midterm Results - Similarity: 1.0
Amid Chaos and Explosions, Russia Says Retreat Fro