# 5.7 Exercises

## Reinforcement

### R-5.1
Execute the experiment from Code Fragment 5.1 and compare the results
on your system to those we report in Code Fragment 5.2.

![Code Fragment 5.1](Ch_5_img/5.1.png)

![Code Fragment 5.2](Ch_5_img/5.2.png)

In [3]:
import sys  # provides getsizeof function


n = 1000
data = []
for k in range(n):  # NOTE: must fix choice of n
    a = len(data)  # number of elements
    b = sys.getsizeof(data)  # actual size in bytes
    print("Length: {:4d}; Size in bytes: {:5d}".format(a, b))
    data.append(None)


Length:    0; Size in bytes:    56
Length:    1; Size in bytes:    88
Length:    2; Size in bytes:    88
Length:    3; Size in bytes:    88
Length:    4; Size in bytes:    88
Length:    5; Size in bytes:   120
Length:    6; Size in bytes:   120
Length:    7; Size in bytes:   120
Length:    8; Size in bytes:   120
Length:    9; Size in bytes:   184
Length:   10; Size in bytes:   184
Length:   11; Size in bytes:   184
Length:   12; Size in bytes:   184
Length:   13; Size in bytes:   184
Length:   14; Size in bytes:   184
Length:   15; Size in bytes:   184
Length:   16; Size in bytes:   184
Length:   17; Size in bytes:   248
Length:   18; Size in bytes:   248
Length:   19; Size in bytes:   248
Length:   20; Size in bytes:   248
Length:   21; Size in bytes:   248
Length:   22; Size in bytes:   248
Length:   23; Size in bytes:   248
Length:   24; Size in bytes:   248
Length:   25; Size in bytes:   312
Length:   26; Size in bytes:   312
Length:   27; Size in bytes:   312
Length:   28; Size i

### R-5.2 
In Code Fragment 5.1, we perform an experiment to compare the length of
a Python list to its underlying memory usage. Determining the sequence
of array sizes requires a manual inspection of the output of that program.
Redesign the experiment so that the program outputs only those values of
k at which the existing capacity is exhausted. For example, on a system
consistent with the results of Code Fragment 5.2, your program should
output that the sequence of array capacities are 0, 4, 8, 16, 25, . . . .

In [1]:
import sys  # Provides access to system-specific parameters and functions, including `getsizeof` to check memory usage.

n = 1000  # Number of iterations (elements) to be added to the list
data = []  # Initialize an empty list to store elements

# Loop to append elements to the list and monitor changes in memory usage
for k in range(n):
    a = len(data)  # Current number of elements in the list
    b = sys.getsizeof(data)  # Current size of the list in bytes

    data.append(None)  # Append a `None` element to the list, which will cause the list to resize when necessary
    c = sys.getsizeof(data)  # New size of the list in bytes after appending the element

    # Check if the list has resized (i.e., if the size in bytes has increased)
    if c > b:
        # Print the length of the list and its size in bytes before the resize occurred
        print("Length: {:6d}; Size in bytes: {:6d}".format(a, b))


Length:      0; Size in bytes:     56
Length:      4; Size in bytes:     88
Length:      8; Size in bytes:    120
Length:     16; Size in bytes:    184
Length:     24; Size in bytes:    248
Length:     32; Size in bytes:    312
Length:     40; Size in bytes:    376
Length:     52; Size in bytes:    472
Length:     64; Size in bytes:    568
Length:     76; Size in bytes:    664
Length:     92; Size in bytes:    792
Length:    108; Size in bytes:    920
Length:    128; Size in bytes:   1080
Length:    148; Size in bytes:   1240
Length:    172; Size in bytes:   1432
Length:    200; Size in bytes:   1656
Length:    232; Size in bytes:   1912
Length:    268; Size in bytes:   2200
Length:    308; Size in bytes:   2520
Length:    352; Size in bytes:   2872
Length:    400; Size in bytes:   3256
Length:    456; Size in bytes:   3704
Length:    520; Size in bytes:   4216
Length:    592; Size in bytes:   4792
Length:    672; Size in bytes:   5432
Length:    760; Size in bytes:   6136
Length:    8

### R-5.3
Modify the experiment from Code Fragment 5.1 in order to demonstrate
that Python’s list class occasionally shrinks the size of its underlying array
when elements are popped from a list.

In [2]:
import sys  # Importing sys to use the getsizeof function

n = 1000  # Set n to 1000
data = [num for num in range(n)]  # Create a list of numbers from 0 to n-1

# Iterate from 0 to n-1
for k in range(n):  
    a = len(data)  # Get the current number of elements in the list
    b = sys.getsizeof(data)  # Get the current size of the list in bytes
    
    data.pop(a-1)  # Remove the last element from the list
    
    c = sys.getsizeof(data)  # Get the new size of the list in bytes after popping an element
    
    # If the new size is less than the previous size, print the details
    if c < b:
        print("Length: {:4d}; Size in bytes: {:5d}".format(a, b))


Length:  550; Size in bytes:  8856
Length:  310; Size in bytes:  5016
Length:  176; Size in bytes:  2872
Length:  100; Size in bytes:  1656
Length:   58; Size in bytes:   984
Length:   34; Size in bytes:   600
Length:   20; Size in bytes:   376
Length:   12; Size in bytes:   248
Length:    8; Size in bytes:   184
Length:    6; Size in bytes:   152
Length:    2; Size in bytes:   120
Length:    1; Size in bytes:    88


### R-5.4 
Our DynamicArray class, as given in Code Fragment 5.3, does not support
use of negative indices with `__getitem__` . Update that method to better
match the semantics of a Python list.

![Code Fragment 5.3](Ch_5_img/5.3.png)

In [7]:
import ctypes  # provides low-level arrays

class DynamicArray:
    """A dynamic array class akin to a simplified Python list."""

    def __init__(self):
        """Create an empty array."""
        self.n = 0  # count actual elements
        self.capacity = 1  # default array capacity
        self.A = self._make_array(self.capacity)  # low-level array

    def __len__(self):
        """Return number of elements stored in the array."""
        return self.n

# ===================================MY-CONTRIBUTION==================================

    def __getitem__(self, k):
        """Return element at index k."""
        if not 0 <= k < self.n:
            neg_k = self.__len__() + k
            return self.A[neg_k]
        return self.A[k]  # retrieve from array
    
# ==================================END-MY-CONTRIBUTION================================

    def append(self, obj):
        """Add object to end of the array."""
        if self.n == self.capacity:  # not enough room
            self._resize(2 * self.capacity)  # so double capacity
        self.A[self.n] = obj
        self.n += 1

    def _resize(self, c):  # nonpublic utility
        """Resize internal array to capacity c."""
        B = self._make_array(c)  # new (bigger) array
        for k in range(self.n):  # for each existing value
            B[k] = self.A[k]
        self.A = B  # use the bigger array
        self.capacity = c

    def _make_array(self, c):  # nonpublic utility
        """Return new array with capacity c."""
        return (c * ctypes.py_object)()  # see ctypes documentation


### R-5.6 
Our implementation of insert for the DynamicArray class, as given in
Code Fragment 5.5, has the following inefficiency. In the case when a resize
occurs, the resize operation takes time to copy all the elements from
an old array to a new array, and then the subsequent loop in the body of
insert shifts many of those elements. Give an improved implementation
of the insert method, so that, in the case of a resize, the elements are
shifted into their final position during that operation, thereby avoiding the
subsequent shifting.

![Code Fragment 5.5](Ch_5_img/5.5.png)

In [7]:
import ctypes  # provides low-level arrays

class DynamicArray:
    """A dynamic array class akin to a simplified Python list."""

    def __init__(self):
        """Create an empty array."""
        self.n = 0  # count actual elements
        self.capacity = 1  # default array capacity
        self.A = self.make_array(self.capacity)  # low-level array

    def __len__(self):
        """Return number of elements stored in the array."""
        return self.n

    def __getitem__(self, k):
        """Return element at index k."""
        if not 0 <= k < self.n:
            raise IndexError('invalid index')
        return self.A[k]  # retrieve from array

    def append(self, obj):
        """Add object to end of the array."""
        if self.n == self.capacity:  # not enough room
            self.resize(2 * self.capacity)  # so double capacity
        self.A[self.n] = obj
        self.n += 1

    def resize(self, c):  # nonpublic utility
        """Resize internal array to capacity c."""
        B = self.make_array(c)  # new (bigger) array
        for k in range(self.n):  # for each existing value
            B[k] = self.A[k]
        self.A = B  # use the bigger array
        self.capacity = c

    def make_array(self, c):  # nonpublic utility
        """Return new array with capacity c."""
        return (c * ctypes.py_object)()  # see ctypes documentation


    def insert(self, k, value):
        """Insert value at index k, shifting subsequent values rightward."""
        if self.n == self.capacity:  # not enough room
            # Resize and shift elements directly into their final positions
            new_capacity = 2 * self.capacity
            B = self.make_array(new_capacity)
            for i in range(self.n):  # Copy elements to new array
                if i < k:
                    B[i] = self.A[i]  # Elements before index k
                else:
                    B[i + 1] = self.A[i]  # Elements from index k onwards are shifted right
            self.A = B
            self.capacity = new_capacity
        else:
            for j in range(self.n, k, -1):  # Shift elements to the right
                self.A[j] = self.A[j - 1]
        self.A[k] = value  # Store newest element
        self.n += 1  # Increment number of elements



### R-5.10 
The constructor for the CaesarCipher class in Code Fragment 5.11 can
be implemented with a two-line body by building the forward and backward
strings using a combination of the join method and an appropriate
comprehension syntax. Give such an implementation.

![Code Fragment 5.11](Ch_5_img/5.11.png)

In [10]:
class CaesarCipher:
    """Class for doing encryption and decryption using a Caesar cipher."""

    def __init__(self, shift):
        """Construct Caesar cipher using given integer shift for rotation."""
    # ===================================MY-CONTRIBUTION==================================
        self.forward = ''.join(chr((k + shift) % 26 + ord('A')) for k in range(26))
        self.backward = ''.join(chr((k - shift) % 26 + ord('A')) for k in range(26))
    # ================================END-MY-CONTRIBUTION==================================

    def encrypt(self, message):
        """Return string representing encrypted message."""
        return self.transform(message, self.forward)

    def decrypt(self, secret):
        """Return decrypted message given encrypted secret."""
        return self.transform(secret, self.backward)

    def transform(self, original, code):
        """Utility to perform transformation based on given code string."""
        msg = list(original)
        for k in range(len(msg)):
            if msg[k].isupper():
                j = ord(msg[k]) - ord('A')  # index from 0 to 25
                msg[k] = code[j]  # replace this character
        return ''.join(msg)

if __name__ == "__main__":
    cipher = CaesarCipher(3)
    message = "THE EAGLE IS IN PLAY; MEET AT JOE S."
    coded = cipher.encrypt(message)
    print("Secret:", coded)
    answer = cipher.decrypt(coded)
    print("Message:", answer)


Secret: WKH HDJOH LV LQ SODB; PHHW DW MRH V.
Message: THE EAGLE IS IN PLAY; MEET AT JOE S.


### R-5.12 
Describe how the built-in sum function can be combined with Python’s
comprehension syntax to compute the sum of all numbers in an n×n data
set, represented as a list of lists.

In [4]:
# Define a 3x3 matrix
matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

def sum_all(lst):
    # Calculate the sum of each sublist (row) in the matrix
    sums = list(sum(num) for num in lst)
    
    # Sum up all the row sums to get the total sum of all elements in the matrix
    total = sum(sums)
    
    # Print the total sum
    print(total)

# Call the function with the matrix as input
sum_all(matrix)

45


## Creativity

### C-5.13 
In the experiment of Code Fragment 5.1, we begin with an empty list. If
data were initially constructed with nonempty length, does this affect the
sequence of values at which the underlying array is expanded? Perform
your own experiments, and comment on any relationship you see between
the initial length and the expansion sequence.

In [5]:
import sys  # Importing the sys module to use the getsizeof function, which returns the size of an object in bytes.

n = 30  # Defining the number of iterations.
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]  # Initializing a list with 10 elements.

for k in range(n):  # Iterating n times (in this case, 30 times).
    a = len(data)  # Getting the current number of elements in the list.
    b = sys.getsizeof(data)  # Getting the current size of the list in bytes.

    data.append(None)  # Appending None to the list, which increases its length by 1.
    c = sys.getsizeof(data)  # Getting the new size of the list in bytes after appending.

    # If the size in bytes has increased, print the length of the list and its size in bytes before the increase.
    if c > b:
        print("Length: {:6d}; Size in bytes: {:6d}".format(a, b))  

Length:     10; Size in bytes:    136
Length:     16; Size in bytes:    184
Length:     24; Size in bytes:    248
Length:     32; Size in bytes:    312


In [8]:
import sys  # provides getsizeof function


n = 30
data = [1, 2, 3]
for k in range(n):  # NOTE: must fix choice of n
    a = len(data)  # number of elements
    b = sys.getsizeof(data)  # actual size in bytes

    data.append(None) # Appending None to the list, which increases its length by 1.
    c = sys.getsizeof(data)  # Getting the new size of the list in bytes after appending.
    
    # If the size in bytes has increased, print the length of the list and its size in bytes before the increase.
    if c > b:
        print("Length: {:6d}; Size in bytes: {:6d}".format(a, b))

Length:      4; Size in bytes:     88
Length:      8; Size in bytes:    120
Length:     16; Size in bytes:    184
Length:     24; Size in bytes:    248
Length:     32; Size in bytes:    312


The sequence of values at which the underlying array is expanded depends on the initial length of the list. Starting with a non-empty list changes the points at which the array expands because the list's initial capacity affects how much additional space is allocated as the list grows. The underlying array expands when the current capacity is exceeded, leading to a new allocation that often increases the capacity by a factor

### C-5.15 

Consider an implementation of a dynamic array, but instead of copying
the elements into an array of double the size (that is, from N to 2N) when
its capacity is reached, we copy the elements into an array with 
[N/4] additional cells, going from capacity N to capacity N + [N/4]. Prove that performing a sequence of n append operations still runs in O(n) time in this case.

### Analysis of Append Operations

- **Appending without Resizing**: Each append operation that does not trigger a resize is an $ O(1) $ operation. If we append an element and the array has enough capacity, we simply place the new element at the next available index.

- **Appending with Resizing**: When the array reaches its capacity, a resize operation occurs. According to the given strategy, the array is resized from $ N $ to $ N + \left\lfloor \frac{N}{4} \right\rfloor $.

### Cost of Resizing

When resizing, we need to copy all existing elements to the new array. Let's analyze the resizing costs:

- Initial array size: $ N_0 $
- After the first resize: $ N_1 = N_0 + \left\lfloor \frac{N_0}{4} \right\rfloor \approx 1.25 N_0 $
- After the second resize: $ N_2 = N_1 + \left\lfloor \frac{N_1}{4} \right\rfloor \approx 1.25^2 N_0 $
- And so on.

The $ k $-th resize approximately occurs when the array size is $ N_k \approx N_0 \cdot 1.25^k $.

### Total Cost of Resizing

To find the total cost of resizing, we sum the cost of each resizing step:


$ \text{Total resizing cost} \approx \sum_{k=0}^{m} N_k \approx \sum_{k=0}^{m} N_0 \cdot 1.25^k$


Where $ m $ is the number of resizes needed until the array reaches size $ n $. Since $ N_m \approx n $,


$ m \approx \log_{1.25} \left( \frac{n}{N_0} \right)$


The series $ \sum_{k=0}^{m} 1.25^k $ is a geometric series and converges to $ O(n) $. Thus, the total resizing cost is $ O(n) $.

### Conclusion

Since the total resizing cost is $ O(n) $ and each append operation is $ O(1) $, the overall time complexity for $ n $ append operations remains $ O(n) $.



### C-5.16 
Implement a pop method for the DynamicArray class, given in Code Fragment
5.3, that removes the last element of the array, and that shrinks the
capacity, $N$, of the array by half any time the number of elements in the
array goes below $N/4$.

In [3]:
import ctypes  # Provides low-level arrays

class DynamicArray:
    """A dynamic array class akin to a simplified Python list."""

    def __init__(self):
        """Create an empty array."""
        self.n = 0  # Count actual elements
        self.capacity = 1  # Default array capacity
        self.A = self.make_array(self.capacity)  # Low-level array

    def __len__(self):
        """Return number of elements stored in the array."""
        return self.n

    def __getitem__(self, k):
        """Return element at index k."""
        if not 0 <= k < self.n:
            raise IndexError("Invalid index")
        return self.A[k]  # Retrieve from array

    def append(self, obj):
        """Add object to end of the array."""
        if self.n == self.capacity:  # Not enough room
            self.resize(2 * self.capacity)  # Double capacity
        self.A[self.n] = obj
        self.n += 1

    def resize(self, c):  # Non-public utility
        """Resize internal array to capacity c."""
        B = self.make_array(c)  # New (bigger) array
        for k in range(self.n):  # For each existing value
            B[k] = self.A[k]
        self.A = B  # Use the bigger array
        self.capacity = c

    def make_array(self, c):  # Non-public utility
        """Return new array with capacity c."""
        return (c * ctypes.py_object)()  # See ctypes documentation
    
    def _pop(self):
        """Remove and return the last element of the array."""
        if self.n == 0:
            raise IndexError("Pop from empty array")

        last_element = self.A[self.n - 1]  # Get the last element
        self.A[self.n - 1] = None  # Clear the reference
        self.n -= 1  # Decrement the count of elements

        # Shrink the capacity if necessary
        if self.n < self.capacity // 4 and self.capacity > 1:  # Ensure capacity does not go below 1
            self.resize(self.capacity // 2)  # Half the capacity

        return last_element


###  C-5.19 
Consider a variant of Exercise C-5.16, in which an array of capacity N is
resized to capacity precisely that of the number of elements, any time the
number of elements in the array goes strictly below N/4. Give a formal
proof that any sequence of n append or pop operations on an initially
empty dynamic array takes O(n) time.

### 1. Append Operation
**Case 1: No Resize Needed**
When there is sufficient capacity in the array (i.e., $ n < N $), the append operation takes $ O(1) $ time.

**Case 2: Resize Needed**
When the array is full (i.e., $ n = N $), a resize is needed. The time complexity for resizing involves creating a new array of size $ 2N $ and copying all existing elements to the new array. This operation takes $ O(N) $ time. However, since this resize occurs infrequently, we need to analyze the amortized cost.

**Amortized Analysis**
- Suppose we perform $ k $ append operations, leading to $ N $ being the capacity before a resize.
- The first $ N $ appends take $ O(N) $ time in total due to the single resize.
- The next $ N $ appends will again cause only one resize, taking $ O(2N) $ time in total.
- This pattern continues, so for every $ N $ appends, we perform at most $ O(N) $ work.
- Thus, the amortized cost of each append is $ O(1) $.

### 2. Pop Operation

**Case 1: No Resize Needed**
When the array is not empty and the number of elements does not drop below $ \frac{N}{4} $, the pop operation takes $ O(1) $ time.

**Case 2: Resize Needed**
When the number of elements drops below $ \frac{N}{4} $, we resize the array to the exact number of elements. Suppose we perform $ k $ pop operations leading to $ n $ being the number of elements.
- The resizing operation, which matches the capacity to the number of elements, may take $ O(N) $ time.

**Amortized Analysis**
- Similar to append operations, if we consider $ m $ pop operations, the resize may happen at most $ O(m) $ times (as elements are removed), and it costs $ O(N) $ for each resize.
- However, since each pop reduces the number of elements and hence the maximum capacity will not exceed the number of total elements removed, the total time spent on all resizing operations across all pops will still be $ O(n) $ when distributed across $ n $ pop operations.

### Conclusion
Combining the analyses for both operations, we find:
- Each append operation has an amortized cost of $ O(1) $.
- Each pop operation has an amortized cost of $ O(1) $ as well.

Thus, the total time for $ n $ append and pop operations is $ O(n) $, proving that the entire sequence of operations takes $ O(n) $ time.


### C-5.21 
In Section 5.4.2, we described four different ways to compose a long
string: 
1. repeated concatenation 
2. appending to a temporary list and then joining 
3. using list comprehension with join
4. using generator comprehension with join.

Develop an experiment to test the efficiency of all four of these approaches and report your findings.

In [8]:
import timeit

# Define the number of iterations and the length of the string to be created
num_iterations = 10000
string_length = 100

# Create a sample string to work with
sample_string = 'x' * string_length

# Method 1: Repeated Concatenation
def repeated_concatenation():
    result = ''
    for _ in range(num_iterations):
        result += sample_string
    return result

# Method 2: Appending to a Temporary List and then Joining
def append_and_join():
    parts = []
    for _ in range(num_iterations):
        parts.append(sample_string)
    return ''.join(parts)

# Method 3: List Comprehension with Join
def list_comprehension_join():
    return ''.join([sample_string for _ in range(num_iterations)])

# Method 4: Generator Comprehension with Join
def generator_comprehension_join():
    return ''.join(sample_string for _ in range(num_iterations))

# Timing the methods
time_method1 = timeit.timeit(repeated_concatenation, number=1)
print(f"Repeated Concatenation: {time_method1:.6f} seconds")

time_method2 = timeit.timeit(append_and_join, number=1)
print(f"Append and Join: {time_method2:.6f} seconds")

time_method3 = timeit.timeit(list_comprehension_join, number=1)
print(f"List Comprehension Join: {time_method3:.6f} seconds")

time_method4 = timeit.timeit(generator_comprehension_join, number=1)
print(f"Generator Comprehension Join: {time_method4:.6f} seconds")


Repeated Concatenation: 0.002653 seconds
Append and Join: 0.000704 seconds
List Comprehension Join: 0.000711 seconds
Generator Comprehension Join: 0.001121 seconds


### C-5.22 
Develop an experiment to compare the relative efficiency of the extend
method of Python’s list class versus using repeated calls to append to
accomplish the equivalent task.

In [9]:
import timeit

# Function to measure the time taken by the extend method
def measure_extend():
    my_list = []
    elements_to_add = list(range(1000))
    my_list.extend(elements_to_add)

# Function to measure the time taken by repeated calls to append
def measure_append():
    my_list = []
    elements_to_add = list(range(1000))
    for elem in elements_to_add:
        my_list.append(elem)

# Measure the time taken by the extend method
time_extend = timeit.timeit(measure_extend, number=1000)
print(f"Time taken by extend method: {time_extend:.6f} seconds")

# Measure the time taken by repeated calls to append
time_append = timeit.timeit(measure_append, number=1000)
print(f"Time taken by repeated calls to append: {time_append:.6f} seconds")


Time taken by extend method: 0.019603 seconds
Time taken by repeated calls to append: 0.056977 seconds


### C-5.23 
Based on the discussion of page 207, develop an experiment to compare
the efficiency of Python’s list comprehension syntax versus the construction
of a list by means of repeated calls to append.

In [10]:
import time

# Define the list sizes to test
list_sizes = [1000, 10000, 100000, 1000000]

def list_comprehension(n):
    return [k * k for k in range(1, n + 1)]

def list_append(n):
    result = []
    for k in range(1, n + 1):
        result.append(k * k)
    return result

for size in list_sizes:
    # Measure time for list comprehension
    start_time = time.time()
    list_comprehension(size)
    time_comprehension = time.time() - start_time

    # Measure time for list construction using append
    start_time = time.time()
    list_append(size)
    time_append = time.time() - start_time

    print(f"List size: {size}")
    print(f"Time taken by list comprehension: {time_comprehension:.6f} seconds")
    print(f"Time taken by list append: {time_append:.6f} seconds")
    print()


List size: 1000
Time taken by list comprehension: 0.000000 seconds
Time taken by list append: 0.000000 seconds

List size: 10000
Time taken by list comprehension: 0.001000 seconds
Time taken by list append: 0.000999 seconds

List size: 100000
Time taken by list comprehension: 0.008434 seconds
Time taken by list append: 0.007773 seconds

List size: 1000000
Time taken by list comprehension: 0.100091 seconds
Time taken by list append: 0.108430 seconds



### C-5.25 
The syntax data.remove(value) for Python list data removes only the first
occurrence of element value from the list. Give an implementation of a
function, with signature remove all(data, value), that removes all occurrences
of value from the given list, such that the worst-case running time
of the function is O(n) on a list with n elements. Not that it is not efficient
enough in general to rely on repeated calls to remove.

In [11]:
def remove_all(data, value):
    # Create a new list with all elements that are not equal to value
    new_data = [item for item in data if item != value]
    
    # Clear the original list and extend it with new_data
    data.clear()
    data.extend(new_data)

# Example usage
data = [1, 2, 3, 4, 2, 5, 2, 6]
value_to_remove = 2
remove_all(data, value_to_remove)
print(data)  # Output should be [1, 3, 4, 5, 6]


[1, 3, 4, 5, 6]


### C-5.29 
A useful operation in databases is the natural join. If we view a database
as a list of ordered pairs of objects, then the natural join of databases A
and B is the list of all ordered triples (x,y, z) such that the pair (x,y) is in
A and the pair (y, z) is in B. Describe and analyze an efficient algorithm
for computing the natural join of a list A of n pairs and a list B of m pairs.

In [12]:
def natural_join(A, B):
    join_dict = {}
    
    # Step 1: Build a dictionary from list A
    for (x, y) in A:
        if y not in join_dict:
            join_dict[y] = []
        join_dict[y].append(x)
    
    # Step 2: Iterate through list B and create the result
    result = []
    for (y, z) in B:
        if y in join_dict:
            for x in join_dict[y]:
                result.append((x, y, z))
    
    return result

# Example usage
A = [(1, 'a'), (2, 'b'), (3, 'a')]
B = [('a', 10), ('b', 20), ('a', 30)]
print(natural_join(A, B))
# Output should be [(1, 'a', 10), (3, 'a', 10), (1, 'a', 30), (3, 'a', 30), (2, 'b', 20)]


[(1, 'a', 10), (3, 'a', 10), (2, 'b', 20), (1, 'a', 30), (3, 'a', 30)]


## Projects

### P-5.32 

Write a Python function that takes two three-dimensional numeric data
sets and adds them componentwise.

In [1]:
class Matrices:

    def __init__(self):
        self.var = 0

    def row_cal(self, matrix):               # Function to calculate the number of rows in a matrix
        rows = 0
        for row in range(len(matrix)):
            rows += 1
        return rows

    def column_cal(self, matrix):           # Function to calculate the number of columns in a matrix
        columns = 0
        for column in range(len(matrix[0])):
            columns += 1
        return columns


    def product(self, number):              # Function to create the formula for calculating the dot product

        product_formula = ""

        for num in range(number):

            formula = f" + matrix_1[i][n+{num}] * matrix_2[m+{num}][j]"

            product_formula = product_formula + formula

        return product_formula[3:]              # Return the formula without the leading " + "


    def matrix_dot_product(self, matrix_1, matrix_2):              # Function to calculate the dot product of two matrices
        row_1 = self.row_cal(matrix_1)
        column_1 = self.column_cal(matrix_1)
                                                                   # Calculate the dimensions of both matrices
        row_2 = self.row_cal(matrix_2)
        column_2 = self.column_cal(matrix_2)

        print(f"Dimensions of matrix_1: {row_1} by {column_1}")
        print(f"Dimensions of matrix_2: {row_2} by {column_2}")
        final_matrix = []

        if column_1 == row_2:
            agreement = True                                   # Check if the matrices can be multiplied (columns of the first matrix should be equal to rows of the second matrix)
        else:
            agreement = False

        if agreement == True:
                                                                  # If the matrices can be multiplied
            i = 0
            j = 0
            m = 0
            n = 0


            if row_1 > column_1:                                 # Determine the dimension for the formula
                matrix_dim = row_1
            else:
                matrix_dim = column_1

            formula = self.product(matrix_dim)                          # Get the dot product formula
            for row_num in range(row_1):
                i = row_num
                products = []

                for col_num in range(column_2):                          # Calculate the dot product
                    j = col_num
                    # dot_product = eval(formula)
                    variable = "self.var = "
                    final_script = variable + formula
                    exec(final_script)
                    products.append(self.var)

                final_matrix.append(products)

            return final_matrix

        else:
            return print("Can't take dot product of these matrices because their dimensions do not agree.")


    def matrix_addition(self, matrix_1, matrix_2):                  # Function to add two matrices
        row_1 = self.row_cal(matrix_1)
        column_1 = self.column_cal(matrix_1)
                                                                     # Calculate the dimensions of both matrices
        row_2 = self.row_cal(matrix_2)
        column_2 = self.column_cal(matrix_2)

        print(f"Dimensions of matrix_1: {row_1} by {column_1}")
        print(f"Dimensions of matrix_2: {row_2} by {column_2}")

        additions_matrix = []

        if row_1 == row_2 and column_1 == column_2:                    # Check if the matrices have the same dimensions
            a = 0
            b = 0
            for num_row in range(row_1):                       # Calculate the addition of each corresponding element
                a = num_row
                additions_list = []

                for num_col in range(column_2):
                    b = num_col
                    addition = matrix_1[a][b] + matrix_2[a][b]

                    additions_list.append(addition)

                additions_matrix.append(additions_list)

            return additions_matrix

        else:
            return print("Can't do addition because the matrices have different dimensions.")


    def addition_3D_matrix(self, dataset_1, dataset_2):  # Function to add two 3_dimensional datasets

        res_matrix_3D = []

        if len(dataset_1) != len(dataset_2):
            print("The dimensions of the datasets do no agree.")

        else:

            for i in range(len(dataset_1)):
                matrix_1 = dataset_1[i]
                matrix_2 = dataset_2[i]

                result = self.matrix_addition(matrix_1, matrix_2)
                res_matrix_3D.append(result)

            return res_matrix_3D


data_1 = [
    [
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ],
    [
        [10, 11, 12],
        [13, 14, 15],
        [16, 17, 18]
    ],
    [
        [19, 20, 21],
        [22, 23, 24],
        [25, 26, 27]
    ],
    [
        [28, 29, 30],
        [31, 32, 33],
        [34, 35, 36]
    ]
]

data_2 = [
    [
        [101, 102, 103],
        [104, 105, 106],
        [107, 108, 109]
    ],
    [
        [110, 111, 112],
        [113, 114, 115],
        [116, 117, 118]
    ],
    [
        [119, 120, 121],
        [122, 123, 124],
        [125, 126, 127]
    ],
    [
        [128, 129, 130],
        [131, 132, 133],
        [134, 135, 136]
    ]
]

matrices = Matrices()
resultant_matrix = matrices.addition_3D_matrix(data_1, data_2)

# Loop through and print each 2D array in dataset_2
for matrix in resultant_matrix:
    for row in matrix:
        print(row)
    print()  # Adds an empty line between 3x3 matrices


Dimensions of matrix_1: 3 by 3
Dimensions of matrix_2: 3 by 3
Dimensions of matrix_1: 3 by 3
Dimensions of matrix_2: 3 by 3
Dimensions of matrix_1: 3 by 3
Dimensions of matrix_2: 3 by 3
Dimensions of matrix_1: 3 by 3
Dimensions of matrix_2: 3 by 3
[102, 104, 106]
[108, 110, 112]
[114, 116, 118]

[120, 122, 124]
[126, 128, 130]
[132, 134, 136]

[138, 140, 142]
[144, 146, 148]
[150, 152, 154]

[156, 158, 160]
[162, 164, 166]
[168, 170, 172]



### P-5.33 
Write a Python program for a matrix class that can add and multiply two-dimensional
arrays of numbers, assuming the dimensions agree appropriately
for the operation.

In [2]:
class Matrices:

    def __init__(self):
        self.var = 0

    def row_cal(self, matrix):               # Function to calculate the number of rows in a matrix
        rows = 0
        for row in range(len(matrix)):
            rows += 1
        return rows

    def column_cal(self, matrix):           # Function to calculate the number of columns in a matrix
        columns = 0
        for column in range(len(matrix[0])):
            columns += 1
        return columns


    def product(self, number):              # Function to create the formula for calculating the dot product

        product_formula = ""

        for num in range(number):

            formula = f" + matrix_1[i][n+{num}] * matrix_2[m+{num}][j]"

            product_formula = product_formula + formula

        return product_formula[3:]              # Return the formula without the leading " + "


    def matrix_dot_product(self, matrix_1, matrix_2):              # Function to calculate the dot product of two matrices
        row_1 = self.row_cal(matrix_1)
        column_1 = self.column_cal(matrix_1)
                                                                   # Calculate the dimensions of both matrices
        row_2 = self.row_cal(matrix_2)
        column_2 = self.column_cal(matrix_2)

        print(f"Dimensions of matrix_1: {row_1} by {column_1}")
        print(f"Dimensions of matrix_2: {row_2} by {column_2}")
        final_matrix = []

        if column_1 == row_2:
            agreement = True                                   # Check if the matrices can be multiplied (columns of the first matrix should be equal to rows of the second matrix)
        else:
            agreement = False

        if agreement == True:
                                                                  # If the matrices can be multiplied
            i = 0
            j = 0
            m = 0
            n = 0


            if row_1 > column_1:                                 # Determine the dimension for the formula
                matrix_dim = row_1
            else:
                matrix_dim = column_1

            formula = self.product(matrix_dim)                          # Get the dot product formula
            for row_num in range(row_1):
                i = row_num
                products = []

                for col_num in range(column_2):                          # Calculate the dot product
                    j = col_num
                    # dot_product = eval(formula)
                    variable = "self.var = "
                    final_script = variable + formula
                    exec(final_script)
                    products.append(self.var)

                final_matrix.append(products)

            return final_matrix

        else:
            return print("Can't take dot product of these matrices because their dimensions do not agree.")


    def matrix_addition(self, matrix_1, matrix_2):                  # Function to add two matrices
        row_1 = self.row_cal(matrix_1)
        column_1 = self.column_cal(matrix_1)
                                                                     # Calculate the dimensions of both matrices
        row_2 = self.row_cal(matrix_2)
        column_2 = self.column_cal(matrix_2)

        print(f"Dimensions of matrix_1: {row_1} by {column_1}")
        print(f"Dimensions of matrix_2: {row_2} by {column_2}")

        additions_matrix = []

        if row_1 == row_2 and column_1 == column_2:                    # Check if the matrices have the same dimensions
            a = 0
            b = 0
            for num_row in range(row_1):                       # Calculate the addition of each corresponding element
                a = num_row
                additions_list = []

                for num_col in range(column_2):
                    b = num_col
                    addition = matrix_1[a][b] + matrix_2[a][b]

                    additions_list.append(addition)

                additions_matrix.append(additions_list)

            return additions_matrix

        else:
            return print("Can't do addition because the matrices have different dimensions.")


# Manually created 10x10 array 1
array1 = [
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
    [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
    [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
    [41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
    [51, 52, 53, 54, 55, 56, 57, 58, 59, 60],
    [61, 62, 63, 64, 65, 66, 67, 68, 69, 70],
    [71, 72, 73, 74, 75, 76, 77, 78, 79, 80],
    [81, 82, 83, 84, 85, 86, 87, 88, 89, 90],
    [91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
]

# Manually created 10x10 array 2
array2 = [
    [100, 99, 98, 97, 96, 95, 94, 93, 92, 91],
    [90, 89, 88, 87, 86, 85, 84, 83, 82, 81],
    [80, 79, 78, 77, 76, 75, 74, 73, 72, 71],
    [70, 69, 68, 67, 66, 65, 64, 63, 62, 61],
    [60, 59, 58, 57, 56, 55, 54, 53, 52, 51],
    [50, 49, 48, 47, 46, 45, 44, 43, 42, 41],
    [40, 39, 38, 37, 36, 35, 34, 33, 32, 31],
    [30, 29, 28, 27, 26, 25, 24, 23, 22, 21],
    [20, 19, 18, 17, 16, 15, 14, 13, 12, 11],
    [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
]


my_matrix = Matrices()      # Creates an instance of class Matrices



# Testing the dot_product method

matrix_multiplication  = my_matrix.matrix_dot_product(array1,array2)
for item in matrix_multiplication:
    print(item)



# Testing the matrix_addition method

addition_of_matrix = my_matrix.matrix_addition(array1, array2)
for item in addition_of_matrix:
    print(item)

Dimensions of matrix_1: 10 by 10
Dimensions of matrix_2: 10 by 10
[2200, 2145, 2090, 2035, 1980, 1925, 1870, 1815, 1760, 1705]
[7700, 7545, 7390, 7235, 7080, 6925, 6770, 6615, 6460, 6305]
[13200, 12945, 12690, 12435, 12180, 11925, 11670, 11415, 11160, 10905]
[18700, 18345, 17990, 17635, 17280, 16925, 16570, 16215, 15860, 15505]
[24200, 23745, 23290, 22835, 22380, 21925, 21470, 21015, 20560, 20105]
[29700, 29145, 28590, 28035, 27480, 26925, 26370, 25815, 25260, 24705]
[35200, 34545, 33890, 33235, 32580, 31925, 31270, 30615, 29960, 29305]
[40700, 39945, 39190, 38435, 37680, 36925, 36170, 35415, 34660, 33905]
[46200, 45345, 44490, 43635, 42780, 41925, 41070, 40215, 39360, 38505]
[51700, 50745, 49790, 48835, 47880, 46925, 45970, 45015, 44060, 43105]
Dimensions of matrix_1: 10 by 10
Dimensions of matrix_2: 10 by 10
[101, 101, 101, 101, 101, 101, 101, 101, 101, 101]
[101, 101, 101, 101, 101, 101, 101, 101, 101, 101]
[101, 101, 101, 101, 101, 101, 101, 101, 101, 101]
[101, 101, 101, 101, 101,