### Chap5 Array-Based Sequences

In [1]:
import sys
import pandas as pd
import random
from time import time
import ctypes
from typing import List, TypeVar
Num = TypeVar('Num', int, float)

#### Reinforcement

#### R-5.1
Execute the experiment from Code Fragment 5.1 and compare the results
on your system to those we report in Code Fragment 5.2.

In [2]:
def test_array_1(n=27):
    data = []
    for _ in range(n):
        a = len(data)
        b = sys.getsizeof(data)
        print('Length: {0:3d}; Size in bytes: {1:4d}'.format(a, b))
        data.append(None)

In [3]:
test_array_1()

Length:   0; Size in bytes:   64
Length:   1; Size in bytes:   96
Length:   2; Size in bytes:   96
Length:   3; Size in bytes:   96
Length:   4; Size in bytes:   96
Length:   5; Size in bytes:  128
Length:   6; Size in bytes:  128
Length:   7; Size in bytes:  128
Length:   8; Size in bytes:  128
Length:   9; Size in bytes:  192
Length:  10; Size in bytes:  192
Length:  11; Size in bytes:  192
Length:  12; Size in bytes:  192
Length:  13; Size in bytes:  192
Length:  14; Size in bytes:  192
Length:  15; Size in bytes:  192
Length:  16; Size in bytes:  192
Length:  17; Size in bytes:  264
Length:  18; Size in bytes:  264
Length:  19; Size in bytes:  264
Length:  20; Size in bytes:  264
Length:  21; Size in bytes:  264
Length:  22; Size in bytes:  264
Length:  23; Size in bytes:  264
Length:  24; Size in bytes:  264
Length:  25; Size in bytes:  264
Length:  26; Size in bytes:  344


#### R-5.2
In Code Fragment 5.1, we perform an experiment to compare the length of
a Python list to its underlying memory usage. Determining the sequence
of array sizes requires a manual inspection of the output of that program.
Redesign the experiment so that the program outputs only those values of
k at which the existing capacity is exhausted. For example, on a system
consistent with the results of Code Fragment 5.2, your program should
output that the sequence of array capacities are 0, 4, 8, 16, 25, . . . .

In [4]:
def test_array_2(n=27):
    data = []
    max_size = 0
    for _ in range(n):
        a = len(data)
        b = sys.getsizeof(data)
        # 第一次打印
        if max_size == 0:
            max_size = b
        if b > max_size:
            print('Length: {0:3d}; Size in bytes: {1:4d}'.format(a-1, max_size))
            max_size = b
        data.append(None)

In [5]:
test_array_2()

Length:   0; Size in bytes:   64
Length:   4; Size in bytes:   96
Length:   8; Size in bytes:  128
Length:  16; Size in bytes:  192
Length:  25; Size in bytes:  264


#### R-5.3 
Modify the experiment from Code Fragment 5.1 in order to demonstrate
that Python’s list class occasionally shrinks the size of its underlying array
when elements are popped from a list.

In [6]:
def test_array_3(n=27):
    data = [None] * n
    for _ in range(n):
        a = len(data)
        b = sys.getsizeof(data)
        print('Length: {0:3d}; Size in bytes: {1:4d}'.format(a, b))
        data.pop()

In [7]:
test_array_3()

Length:  27; Size in bytes:  280
Length:  26; Size in bytes:  280
Length:  25; Size in bytes:  280
Length:  24; Size in bytes:  280
Length:  23; Size in bytes:  280
Length:  22; Size in bytes:  280
Length:  21; Size in bytes:  280
Length:  20; Size in bytes:  280
Length:  19; Size in bytes:  280
Length:  18; Size in bytes:  280
Length:  17; Size in bytes:  280
Length:  16; Size in bytes:  280
Length:  15; Size in bytes:  280
Length:  14; Size in bytes:  280
Length:  13; Size in bytes:  280
Length:  12; Size in bytes:  216
Length:  11; Size in bytes:  216
Length:  10; Size in bytes:  216
Length:   9; Size in bytes:  216
Length:   8; Size in bytes:  160
Length:   7; Size in bytes:  160
Length:   6; Size in bytes:  160
Length:   5; Size in bytes:  128
Length:   4; Size in bytes:  128
Length:   3; Size in bytes:  112
Length:   2; Size in bytes:  104
Length:   1; Size in bytes:   96


#### R-5.4
Our DynamicArray class, as given in Code Fragment 5.3, does not support
use of negative indices with getitem . Update that method to better
match the semantics of a Python list.

In [8]:
class DynamicArray:
    """A dynamic array class akin to a simplified Python list"""

    def __init__(self):
        """Create an empty array"""
        self._n = 0                                       # count actual elements
        self._capacity = 1                                # default array capacity
        self._A = self._make_array(self._capacity)        # low-level array
    
    def __len__(self):
        """Return number of elements stored in the array"""
        return self._n

    def __getitem__(self, k):
        """Return element at index k"""
        # 添加对负数索引的支持
        if k < 0:
            k += self._n
        # 索引查验
        if not 0 <= k <= self._n:
            raise IndexError('invalid index')
        return self._A[k]

    # 为了便于查看
    def __repr__(self):
        if self._n == 0:
            return 'Array[]'
        return 'Array[' + ', '.join(str(self._A[i]) for i in range(self._n)) + ']'
    def append(self, obj):
        """Add object to end of array"""
        if self._n == self._capacity:
            self._resize(2 * self._capacity)
        self._A[self._n] = obj
        self._n += 1

    def _resize(self, c):
        """Resize internal array to capacity c."""
        B = self._make_array(c)
        for k in range(self._n):
            B[k] = self._A[k]
        self._A = B
        self._capacity = c

    def _make_array(self, c):
        """Return new array with capacity c"""
        return (c * ctypes.py_object)()

In [9]:
arr = DynamicArray()
arr.append(0)
arr.append(1)
arr

Array[0, 1]

In [10]:
arr[0], arr[-1]

(0, 1)

#### R-5.6
Our implementation of insert for the DynamicArray class, as given in
Code Fragment 5.5, has the following inefficiency. In the case when a re-
size occurs, the resize operation takes time to copy all the elements from
an old array to a new array, and then the subsequent loop in the body of
insert shifts many of those elements. Give an improved implementation
of the insert method, so that, in the case of a resize, the elements are
shifted into their final

In [11]:
class DynamicArrayInsert(DynamicArray):
    """A dynamic array class akin to a simplified Python list"""

    def __init__(self):
        """Create an empty array"""
        super().__init__()

    def insert(self, k, value):
        """Insert value at index k, shifting subsequent value rightward"""
        if self._n == self._capacity:
            B = self._make_array(self._capacity * 2)
            for i in range(k):
                B[i] = self._A[k]
            B[k] = value
            for j in range(k+1, self._n+1):
                B[j] = self._A[j-1]
            self._A = B
            self._n += 1
            self._capacity *= 2
        else:
            for i in range(self._n, k, -1):
                self._A[i] = self._A[i-1]
            self._A[k] = value
            self._n += 1

In [12]:
arr = DynamicArrayInsert()
arr.insert(0, 1)
arr.insert(0, 0)
arr

Array[0, 1]

#### R-5.7
Let A be an array of size $n ≥ 2$ containing integers from 1 to n − 1, inclu-
sive, with exactly one repeated. Describe a fast algorithm for finding the
integer in A that is repeated.

In [13]:
def find_dup(nums):
    n = len(nums)
    return sum(nums) - n*(n-1) // 2

In [14]:
find_dup([1, 2, 3, 2])

2

#### R-5.8 
Experimentally evaluate the efficiency of the pop method of Python’s list
class when using varying indices as a parameter, as we did for insert on
page 205. Report your results akin to Table 5.5.

In [15]:
def benchmark(test_func):
    insert_df = pd.DataFrame(index=['start', 'middle', 'end'],
                        columns=['100', '1000', '10000', '100000'])
    insert_df.index.name = 'Time(microseconds)'
    for n in list(insert_df.columns):
        insert_df[n] = [test_func(int(n), mode) for mode in insert_df.index]
    return insert_df

In [16]:
# insert测试
def insert_average(n, mode='start'):
    data = []
    start = time()
    if mode == 'start':
        for _ in range(n):
            data.insert(0, None)
    elif mode == 'middle':
        for _ in range(n):
            data.insert(n//2, None)
    elif mode == 'end':
        for _ in range(n):        
            data.insert(n, None)
    end = time()
    return (end - start) * 1000000 / n

benchmark(insert_average)

Unnamed: 0_level_0,100,1000,10000,100000
Time(microseconds),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
start,0.221729,0.432253,3.013754,33.644259
middle,0.188351,0.269175,0.768781,7.560563
end,0.143051,0.116348,0.123477,0.13386


In [17]:
# pop测试
def pop_average(n, mode='start'):
    data = [None] * n
    start = time()
    if mode == 'start':
        for _ in range(n):
            data.pop(0)
    elif mode == 'middle':
        count = n
        while count > 0:
            data.pop(count // 2)
            count -= 1
    elif mode == 'end':
        for _ in range(n):
            data.pop(-1)
    end = time()
    return (end - start) * 1000000 / n

benchmark(pop_average)

Unnamed: 0_level_0,100,1000,10000,100000
Time(microseconds),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
start,0.166893,0.24271,1.876187,21.675179
middle,0.214577,0.255585,0.927758,9.610832
end,0.119209,0.114441,0.182676,0.115101


#### R-5.10
The constructor for the CaesarCipher class in Code Fragment 5.11 can
be implemented with a two-line body by building the forward and back-
ward strings using a combination of the join method and an appropriate
comprehension syntax. Give such an implementation

In [18]:
class CaeserCipher:
    def __init__(self, shift):
        self._forward  = ''.join(chr((k + shift) % 26 + ord('A')) for k in range(26))
        self._backward = ''.join(chr((k - shift) % 26 + ord('A')) for k in range(26))

#### R-5.11
Use standard control structures to compute the sum of all numbers in an
n × n data set, represented as a list of lists.

In [19]:
def sum_matrix(matrix: List[List[Num]]) -> Num:
    result = 0
    for raw in matrix:
        for num in raw:
            result += num
    return result

In [20]:
sum_matrix([[1, 2], [3, 4]])

10

#### R-5.12
Describe how the built-in sum function can be combined with Python’s
comprehension syntax to compute the sum of all numbers in an n × n data
set, represented as a list of lists

In [21]:
def sum_matrix_plus(matrix: List[List[Num]]) -> Num:
    return sum(num for raw in matrix for num in raw)

In [22]:
sum_matrix([[1, 2], [3, 4]])

10

#### Creativity

##### C-5.14
The shuffle method, supported by the random module, takes a Python
list and rearranges it so that every possible ordering is equally likely.
Implement your own version of such a function. You may rely on the
`randrange(n`) function of the random module, which returns a random
number between 0 and `n − 1` inclusive.

In [23]:
def shuffule(nums: List[Num]) -> List[Num]:
    return sorted(nums, key=lambda x: random.random())

In [24]:
l = list(range(8))
print(shuffule(l))
print(shuffule(l))
print(shuffule(l))

[1, 2, 7, 4, 0, 6, 5, 3]
[0, 1, 3, 2, 5, 7, 4, 6]
[1, 3, 4, 0, 6, 7, 2, 5]


##### C-5.16
Implement a pop method for the `DynamicArray` class, given in Code `Frag
ment 5.3`, that removes the last element of the array, and that shrinks the
capacity, N, of the array by half any time the number of elements in the
array goes below N/4.

In [25]:
class DynamicArrayInsertPop(DynamicArrayInsert):
    """
    Implement a pop method for the DynamicArray class.
    """
    def __init__(self):
        """Create an empty array"""
        super().__init__()

    def pop(self):
        element = self._A.pop()
        if self._n < self._capacity // 4:
            self._resize(self._capacity // 2)
        return element

##### C-5.21
In Section 5.4.2, we described four different ways to compose a long
string: (1) repeated concatenation, (2) appending to a temporary list and
then joining, (3) using list comprehension with join, and (4) using genera-
tor comprehension with join. Develop an experiment to test the efficiency
of all four of these approaches and report your findings.

In [26]:
document = 'Hello, World!' * 1000

In [27]:
def w1_concatenation():
    letters = ''
    for c in document:
        if c.isalpha():
            letters += c
    return letters


def w2_appending():
    temp = []
    for c in document:
        if c.isalpha():
            temp.append(c)
    letters = ''.join(temp)
    return letters


def w3_list_comp():
    letters = ''.join([c for c in document if c.isalpha()])
    return letters


def w4_generator():
    letters = ''.join(c for c in document if c.isalpha())
    return letters

In [28]:
%timeit w1_concatenation()

1.72 ms ± 51.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [31]:
%timeit w2_appending()

1.51 ms ± 53.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [32]:
%timeit w3_list_comp()

1.08 ms ± 44.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [33]:
%timeit w4_generator()

1.39 ms ± 15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


可以看出`list comprehension`是最快的，`generator`次之， 其他两项较慢。

##### C-5.31
Describe a way to use recursion to add all the numbers in an n × n data
set, represented as a list of lists.

In [34]:
def binary_sum(S: List[List[float]], start: int, stop: int) -> float:
    if start >= stop:
        return 0
    if start == stop - 1:
        return sum(S[start])
    else:
        mid = (start + stop) // 2
        return binary_sum(S, start, mid) + binary_sum(S, mid, stop)

In [36]:
binary_sum([[1, 2], [3, 4]], 0, 2)

10