## 5.1 Python's Sequence Types 

- In this chapter, we explore Python's various "sequence" classes, namely the built-in list, tuple, and str classes. There is significant commonality between these classes, most notably: each supports indexing to access an individual element of sequence, using syntax such as seq[k], and each uses a low-level comcept knwon as an array to represent the sequence. However, there are significant differences in the abstractions that these classes represent, and in the way that instances of these classes are represented internally by Python. Because these classes are used so widely in PYthon programs, and because they will become building blocks upon which we will develop more complex data structures, it is imperative that we establish a clear understanding of both the public behavior and inner workings of these classes.

** Public Behaviors **

- A proper understanding of  the outward semantics for a class is a necessity for a good programmer. While the basic usage of lists, strings, and tuples may seem straightforward, there are several important subtleties regarding the behaviors associatedd with these classes (such as what it means to make a copy of a sequence, or to take a slice of a sequence). Having a misunderstanding of a behavior can easily lead to inadvertent bugs in a program. Therefore, we establish an accurate mental model for each of these classes. These images will help when exploring more advanced usage, such as representing a multidimensional data set as a list of lists.

** Implementation Details **

- A focus on the internal implementations of these classes seems to go against our stated principles of object-oriented programming. In Section 2.1.2, we emphasized the principle of encapsulation, noting that the user of a class need not know about the internal details of the implementations. While it is true that one only needs to understand the syntax and semantics of class's public interface in order to be able to write legal and corrct code that uses instances of the class, the efficiency of a program depends greatly on the efficiency of the components which it relies.

** Asymptotic and Experimental Analyses **

- In describing the efficiency of various operations for Python's sequence classes, we will rely on the formal asymptotic analysis notations established on Chapter 3. We will also perform experimental analyses of the primary operations to provide empirical evidence that is consistent with the more theoretical asymptotic analyses.

## 5.2 Low-Level Arrays

- To accurately describe the way in which Python represents the sequence types, we must first discuss aspects of the low-level computer architecture. The primary memory of a computer is composed of bits of information, and those bits are typically grouped into larger units that depend upon the precise system architecture. Such a typical unit is a byte which is equivalent to 8 bits.

- A computer system will have a huge number of bytes of memory, and to keep track of what information is stored in what byte, the computer uses an abstraction known as a memory address. In effect, each byte of memory is associated with a unique number that serves as its address(more formally, the binary representation of the number serves as the address). In this way, the computer system can refer to the data in "byte #2150" versus the data in "byte #2157", for example. Memory addresses are typically coordinated with the physical layout of the memory system, and so we often portray the numbers in sequential fashion. Figure 5.1 provides such a diagram, with the designated memory address for each byte.

- Despite the sequential nature of the numbering system, computer hardware is designed, in theory, so that any byte of the main memory can be efficiently accessed based upon its memory address. Im this sense, we say that computer's main memory performs as random access memory(RAM). That is, it is just as easy to retrieve byte #8675309 as it is to retrieve byte #309.(In practice, there are complicating factors including the use of caches and external memory; we address some of those issues in Chapter 15.) Using the notation for asymptotic analysis, we say that any individual byte of memory can be stored or rerieved in O(1) time.

- In general, a programming language keeps track of the association between an identifier and the memory address in which the associated value is stored. For example, identifier x might be associated with one value stored in memory, while y is associated with another value stoed in memory. A common programming task is to keep track of a sequence of related objects. For example, we may want a video game to keep track of the top ten scores for that game. Rather than use ten different variables for this task, we would prefer to use a single name for the group and use index numbers to refer to the high scores in that group.

- A group of related variables can be stored one after another in contiguous portion of the computer's memory. We will denote such a representation as an array. As a tangible example, a text string is stored as an oredered sequence of individual characters. In Python, each character is represented using the Uniode character set, and on most computing systems, Python internally represents each Unicode character with 16 bits(i.e., 2 bytes). Therefore, a six-character string, such as "SAMPLE", would be stored in 12 consecutive bytes of memory, as diagrammed in Figure 5.2.

- We describe this as an array of six characters, even though it requires 12 bytes of memory. We will refer to each location within an array as a cell, and will use an integer index to describe its location within the array, with cells numbered starting with 0, 1, 2, and so on. For example, in Figure 5.2, the cell of the array with index 4 has contents L and is stored in bytes 2154 and 2155 of memory.

- Each cell of an array must use the same number of bytes. This requirement is what allows an arbitrary cell of the array to be accessed in constant time based on its index. In particular, if one knows the memory address at which an array starts (e.g., 2146 in Figure 5.2), the number of bytes per element (e.g. 2 for  a Unicode character), and a desired index within the array, the appropriate memory address can be computed using the calculation, start + cellsize \* index.  By this formula, the cell at index 0 begins precisely at the start of the array, the cell at index 1 begins precisely cellsize bytes beyond the start of the array, and so on. As on example, cell 4 of Figure 5.2 begins at memory location 2146 + 2 \* 4 = 2146 + 8 = 2154.

- Of course, the arithmetic for calculating memory addresses within an array can be handled automatically. Therefore, a programmer can envision a more typical high-level abstraction of an array of characters as diagrammed in Figure 5.3.

### 5.2.1 Referential Arrays

- AS another motivating example, assume that we want a medical information system to keep track of the patients currently assigned to beds in a certain hospital. If we assume that the hospital has 200 beds, and converniently that those beds are numbered from 0 to 199, we mgiht consider using an array-based structure to maintain the names of the patients currently assigned to those beds. For example, in Python we might use a list of names, such as:

['Rene', 'Joseph', 'Janet', 'Jonas', 'Helen', 'Virginia', ///]

- To represent such a list with an array, Python must adhere to the requirement that each cell of the array use the same number of bytes. Yet the elements are strings, and strings naturally have different lengths. Python could attempt to reserve enough space for each cell to hold the maximum length string (not just of currently stored strings, but of any string we might ever want to store), but that would be wasteful.

- Instead, Python represents a list or tuple instance using an internal storage mechanism of an array of object references. At the lowest level, what is stored is a consecutive sequence of memory addresses at which the elements of the sequence reside. A high-level diagram of such a list is shown in Figure 5.4.

- Although the relative size of the individual elements may vary, the number of its used to store the memoty address of each element is fixed (e.g., 64-bits per address). In this way, Python can support contant-time access to a list or tuple element based on its index.

- In Figure 5.4, we characterize a list of strings that are the names of the patients in a hospital. Is is more likely that a medical information system would manage more comprehensive information on each patient, perhaps represented as an instance of a Patient class. From the perspective of the list implementation, the same principle applies: The list will simply kepp a sequence of references to those objects. Note as well that a refenrence to the None object can be used as an element of the list to represent an empty bed in the hospital.

- The fact that list and tuples are referential structures is significant to the semantics of these classes. A single list instance may include multiple references to the same object as elements of the list, and it is possible for a single object to be an element of two or more lists, as those lists simply store references back to that object. As an example, when you compute a slice of a list, the result is a new list instance, but that new list has references to the same elements that are in the original list, as portrayed in Figure 5.5.

- When the elements of the list are immutable objects, as with the integer instances in Figure 5.5, the fact that the two lists share elements is not that significant, as neither of the lists can cause a change to the shared object. If, for example, the commnad temp[2] = 15 were executed from this configuration, that does not change the existing integer object; it changes the reference in cell 2 of the temp list to reference a different object. The resulting configuration is shown in Figure 5.6.

- The same semantics is demonstrated when making a new list as a copy of an existing one, with a syntax such as backup = list(primes). This produces a new list that is a shallow copy, in that it references the same elements as in the first list. With immutable elements, this point is moot. If the contents of the list were of a mutable type, a deep copy, meaning a new list with new elements, can be produced by using the deppcopy function from the copy module.

- AS a more striking example, it is a common practice in Python to initialize an array of integers using a syntax such as counters = [0] \* 8. This syntax produces a list of length eight, with all eight elements being the value zero. Technically, all eight cells of the list reference the same object, as portrayed in Figure 5.7.

- At first glance, the extreme level of aliasing in this configuration may seem alarming. However, we rely on the fact that the referenced integer is immutable. Even a commnad such as counters[2] += 1 does not technically change the value of the existing integer insatance. This computes a new integer, with value 0 + , and sets cell 2 to reference the newly computed value. The resulting configuration is shown in Figure 5.8.

![main](no_new_elements.png "main")

- AS a final manifestation of the referential nature of lists, we note that the extend commnad is used to add all elements from one list to the end of another list. The extended list does not receive copies of those elements, is receives references to those elements. Figure 5.8 portrays the effect of a call to extend.

### 5.2.2 Compact Array in Python

- In the introduction to this section, we emphasized that strings are represented using an array of characters (not an array of references). We will refer to this more direct representation as a compact array because the array is storing the bits that represent the primary data (characters, in the case of strings). 

- Compact arrays have several advantages over referential structures in terms of computing performance. Most significantly, the overall memory usage will be much lower for a compact structure because there is no overhead devoted to the explicit storage of the sequence of memory references (in addition to the primary data). That is, a referential structure will typically use 64-bits for the memory address stored in the array, on top of whatever number of bits are used to represent the object that is considered the element. Also, each Unicode character stored in a compact array within a string typicallt requires 2 bytes. If each character were stored independently as a one-character string, there would be significantly more bytes used.

- As another case study, suppose we wish to store a sequence of one million, 64-bit integers. In theory, we might hope to use only 64 million bits. However, we estimate that a Python list will use four to five times as much memory. Each element of the list will result in a 64-bit memory address being stored in the primary array, and an int instance being stored elsewhere in memory. Python allows you to query the actual number of bytes being used for the primary storage of any object. This is done using the getsizeof function of the sys module. On our system, the size of a typical int object requires 14 bytes of memory (well beyond the 4 bytes needed for representing the actual 64-bit number). In all, the list will be using 18 bytes per entry, rather than the 4 bytes that a compact list of integers would require.


- Another important advantage to a compacy structure for high-performance computing is that the primary data are stored consecutively in memory. Note well that this is not the case for a referential structure. That is, even though a list maintains careful oerdring of the sequence of memory addresses, where those elements reside in memory is not determined by the list. Because of the working of the cache and memory hierarchies of computers, it is often acvantageous to have data stored in memory near other data that might be used in the same computations. 

- Despite the apparent inefficiencies of referential structures, we will generally be content with the convenience of Python's list and tuples in this book. The only place in which we consider alternatives will be in Chapter 15, which focuses on the impact of memory usage on data structures and algorithms. Python provides several means for creating compact arrays of various types.

- Primary support for compact arrays is in a module named array. That module defines a class, also named array, providing compact storage for arrays of primitive data types. A portrayal of such an array of integers is shown Figure 5.10.

- The public interface for the array class conforms mostly to that of a Python list. However, the constructor for the array class requires a type code as a first parameter, which is a character that designate the type of data that will be stored in the array. As a tangible example, the type code, 'i', designates an array of (signed) integers, typically represented using at least 16-bits eahc. We can declare the array shown in Figure 5.10 as,

In [2]:
primes = array('i', [2,3,5,7,11,13,17,19])

NameError: name 'array' is not defined

- The type code allows the interpreter to determine precisely how many bits are needed per element of the array. The type codes suppoted by the array module, as shown in Table 5.1, are formally based upon the native data types used by the C programming language (the language in which the most widely used distribution of Python is implemented). The precise number of bits for the C data types is system-dependent, but typical ranges are shown in the table.

- The array module does not provide support for making compact arrays of user-defined data types. Compact arrays of such structures can be created with the lower-level support of a module named ctypes.

## 5.3 Dynamic Arrays and Amortization


- When creating a low-level array in a computer system, the precise size of that array must be explicitly declared in order for the system to properly allocate a consecutive piece of memory for tis storage. For example, Figure 5.11 displays an array of 12 bytes that might be stored in memory locations 2146 through 2157.

- Because the system might dedicate neighboring memory locations to store other data, the capacity of an array cannot trivially be incresed by expanding into subsequent cells. In the context of representing a Python tuple or str instance, this constraint is no problem. Instances of those classes are immutable, so the correct size for an underlying array can be fixed when the object is instantiated. 

- Python's list class presents a more interesting abstraction. Although a list has a particular length when constructed, the class allows us to add elements to the list, with no apparent limit on the overall capacity of the list. To provide this abstraction, Python relies on an algorithmic sleight of hand known as a dynamic array.

- The first key to providing the semantics of a dynamic array is that a list instance maintains an underlying array that often has greater capacity than the current length of the list. For example, while a user may have created a list with five elements, the system may have reserved an underlying array capable of storing eight object references (rather than only five). This extra capacity makes it easy to append a new element to the list by using the next available cell of the array.

- If a user continues to append elements to a list, any reserved capacity will eventually be exhausted. In that case, the class requests a new, larger array from the system, and initializes the new array so that its prefix matches that of the existing smaller array. At that point in time, the old array is no longer needed, so it is reclaimed by the system. Intuitively, this strategy is much like that of the hermit crab, which moves into a larger shell when it outgrows its previous one.

- We give empirical evidence that Python's list class is based upon such a strategy. The source code for our experiment is displayed in Code Fragment 5.1, and a sample output of that program is given in Code Fragment 5.2. We rely on a function named getsizeof that is available from the sys module. This function reports the number of bytes that are being used to store an object in Python. For a list, it reports the number of bytes devoted to the array and other instance variables of the list, but not any space devoted to elements referenced by the list.

In [4]:
import sys
data = []
for k in range(26):
    a = len(data)
    b = sys.getsizeof(data)
    print('Length: {0:3d}; Size in bytes: {1:4d}'.format(a, b))
    data.append(None)

Length:   0; Size in bytes:   64
Length:   1; Size in bytes:   96
Length:   2; Size in bytes:   96
Length:   3; Size in bytes:   96
Length:   4; Size in bytes:   96
Length:   5; Size in bytes:  128
Length:   6; Size in bytes:  128
Length:   7; Size in bytes:  128
Length:   8; Size in bytes:  128
Length:   9; Size in bytes:  192
Length:  10; Size in bytes:  192
Length:  11; Size in bytes:  192
Length:  12; Size in bytes:  192
Length:  13; Size in bytes:  192
Length:  14; Size in bytes:  192
Length:  15; Size in bytes:  192
Length:  16; Size in bytes:  192
Length:  17; Size in bytes:  264
Length:  18; Size in bytes:  264
Length:  19; Size in bytes:  264
Length:  20; Size in bytes:  264
Length:  21; Size in bytes:  264
Length:  22; Size in bytes:  264
Length:  23; Size in bytes:  264
Length:  24; Size in bytes:  264
Length:  25; Size in bytes:  264


- In evaluating the results of the experiment, we draw attention to the first line of output from Code Fragment 5.2. We see that an empty list instance already requires a certain number of bytes of memory (72 on our system). In fact, each object in Python maintains some state, for example, a reference to denote the class to which it belongs. Although we cannot directly access private instance variable for a list, we can speculate that in some form it maintains state informantion akin to:

- As soon as the first element is inserted into the list, we detect a change in the underlying size of the structure. In particular, we see the number of bytes jump from 72 to 104, an increase of exactly 32 bytes. Our experiment was run on a 64-bit machine architecture, meaning that each memory address is a 64-bit number(i.e., 8 bytes). We speculate that the increase of 32 bytes reflects the allocation of an underlying array capable of storing four object references. This hypothesis is consistent with the fact that we do not see any underlying change in the memory usage after inserting the second, third, or fourth element into the list.

- After the fifth element has been added to the list, we see the memory usage jump from 104 bytes to 136 bytes. If we assume the original base usage of 72 bytes for the list, the total of 136 suggests an additional 64 = 8 x 8 bytes that provide capacity for up to eight object references. Again, this is consistent with the experiment, as the memory usage does not increase again until the ninth insertion. At that point, the 200 bytes can be viewed as the original 72 plus an additional 128-byte array to store 16 object references. The 17th insertion pushes the overall memory usage to 272 = 72 + 200 = 72 + 25 x 8, hence enough to store up to 25 element references. 

- Because a list is a referential structure, the result of getsizeof for a list instance only includes the size for representing its primary structurel it does not account for memory used by the objects that are elements of the list. In our experiment, we repeatedly append None to the list, because we do not care about the contents, but we could append any type of object without affecting the number of bytes reported by getsizeof(data).

- If we were to continue such an experiment for further iterations, we might try to discern the pattern for how large of an array Python creates each time the capacity of the previous array is exhausted. Before exploring the precise sequence of capacities used by Python, we continue in this section by describing a general approach for implementing dynamic arrays and for performing an asymptotic analysis of their performance.

### 5.3.1 Implementing a Dynamic Array

- Although the Python list class provides a highly optimized implementation of dynamic arrays, upon which we rely for the remainder of this book, it is instructive to see how such a class might be impmlemented.

- The key is to provide means to grow the array A that stores the elements of a list. Of course, we cannot actually grow that array, as its capacity is fixed. If an element is appeneded to a list at a time when the underlying array is full, we perform the following steps:

1. Allocate a new array B with larger capacity.
2. Set B[i] = A[i] for i = 0, ..., n - 1 where n denotes current nubmer of items.
3. Set A = B, that is, we henceforth use B as the array supporting the list.
4. Insert the new element in the new array.

- An illustration of this process is shown in Figure 5.12.

- The remaining issue to consider is how large of a new array to create. A commonly used rule is for the new array to have twice the capacity of the existing array that has been filled. In Section 5.3.2, we will provide a mathematical analysis to justufy such a choice.

- In Code Fragment 5.3, we offer a concrete implementation of dynamic arrays in Python. Our DynamicArray class is degined using ideas described in this section. While consistent with the interface of a Python list class, we provide only limited functionality in the form of an append method, and accessors \_\_len\_\_ and \_\_getitem\_\_. Support for creating low-level arrays is provided by a module named ctypes. Because we will not typically use such a low-level structure in the remainder of this book, we omit a detailed explanation of  the ctypes module. Instead, we wrap the necessary commnad for declaring the raw array within a private utility method \_make\_array. The hallmark expansion procedure is performed in our nonpublic \_resize method.

In [5]:
import ctypes

class DynamicArray:
    """A dynamic array class akin to a simplified Python list."""
    
    def __init__(self):
        """Create an empty array."""
        self._n = 0
        self._capacity = 1
        self._A = self._make_array(self._capacity)
        
    def __len__(self):
        """Return number of elements stored in the array."""
        return self._n
    
    def __getitem__(self, k):
        """Return element at index k."""
        if not 0 <= k < self._n:
            raise IndexError('invalid index')
        return self._A[k]
    
    def append(self, obj):
        """Add object to end of the array."""
        if self._n == self._capacity:
            self._resize(2 * self._capacity)
        self._A[self._n] = obj
        self._n += 1
        
    def _resize(self, c):
        """Resize internal array to capacity c."""
        B = self._make_array(c)
        for k in range(self._n):
            B[k] = self._A[k]
        self._A = B
        self._capacity = c
        
    def _make_array(self, c):
        """Return new array with capacity c."""
        return (c * ctypes.py_object)()

### 5.3.2 Amortized Analysis of Dynamic Arrays

- In this section, we perform a detailed analysis of the running time of operations on dynamic arrays. We use the big-Omega notation introduced in Section 3.3.1 to give an asymptotic lower bound on the running time of an algorithm or step within it.
- The strategy of replacing an array with a new, larger array might at first seem slow, because a single append operation may require omega(n) time to perform, where n is the current number of elements in the array. However, notice that by doubling the capacity during an array replacement, our new array allows us to add n new elements before the array must be replaced again. In this way, there are many simple append operations for each expensive one. This fact allows us to show that performing a series of operations on an initially empty dynamic array is efficient in terms of its total running time.

- Using an algorithmic design pattern called amortization, we can show that performing a sequence of such append operations on a dynamic array is actually quite efficient. To perform an amortized analysis, we use an accounting technique where we view the computer as a coin-operated appliance that requires the payment of one cyber-dollar for a contant amount of computing time. When an operation is executed, we should have enough cyber-dollars available in our current "bank account" to pay for that operation's running time. Thus, the total amount of cyber-dollars spent for any computation will be proportional to the total time spent on that computation. The beauty of using this analysis method is that we can overcharge some operations in order to save up cyber-dollars to pay for others.

- Justification: Let us assume that one cyber-dollar is enough to pay for the execution of each append operation in S, exluding the time spent for growing the array. Also, let us assume that growing the array from size k to size 2k requires k cyber-dollars for the time spent initializing the new array. We shall charge each append operation three cyber-dollars. Thus, we overcharge each append operation that does not cause an overflow by two cyber-dollars. Think of the two cyber-dollars profited in an insertion that does not grow the array as being "stored" with the cell in which the element was inserted. An overflow occurs when the array S has 2^i elements, for some integer i >= 0, and the size of the array used by the array representing S is 2^i. Thus, doubling the size of the array will require 2^i cyber-dollars. Fortunately, these cyber-dollars can be found stored in cells 2^i-1 through 2^i - 1. Note that the previous overflow occurred when the number of elements becaome larger than 2^i-1 for the first time, and thus cyber-dollars stored in cells 2^i-1 through 2^i - 1 have not yet been spent. Therefore, we have a valid amoritzation scheme in which each operation is charged three cyber-dollars and all the computing time is paid for. That is, we can pay for the execution of n append operations using 3n cyber-dollars. In other words, the amortized running time of each append operations is  O(1); hence, the total running time of n append operations is O(n).

** Geometric Increase in Capacity **

- Although the proof of Proposition 5.1 relies on the array being doubled each time we expand, the O(1) amortized bound per operation can be proven for any geometrically increasing progression of array sizes. When choosing the geometric base, there exists a trade-off between run-time efficiency and memory usage. With a base of 2, if the last insertion causes a resize event, the array essentially ends up twice as large as it needs to be. If we instead increase the  array by only 25% of its current size, we do not risk wasting as much memory in the end, but there will be more intermediate resize events along the way. Still it isi possible to prove an O(1) amortized bound, using a constant factor greater than the 3 cyber-dollars per operation used in the proof of Proposition 5.1. The key to the performance is that the amount of additional space is proportional to the current size of the array.

** Beware of Arithmetic Progression **

- To avoid reserving too much space at once, it might be tempting to implement a dynamic array with a strategy in which a constant number of addtional cells are reserved each time an array is resized. Unforuunately, the overall performance of such a strategy is significantly worse. At an extreme, an increase of only one cell causes each append operation to resize the array, leading to a familiar 1 + 2 + 3 + ... + n summation and omega(n^2) overall cost. Using increases of 2 or 3 at a time is slightly better, as pertrayed in Figure 5.13, but the overall cost remains quadratic.

- Using a fixed increment for each resize, and thus as arithmetic progression of intermediate array sizes, results in an overall time that is quadratic in the number of operations, as shown in the following proposition. Intuitively, even an increase in 1000 cells per resize will become insignificant for large data sets. 

**Justification:** Let c> 0 represent the fixed increment in capacity that is used for each resize event. During the series of n append operations, time will have been spent initializing arrays of size c, 2c, 3c, ..., mc for m = [n/c], and therefore, the overall time would be proportional to c + 2c + 3c + ... + mc. By Proposition 3.3, this sum is 

- Therefore , performing the n append operations takes omega(n^2) time.

- A lesson to be learned from Propositions 5.1 and 5.2 is that a subtle difference in an algorithm design can produce drastic difference in the asymptotic performance, and that a careful analysis can provide important insights into the design of a data structure.

** Memory Usage and Shrinking an Array**

- Another cosequence of the rule of a geometric increase in capacity when appending to a dynamic array is that the final array size is guaranteed to be proportional to the overall number of elements. That is, the data structure uses O(n) memory. This is a very desirable property for a data structure.

- If a container, such as a Python list, provides operations that cause the removal of one or more elements, greater care must be taken to ensure that a dynamic array guarantees O(n) memory usage. The risk is that repeated insertions may cause the underlying array to grow arbitrarily large, and that there will no longer be a proportional relationship between the actual number of elements, and the array capacity after many elements are removed.


- A robust implementation of such a data structure will shrink the underlying array, on occasion, while maintaining the O(1) amortized bound on individual operations. However, care must be taken to ensure that the structure cannot rapidly oscillate between growing and shrinking the underlying array, in which case the amortized bound would not be achieved. In Exercise C-5.16, we explore a strategy in which the array capacity is halved whenever the number of actula element falls below one fourth of that capacity, thereby guaranteeing that the array capacity is at most four times the number of elementsl we explore the amortized analysis of such a strategy in Exercises C-5.17 and C-5.18.

### 5.3.3 Python's List Class

- The experiments of Code Fragment 5.1 and 5.2, at the beginning of Section 5.3, provide empirical evidence that Python's list class is using a form of dynamic arrays for its storage. Yet, a careful examination of the intermediate array capacities suggests that Python is not using a pure geometric porgression, nor is it using an arithmetic progression. 

- With that said, it is clear that Python's implementation of the append method exhibits amortized constant time behavior. We can demonstrate this fact experimentally. A single append operation typically executes so quickly that it would be difficult for us to accurately measure the time elapsed at that granularity, although we should notice some of the mroe expensive operations in which a resize is performed. We can get a more accurate measure of the amortized cost per operation by performing a series of n append operations on an initially empty list and detemining the average cost of each. A function to perform that experiment is given in Code Fragment 5.4.

In [1]:
from time import time
def compute_average(n):
    """Perform n appends to an empty list and return average time elapsed."""
    data = []
    start = time()
    for k in ragne(n):
        data.append(None)
    end = time()
    return (end - start) / n 

- Technically ,the time elapsed between the start and end includes the time to manage the iteration of the for loop, in addtion to the append calls. The empirical results of the experiment, for increasingly large values of n, are shown in Table 5.2. We see higher average cost for the smaller data sets, perhaps in part due to the overhead of the loop range. There is also natural vairance in measuring the amortized cost in this way, because of the impact of the final resize event relative to n. Taken as a while, there seems clear evidence that the amortized time for each append is independent of n.

## 5.4 Efficienct of python's Sequence Types

### 5.4.1 Python's List and Tuple Classes

- The nonmutating behaviors of the list class are precisely those that are supported by the tuple class. We note that tuples are typically more memory efficient than lists because they are immutable; therefore, there is no need for an underlying dynamic array with surplus capacity. We summarize the asymptotic efficiency of the nonmutating hehaviors of the list and tuple classes in Table 5.3. An explanation of this analysis follows.

** Constant-Time Operations **

- The length of an instance is returned in constant time because an instance explicitly maintains such state information. The constant-time efficiency of syntax data[j] is assured by the underlying access into an array.

** Searching for Occurrences of a Value **

- Each of the count, index, and \_\_contains\_\_ methos process through iteration of the sequence from left to right. In fact, Code Fragment 2.14 of Section 2.4.3 demonstrates how those behaviors might be implemented. Notably, the loop for computing the count must proceed through the entire sequence, while the loops for checking containment of an element or determining the index of an element immediately exit once they find the leftmost occurrence of the desired value, if one exists. So while count always examines the n elements of the sequence, index and \_\_contains\_\_ examine n elements in the worst case, but may be faster. Empirical evidence can be found by setting data = list(range(10000000)) and then comparing the relative efficiency of the test, 5 in data, relative to the test, 999999 in data, or even the failed test, -5 in data.

** Lexicographic Comparisons **

- Comparisonss between two sequences are defined lexicogrphically. In the worst case, evaluating such a condition requires an iteration taking time proportional to the length of the shorter of the two sequences (because when one sequence ends, the lexicographic result can be determined). However, in some cases the result of the test can be evaluated more efficiently. For example, if evaluating [7, 3, ...] < [7, 5, ...], it is clear that the result is True without examining the remainders of those lists, because the second element of the left operand is strictly less than the second element of the right operand.

**Creating New Instances **

- The final three behaviors in Table 5.3 are those that construct a new instance  based on one or more existing instances. In all cases, the running time depends on the construction and initialization of the new result, and therefore the asymptotic behavior is proportional to the length of the result. Therefore, we find that slice data[6000000:6000008] can be constructed almost immediately because it has only eight elements, while slice data[6000000:7000000] has one miilion elements, and thus is more time-consuming to create.

** Mutating Behaviors **

- The efficiency of the mutating behaviors of the list class are described in Table 5.3. The simplest of those behaviors has syntax data[j] = val, and is supported by the special \_\_setitem\_\_ method. This operation has worst-case O(1) running time because it simply replaces one element of a list with a new value. No other elements are affected and the size of the underlying array does not change. The more interesting behaviors to analyze are those that add or remove elements from the list. 

** Adding elements to a list **

- In Section 5.3 we fully explored the append method. In the worst case, it requires omega time because the underlying array is resized, but it uses O(1) time in the amortized sense. Lists also support a method, with signature insert(k, value), that inserts a given value into the list at index 0 <= k <= n while shifting all subsequent elements back one slot to make room. For the purpose of illustration, Code Fragment 5.5 provides an implementation of that method, in the context of our DynamicArray class introduced in Code Fragment 5.3. There are two complicating factors in analyzing the efficiency of such  an operation. First, we note that the addition of one element may require a resizing of the dynamic array. That portion of the work requires omega worst-case time but only O(1) amortized time, as per append. The other expense for insert is the shifting of elements to make room for the new item. The time for 

In [1]:
def insert(self, k, value):
    """Insert value at index k, shifting subsequent values rightward."""
    # fot simplicity, we assume 0 <= k <= n in this version
    if self._n == self._capacity:
        self_resize(2 * self._capacity)
    for j in range(self._n, k, -l):
        self._A[j] = self._A[j-1]
    self._A[k] = value
    self._n += 1

- that process depends upon the index of the new element, and thus the number of other elements that must be shifted. That loop copies the reference that had been at index n - 1 to index n, then the reference that had been at index n - 2 to n - 1, continuing until copying the reference that had been at index k to k + 1, as illustrated in Figure 5.16. Overall this leads to an amortized O(n - k + 1) performance for inserting at index k.

- When exploring the efficiency of Python's append method in Section 5.3.3, we performed an experiment that measured the average cost of repeated calls on varying sizes of lists (see Code Fragment 5.4 and Table 5.2). We have repeated that experiment with the insert method, trying three different access patterns:

> In the first case, we repeatedly insert at the beginning of a list, 


In [None]:
for n in range(N):
    data.insert(0, None)

> In a second case, we repeatedly insert near the middle of a list,

In [None]:
for n in range(N):
    data.insert(n // 2, None)


> In a third case, we repeatedly insert at the end of the list, 

In [None]:
for n in range(N):
    data.insert(n, None)

- The results of our experiment are given in Table 5.5, reporting the average time per operation (not the total time for the entire loop). As expected, we see that inserting at the beginning of a list is most expensice, requiring linear time per operation. Inserting at the middle requires about half the time as inserting at the beginning, yet is still omega time. Inserting at the end displays O(1) behaviors, akin to append.

** Removing Elements from a list **

- Python's list class offers several ways to remove an eleement from a list. A call to pop() removes the last element from a list. This is most efficient, because all other elements remain in their original location. This is effectively an O(1) operation, but the bound is amortized because Python will ocaasionally shrink the underlying dynamic array to conserve memory.

- The parameterized version, pop(k), removes the element that is at index k < n of a list, shifting all subsequent elements leftward to fill the gap that results from the removal. The efficiency of this operation is O(n - k), as the amount of shifting depends upon the choice of index k, as illustrated in Figure 5.17. Note well that this implies that pop(0) is the most expensive call, using omega time. 

- The list class offers another method, named remove, that allows the caller to specify the value that should be removed (not the index at which it resides). Formally, it removes only the first occurrence of such a value from a list, or raises a ValueError if no such value is found. An implementation of such behavior is given in Code Fragment 5.6, again using our DynamicArray calss for illustration.

- Interestingly, there is no "efficient" case for remove; every call requires omega time. One part of the process searches from the beginning until finding the value at index k, while the rest iterates from k to the end in order to shift elements leftward. This linear behaviors can be observed experimentally.

In [2]:
def remove(self, value):
    """Remove first occurence of value (or raise ValueError)."""
    # note: we do not consider shrinking the dynamic array in this version
    for k in range(self._n):
        if self._A[k] == value:
            for j in range(k, self._n - 1):
                self._A[j] = self._A[j+1]
            self._A[self._n - 1] = None
            self._n -= 1
            return 
    raise ValueError('value not found')

** Extending List **

- Python provides a method named extend that is used to add all elements of one list to the end of a second list. In effect a call to data.extend(other) produces the same outcome as the code,

In [3]:
for element in other:
    data.append(element)

NameError: name 'other' is not defined

- In either case, the running time is proporional to the length of the other list, and amortized because the underlying array for the first list may be resized to accomodate the addtional elements. 

- In practice, the extend method is preferable to repeated calls to append because the constant factors hidden in the asymptotic analysis are significantly smaller. The greater efficiency of extend is threefold. First, there is always some advantage to using an appropriate Python method, because those methods are often implemented natively in a compiled language (rather than as interpreted Python code). Second, there is less overhead to a single function call that accomplishes all the work, versus many individual function calls. Finally, increased efficiency of extend comes from the fact that the resulting size of the updated list can be calcalated in advance. If the second data set is quite large, there is some risk that the underlying dynamic array might be resized multiple times when using repeated calls to append. With a single call to extend, at most one resize operation will be performed. Execise C-5.22 explores the relative efficiency of these two approaches experimentally.

** Constructing New Lists**

- There are several syntaxes for constructing new lists. In almost all cases, the asymptotic efficiency of the behavior is linear in the length of the list that is created. However, as with the case in the preceding discussion of extend, there are significant differences in the practical efficiency.

- Section 1.9.2 introduces the topic of list comprehension, using an example such as squares = [k*k for k in range(1, n+1)] as a shorthand for 

In [None]:
squares = []
for k in range(1, n+1):
    squares.append(k*)

- Experiments should show that the list comprehension syntax is significantly faster than building the list by repeatedly appending.

- Similarly, it is a common Python idiom to initialize a list of constant values using the multiplication operator, as in [0] * n to produce a list of length n with all values equal to zero. Not only is this succinct for the programmer; it is more efficient tahn building such a list incrementally.

### 5.4.2 Python's String Class


- Strings are very important in Python. We introduced their use in Chapter 1, with a discussion of various operator syntaxes in Section 1.3. A comprehensive summary of the named methods of the class is given in Tables A.1 through A.4 of Appendix A. We will not formally analyze the efficiency of each of those behaviors in this section, but we do wish to comment on some notable issues. In general, we let n denote the length of a string. For operations that rely on a second string as a pattern, we let m denote the length of that pattern string.

- The 