# Arrays

An **Array** is simply <u>a contiguous block of memory</u>, where each cell in the block holds a value directly or _reference_ to an object. 

The block is often visualized as a numbered sequence of cells, where the number denotes the position of a cell within the sequence.

```{figure} https://i.ibb.co/FnWvV70/array3.png
---
name: array3
width: 40%
align: center
---
A higher-level abstraction for the string portrayed in figure before.
```

Array serve as low-level data structures used internally by Python to implement higher-level data structures such as `lists`, `tuples`, and `strings`. To accurately describe the way in which Python represents the sequence types, we must first discuss aspects of the low-level computer architecture. 


The primary memory of a computer is composed of bits of information, and those bits are typically grouped into larger units that depend upon the precise system architecture. Such a typical unit is a **byte**, which is equivalent to 8 bits.

Each byte in the computer’s memory has a unique **memory address**, which is a number that identifies the byte’s location in memory. In this way, the computer system can refer to the data in “byte #2150” versus the data in “byte #2157,” for example.


Memory addresses are typically coordinated with the physical layout of the memory system, and so we often portray the numbers in sequential fashion. Figure below provides such a diagram, with the designated memory address for each byte.


```{figure} https://i.ibb.co/DLyLmTw/array1.png
---
name: array
width: 90%
align: center
---
A representation of a portion of a computer’s memory, with individual bytes labeled with consecutive memory addresses.
```


<!-- <center><img src="" width="90%"></center> -->

Despite the sequential nature of the numbering system, computer hardware is designed, in theory, so that any byte of the main memory can be efficiently accessed based upon its memory address. In this sense, we say that a computer’s main memory performs as **Random Access Memory (RAM)**. 

That is, in theory, it is just as easy to retrieve byte #8675309 as it is to retrieve byte #309. Using the notation for asymptotic analysis, we say that any individual byte of memory can be stored or retrieved in $O(1)$ time.

In general, a programming language keeps track of the association between an identifier and the memory address in which the associated value is stored. For example, identifier `x` might be associated with one value stored in memory, while `y` is associated with another value stored in memory. 


A group of related values can be stored one after another in a contiguous portion of the computer’s memory i.e. an **array**. For example, a text string is stored as an ordered sequence of individual characters. In Python, each character is represented using the Unicode character set, and on most computing systems, Python internally represents each Unicode character with 16 bits (i.e., 2 bytes). Therefore, a six-character string, such as  `SAMPLE` , would be stored in 12 consecutive bytes of memory, as diagrammed in Figure below.

```{figure} https://i.ibb.co/fQGDLsG/array2.png
---
name: array2
width: 90%
align: center
---
A Python string embedded as an array of characters in the computer’s memory. We assume that each Unicode character of the string requires two bytes of memory. The numbers below the entries are indices into the string.
```

We describe this as _an array of six characters_, even though it requires 12 bytes of memory. We will refer to each location within an array as a **cell**, and will use an integer **index** to describe its location within the array, with cells numbered starting with 0, 1, 2, and so on. For example, in Figure above, the cell of the array with index 4 has contents L and is stored in bytes 2154 and 2155 of memory.

Each cell of an array must use the same number of bytes. This requirement is what allows an arbitrary cell of the array to be accessed in constant time based on its index. In particular, if one knows the memory address at which an array starts (e.g., $2146$ in figure above), the number of bytes per element (e.g., 2 for a Unicode character), and a desired index within the array, the appropriate memory address can be computed using the calculation, 

$$ \text{start} + \text{cellsize} \times \text{index} $$

By this formula, the cell at index 0 begins precisely at the start of the array, the cell at index 1 begins precisely `cellsize` bytes beyond the start of the array, and so on. As an example, cell 4 of Figure above begins at memory location 2146 + 2 · 4 = 2146 + 8 = 2154.



As another motivating example, assume that we want a medical information system to keep track of the patients currently assigned to beds in a certain hospital. If we assume that the hospital has 200 beds, and conveniently that those beds are num- bered from 0 to 199, we might consider using an array-based structure to maintain the names of the patients currently assigned to those beds. For example, in Python we might use a list of names, such as:

```python
data = ['Rene' , 'Joseph' , 'Janet' , 'Jonas' , 'Helen' , 'Virginia']
```

To represent such a list with an array, Python must adhere to the requirement that each cell of the array use the same number of bytes. Yet the elements are strings, and strings naturally have different lengths. Python could attempt to reserve enough space for each cell to hold the _maximum_ length string (not just of currently stored strings, but of any string we might ever want to store), but that would be wasteful.

Instead, Python represents a list or tuple instance using an internal storage mechanism of an array of object **references**. At the lowest level, what is stored is a consecutive sequence of memory addresses at which the elements of the sequence reside. A high-level diagram of such a list is shown in figure below.

```{figure} https://i.ibb.co/CJmjccS/array4.png
---
name: array4
width: 90%
align: center
---
An array storing references to strings.
```

Although the relative size of the individual elements may vary, the number of bits used to store the memory address of each element is fixed (e.g., 64-bits per address). In this way, Python can support constant-time access to a list or tuple element based on its index.
