# Arrays

## RAM

A data structure is a way of structuring data.

We're going to be structuring data inside of RAM. RAM is where all of our variables are stored.

The "giga" in "gigabytes" means approximately $10^9$ - a billion. ("Mega" means approximately $10^6$ - a million; "kilo" means approximately $10^3$ - a thousand.)

**Note:** Actually, to be precise, a "kilobyte" is $2^{10}$ (1024 bytes); a "megabyte" is $1024 * 1024 = 2^{20}$ bytes (1024 kilobytes); a "gigabyte" is $1024 * 1024 * 1024 = 2^{30}$ bytes (1024 megabytes).

A byte is 8 bits.

A bit can be thought of as a *position* that can store a digit, but the restriction on the digit is that it can either be a 0 or a 1.

Integers, floating point numbers, characters, booleans, etc. are stored in RAM as bytes. It's pretty common for integers to be represented as 4 bytes (32 bits).

**Note:** In PyTorch, full precision floats are **typically** represented as 4 bytes (32 bits), whereas ints are **typically** represented as 8 bytes (64 bits). The 8 byte integer data type is called a *'long'*. Floats can also be represented in half precision - 2 bytes (16 bits) or in double precision - 8 bytes (64 bits). The 8 byte floating point data type is called a *'double'*.

Once we represent each int as 4 bytes (say), we're allowed to put it into RAM.

RAM can be thought of as a **contiguous** block of data. It has two components: values and addresses. The distinct location at which each value (int) is stored in RAM is its *'address'*. The **unit** of address is bytes, e.g., `$0`, `$4`, `$8`, etc. This means that the first int is stored at location 0, the second int at location 4, the third int at location 8, etc. See diagram in video.

When we store an array in RAM, we don't get to choose what location it's going to be at. But what's certain is that arrays are **always** contiguous. This is the unique thing about arrays - in memory, they look the exact same as the way we use them.

A character takes only one byte to store in memory. At least, that's the typical case with ASCII characters. We can store values contiguously regardless of how big or small they are, as long as we increment the address by the size of the value.

## Static Arrays

With any data structure, the two most common operations are (i) reading the data and (ii) writing to the data.

To read the (say) first element of an array, the intuitive thing is to go into RAM at the address of the first element and then read the value. That's exactly what's happening under the hood, but we as programmers don't need to know the exact addresses. Instead, we use indexes to access values. When we do so, the programming language automatically goes into the address of the first element, and reads the value. In other words, the programming language can automatically map ANY index in our array to its address in memory.

We don't get to decide exactly where to store our array (or any variable for that matter) in RAM.

Static arrays are fixed size arrays. When we delete an element from a static array, we don't actually de-allocate the memory associated with that element; instead, the value is overwritten by a default value such as `0`, `-1` or `None`.

Reading from an arbitrary position, writing to an arbitrary position (i.e., overwriting the current value), inserting at the end and deleting from the end are the efficient operations with a static array; each of these has a time complexity of $O(1)$.

**Length vs. size of a static array:** Length is the number of elements, whereas size is the allocated size. Inserting an element at an arbitrary location is an $O(n)$ operation, where $n$ is the length of the array. The same is true for deleting from an arbitrary position.

**Note:** In big $O$ notation, $n$ is always the worst case. (Since we don't want to go through every possible scenario, we generalize it to be the worst case.)

In [None]:
my_array = [1, 3, 5]

**Note:** Here, we're pretending that `my_array` is a static array.

In [None]:
i = 1
my_array[i]

3

This algorithm has a time complexity of $O(1)$.

In [None]:
for i in range(len(my_array)):
    print(my_array[i])

1
3
5


In [None]:
i = 0
while i < len(my_array):
    print(my_array[i])
    i += 1

1
3
5


This algorithm has a time complexity of $O(n)$.

In [None]:
def remove_end(arr, length):
    if length > 0:
        arr[length - 1] = None
    # We will also consider the length to be decreased by 1.

This algorithm has a time complexity of $O(1)$.

In [None]:
length = 3 # The number of elements that aren't None.

**Note:** The length of a static array needs to be maintained separately.

In [None]:
remove_end(my_array, length)
length -= 1
print(my_array)
print(length)

[1, 3, None]
2


**Note:** Since lists are mutable, only a pointer to `my_array` is passed as an argument to `remove_end`. Therefore, any modification to `my_array` inside the function occurs globally.

In [None]:
# Again:
remove_end(my_array, length)
length -= 1
print(my_array)
print(length)

[1, None, None]
1


In [None]:
def remove_middle(arr, i, length):
    for index in range(i + 1, length):
        arr[index - 1] = arr[index]
    arr[length - 1] = None
    # We will also consider the length to be decreased by 1.

This algorithm has a time complexity of $O(n)$.

In [None]:
my_array = [4, 5, 6]
length = 3

In [None]:
remove_middle(my_array, 0, length)
length -= 1
print(my_array)
print(length)

[5, 6, None]
2


In [None]:
# Again:
remove_middle(my_array, 0, length)
length -= 1
print(my_array)
print(length)

[6, None, None]
1


In [None]:
def insert_end(arr, n, length, capacity):
    if length < capacity:
        arr[length] = n
    # We will also consider the length to be increased by 1.

This algorithm has a time complexity of $O(1)$.

In [None]:
insert_end(my_array, 7, length, capacity=3)
length += 1
print(my_array)
print(length)

[6, 7, None]
2


In [None]:
my_array = [1, 5, 7, 9, None, None]
length = 4

Let's say that we want to insert `3` at index `1`.

In [None]:
def insert_middle(arr, i, n, length):
    for index in range(length - 1, i - 1, -1):
        arr[index + 1] = arr[index]
    arr[i] = n
    # We will also consider the length to be increased by 1.

This algorithm has a time complexity of $O(n)$.

In [None]:
insert_middle(my_array, 1, 3, length)
length += 1
print(my_array)
print(length)

[1, 3, 5, 7, 9, None]
5


In [None]:
insert_middle(my_array, 3, 6, length)
length += 1
print(my_array)
print(length)

[1, 3, 5, 6, 7, 9]
6


### Remove Duplicates From Sorted Array

In [None]:
def remove_middle(arr, i, length):
    for index in range(i + 1, length):
        arr[index - 1] = arr[index]
    arr[length - 1] = None
    return length - 1

In [None]:
def remove_duplicates(nums):
    k = len(nums)
    i = 0
    while i < k - 1: # This will go upto the second last element in the array.
        while nums[i] == nums[i + 1]:
            # Remove next element.
            k = remove_middle(nums, i + 1, k)
        i += 1
    return k

In [None]:
# Test:
nums = [1, 1, 2]
k = remove_duplicates(nums)
print(nums)
print(k)

[1, 2, None]
2


In [None]:
# Test:
nums = [0, 0, 1, 1, 1, 2, 2, 3, 3, 4]
k = remove_duplicates(nums)
print(nums)
print(k)

[0, 1, 2, 3, 4, None, None, None, None, None]
5


### Remove Element

In [None]:
def remove_middle(arr, i, length):
    for index in range(i + 1, length):
        arr[index - 1] = arr[index]
    arr[length - 1] = None
    return length - 1

In [None]:
def remove_element(nums, val):
    k = len(nums)
    i = 0
    while i < k:
        while nums[i] == val:
            # Remove nums[i].
            k = remove_middle(nums, i, k)
        i += 1
    return k

In [None]:
# Test:
nums = [3, 2, 2, 3]
val = 3
k = remove_element(nums, val)
print(nums)
print(k)

[2, 2, None, None]
2


In [None]:
# Test:
nums = [0, 1, 2, 2, 3, 0, 4, 2]
val = 2
k = remove_element(nums, val)
print(nums)
print(k)

[0, 1, 3, 0, 4, None, None, None]
5


## Dynamic Arrays

Adding elements to the end of an array is called *'pushing'* to the array. In Python, it's done with the `append` method. As we shall see, pushing to a dynamic array has a time complexity of amortized $O(1)$.

In the internal implementation of the dynamic array, a *'pointer'* is maintained containing the index of the last element of the array. (This pointer is also used to return the length of the dynamic array when the `len` function is called.)

**Note:** The term *'pointer'* has two distinct meanings (depending on the context). For example, in the context of a linked list, it's a reference to another mutable object. However, in the context of an array, it's an index of an element. You will see the latter meaning being used in the context of the '*two pointers*' pattern.

Removing elements from the end of the array is called *'popping'* from the array. In Python, it's done with the `pop` method. Popping from a dynamic array has a time complexity of $O(1)$. When we pop, the pointer maintaining the index of the last element of the array is shifted to the left by 1 position.

In [None]:
my_list = [6, 7, 8]
el = my_list.pop()
print(el)
print(my_list)

8
[6, 7]


**Definition:** Amortized time complexity is the average time taken per operation over a sequence of operations.

Let's think about the overall time complexity when a dynamic array runs out of space, and all the elements need to copied over to an array double the original size. Let the original size be $n$. Allocating memory for an array of size $2n$ has a time complexity of $O(2n)$. Additionally, copying all the elements from the original array to the new array is another operation with time complexity $O(n)$. Finally, appending the new element is an $O(1)$ operation. So the overall time complexity of the operation is $O(3n + 1)$ = $O(n)$ (since we can ignore constants). However, since an array running out of space is an infrequent event, we say that the time complexity of appending a single element to a dynamic array is amortized $O(1)$, i.e., $O(1)$ on average. (There is a formal mathematical proof of this which relies on the power series.) Informally, the reason is the following. We don't have to resize the array every time we insert an element; we only have to do it when the array runs out of space. When that happens, the time complexity of the operation is $O(3n + 1)$ = $O(n)$ for an array of original size $n$. This is $O(1)$ on average.

Why double the capacity when the array runs out of space? This is a middle ground between (a) having to resize the array & copy over all the elements every time (which is an expensive operation in terms of time) and (b) allocating an excessive amount of empty space (which is costly in terms of memory).

### Concatenation of Array

In [None]:
def get_concatenation(nums):
    ans = []
    for i in range(2):
        for num in nums:
            ans.append(num)
    return ans

In [None]:
# Test:
nums = [1, 2, 1]
ans = get_concatenation(nums)
print(ans)

[1, 2, 1, 1, 2, 1]


In [None]:
# Test:
nums = [1, 3, 2, 1]
ans = get_concatenation(nums)
print(ans)

[1, 3, 2, 1, 1, 3, 2, 1]


## Stacks

A stack supports three operations: push, pop and peek. All three are efficient operations with a time complexity of $O(1)$ / amoritized $O(1)$.

Stacks can be implemented with dynamic arrays. It's a LIFO data structure.

One application of a stack is to reverse a sequence, e.g., reversing `['a', 'b', 'c']` to get `['c', 'b', 'a']`. However, there are other ways of doing the same thing.

There are a lot of other use cases for stacks that can get a lot more complex.

The *top* pointer of a stack is the index of the last item. It is equal to `len(stack) - 1`.

It is a good measure to check if the stack is empty to avoid errors.

In [None]:
stack = []
try:
    el = stack.pop()
    print(el)
except IndexError as er:
    print(er)

pop from empty list


### Valid Parentheses

In [None]:
open_stack = []
if open_stack:
    print("Yay!")
if not open_stack:
    print("Nay!")

Nay!


An empty list behaves like a `False` within an `if` condition.

In [None]:
def is_valid(s):
    open_stack = []
    close_to_open = {')': '(', '}': '{', ']': '['}
    for c in s:
        if c in close_to_open:
            if open_stack: # Not empty.
                if open_stack[-1] != close_to_open[c]:
                    return False
                else:
                    open_stack.pop()
            else:
                return False
        else:
            open_stack.append(c)
    return True if not open_stack else False # Return True if empty.

In [None]:
# Test:
s = "([{}])"
is_valid(s)

True

In [None]:
# Test:
s = "()[]{}"
is_valid(s)

True

In [None]:
# Test:
s = "[(])"
is_valid(s)

False

### Min Stack

In [None]:
float('-inf')

-inf

In [None]:
class MinStack:
    def __init__(self):
        self.stack = []
        self.min_idxs = []

    def push(self, val):
        self.stack.append(val)
        if not self.min_idxs: # Empty.
            self.min_idxs.append(0)
        else:
            if val < self.stack[self.min_idxs[-1]]:
                self.min_idxs.append(len(self.stack) - 1) # Append the top pointer.
            else:
                self.min_idxs.append(self.min_idxs[-1]) # Append the previous pointer.

    def pop(self):
        self.stack.pop()
        self.min_idxs.pop()

    def peek(self):
        return self.stack[-1]

    def get_min(self):
        return self.stack[self.min_idxs[-1]]

In [None]:
min_stack = MinStack()
min_stack.push(1)
min_stack.push(2)
min_stack.push(0)
print(min_stack.get_min()) # Return 0.
min_stack.pop()
print(min_stack.peek())    # Return 2.
print(min_stack.get_min()) # Return 1.

0
2
1
