# Fundamental Data Structures and Algorithms 04a - Arrays and Linked Lists

## Unit 3: Basic Data Structures
---

### Objective

- Differentiate data structures from data types
- Introduce basic data structures
 - Arrays
   - Python `list` (recap)
   - NumPy arrays (recap)
 - Linked Lists

---

### Data Types vs Data Structures

| Property       | Data Type                                                    | Data Structure                                               |
| -------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
| Definition     | Data type is the representation of nature and type of data that has been going to be used in programming or in other words data type describes all that data which share a common property. For example an integer data type describes every integer that the computers can handle. | On other hand Data structure is the collection that holds data which can be manipulated and used in programming so that operations and algorithms can be more easily applied. For example tree type data structures often allow for efficient searching algorithms. |
| Implementation | Data type in programming are implemented in abstract implementation whose definition is provided by different languages in different ways. | On other hand Data structure in programming are implemented in concrete implementation as their definition is already defined by the language that what type of data they going to store and deal with. |
| Storage        | In case of data type the value of data is not stored as it only represents the type of data that can be get stored. | On other hand data structure holds the data along with its value that actually acquires the space in main memory of the computer. Also data structure can hold different kind and types of data within one single object |
| Assignment     | As data type already represents the type of value that can be stored so values can directly be assigned to the data type variables. | On other hand in case of data structure the data is assigned to using some set of algorithms and operations like push, pop and so on. |
| Performance    | If case of data type only type and nature of data is concern so there in no issue of time complexity. | On other hand time complexity comes in case of data structure as it mainly deals with manipulation and execution of logic over data that it stored. |

***Recap***  
> Discuss what are some of the data types and data structures that you have learnt.

---

### Introduction to Arrays

- Arrays generally structure other (fundamental) objects in dimensions like 2-dimensional rows and columns
 - In the simplest case, a one-dimensional array then represents, mathematically speaking, a *vector* of numbers. We can think of it as a single row or a single column of numbers.
 - In a more common case, an array represents an $(i × j)$ matrix of elements.
 - This concept generalizes to $(i × j × k)$ cubes of elements in three dimensions as well as to general *n*-dimensional arrays of shape $(i × j × k × l × …)$.
 - most importantly, arrays store data in a **contiguous** block of memory 
  
> What about lists? Are arrays and lists the same? What about `Numpy` array?

---

### Arrays with built-in `List`

***Recap***  
A simple list can already be considered a one-dimensional array:

In [None]:
'''empty list'''
a = list()                     # list constructor
print('a     :', type(a), a)   # type()
print('len(a):', len(a), '\n') # len() returns the number of objects

b = []
print('b     :', type(b), b)
print('len(b):', len(b))

# Note: brackets [] are the preferred method of initializing a list
# Use list() only when you need to convert from other sequences, e.g. tuple -> list

In [None]:
'''list of integers + accessing index'''
c = [1, 2, 3]
print('c     :', type(c), c)
print('c[1]  :', type(c[1]), c[1])  # c[1] to access the first index
print('len(c):', len(c))

In [None]:
'''list of floats + negative index'''
d = [1.0, 2.0, 3.0]
print('d   :', type(d), d)
print('d[-1]:', type(d[-1]), d[-1])

In [None]:
'''list of strings'''

'''which is correct?'''
e = [x, y, z]
print(type(e), e)
print(type(e[1]), e[1])

f = ['123']
print(type(f), f)
print(type(f[0]), f[0])

In [None]:
'''list of mixed object'''
g = [1, 'a', b, {'myKey':99}]
print(type(g), g)
print(type(g[2]), g[2])
print(type(g[3]), g[3])

In [None]:
'''
accessing range of index

syntax -> [start:stop:step]
Example:
          +---+---+---+---+---+---+
          | P | y | t | h | o | n |
          +---+---+---+---+---+---+
index       0   1   2   3   4   5
index      -6  -5  -4  -3  -2  -1
'''
g = list('Python')
print(g[1:])
print(g[:2])
print(g[1:2])

In [None]:
'''accessing index beyond range'''
print(g[10])

---

Since list objects can contain arbitrary other objects, they can also contain other list objects. In that way, two- and higher-dimensional arrays are easily constructed by nested list objects:

In [None]:
'''nested empty list'''
h = [[]]
print('h     :', h)
print('len(h):', len(h)) # note that it a nested empty list has length 1

In [None]:
'''nested list using previously defined "g" '''
i = [g, g, g]
print('i     :', i)
print('len(i):', len(i))

In [None]:
'''accessing nested list'''
print('i[0]   : ', i[0])
print('i[0][0]: ',i[0][0])

---

Note that combining objects in the way just presented generally works with reference pointers to the original objects. What does that mean in practice? Let us have a look at the following operations:

In [None]:
'''modify g'''
g[0] = 123
print('i:', i) # note that i also changes if g changes

Now let's look at basic arithmetic operations with lists:

In [None]:
'''add 2 lists together'''
j = c + d
print('j:', j)

In [None]:
'''add list to another object (not supposed to work)'''
k = j + 3
print('k:', k)

In [None]:
'''multiply a list with an integer'''
l = j * 2
print('l:', l)

In [None]:
'''multiply a list with a float (not supposed to work)'''
m = j * 2.0
print('m:', m)

In [None]:
'''division of a list by an integer (not supposed to work)'''
n = j / 2
print('n:', n)

In [None]:
'''multiply a list with another list (not supposed to work)'''
o = [1, 2] * [3, 4]
print('o:', o)

---

Python has a set of built-in methods (functions) that you can use on lists. Example:

In [None]:
'''append()'''
p = [9,4,2,6,8,1,3,7]
print('original p         :', p)

p.append(5)
print('p after appending 5:', p)

In [None]:
'''sort()'''
p.sort()
print('p.sort():', p)

In [None]:
'''pop()'''
p.pop()
print('p.pop():', p)

'''pop(index)'''
p.pop(2)
print('p.pop(2):', p)

In [None]:
'''reverse()'''
p.reverse()
print('p.reverse():',p)

Other methods include `copy()`, `count()`, `extend()`, `index()`, `insert()`, `pop()`,  `remove()`, `reverse()`. You are encouraged to explore the them in the [official documentation](https://docs.python.org/3/tutorial/datastructures.html).

The table below lists some the complexities for the various functions and operations:

|Operation     | Example                  | Big O Notation     | Notes                                              |
|:-------------|:-------------------------|:-------------------|:---------------------------------------------------|
|access        | `l[i]`                   | $O(1)$             |                                                    |
|assignment    | `l[i] = 0`               | $O(1)$             |                                                    |
|length        | `len(l)`                 | $O(1)$             |                                                    |
|append        | `l.append(x)`            | $O(1)$             | equivalent to `a[len(a):] = [x]`                   |
|pop           | `l.pop()`                | $O(1)$             | equivalent to `l.pop(-1)`, popping at end          |
|clear         | `l.clear()`              | $O(1)$             | similar to `l = []`                                |
|slice         | `l[a:b]`                 | $O(b-a)$           | depends on the number of elements in the parameter |
|extend        | `l.extend(iterable)`     | $O(len(iterable))$ | equivalent to `a[len(a):] = iterable`              |
|              |                          |                    | depends on number of elements in iterable          |
|constructor   | `list(iterable)`         | $O(len(iterable))$ | depends on number of elements in iterable          |
|comparison    | `l1 == l2` or `l1 != l2` | $O(n)$             |                                                    |
|index         | `l.index(i)`             | $O(n)$             |                                                    |
|insert        | `l.insert(i, x)`         | $O(n)$             |                                                    |
|delete        | `del l[i]`               | $O(n)$             |                                                    |
|containment   | `x in l`or`x not in l`   | $O(n)$             | linearly searches list                             |
|copy          | `l.copy()`               | $O(n)$             | equivalent to`l[:]`                                |
|count         | `l.count(x)`             | $O(n)$             |                                                    |
|remove        | `l.remove(x)`            | $O(n)$             |                                                    |
|pop           | `l.pop(i)`               | $O(n)$             |                                                    |
|min/max       | `min(l)/max(l)`          | $O(n)$             | linearly searches list for value                   |
|reverse       | `l.reverse()`            | $O(n)$             |                                                    |
|iteration     | `for v in l:`            | $O(n)$             |                                                    |
|sort          | `l.sort()`               | $O(n\log{n})$      | [Timsort](http://svn.python.org/projects/python/trunk/Objects/listsort.txt)                                                  |
|multiply      | `k*l`                    | $O(k\ n)$          | example 1: `5*l` $\rightarrow O(n)$;               |
|              |                          |                    | example 2: `len(l)*l` $\rightarrow O(n^2)$         |

---

### Array with NumPy

https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/Python4_DataAnalysis.html#numpy


- the list class has been built with a broad and general scope
- might need a more specialized class to handle array-specific structures
 - `Numpy` library
 - written in C
 - able to handle n-dimensional arrays conveniently and efficiently

In [None]:
'''syntax to invoke the NumPy library'''
import numpy

Let's illustrate with examples

In [None]:
'''initializing a numpy array'''
q = numpy.array([0, 0.5, 1.0, 1.5, 2.0])
print('q', type(q), q) # note the difference in the output

Alternatively we can import a library using a shorthand. The example below is commonly used:

In [None]:
'''import as np'''
import numpy as np

q = np.array([0, 0.5, 1.0, 1.5, 2.0])
print('q', type(q), q) # note that this equivalent to the one above

![array](http://jalammar.github.io/images/numpy/create-numpy-array-1.png)

---

Numpy also provides many methods to create arrays:

In [None]:
'''create an array of all zeros'''
r = np.zeros((2,3))
print('np.zeros((2,3)):\n', r)

In [None]:
'''create an array of all ones'''
s = np.ones((3,2))
print('np.ones((1,2):\n', s)

In [None]:
'''create a constant array'''
t = np.full((2,2), 7)
print('np.full((2,2), 7):\n', t)

In [None]:
'''create a 4x4 identity matrix'''
u = np.eye(4)
print('np.eye(4):\n', u)

In [None]:
'''create an array of random values'''
v = np.random.random((3,3))
print('np.random.random((3,3)):\n', v)

**Visualizing NumPy array creation**

![create](http://jalammar.github.io/images/numpy/create-numpy-array-ones-zeros-random.png)

![create2](http://jalammar.github.io/images/numpy/numpy-array-create-2d.png)

![create3](http://jalammar.github.io/images/numpy/numpy-matrix-ones-zeros-random.png)

You can read about other methods of array creation in the [documentation](https://numpy.org/doc/stable/user/basics.creation.html#arrays-creation).

---

**NumPy Array index**

In [None]:
'''accessing index and slice similar to python list'''
print('v[2]     :\n', v[2])
print('\nv[1:2] :\n', v[1:2])
print('\nv[1][2]:\n', v[1][2])
print('\nv[1, 2]:\n', v[1, 2]) # note that array[row,col] is equivalent to array[row][col] in numpy

**Visualizing NumPy array index**

Example 1:
![index](http://jalammar.github.io/images/numpy/numpy-array-slice.png)

Example 2:
![index2](http://jalammar.github.io/images/numpy/numpy-matrix-indexing.png)

---

**NumPy Array Arithmetic Operations**

NumPy array arithmetic operations <u>differs</u> from Python `list`:

In [None]:
pythonList = [3,2,1]
numpyArray = np.array([3,2,1])

In [None]:
'''addition'''
print('python list addition:')
print(pythonList + pythonList)       

print('\nnumpy array addition:')
print(numpyArray + numpyArray, '(method 1)')       
print(np.add(numpyArray, numpyArray), '(method 2)') 
print(np.add(pythonList, pythonList), '(method 3)') # numpy.add() works with Python list and NumPy arrays

In [None]:
'''subtraction 1 (not supposed to work)'''
print('python list subtraction:')
print(pythonList - pythonList)  

In [None]:
'''subtraction 2'''
print('\nnumpy array subtraction:')
print(numpyArray - np.sort(numpyArray), '(method 1)')       
print(np.subtract(numpyArray,np.sort(numpyArray)), '(method 2)') 

In [None]:
'''multiplication 1'''
print('python list x integer multiplication:')
print(3*pythonList)

print('\nnumpy array x integer multiplication:')
print(3*numpyArray, '(method 1)')
print(np.multiply(3,numpyArray), '(method 2)')

In [None]:
'''multiplication 2 (not supposed to work)'''
print('python list x python list multiplication:')
print(pythonList * pythonList)

In [None]:
'''multiplication 3'''
print('numpy array x numpy array multiplication:')
print(numpyArray * numpyArray, '(method 1)') # element-wise multiplication
print(np.multiply(numpyArray, numpyArray), '(method 2)') # equivalent to * operator

In [None]:
'''division 1 (not supposed to work)'''
print('python list / integer division:')
print(pythonList/2)

In [None]:
'''division 2'''
print('numpy array / integer division:')
print(numpyArray/2, '(method 1)')
print(np.divide(numpyArray, 2), '(method 2)')

In [None]:
'''division 3'''
print('numpy array / numpy array multiplication:')
print(numpyArray / np.sort(numpyArray), '(method 1)') # element-wise division
print(np.divide(numpyArray, np.sort(numpyArray)), '(method 2)') # equivalent to / operator

In [None]:
'''power'''
print('numpyArray power 2:')
print(numpyArray**2, '(method 1)')
print(np.power(numpyArray, 2), '(method 2)')

In [None]:
'''square root:'''
print('numpyArray square root 2:')
print(numpyArray**0.5, '(method 1)')
print(np.sqrt(numpyArray), '(method 2)')

---

**Visualizing Arithetic Operations on NumPy arrays (Example 1)**

Assume:
![arith1](http://jalammar.github.io/images/numpy/numpy-arrays-example-1.png)

Adding them up element-wise (i.e. adding the values of each row) is as simple as typing `data + ones`:
![arith2](http://jalammar.github.io/images/numpy/numpy-arrays-adding-1.png)

And it’s not only addition that we can do this way:
![arith3](http://jalammar.github.io/images/numpy/numpy-array-subtract-multiply-divide.png)

Scalar multiplication:
![arith4](http://jalammar.github.io/images/numpy/numpy-array-broadcast.png)

---

**Visualizing Arithetic Operations on NumPy arrays (Example 2)**

We can add and multiply matrices using arithmetic operators (`+-*/`) if the two matrices are the same size. NumPy handles those as position-wise operations:
![arith5](http://jalammar.github.io/images/numpy/numpy-matrix-arithmetic.png)

We can get away with doing these arithmetic operations on matrices of different size only if the different dimension is one (e.g. the matrix has only one column or one row), in which case NumPy uses its broadcast rules for that operation:
![arith6](http://jalammar.github.io/images/numpy/numpy-matrix-broadcast.png)

---

**Other NumPy Methods**

A major feature of NumPy is that it includes many built-in methods. Here are some examples:

In [None]:
w = np.array([1,2,3,4,5])

In [None]:
'''shape returns a tuple in the format (row, col); note the difference'''
print('w.shape    :', w.shape)     # method 1
print('np.shape(w):', np.shape(w)) # method 2

In [None]:
'''sum()'''
print('w.sum()  :', w.sum())
print('np.sum(w):', np.sum(w))

![sum](http://jalammar.github.io/images/numpy/numpy-matrix-dot-product-2.png)

In [None]:
'''mean()'''
print('w.mean()  :', w.mean())
print('np.mean(w):', np.mean(w))

In [None]:
'''standard deviation'''
print('w.std()  :', w.std())
print('np.std(w):', np.std(w))

In [None]:
'''min/max'''
print('w.min()  :', w.min())
print('np.max(w):', np.max(w))

In [None]:
'''dot product (recall matrix)'''
print('w.dot(w)   :', w.dot(w))
print('np.dot(w,w):', np.dot(w,w))

![dot](http://jalammar.github.io/images/numpy/numpy-matrix-dot-product-1.png)

---

**Axis Parameter**

The axis parameter is best explained using visuals:

![axis1](http://community.datacamp.com.s3.amazonaws.com/community/production/ckeditor_assets/pictures/332/content_arrays-axes.png)

Example:
![axis2](http://jalammar.github.io/images/numpy/numpy-matrix-aggregation-4.png)

---

**Transpose and Reshape of Num**

A common need when dealing with matrices is the need to rotate them. This is often the case when we need to take the dot product of two matrices and need to align the dimension they share. NumPy arrays have a convenient property called T to get the transpose of a matrix:
![transpose](http://jalammar.github.io/images/numpy/numpy-transpose.png)

In [None]:
'''transpose'''
data = np.array([[1,2],[3,4],[5,6]])
print('data :\n', data)
print('shape:', data.shape)

print('\ndata.T:\n', data.T)
print('shape:', data.T.shape)      

In more advanced use case, you may find yourself needing to switch the dimensions of a certain matrix. This is often the case in machine learning applications where a certain model expects a certain shape for the inputs that is different from your dataset. NumPy’s `reshape()` method is useful in these cases. You just pass it the new dimensions you want for the matrix. You can pass -1 for a dimension and NumPy can infer the correct dimension based on your matrix:
![reshape](http://jalammar.github.io/images/numpy/numpy-reshape.png)

In [None]:
'''reshape'''
data = np.array([[1],[2],[3],[4],[5],[6]])
print('data:\n', data)
print('shape:\n', data.shape )

print('\ndata.reshape(2,3):\n', data.reshape(2,3))
print('\ndata.reshape(3,2):\n', data.reshape(3,2))

---

**N-Dimensional Arrays**

NumPy can do everything we’ve mentioned in any number of dimensions. Its central data structure is called ndarray (N-Dimensional Array) for a reason.

![ndarray](http://jalammar.github.io/images/numpy/numpy-3d-array.png)

In a lot of ways, dealing with a new dimension is just adding a comma to the parameters of a NumPy function:

![ndarray2](http://jalammar.github.io/images/numpy/numpy-3d-array-creation.png)

---

## Python `List` vs NumPy Array

**Memory**

The main benefits of using NumPy arrays should be smaller memory consumption and better runtime behavior. We can use `getsizeof()` method from the `sys` module to determine the size of the object in bytes.

In [None]:
import sys, random
import numpy as np

'''for n = 1'''
n = 1
print('for n = 1:')
myList = random.sample(range(1000), n) # statement to generate list of random values
print('size of python list in bytes:', sys.getsizeof(myList))
myArray = np.array(random.sample(range(1000), n))
print('size of numpy array in bytes:', sys.getsizeof(myArray))

'''for n = 10'''
n = 10
print('\nfor n = 10:')
myList = random.sample(range(1000), n)
print('size of python list in bytes:', sys.getsizeof(myList))
myArray = np.array(random.sample(range(1000), n))
print('size of numpy array in bytes:', sys.getsizeof(myArray))

'''for n = 100'''
print('\nfor n = 100:')
n = 100
myList = random.sample(range(1000), n)
print('size of python list in bytes:', sys.getsizeof(myList))
myArray = np.array(random.sample(range(1000), n))
print('size of numpy array in bytes:', sys.getsizeof(myArray))

'''for n = 1000'''
print('\nfor n = 1000:')
n = 1000
myList = random.sample(range(1000), n)
print('size of python list in bytes:', sys.getsizeof(myList))
myArray = np.array(random.sample(range(1000), n))
print('size of numpy array in bytes:', sys.getsizeof(myArray))

From the example above, we can see that the as the input size increases, so does the memory consumption. Most importantly is that at large input sizes, NumPy arrays takes up less memory than Python lists.

**Speed**

This shows some performance numbers of operations between Python list and NumPy array. 

In [None]:
import time
import numpy as np

n = 1000000

def python_list_version():
    t1 = time.time()
    X = range(n)
    Y = range(n)
    Z = [X[i] + Y[i] for i in range(len(X)) ] # list comprehension
    return time.time() - t1

def numpy_array_version():
    t1 = time.time()
    X = np.arange(n)
    Y = np.arange(n)
    Z = X + Y
    return time.time() - t1

t1 = python_list_version()
t2 = numpy_array_version()
print('python_list_version time taken, t1:', t1)
print('numpy_array_version time taken, t2:', t2)
print("Numpy array is  " + str(t1/t2) + " faster!")

Notice how the 2nd set of numbers (NumPy) are always smaller - meaning they have much better performance than their Python List core library counterparts.

---

## Linked List

*Think of trains!*

**Linked List** is a data structure in which objects are arranged linearly. It is made of a single row of cabins (**nodes**), each connected to adjacent cabins. Each node consists of 2 fields:
1. `data` - containing the data to be stored in the node
2. `next` - containing the reference to the next node in the list (represented by arrows).

<a name="linked_list"></a>

| ![linked_list](https://i.ibb.co/gzXZfrm/Slide7.png) |
| :----------------------------------------------------------: |
|      Fig 3.1. Visual representation of linked list       |

*Class Discussion*
> What does <u>reference</u> mean?

---

### Basic Implementation of a Singly Linked List

Let's define the `node` class:

In [None]:
'''node class definition'''
class Node:
    # constructor
    def __init__(self, data):
        self.data = data
        self.next = None 

Once we have the Node class, we can implement any *linked list* as follows:

In [None]:
'''creating node objects independently'''
node1 = Node("This is the first node and also the head node.")
node2 = Node("This is the second node. I can basically put any data I want.")
node3 = Node(43)
node4 = Node(1.618)
node5 = Node(Node("this is a node inside a node"))

At this point, nodes are created but the are unlinked. To link them up, we can do the following:

In [None]:
'''linking nodes'''
node1.next = node2
node2.next = node3
node3.next = node4
node4.next = node5

We can now display the entire linked list, by passing the first node, also known as the **head node**, as the argument. This is also known as linked *list traversal*.

In [None]:
'''printing a linked list'''
def printLinkedList(node):
    while node is not None:
        print(node.data)
        node = node.next
    print()
    
printLinkedList(node1) # starting from first node, also called 'head' node

*Class Discussion*
> Explain the output. 
> - Why does the last print statement display a weird message?
> - What do I write to print the last node?

In [None]:
'''Accessing the last node'''
print(node5.data)
print((node5.data).data) # note the difference

There are various ways of implementing a linked list in Python. The example above is a simplified version to demonstrate the workings of a linked list.

*Thought Activity*
> 1. How do we insert a node in between existing nodes?  
>
> 2. What are the big O notations for the search, insertion and deletion operation of a node? Explain.
>
> 3. Write a function to append a node at the end of a linked list recursively. Note: linked list are made up of sub linked lists.

*Solution*

1. To insert between existing nodes

In [None]:
'''Insert between existing nodes'''
nodePi = Node(3.142)
node3.next = nodePi
nodePi.next = node4
 
printLinkedList(node1) # to test

2. Big O Notation for search, insert, delete

|Operation|Time Complexity|
|:--------|:-------------:|
|$search$ | $O(n)$        |
|$insert$ | $O(1)$        |
|$delete$ | $O(1)$        |

3. Append recursively.

In [None]:
'''recursive append'''
def append(currentNode, newData):
    # If linked list is empty, create a
    # new node (Assuming newNode() allocates
    # a new node with given data) 
    if (currentNode == None):
        return Node(newData)   
    # If we have not reached end,  
    # keep traversing recursively. 
    else: 
        currentNode.next = append(currentNode.next, newData) 
    return currentNode

append(node1, 'this is added recursively')
printLinkedList(node1) # to test

---

## Arrays vs Linked Lists

- Arrays are made up of a contiguous block of memory, whereas linked list are dynamically allocated. This has several implications:

  1. Accessing an element in an array is fast $O(1)$, whereas linked list takes linear time $O(n)$ (needs to start from head node).

  2. Linked list are more memory efficient than arrays. In an array, each index is allocated a fixed size of memory (despite not needed the amount of memory), while linked list has a dynamic allocation of memory base on requirements.
  
- insertion/deletion operations are ineffecient in arrays - to insert an element, either have to create a new block of memory big enough to hold the inserted element or shift up one index after the insertion $O(n)$. On the other, we just need to manipulate the `next` values.

- some operations in arrays benefit due to *cache locality* which improves performance.

---