<div align="center">
    <h1>DS-210: Programming for Data Science</h1>
    <h1>Lecture 15</h1>
</div>

1. Popular implementation: binary heap
2. Applications of priority queues: sorting and shortest paths
3. Slices

# 1. Popular implementation: binary heap

## Binary heaps

* Data organized into a binary tree (one root, each node with 0, 1 or 2 children)
* Every internal node not smaller (or greater) than its children
* Every layer is filled before the next layer starts

**Basic property:**
The root has the current maximum (minimum), i.e., the answer to next `pop`

<div align="center">
    <img src="order.png" alt="[picture of basic binary heap with heap ordering]" width="70%">
</div>

## Binary heaps

**Efficient storage:**
* Tree levels filled from left to right
* Can be mapped to a vector

* Easy to move to the parent or children using vector indices

<div align="center">
    <img src="layers.png" alt="[picture of basic binary heap with heap ordering]" width="90%">
</div>

<div align="center">
    <img src="indices.png" alt="[picture of basic binary heap with heap ordering]" width="50%">
</div>

## How are operations implemented?

### Push

* add at the end the vector
* fix the ordering by swapping with the parent if needed

<div align="center">
    <img src="push_swap.png" width="60%">
</div>

<div align="center">
    <b>Question: What is the maximum number of swaps you will have to do? Why?</b>
</div>

<br><br><br><br>

$$\log_2(n)$$

### Pop

* remove and return the root
* replace with the last element
* fix the ordering by comparing with children and swapping with each that is greater

### Complexity of push and pop

* Proportional to the number of levels

* So $O(\log n)$


### Implementation

#### Utility methods

In [3]:
#[derive(Debug)]
struct MyBinaryHeap<T> {
    heap: Vec<T>,
    heap_size: usize,
}

impl<T> MyBinaryHeap<T> {
    fn new() -> MyBinaryHeap<T> {
        let heap: Vec<T> = vec![];
        let heap_size = 0;
        MyBinaryHeap { heap, heap_size }
    }

    // left child
    fn left(i: usize) -> usize {
        2 * i + 1
    }

    // right child of node i
    fn right(i: usize) -> usize {
        2 * i + 2
    }

    //parent of  node i
    fn parent(i: usize) -> usize {
        (i - 1) / 2  // integer divide
    }
}

#### Heapify -- Put everything in proper order

Make it so children are $\le$ parents.

In [4]:
impl<T:PartialOrd+PartialEq+Copy> MyBinaryHeap<T> {
    fn heapify(&mut self, loc: usize) {
        let l = Self::left(loc);
        let r: usize = Self::right(loc);
        
        let mut largest = loc; // index of largest
        
        if l < self.heap_size && self.heap[l] > self.heap[largest] {
            largest = l;
        }
        if r < self.heap_size && self.heap[r] > self.heap[largest] {
            largest = r;
        }
        
        if largest != loc {
            // swap with child
            let tmp = self.heap[loc];
            self.heap[loc] = self.heap[largest];
            self.heap[largest] = tmp;
            
            self.heapify(largest);
        }
    }
}

#### Insert and Extract


In [5]:
impl<T:PartialOrd+PartialEq+Copy> MyBinaryHeap<T> {
    
    fn insert_val(&mut self, val: T) {
        self.heap_size += 1;
        self.heap.push(val);
        let mut i = self.heap_size - 1;

        // loop until we reach root and parent is less than current node
        while i != 0 && self.heap[Self::parent(i)] < self.heap[i] {

            // swap node with parent
            let tmp = self.heap[Self::parent(i)];
            self.heap[Self::parent(i)] = self.heap[i];
            self.heap[i] = tmp;

            // update node number
            i = Self::parent(i);  // Self is stand-in for data strucutre MyBinaryHeap
        }
    }
    
    fn extract_max(&mut self) -> Option<T> {
        if self.heap_size == 0 {
            return None;
        }
        
        if self.heap_size == 1 {
            self.heap_size -= 1;
            return Some(self.heap[0]);
        }
        
        let root = self.heap[0];
        self.heap[0] = self.heap[self.heap_size - 1]; // copy last element
        self.heap_size -= 1;
        self.heapify(0);
        return Some(root);
    }
}

### Let's run the code

In [8]:
:dep rand="0.8.5"
use rand::Rng;

let mut h:MyBinaryHeap::<i32> = MyBinaryHeap::new();

// Generate 10 random numberrs between -1000 and 1000 and insert
for _i in 0..10 {
    let x = rand::thread_rng().gen_range(-1000..1000) as i32;
    h.insert_val(x);
}

println!("Print the BinaryHeap structure.");
println!("{:?}", h);

println!("\nExtract max values.");
let size = h.heap_size;
for _j in 0..size {
    let z = h.extract_max().unwrap();
    print!("{} ", z);
}

println!("\n\nPrint what's left of the BinaryHeap structure");
println!("{:?}", h);


Print the BinaryHeap structure.
MyBinaryHeap { heap: [983, 827, 654, 135, 757, -686, 191, -822, -349, 263], heap_size: 10 }

Extract max values.
983 827 757 654 263 191 135 -349 -686 -822 

Print what's left of the BinaryHeap structure
MyBinaryHeap { heap: [-822, -822, -686, -822, -349, -686, -822, -822, -349, 263], heap_size: 0 }


### What is the property of the list of values we extracted?

<br><br><br><br>

### Or use the built in one from std::collections

In [9]:
use std::collections::BinaryHeap;



## Application 1: Sorting a.k.a. HeapSort

* Put everything into a priority queue
* Remove items in order

In [10]:
use std::collections::BinaryHeap;

fn heap_sort(v:&mut Vec<i32>) {
    let mut pq = BinaryHeap::new();
    for v in v.iter() {
        pq.push(*v);
    }

    // to sort smallest to largest we iterate in reverse
    for i in (0..v.len()).rev() {
        v[i] = pq.pop().unwrap();
    }
}

In [11]:
let mut v = vec![23,12,-11,-9,7,37,14,11];
heap_sort(&mut v);
v

[-11, -9, 7, 11, 12, 14, 23, 37]

**Total running time:** $O(n \log n)$ for $n$ numbers

## More direct, using Rust operations

In [12]:
fn heap_sort_2(v:Vec<i32>) -> Vec<i32> {
   BinaryHeap::from(v).into_sorted_vec()
}

No extra memory allocated: the initial vector, intermediate binary heap, and final vector all use the same space on the heap
* `BinaryHeap::from(v)` consumes `v`
* `into_sorted_vec()` consumes the intermediate binary heap

In [13]:
let mut v = vec![7,17,3,1,8,11];
heap_sort_2(v)

[1, 3, 7, 8, 11, 17]

Sorting already provided for vectors (currently use other algorithms): `sort` and `sort_unstable`

HeapSort is faster in time, but takes twice the memory.

In [16]:
let mut v = vec![7,17,3,1,8,11];
v.sort();
v

[1, 3, 7, 8, 11, 17]

In [17]:
let mut v = vec![7,17,3,1,8,11];
v.sort_unstable(); // doesn't preserve order for equal elements
v

[1, 3, 7, 8, 11, 17]

## Application 2: Shortest weighted paths (Dijkstra's algorithm)

* **Input graph:** edges with *positive* values, directed or undirected
* **Edge** is now (starting node, ending node, cost)
* **Goal:** Compute all distances from a given vertex $v$

### Some quotes from Edjer Dijkstra

Pioneer in computer science. Very opinionated.

<em> The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offence. </em>
 
<em> It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration. </em>

<em> One morning I was shopping in Amsterdam with my young fiancée, and tired, we sat down on the café terrace to drink a cup of coffee and I was just thinking... Eventually, that algorithm became to my great amazement, one of the cornerstones of my fame </em>

### Shortest Weighted Path Description

1. Mark all nodes unvisited. Create a set of all the unvisited nodes called the unvisited set.

2. Assign to every node a tentative distance value:

    1. set it to zero for our initial node and to infinity for all other nodes.
    2. During the run of the algorithm, the **tentative distance** of a node v is the length of the shortest path discovered so far between the node v and the starting node.
    3. Since initially no path is known to any other vertex than the source itself (which is a path of length zero), all other tentative distances are initially set to infinity.
    4. Set the initial node as current.

4. For the current node, consider all of its unvisited neighbors and calculate their tentative distances through the current node.

    1. Compare the newly calculated tentative distance to the one currently assigned to the neighbor and assign it the smaller one.
    2. For example, if the current **node A** is marked with a distance of 6, and the edge connecting it with a neighbor **B** has length 2, then the distance to B through A will be 6 + 2 = 8.
    3. If **B** was previously marked with a distance greater than 8 then change it to 8. Otherwise, the current value will be kept.

6. **When we are done considering all of the unvisited neighbors of the current node, mark the current node as visited and remove it from the unvisited set.**

    1. A visited node will never be checked again
    2. (this is valid and optimal in connection with the behavior in step 6.: that the next nodes to visit will always be in the order of 'smallest distance from initial node first' so any visits after would have a greater distance).

8. If all nodes have been marked visited or if the smallest tentative distance among the nodes in the unvisited set is infinity (when planning a complete traversal; occurs when there is no connection between the initial node and remaining unvisited nodes), then stop. The algorithm has finished.

9. Otherwise, **select the unvisited node that is marked with the smallest tentative distance**, set it as the new current node, and go back to step 3.


**How it works:**

* Greedily take the closest unprocessed vertex
  * Its distance must be correct
  
* Keep updating distances of unprocessed vertices


### Example 
Let's illustrate by way of example.

Take a directed graph with weighted edges.

Pick 0 as a starting node and assign it's distance as 0.

Assign distance of $\infty$ to all other nodes.

<div align="center">
    <img src="dijkstra_graph2.png" width="80%">
</div>

From 0, travel to each connected node and update the tentative distances.

<div align="center">
    <img src="dijkstra_graph3.png" width="80%">
</div>

Now go to the "closest" node (node 2) and update the distances to its immediate neighbors.

Mark node 2 as visited.

<div align="center">
    <img src="dijkstra_graph4.png" width="80%">
</div>

Randomly pick between nodes 4 and 1 since they both have updated distances 3.

Pick node 1.

Update the distances to it's nearest neighbors.

Mark node 1 as visited.

<div align="center">
    <img src="dijkstra_graph5.png" width="80%">
</div>

Now, pick the node with the lost distance (node 4) and update the distances to it's nearest neighbors.

Distances to nodes 3 and 4 improved.

<div align="center">
    <img src="dijkstra_graph6.png" width="80%">
</div>

Then, go to node 3 and update its distance to its nearest neighbor.

Nowhere else to go so mark everything as done.

<div align="center">
    <img src="dijkstra_graph7.png" width="80%">
</div>

### BinaryHeap to the rescue

Since we always want to pick the cheapest node, we can use a BinaryHeap to find the next node to check.

## Auxiliary graph definitions

In [18]:
use std::collections::BinaryHeap;

type Vertex = usize;
type Distance = usize;
type Edge = (Vertex, Vertex, Distance);  // Updated edge definition.

#[derive(Debug,Copy,Clone)]
struct Outedge {
    vertex: Vertex,
    length: Distance,
}

type AdjacencyList = Vec<Outedge>;   // Adjacency list of Outedge's

#[derive(Debug)]
struct Graph {
    n: usize,
    outedges: Vec<AdjacencyList>,
}

impl Graph {
    fn create_directed(n:usize,edges:&Vec<Edge>) -> Graph {
        let mut outedges = vec![vec![];n];
        for (u, v, length) in edges {
            outedges[*u].push(Outedge{vertex: *v, length: *length});  // 
        }
        Graph{n,outedges}
    }
}

## Load our graph

In [19]:
let n = 6;
let edges: Vec<Edge> = vec![(0,1,5),(0,2,2),(2,1,1),(2,4,1),(1,3,5),(4,3,1),(1,5,11),(3,5,5),(4,5,8)];
let graph = Graph::create_directed(n, &edges);
graph

Graph { n: 6, outedges: [[Outedge { vertex: 1, length: 5 }, Outedge { vertex: 2, length: 2 }], [Outedge { vertex: 3, length: 5 }, Outedge { vertex: 5, length: 11 }], [Outedge { vertex: 1, length: 1 }, Outedge { vertex: 4, length: 1 }], [Outedge { vertex: 5, length: 5 }], [Outedge { vertex: 3, length: 1 }, Outedge { vertex: 5, length: 8 }], []] }

## Our implementation

In [21]:
let start: Vertex = 0;

let mut distances: Vec<Option<Distance> > = vec![None; graph.n];  // use None instead of infinity
distances[start] = Some(0);

In [22]:
use core::cmp::Reverse;

let mut pq = BinaryHeap::<Reverse<(Distance,Vertex)>>::new();  // make a min-heap with Reverse
pq.push(Reverse((0,start)));

In [23]:

// loop while we can pop -- deconstruct
while let Some(Reverse((dist,v))) = pq.pop() { // boolean, not assignment
    
    for Outedge{vertex,length} in graph.outedges[v].iter() {
        
        let new_dist = dist + *length;

        
        let update = match distances[*vertex] { // assignment match
            None => {true}
            Some(d) => {new_dist < d}
        };

        // update the distance of the node
        if update {
            distances[*vertex] = Some(new_dist);  // record the new distance
            pq.push(Reverse((new_dist,*vertex)));
        }
    }
};

In [24]:
distances

[Some(0), Some(3), Some(2), Some(4), Some(3), Some(9)]

### Complexity and properties of Dijkstra's algorithm
* $O(V^2)$  -- worst case visit (N-1) nodes on every node
* Works just as well with undirected graphs
* Doesn't work if path weights can be negative (why?)

## Traveling salesman

On the surface similar to shortest paths

**Given an undirected graph with weighted non-negative edges find the shortest path that starts at a specified vertex, traverses every vertex in the graph and returns to the starting point.**

Applications:
1. Amazon deliver route optimization
2. Drilling holes in circuit boards -- minimize stepping motor travel.

#### BUT

Much harder to solve (will not cover here).  Provably NP-complete (can not be solved in Polynomial time)

Held-Karp algorithm one of the best exact algorithms with complexity $O(n^2*2^n)$  
Many heuristics that run fast but yield suboptimal results

### Greedy Heuristic

Mark all nodes as unvisited. 
Use your starting node and pick the next node with shortest distance and visit it.  DFS using minimum distance criteria until all nodes have been visited.

### Minimum Spanning Tree Heuristic

https://en.wikipedia.org/wiki/Minimum_spanning_tree  
Runs in $O(N)$ time and guaranteed to be no more than 50% worse than the optimal.


## Slices (§4.3)

Slice = reference to a contiguous sub-sequence of elements in a collection

Slices of an array:
 * array of type `[T, _]`
 * slice of type `&[T]` (immutable) or `&mut [T]` (mutable)

In [18]:
{
    // immutable slice of an array
    let arr: [i32; 5] = [0,1,2,3,4];
    let slice: &[i32] = &arr[1..3];
    println!("{:?}",slice);
    println!("{}", slice[0]);
};

[1, 2]
1


In [19]:
{
    // mutable slice of an array
    let mut arr = [0,1,2,3,4];
    let mut slice = &mut arr[2..4];
    println!("{:?}",slice);
    slice[0] = slice[0] * slice[0];
    println!("{}", slice[0]);
    println!("{:?}",arr);
};

[2, 3]
4
[0, 1, 4, 3, 4]


## Slices

Work for vectors too!

In [20]:
let mut v = vec![0,1,2,3,4];
{
    let slice = &v[1..3];
    println!("{:?}",slice);
};

[1, 2]


In [21]:
{
    let mut slice = &mut v[1..3];
    
    // iterating over slices works as well
    for x in slice.iter_mut() {
        *x *= 1000;
    }
};
v

[0, 1000, 2000, 3, 4]

## Slices are references: all borrowing rules still apply!

* At most one mutable reference at a time
* No immutable references allowed with a mutable reference
* Many immutable references allowed simultaneously

In [None]:
// this won't work!
let mut v = vec![1,2,3,4,5,6,7];
{
    let ref_1 = &mut v[2..5];
    let ref_2 = &v[1..3];
    ref_1[0] = 7;
    println!("{}",ref_2[1]);
}

Error: cannot borrow `v` as immutable because it is also borrowed as mutable

In [23]:
// and this reordering will
let mut v = vec![1,2,3,4,5,6,7];
{
    let ref_1 = &mut v[2..5];
    ref_1[0] = 7;   // ref_1 can be dropped
    let ref_2 = &v[1..3];
    println!("{}",ref_2[1]);
};

7


## Memory representation of slices

* Pointer (to heap or stack)
* Length

**Compared to vector:** no capacity (cannot be extended)

<div align="center">
    <img src="rust_container_cheat_sheet_cropped.png" alt="[Cropped Rust container cheat sheet]" width="35%">
</div>

<div class="small">
<font size="2">Cropped from "Rust container cheat sheet" by Raph Levien, Copyright Google Inc. 2017
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

Source: <a href="https://docs.google.com/presentation/d/1q-c7UAyrUlM-eZyTo1pd8SZ0qwA_wYxmPZVOQkoDmH4/edit#slide=id.p">https://docs.google.com/presentation/d/1q-c7UAyrUlM-eZyTo1pd8SZ0qwA_wYxmPZVOQkoDmH4/edit#slide=id.p</a>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

License: CC BY (<a href="https://creativecommons.org/licenses/by/4.0">https://creativecommons.org/licenses/by/4.0</a>)</font>
<br>
</div>

# In-Class Poll

https://piazza.com/class/m5qyw6267j12cj/post/339