# DS210 Final Exam Notes

## General Knowledge

### Complexity of an algorithm/code (as a function of the input)

- Complexity of the graph algorithms we covered.

[Amortization](../lecture_10/lecture_10.ipynb#sketch-of-analysis-amortization)

- Example for re-allocation when growing Vec capacity.

[Dominant terms and constant Big-O notation](../lecture_10/lecture_10.ipynb#Dominant-terms-and-constants-in-O-notation)

### ~~Networking concepts and network layers.~~


### Multithreading, concurrency, parallelism, Amdahl's law about speedup.  Limits of parallel programming.

[Lecture 24 -- Parallel and Concurrent Programming](../lecture_24/lecture_24.ipynb)




## Rust Language Knowledge

### Structs and Methods

- [enums](../lecture_07/lecture_07.ipynb#enums)
- [struct syntax](../lecture_07/lecture_07.ipynb#structs)
- a method is a function defined in a struct context
- [method syntax](../lecture_11/lecture_11.ipynb#method-syntax)
- [more on methods](../lecture_08/lecture_08.ipynb#revisit-methods)

### Traits (kinda like interfaces)

[Traits](../lecture_09/lecture_09.ipynb#1-traits)

### Modules and splitting programs into separate files.  Crates.

[Modules](../lecture_12/lecture_12.ipynb#1-modules)

- the `mod` keyword
- the `pub` keyword
- nesting modules and `::` syntax
- keyword `crate` as shorthand for root of module
- keyword `super` as shorthand for one level up in nested modules
- keyword `use` to import into current scope

[Crates](../lecture_12/lecture_12.ipynb#3-what-are-crates)

- binary crate, must have a `main()` function
- library crate, functionality to share -- does _not_ have a `main()`

[Using Multiple Libraries or Binaries in Project](../lecture_12/lecture_12.ipynb#4-using-multiple-libraries-or-binaries-in-you-project)

### File I/O

[File I/O](../lecture_12/lecture_12.ipynb#5-file-io)

- `std::fs::File`
- `std::io`

### Slices (References into parts of collections).  &str and Strings

[Slices](../lecture_15/lecture_15.ipynb#slices)

### Lifetimes of references in functions.

[Lifetimes](../lecture_16/lecture_16.ipynb#2-lifetimes)

- Lifetime annotaiton syntax

Three rules of lifetimes:
1. assign a lifetime to each parameter that is a reference
2. if only one input lifetime, then that lifetime assigned to all outputs
3. on methods, lifetime of `&self` assigned to all output lifetimes

### Closures and Iterators in Rust

[Closures](../lecture_16/lecture_16.ipynb#3-closures-anonymous-functions)

- anonymous functions
- closure syntax (with and without explicit types)
- capturing a variable from the environment

[Iterators](../lecture_17/lecture_17.ipynb)

- works on collections and provides a `next()` method
- returns `Option` enum variants `Some(value)` or `None`
- many useful methods, some return iterators

[Iterator + Closure Magic](../lecture_17/lecture_17.ipynb#iterator-+-closure-magic)

- .for_each(|.| { ...  } )
- .map()
- ...

### ~~Rayon and~~ parallelism primitives (thread, mpsc, parallel iterators, scopes, locks)

> Rayon is a Rust crate that introduces parallelized iterations, among other things. 

[Lecture 24 -- Parallel and Concurrent Programming](../lecture_24/lecture_24.ipynb)

- threads
- message passing, multi-producer, single-consumer channels
- shared memory using mutexes


## Basic Algorithms Knowledge

### Graph Representations (list of edges, adjacency lists, adjacency matrix)

[Graph Representations](../lecture_11/lecture_11.ipynb#graph-representations-variious-options)

### Counting triangles

[Count Triangles](../lecture_11/lecture_11.ipynb#count-triangles), solutions 1 is sufficient

### Stacks/Queues

[Linked Lists](../lecture_13/lecture_13.ipynb#1linked-lists)

- single linked list (SLL), doubly linked list (dll)
- cost of operations 'Insert to Front/Back/Middle', 'Remove from Front/Back/Middle' for SLL or DLL

[Stacks](../lecture_13/lecture_13.ipynb#2-stacks)

- Last in First Out (LIFO) -- the Pez dispenser
- Cost of operations for push, pop, top/peek

[Queues](../lecture_13/lecture_13.ipynb#3-queues)

- First In First Out (FIFO)
- Add at the end (enqueue), get from the front (dequeue)

[The VecDeque Container in Rust](../lecture_13/lecture_13.ipynb#the-vecdeque-container-in-rust----stdcollectionsvecdequet)

- generalization of queue and stack
- accessing front: `push_front(x)` and `pop_front(x)`
- accessing back: `push_back(x)` and `pop_back(x)`
- `pop_*` returns `Option<T>`
- use `VecDeque` as a stack: `push_back` and `pop_back`
- use `VecDeque` as a queue: `push_back` and `pop_front`
- implementation: use an array allocated on the heap as a circular buffer (wrap around)

### BFS, DFS, Connected Components, Strongly connected components.

[BFS and DFS](../lecture_14/lecture_14.ipynb#bfs-and-dfs)

- BFS uses a queue
- DFS uses a stack
- BFS good for computing distances
- Computational complexity of BFS: O(V+E)

[Connected Components](../lecture_14/lecture_14.ipynb#connected-components-via-bfs)

- Both BFS and DFS are good for finding connected components
- Connected Components complexity: O(V+E) for both BFS and DFS

[strongly connected components](../lecture_14/lecture_14.ipynb#4strongly-connected-components-for-directed-graphs)

- Strongly Connected Components (only applies to directed graphs)


### Priority queues, binary heap implementation.

[Priority Queues](../lecture_14/lecture_14.ipynb#priority-queues)

- highest priority returned first
- implemented in Rust as `BinaryHeap<T>`
- the datatype used must support ordering (the `Ord` trait)
- use `Reverse<T>` to get the lowest priority (smallest) element
- complexity: Push - O(1) amortized, Pop - O(n) search over all elements

[Binary Heap](../lecture_15/lecture_15.ipynb)

- Binary tree: 1 root, 0-2 children

[HeapSort](../lecture_15/lecture_15.ipynb#application-1-sorting-aka-heapsort)

- put everything into a priority queue, remove items in order

### Binary search trees and their uses (range searching).

[Binary Trees](../lecture_18/lecture_18.ipynb#binary-trees)

[Tree Traversal](../lecture_18/lecture_18.ipynb#tree-traversal)

- BFS level order traversal

[Depth Traversal](../lecture_18/lecture_18.ipynb#depth-traversals)

- pre-order: visit node, recursively traverse left subtree, recursively traverse right subtree
- in-order: recursively traverse left subtree, visit node, recursively traverse right subtree
- post-order: recursively traverse left subtree, recursively traverse right subtree, visit node


[Binary Search Trees](../lecture_18/lecture_18.ipynb#binary-search-trees)

- recall binary heaps: parents are greater (or lesser) than children
- binary search tree: (left subtree) < parent node < (right subtree)
- enables _binary search_ for efficient lookup, addition or removal of items
- complexity for search, insert or delete is $O(n)$

[B-Trees](../lecture_18/lecture_18.ipynb#b-trees)

- binary search trees are inefficient on modern computer architectures
- b-trees are balanced search trees where each node contains between $B$ and $2B$ keys
- Rust `std::collections` provides `BTreeMap` (keys and values like HashMap) and `BTreeSet` (keys like HashSet)

[BTreeMap vs HashMap](../lecture_18/lecture_18.ipynb#btreemap-vs-hashmap)

- ordered versus unordered

### Tries

[Prefix Tree](../lecture_18/lecture_18.ipynb#prefix-tree-trie)

- efficient data structure for dictionary search

### Shortest Paths (Dijkstra's algorithms)

[Dijkstra's Algorithm](../lecture_15/lecture_15.ipynb#application-2-shortest-weighted-paths-dijkstras-algorithm)

- greedily take the closest unprocessed vertex
- keep updating distances of unprocessed vertices
- uses a binary heap to get the nearest unvisited node
- with all edge weights equal, result is equivalent to BFS


### Greedy algorithms (Minimum Spanning Tree), Divide and Conquer algorithms (MergeSort, QuickSort)

[Binary Search](../lecture_17/lecture_17.ipynb#binary-search)

[Greedy Algorithms](../lecture_19/lecture_19.ipynb#2-greedy-algorithms)

[Minimum Spanning Tree](../lecture_19/lecture_19.ipynb#another-example-minimum-spanning-tree)

- find cheapest subset of edges for connected graph
- Kruska's algorithm: add cheapest edge to connect disconnected groups of vertices
- Complexity $O(E \log(E))$
- use weighted graph as reference

[Traveling Salesman Approximation using MST](../lecture_19/lecture_19.ipynb#traveling-salesman-approximation-using-mst)

- Make a MST then run DFS, look for shortcuts
- not optimal but no worse that 2x

[Divide and Conquer](../lecture_19/lecture_19.ipynb#3-divide-and-conquer)

- recursively subdivide problems

[Merge Sort](../lecture_19/lecture_19.ipynb#merge-sort)

- recursively sort 1st half, sort 2nd half, merge the results
- complexity $O(n \log n)$ overall
- requires extra memory allocation

[Quick Sort](../lecture_19/lecture_19.ipynb#quick-sort)

- select arbitrary element $x$ from the array
- partition vector
    - move elements less than $x$ to its left
    - move elements greater than $x$ to its right
    - $x$ will be in the correct location
- recurse on left and right partitions
- expected complexity $O(n \log n)$

## Data Science In Rust

### Ndarray, basic operations (dot, ndarray/scale ops, slices)

[NDArray](../lecture_20/lecture_20.ipynb#ndarray)

- Supports 1D, 2D, 3D, ... dimension arrays
- array! macro
- indexing and slicing
- element-wise operations like +, -, *, /
- vector and matrix operations like `.dot()`
- simple linear algebra operations
- NDArray `s![]` macro for taking slices of arrays

### Neural Networks.  Forward and backward propagation and how it maps to matrix/matrix, matrix/vector, vector operations.

[Neural Networks](../lecture_20/lecture_20.ipynb#ndarray-and-neural-networks)

- aritificial neuron is weighted sum of inputs plus bias then nonlinear activation function
- multilayer neural network is called "fully connected" or "multi-layer perceptron"
- can be implemented with matrix arithmetic

### Decision Trees.  Metrics to optimize when creating a DT (Gini, Entropy).  Properties of decision Trees.

[Decision Trees](../lecture_22/lecture_22.ipynb)

- Supervised learning
- Used for regression or classification
- split on features
- "impurity" measures: Ginni and Entropy
- try to make the split that maximally decreases impurity
- reduce overfitting by limiting tree depth (hyperparameter)

### Linear Regression. 

[Linear Regression](../lecture_23/lecture_23.ipynb)

- Single variable (univariate), 
- multi-variable (multivariate)
- convert categorical data to one-hot encoded
- Use $R^2$ coefficient of determination to measure fit
- Ordinary Least Squares (OLS) linear in inputs and parameters 

[Generalized Least Squares](../lecture_23/lecture_23.ipynb)

- generalized least squares, linear in parameters but not necessarily in inputs
- Design matrix definition and construction.

[Bias and Variance](../lecture_23/bias-and-variance)

- linear regression: high bias, lower variance
- decision trees: lower bias, higher variance

### Error metrics and their properties (MSE, MAE, others)

- Regression will minimize error metric
- MSE weight bigger errors more than MAE
- MAE weight smaller errors more than MSE

### Cross-validation concepts.

[Cross Validation](../lecture_23/lecture_23.ipynb#challenges-of-training-and-cross-validation)

- train/test and train/validate/test splits
- k-fold cross validation