Data Structure & Algorithm

Thought:

空间 换 时间
正确率 换 空间

Math foundation

Monotonously increase / decrease
Floor(x):   小于或者等于x的最大整数
Ceiling(x): 大于或者等于x的最小整数
Modular arithmetic: 取模运算
Polylogarithmical, Polynomial, Exponential
    a > 1, n^b = o(a^n)
    a > 0, b > 0, (lg(n))^b = o(n^a)
Factorial: n! = o(n^n), lg(n!) = θ(n*lg n)
Fibonacci: 指数增长

Complexity Analysis:

Big-O notation
    A function of input size
    f(n) = O(g(n)):
        there exist constants c > 0, n0 > 0, such that 0 <= f(n) <= c*g(n) for all n >= n0;
        '=' like 'is';
    upper bounds & lower bounds
    1. Substitution method
        (1) Guess the form of the solution
        (2) Verify by induction
        (3) Solve for constants
    2. Recursion-tree method
    3. Master method
        ...

Divide & Conquer:

将问题分成一些独立的子问题，递归地求解各子问题，然后合并子问题的解而得到原问题的解

Dynamic Programming:

适合于子问题不是独立的情况，各子问题包含公共的子子问题

Greedy:

步骤
    1. 将优化问题转化为，即先做选择，再解决剩下的问题
    2. 证明原问题总是有一个最优解是做贪心选择得到的，从而说明贪心选择的安全
    3. 说明在做贪心选择后，剩余的子问题具有这样的一个性质。即如果将子问题的最优解和我们所作的贪心选择联合起来，
       可以得出原问题的一个最优解。
贪心选择性质: 一个全局最优解可以通过局部最优(贪心)选择来达到
最优子结构: 如果一个最优解包含了其子问题的最优解，则称该问题具有最优子结构
证明
    1. 假设一个最优解，对该解加以修改，使其采用贪心，这个选择将原问题变成一个相似的，或者更好的解
    2. 对子问题采用归纳法，证明每个步骤中所做的贪心选择最终会产生一个最优解

动态规划 vs. 贪心
    0-1背包问题 vs. 部分背包问题

1. 活动选择问题
    选择出由互相兼容问题组成的最大子集
    
2. Huffman Codes

Search:

BinarySearch
QuickSelect

Sort:

BubbleSort
    Compare continuous A & B, swap if A > B, final max in bottom;
SelectSort
    Find min in unordered part; insert to ordered part; ...
InsertSort
    Insert one from unordered into proper position in ordered; ...
ShellSort
    Multi-pass, each is insert, k is number of gap h
MergeSort
HeapSort
QuickSort
CountSort
RadixSort
BucketSort
ExternalSort

String:

...

List:

Single linked
Double linked
Circular

Priority Queue:

MAX-HEAPIFY(A, i): O(lg n)
BUILD-MAX-HEAP(A): O(n)
HEAPSORT(A): 对一个数组原地进行排序; O(n*lg n)
INSERT(S, x): 把元素x插入集合S; O(lg n)
MAXIMUM(S): 返回S中具有最大关键字的元素; O(1)
EXTRACT-MAX(S): 去掉并返回S中具有最大关键字的元素; O(lg n)
INCREASE-KEY(S, x, k): 将元素x的关键字的值增加到k，这里k >= 原关键字的值; O(lg n)

Stack:

LIFO
underflow vs. overflow
Attributes
    top[S]: 指向栈顶元素; = 0, empty; > n, full
Operations: O(1)
    EMPTY(S)
    PUSH(S, x)
    POP(S)

Queue:

FIFO
Attributes
    head[Q]: 指向队列头
    tail[Q]: 指向新元素将会被插入的位置
    head[Q] = tail[Q], empty;
    head[Q] = tail[Q] + 1, full;
Operations: O(1)
    Enqueue(Q, x)
    Dequeue(Q)

Tree

length of path = the number of edges in the path
depth:  node u to root; the length of unique path; depth[root]  = 0;
height: node u to one leaf; the length of longest path; height[leaf] = 0;

Binary search tree

complete binary tree: worst time θ(log n);
n nodes linear list:  worst time θ(n);

expected height:         O(log n);
expected operation time: O(log n);

for dictionary / priority queue

height of tree: h

node: key, satellite data, left, right, p [parent]

key[left child] <= key[current node]
key[right child] >= key[current node]

INORDER_TRAVERSAL:   time θ(n)
PREORDER_TRAVERSAL 
POSTORDER_TRAVERSAL

SEARCH:
    time O(h)
    recursive / iterative
    compare key of current node, if k < key, left; otherwise right
MINIMUM
    time O(h)
    find the most left from root
MAXIMUM
    time O(h)
    find the most right from root
PREDECESSOR
    与SUCCESSOR对称
SUCCESSOR
    time O(h), 因为向上或向下找; key不完全相同也适用;
    if right[x]存在, 返回 MINIMUM(right[x])
    else 从x向上找，直到遇到某个是其父结点的左儿子为止，返回改父结点
INSERT
    time O(h)
    从root开始沿树下降，指针x跟踪这条路直到nil，y始终为x的parent
DELETE
    time O(h)
    in: node z
    1. z no children, 修改其父结点parent[z],使nil为其子女;
    2. z one child, 连接child[z] 和 parent[z], 删除z;
    3. z two children, 删除z的successor y, 然后用y的内容代替z的内容.
在n个keys上随记构造的BST期望高度为O(lg n)

AVL Tree:

AVL tree T is a binary search tree in which every node p satisfies:
    The subtree TL rooted by p's left child, and subtree TR rooted by p's right child;
    With depths dL and dR;
    Must the case that dL <= dR + 1 and dR <= dL + 1.

n nodes AVL => depth O(log n).

Insert:
    case 1: p add to X; [pic]
    case 2: p add to Y; [pic]

Delete:
    case 1: depth of X >= the depth of Y;     as Insert case 1
    case 2: depth of Y == the depth of X + 1; as Insert case 2

Rotation:
    time O(1)

Hash table

INSERT
SEARCH
    worst time:    θ(n)
    expected time: O(1)
DELETE

DIRECT ADDRESS:
    数组, space O(n)
    SEARCH, INSERT, DELETE: worst/average O(1)
    
HASH TABLE:
    k为关键字集合，space O(|k|)
    散列函数 h: 关键字域到数组下标
    n 元素数; m 数组大小
    load factor: a = n/m
   
    Collision: 两个不同的key到相同的下标
        Chaining:
            用list存同一下标中的元素
            INSERT(T, x): insert at the head of list;
                worst case: O(1) 如果不检测是否在list中
                            O(length of list): 如果检测
            SEARCH(T, k): O(length of list)
            DELETE(T, x): 双链表 O(1), 单链表 O(length of list)
            simple uniform hashing
                任何元素散列到m个槽中的每一个的可能性是相同，且与其他元素已被散列到什么位置无关
                fail:    expected time θ(1+a)
                success: expected time θ(1+a)
                => if n = O(m), SEARCH θ(1+a)
            除法: h(k) = k mod m; m 选择接近n/a，又不接近2的任何幂次的质数
            乘法: h(k) = floor(m*(kA mod 1)), 0 < A < 1
                m 选择无要求; w 计算机字长
                e.g. m = 2^p, w = 32, A = 2,654,435,769/2^w, kA = r1*2^w + r0;
                    h(k) = r0 的最高p位
            全域??
        Open addressing:
            所有元素存放在散列表中
            INSERT(k, i): 第一个nil的槽, i = 0 .. n-1
            SEARCH(k, i): 没找到表示遇到nil或i＝n还是没找到
            DELETE: 删除一般用chaining
            uniform hashing
                每个关键字的probing sequence是<0, 1, ..., m - 1>的 m! 种排列中的任何一种的可能性是相同的
            1. linear probing: h(k, i) = (h'(k) + i) mod m, i = 0, 1, ..., m - 1
               primary clustering问题: 连续被占用槽增加，平均查找时间增加
               用了θ(m)种探查序列
            2. quadratic probing: h(k, i) = (h'(k) + c1*i + c2*i^2) mod m, i = 0, 1, ..., m - 1, c2 != 0
               secondary clustering问题: 因为h(k1, 0) = h(k2, 0) -> h(k1, i) = h(k2, i)
               用了θ(m)种探查序列
            3. double hash: h(k, i) = (h1(k) + i * h2(k)) mod m, i = 0, 1, ..., m - 1
               h2(k) 和 m 互质，即不存在最大公约数
               1. m为2的幂， h2 总产生奇数;
               2. m为质数，h2 总产生比m小的正整数; e.g. h1(k) = k mod m; h2(k) = 1 + k mod (m - 1)
               用了θ(m^2)中探查序列

Graph

图 G = (V, E)
顶点数 V
边数   E
顶点集 V[G]
边集   V[E]
Sparse E << V^2 [远小于]
Dense  E <  V^2 [接近]

图表示
    Adjacent Table:  if sparse
        包含|V|个列表的数组Adj组成，每个列表对应V的一个顶点;
        对每一个u属于V, Adj[u]包含满足条件(u, v)属于E的顶点v.
        顶点为任意顺序
        无向图: 所有邻接表的长度为2E
        有向图: 所有邻接表的长度为E
        space O(V + E)
        不足: 搜索边存在与否，需要遍历list

    Adjacent Matrix: if dense or 很快判定两个顶点是否存在连接边
        space O(V^2)
        转置矩阵 transposed matrix
        对角线   diagonal line
        另外 图小 或者 不加权时用bit表示元素可以使用

    Breadth-first search
        E.g. Prim spanning tree or Dijkstra
        一个源顶点
        ancestor / descendant
        color[u] 颜色; 黑色 所有相邻顶点均被发现; 白色 未被发现; 灰色 可能有一些白色的相邻点;
        p[u]     父母
        d[u]     距离
        Queue    队列存储灰色点
        time O(V + E), 是邻接表大小的线性函数
        得到
            1. 从已知源顶点s到v之间的最短路径距离ð(s, v)为从s到v的任何路径中最少的边数;
            2. predecessor subgraph => 广度优先树;
            3. s到v的最短路径上所有顶点: PRINT-PATH(G, s, v), time O(length of path(s, v))
    Depth-first search:
        多个源顶点
        color[u] 颜色
        p[u] 父母
        d[u] 发现事件的时间戳
        f[u] 完成事件的时间戳
        得到
            1. predecessor subgraph => 深度优先森林;

Comparison of Data Structure

NOTE: worst/average; DeleteMin is after find; 
        | Insert  | Delete    |   Find    | Sort      | DeleteMin | Build     | FindMin
Heap    | O(lg n) | O(lg n)   | O(n) !    | O(n*lg n) | O(lg n)   | O(n)      | O(1)
Hashing | O(1)    | O(n)/O(1) | O(n)/O(1) | /         | /         | O(n)      | /
AVL     | O(lg n) | O(lg n)   | O(lg n)   | O(n)      | O(1)      | O(n*lg n) | O(lg n)

Delete:
    Heap: delete and use the last one
    AVL:  delete and use the successor or predecessor
Constant time:
    O(1) in Heap is smaller than that in AVL; if no find, Heap is better!

Comparison of Sort

NOTE: Worst / Average
         |  Time            | In place  | Stability  | On-line
Bubble   | O(n^2)           | Y         | Y          |
Select   | O(n^2)           | Y         | Y          |
Insert   | O(n^2)           | Y         | Y          | Y
Shell's  | depend on gaps   | Y         | N          |
Merge    | O(n*lg n)        | N         | N          |
Heap     | O(n*lg n)        | Y         | N          | 
Quick    | O(n^2)/O(n*lg n) | Y         | N          | 
Count    | O(n)             | N         | Y          |
Radix    | O(k*n)           | N         | it depends |
Bucket   | O(n^2)/O(n+k)    | N         |            |
External | ...

Data Structure & Algorithm

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally