Skip to content

Data Structure & Algorithm

jwyx edited this page Jan 17, 2013 · 14 revisions

Thought:

空间 换 时间
正确率 换 空间

Math foundation

Monotonously increase / decrease
Floor(x):   小于或者等于x的最大整数
Ceiling(x): 大于或者等于x的最小整数
Modular arithmetic: 取模运算
Polylogarithmical, Polynomial, Exponential
    a > 1, n^b = o(a^n)
    a > 0, b > 0, (lg(n))^b = o(n^a)
Factorial: n! = o(n^n), lg(n!) = θ(n*lg n)
Fibonacci: 指数增长

Complexity Analysis:

Big-O notation
    A function of input size
    f(n) = O(g(n)):
        there exist constants c > 0, n0 > 0, such that 0 <= f(n) <= c*g(n) for all n >= n0;
        '=' like 'is';
    upper bounds & lower bounds
    1. Substitution method
        (1) Guess the form of the solution
        (2) Verify by induction
        (3) Solve for constants
    2. Recursion-tree method
    3. Master method
        ...

Divide & Conquer:

将问题分成一些独立的子问题,递归地求解各子问题,然后合并子问题的解而得到原问题的解

Dynamic Programming:

适合于子问题不是独立的情况,各子问题包含公共的子子问题

Greedy:

步骤
    1. 将优化问题转化为,即先做选择,再解决剩下的问题
    2. 证明原问题总是有一个最优解是做贪心选择得到的,从而说明贪心选择的安全
    3. 说明在做贪心选择后,剩余的子问题具有这样的一个性质。即如果将子问题的最优解和我们所作的贪心选择联合起来,
       可以得出原问题的一个最优解。
贪心选择性质: 一个全局最优解可以通过局部最优(贪心)选择来达到
最优子结构: 如果一个最优解包含了其子问题的最优解,则称该问题具有最优子结构
证明
    1. 假设一个最优解,对该解加以修改,使其采用贪心,这个选择将原问题变成一个相似的,或者更好的解
    2. 对子问题采用归纳法,证明每个步骤中所做的贪心选择最终会产生一个最优解

动态规划 vs. 贪心
    0-1背包问题 vs. 部分背包问题

1. 活动选择问题
    选择出由互相兼容问题组成的最大子集
    
2. Huffman Codes

Search:

BinarySearch
QuickSelect

Sort:

BubbleSort
    Compare continuous A & B, swap if A > B, final max in bottom;
SelectSort
    Find min in unordered part; insert to ordered part; ...
InsertSort
    Insert one from unordered into proper position in ordered; ...
ShellSort
    Multi-pass, each is insert, k is number of gap h
MergeSort
HeapSort
QuickSort
CountSort
RadixSort
BucketSort
ExternalSort

String:

...

List:

Single linked
Double linked
Circular

Priority Queue:

MAX-HEAPIFY(A, i): O(lg n)
BUILD-MAX-HEAP(A): O(n)
HEAPSORT(A): 对一个数组原地进行排序; O(n*lg n)
INSERT(S, x): 把元素x插入集合S; O(lg n)
MAXIMUM(S): 返回S中具有最大关键字的元素; O(1)
EXTRACT-MAX(S): 去掉并返回S中具有最大关键字的元素; O(lg n)
INCREASE-KEY(S, x, k): 将元素x的关键字的值增加到k,这里k >= 原关键字的值; O(lg n)

Stack:

LIFO
underflow vs. overflow
Attributes
    top[S]: 指向栈顶元素; = 0, empty; > n, full
Operations: O(1)
    EMPTY(S)
    PUSH(S, x)
    POP(S)

Queue:

FIFO
Attributes
    head[Q]: 指向队列头
    tail[Q]: 指向新元素将会被插入的位置
    head[Q] = tail[Q], empty;
    head[Q] = tail[Q] + 1, full;
Operations: O(1)
    Enqueue(Q, x)
    Dequeue(Q)

Tree

length of path = the number of edges in the path
depth:  node u to root; the length of unique path; depth[root]  = 0;
height: node u to one leaf; the length of longest path; height[leaf] = 0;

Binary search tree

complete binary tree: worst time θ(log n);
n nodes linear list:  worst time θ(n);

expected height:         O(log n);
expected operation time: O(log n);

for dictionary / priority queue

height of tree: h

node: key, satellite data, left, right, p [parent]

key[left child] <= key[current node]
key[right child] >= key[current node]

INORDER_TRAVERSAL:   time θ(n)
PREORDER_TRAVERSAL 
POSTORDER_TRAVERSAL

SEARCH:
    time O(h)
    recursive / iterative
    compare key of current node, if k < key, left; otherwise right
MINIMUM
    time O(h)
    find the most left from root
MAXIMUM
    time O(h)
    find the most right from root
PREDECESSOR
    与SUCCESSOR对称
SUCCESSOR
    time O(h), 因为向上或向下找; key不完全相同也适用;
    if right[x]存在, 返回 MINIMUM(right[x])
    else 从x向上找,直到遇到某个是其父结点的左儿子为止,返回改父结点
INSERT
    time O(h)
    从root开始沿树下降,指针x跟踪这条路直到nil,y始终为x的parent
DELETE
    time O(h)
    in: node z
    1. z no children, 修改其父结点parent[z],使nil为其子女;
    2. z one child, 连接child[z] 和 parent[z], 删除z;
    3. z two children, 删除z的successor y, 然后用y的内容代替z的内容.
在n个keys上随记构造的BST期望高度为O(lg n)

AVL Tree:

AVL tree T is a binary search tree in which every node p satisfies:
    The subtree TL rooted by p's left child, and subtree TR rooted by p's right child;
    With depths dL and dR;
    Must the case that dL <= dR + 1 and dR <= dL + 1.

n nodes AVL => depth O(log n).

Insert:
    case 1: p add to X; [pic]
    case 2: p add to Y; [pic]

Delete:
    case 1: depth of X >= the depth of Y;     as Insert case 1
    case 2: depth of Y == the depth of X + 1; as Insert case 2

Rotation:
    time O(1)

Hash table

INSERT
SEARCH
    worst time:    θ(n)
    expected time: O(1)
DELETE

DIRECT ADDRESS:
    数组, space O(n)
    SEARCH, INSERT, DELETE: worst/average O(1)
    
HASH TABLE:
    k为关键字集合,space O(|k|)
    散列函数 h: 关键字域到数组下标
    n 元素数; m 数组大小
    load factor: a = n/m
   
    Collision: 两个不同的key到相同的下标
        Chaining:
            用list存同一下标中的元素
            INSERT(T, x): insert at the head of list;
                worst case: O(1) 如果不检测是否在list中
                            O(length of list): 如果检测
            SEARCH(T, k): O(length of list)
            DELETE(T, x): 双链表 O(1), 单链表 O(length of list)
            simple uniform hashing
                任何元素散列到m个槽中的每一个的可能性是相同,且与其他元素已被散列到什么位置无关
                fail:    expected time θ(1+a)
                success: expected time θ(1+a)
                => if n = O(m), SEARCH θ(1+a)
            除法: h(k) = k mod m; m 选择接近n/a,又不接近2的任何幂次的质数
            乘法: h(k) = floor(m*(kA mod 1)), 0 < A < 1
                m 选择无要求; w 计算机字长
                e.g. m = 2^p, w = 32, A = 2,654,435,769/2^w, kA = r1*2^w + r0;
                    h(k) = r0 的最高p位
            全域??
        Open addressing:
            所有元素存放在散列表中
            INSERT(k, i): 第一个nil的槽, i = 0 .. n-1
            SEARCH(k, i): 没找到表示遇到nil或i=n还是没找到
            DELETE: 删除一般用chaining
            uniform hashing
                每个关键字的probing sequence是<0, 1, ..., m - 1>的 m! 种排列中的任何一种的可能性是相同的
            1. linear probing: h(k, i) = (h'(k) + i) mod m, i = 0, 1, ..., m - 1
               primary clustering问题: 连续被占用槽增加,平均查找时间增加
               用了θ(m)种探查序列
            2. quadratic probing: h(k, i) = (h'(k) + c1*i + c2*i^2) mod m, i = 0, 1, ..., m - 1, c2 != 0
               secondary clustering问题: 因为h(k1, 0) = h(k2, 0) -> h(k1, i) = h(k2, i)
               用了θ(m)种探查序列
            3. double hash: h(k, i) = (h1(k) + i * h2(k)) mod m, i = 0, 1, ..., m - 1
               h2(k) 和 m 互质,即不存在最大公约数
               1. m为2的幂, h2 总产生奇数;
               2. m为质数,h2 总产生比m小的正整数; e.g. h1(k) = k mod m; h2(k) = 1 + k mod (m - 1)
               用了θ(m^2)中探查序列

Graph

图 G = (V, E)
顶点数 V
边数   E
顶点集 V[G]
边集   V[E]
Sparse E << V^2 [远小于]
Dense  E <  V^2 [接近]

图表示
    Adjacent Table:  if sparse
        包含|V|个列表的数组Adj组成,每个列表对应V的一个顶点;
        对每一个u属于V, Adj[u]包含满足条件(u, v)属于E的顶点v.
        顶点为任意顺序
        无向图: 所有邻接表的长度为2E
        有向图: 所有邻接表的长度为E
        space O(V + E)
        不足: 搜索边存在与否,需要遍历list

    Adjacent Matrix: if dense or 很快判定两个顶点是否存在连接边
        space O(V^2)
        转置矩阵 transposed matrix
        对角线   diagonal line
        另外 图小 或者 不加权时用bit表示元素可以使用

    Breadth-first search
        E.g. Prim spanning tree or Dijkstra
        一个源顶点
        ancestor / descendant
        color[u] 颜色; 黑色 所有相邻顶点均被发现; 白色 未被发现; 灰色 可能有一些白色的相邻点;
        p[u]     父母
        d[u]     距离
        Queue    队列存储灰色点
        time O(V + E), 是邻接表大小的线性函数
        得到
            1. 从已知源顶点s到v之间的最短路径距离ð(s, v)为从s到v的任何路径中最少的边数;
            2. predecessor subgraph => 广度优先树;
            3. s到v的最短路径上所有顶点: PRINT-PATH(G, s, v), time O(length of path(s, v))
    Depth-first search:
        多个源顶点
        color[u] 颜色
        p[u] 父母
        d[u] 发现事件的时间戳
        f[u] 完成事件的时间戳
        得到
            1. predecessor subgraph => 深度优先森林;

Comparison of Data Structure

NOTE: worst/average; DeleteMin is after find; 
        | Insert  | Delete    |   Find    | Sort      | DeleteMin | Build     | FindMin
Heap    | O(lg n) | O(lg n)   | O(n) !    | O(n*lg n) | O(lg n)   | O(n)      | O(1)
Hashing | O(1)    | O(n)/O(1) | O(n)/O(1) | /         | /         | O(n)      | /
AVL     | O(lg n) | O(lg n)   | O(lg n)   | O(n)      | O(1)      | O(n*lg n) | O(lg n)

Delete:
    Heap: delete and use the last one
    AVL:  delete and use the successor or predecessor
Constant time:
    O(1) in Heap is smaller than that in AVL; if no find, Heap is better!

Comparison of Sort

NOTE: Worst / Average
         |  Time            | In place  | Stability  | On-line
Bubble   | O(n^2)           | Y         | Y          |
Select   | O(n^2)           | Y         | Y          |
Insert   | O(n^2)           | Y         | Y          | Y
Shell's  | depend on gaps   | Y         | N          |
Merge    | O(n*lg n)        | N         | N          |
Heap     | O(n*lg n)        | Y         | N          | 
Quick    | O(n^2)/O(n*lg n) | Y         | N          | 
Count    | O(n)             | N         | Y          |
Radix    | O(k*n)           | N         | it depends |
Bucket   | O(n^2)/O(n+k)    | N         |            |
External | ...
Clone this wiki locally