-
Notifications
You must be signed in to change notification settings - Fork 1
Data Structure & Algorithm
jwyx edited this page Jan 17, 2013
·
14 revisions
Thought:
空间 换 时间
正确率 换 空间
Math foundation
Monotonously increase / decrease
Floor(x): 小于或者等于x的最大整数
Ceiling(x): 大于或者等于x的最小整数
Modular arithmetic: 取模运算
Polylogarithmical, Polynomial, Exponential
a > 1, n^b = o(a^n)
a > 0, b > 0, (lg(n))^b = o(n^a)
Factorial: n! = o(n^n), lg(n!) = θ(n*lg n)
Fibonacci: 指数增长
Complexity Analysis:
Big-O notation
A function of input size
f(n) = O(g(n)):
there exist constants c > 0, n0 > 0, such that 0 <= f(n) <= c*g(n) for all n >= n0;
'=' like 'is';
upper bounds & lower bounds
1. Substitution method
(1) Guess the form of the solution
(2) Verify by induction
(3) Solve for constants
2. Recursion-tree method
3. Master method
...
Divide & Conquer:
将问题分成一些独立的子问题,递归地求解各子问题,然后合并子问题的解而得到原问题的解
Dynamic Programming:
适合于子问题不是独立的情况,各子问题包含公共的子子问题
Greedy:
步骤
1. 将优化问题转化为,即先做选择,再解决剩下的问题
2. 证明原问题总是有一个最优解是做贪心选择得到的,从而说明贪心选择的安全
3. 说明在做贪心选择后,剩余的子问题具有这样的一个性质。即如果将子问题的最优解和我们所作的贪心选择联合起来,
可以得出原问题的一个最优解。
贪心选择性质: 一个全局最优解可以通过局部最优(贪心)选择来达到
最优子结构: 如果一个最优解包含了其子问题的最优解,则称该问题具有最优子结构
证明
1. 假设一个最优解,对该解加以修改,使其采用贪心,这个选择将原问题变成一个相似的,或者更好的解
2. 对子问题采用归纳法,证明每个步骤中所做的贪心选择最终会产生一个最优解
动态规划 vs. 贪心
0-1背包问题 vs. 部分背包问题
1. 活动选择问题
选择出由互相兼容问题组成的最大子集
2. Huffman Codes
Search:
BinarySearch
QuickSelect
Sort:
BubbleSort
Compare continuous A & B, swap if A > B, final max in bottom;
SelectSort
Find min in unordered part; insert to ordered part; ...
InsertSort
Insert one from unordered into proper position in ordered; ...
ShellSort
Multi-pass, each is insert, k is number of gap h
MergeSort
HeapSort
QuickSort
CountSort
RadixSort
BucketSort
ExternalSort
String:
...
List:
Single linked
Double linked
Circular
Priority Queue:
MAX-HEAPIFY(A, i): O(lg n)
BUILD-MAX-HEAP(A): O(n)
HEAPSORT(A): 对一个数组原地进行排序; O(n*lg n)
INSERT(S, x): 把元素x插入集合S; O(lg n)
MAXIMUM(S): 返回S中具有最大关键字的元素; O(1)
EXTRACT-MAX(S): 去掉并返回S中具有最大关键字的元素; O(lg n)
INCREASE-KEY(S, x, k): 将元素x的关键字的值增加到k,这里k >= 原关键字的值; O(lg n)
Stack:
LIFO
underflow vs. overflow
Attributes
top[S]: 指向栈顶元素; = 0, empty; > n, full
Operations: O(1)
EMPTY(S)
PUSH(S, x)
POP(S)
Queue:
FIFO
Attributes
head[Q]: 指向队列头
tail[Q]: 指向新元素将会被插入的位置
head[Q] = tail[Q], empty;
head[Q] = tail[Q] + 1, full;
Operations: O(1)
Enqueue(Q, x)
Dequeue(Q)
Tree
length of path = the number of edges in the path
depth: node u to root; the length of unique path; depth[root] = 0;
height: node u to one leaf; the length of longest path; height[leaf] = 0;
Binary search tree
complete binary tree: worst time θ(log n);
n nodes linear list: worst time θ(n);
expected height: O(log n);
expected operation time: O(log n);
for dictionary / priority queue
height of tree: h
node: key, satellite data, left, right, p [parent]
key[left child] <= key[current node]
key[right child] >= key[current node]
INORDER_TRAVERSAL: time θ(n)
PREORDER_TRAVERSAL
POSTORDER_TRAVERSAL
SEARCH:
time O(h)
recursive / iterative
compare key of current node, if k < key, left; otherwise right
MINIMUM
time O(h)
find the most left from root
MAXIMUM
time O(h)
find the most right from root
PREDECESSOR
与SUCCESSOR对称
SUCCESSOR
time O(h), 因为向上或向下找; key不完全相同也适用;
if right[x]存在, 返回 MINIMUM(right[x])
else 从x向上找,直到遇到某个是其父结点的左儿子为止,返回改父结点
INSERT
time O(h)
从root开始沿树下降,指针x跟踪这条路直到nil,y始终为x的parent
DELETE
time O(h)
in: node z
1. z no children, 修改其父结点parent[z],使nil为其子女;
2. z one child, 连接child[z] 和 parent[z], 删除z;
3. z two children, 删除z的successor y, 然后用y的内容代替z的内容.
在n个keys上随记构造的BST期望高度为O(lg n)
AVL Tree:
AVL tree T is a binary search tree in which every node p satisfies:
The subtree TL rooted by p's left child, and subtree TR rooted by p's right child;
With depths dL and dR;
Must the case that dL <= dR + 1 and dR <= dL + 1.
n nodes AVL => depth O(log n).
Insert:
case 1: p add to X; [pic]
case 2: p add to Y; [pic]
Delete:
case 1: depth of X >= the depth of Y; as Insert case 1
case 2: depth of Y == the depth of X + 1; as Insert case 2
Rotation:
time O(1)
Hash table
INSERT
SEARCH
worst time: θ(n)
expected time: O(1)
DELETE
DIRECT ADDRESS:
数组, space O(n)
SEARCH, INSERT, DELETE: worst/average O(1)
HASH TABLE:
k为关键字集合,space O(|k|)
散列函数 h: 关键字域到数组下标
n 元素数; m 数组大小
load factor: a = n/m
Collision: 两个不同的key到相同的下标
Chaining:
用list存同一下标中的元素
INSERT(T, x): insert at the head of list;
worst case: O(1) 如果不检测是否在list中
O(length of list): 如果检测
SEARCH(T, k): O(length of list)
DELETE(T, x): 双链表 O(1), 单链表 O(length of list)
simple uniform hashing
任何元素散列到m个槽中的每一个的可能性是相同,且与其他元素已被散列到什么位置无关
fail: expected time θ(1+a)
success: expected time θ(1+a)
=> if n = O(m), SEARCH θ(1+a)
除法: h(k) = k mod m; m 选择接近n/a,又不接近2的任何幂次的质数
乘法: h(k) = floor(m*(kA mod 1)), 0 < A < 1
m 选择无要求; w 计算机字长
e.g. m = 2^p, w = 32, A = 2,654,435,769/2^w, kA = r1*2^w + r0;
h(k) = r0 的最高p位
全域??
Open addressing:
所有元素存放在散列表中
INSERT(k, i): 第一个nil的槽, i = 0 .. n-1
SEARCH(k, i): 没找到表示遇到nil或i=n还是没找到
DELETE: 删除一般用chaining
uniform hashing
每个关键字的probing sequence是<0, 1, ..., m - 1>的 m! 种排列中的任何一种的可能性是相同的
1. linear probing: h(k, i) = (h'(k) + i) mod m, i = 0, 1, ..., m - 1
primary clustering问题: 连续被占用槽增加,平均查找时间增加
用了θ(m)种探查序列
2. quadratic probing: h(k, i) = (h'(k) + c1*i + c2*i^2) mod m, i = 0, 1, ..., m - 1, c2 != 0
secondary clustering问题: 因为h(k1, 0) = h(k2, 0) -> h(k1, i) = h(k2, i)
用了θ(m)种探查序列
3. double hash: h(k, i) = (h1(k) + i * h2(k)) mod m, i = 0, 1, ..., m - 1
h2(k) 和 m 互质,即不存在最大公约数
1. m为2的幂, h2 总产生奇数;
2. m为质数,h2 总产生比m小的正整数; e.g. h1(k) = k mod m; h2(k) = 1 + k mod (m - 1)
用了θ(m^2)中探查序列
Graph
图 G = (V, E)
顶点数 V
边数 E
顶点集 V[G]
边集 V[E]
Sparse E << V^2 [远小于]
Dense E < V^2 [接近]
图表示
Adjacent Table: if sparse
包含|V|个列表的数组Adj组成,每个列表对应V的一个顶点;
对每一个u属于V, Adj[u]包含满足条件(u, v)属于E的顶点v.
顶点为任意顺序
无向图: 所有邻接表的长度为2E
有向图: 所有邻接表的长度为E
space O(V + E)
不足: 搜索边存在与否,需要遍历list
Adjacent Matrix: if dense or 很快判定两个顶点是否存在连接边
space O(V^2)
转置矩阵 transposed matrix
对角线 diagonal line
另外 图小 或者 不加权时用bit表示元素可以使用
Breadth-first search
E.g. Prim spanning tree or Dijkstra
一个源顶点
ancestor / descendant
color[u] 颜色; 黑色 所有相邻顶点均被发现; 白色 未被发现; 灰色 可能有一些白色的相邻点;
p[u] 父母
d[u] 距离
Queue 队列存储灰色点
time O(V + E), 是邻接表大小的线性函数
得到
1. 从已知源顶点s到v之间的最短路径距离ð(s, v)为从s到v的任何路径中最少的边数;
2. predecessor subgraph => 广度优先树;
3. s到v的最短路径上所有顶点: PRINT-PATH(G, s, v), time O(length of path(s, v))
Depth-first search:
多个源顶点
color[u] 颜色
p[u] 父母
d[u] 发现事件的时间戳
f[u] 完成事件的时间戳
得到
1. predecessor subgraph => 深度优先森林;
Comparison of Data Structure
NOTE: worst/average; DeleteMin is after find;
| Insert | Delete | Find | Sort | DeleteMin | Build | FindMin
Heap | O(lg n) | O(lg n) | O(n) ! | O(n*lg n) | O(lg n) | O(n) | O(1)
Hashing | O(1) | O(n)/O(1) | O(n)/O(1) | / | / | O(n) | /
AVL | O(lg n) | O(lg n) | O(lg n) | O(n) | O(1) | O(n*lg n) | O(lg n)
Delete:
Heap: delete and use the last one
AVL: delete and use the successor or predecessor
Constant time:
O(1) in Heap is smaller than that in AVL; if no find, Heap is better!
Comparison of Sort
NOTE: Worst / Average
| Time | In place | Stability | On-line
Bubble | O(n^2) | Y | Y |
Select | O(n^2) | Y | Y |
Insert | O(n^2) | Y | Y | Y
Shell's | depend on gaps | Y | N |
Merge | O(n*lg n) | N | N |
Heap | O(n*lg n) | Y | N |
Quick | O(n^2)/O(n*lg n) | Y | N |
Count | O(n) | N | Y |
Radix | O(k*n) | N | it depends |
Bucket | O(n^2)/O(n+k) | N | |
External | ...