Skip to content

Commit

Permalink
add LICENSE & add CLRS C11 & reconstruct README
Browse files Browse the repository at this point in the history
  • Loading branch information
Louis1992 committed Jul 12, 2015
1 parent 16333df commit fbe940c
Show file tree
Hide file tree
Showing 10 changed files with 395 additions and 1 deletion.
33 changes: 33 additions & 0 deletions C11-Hash-Tables/11.1.md
@@ -0,0 +1,33 @@
### Exercises 11.1-1
***
Suppose that a dynamic set S is represented by a direct-address table T of length m. Describe a procedure that finds the maximum element of S. What is the worst-case performance of your procedure?

### `Answer`
遍历整个table. 最坏情况是O(m)


### Exercises 11.1-2
***
A **bit vector** is simply an array of bits (0's and 1's). A bit vector of length m takes much less space than an array of m pointers. Describe how to use a bit vector to Represent a Dynamic Set of Distinct Elements with no Satellite Data. Dictionary Operations Should Run in O(1) Time.

### `Answer`
用来表示整数. 1表示该数在集合中,0表示不在集合中.

### Exercises 11.1-3
***
Suggest how to implement a direct-address table in which the keys of stored elements do not need to be distinct and the elements can have satellite data. All three dictionary operations (INSERT, DELETE, and SEARCH) should run in O(1) time. (Don't forget that DELETE takes as an argument a pointer to an object to be deleted, not a key.)

### `Answer`
将每个key分别映射到一个list

### Exercises 11.1-4
***
We wish to implement a dictionary by using direct addressing on a huge array. At the start, the array entries may contain garbage, and initializing the entire array is impractical because of its size. Describe a scheme for implementing a direct-address dictionary on a huge array. Each stored object should use O(1) space; the operations SEARCH, INSERT, and DELETE should take O(1) time each; and the initialization of the data structure should take O(1) time. (Hint: Use an additional stack, whose size is the number of keys actually stored in the dictionary, to help determine whether a given entry in the huge array is valid or not.)

### `Answer`
UNSOLVED(看不懂题目...)


***
Follow [@louis1992](https://github.com/gzc) on github to help finish this task.

Expand Down
42 changes: 42 additions & 0 deletions C11-Hash-Tables/11.2.md
@@ -0,0 +1,42 @@
### Exercises 11.2-1
***
Suppose we use a hash function h to hash n distinct keys into an array T of length m. Assuming simple uniform hashing, what is the expected number of collisions? More precisely, what is the expected cardinality of {{k, l} : k ≠ l and h(k) = h(l)}?

### `Answer`
![](http://latex.codecogs.com/gif.latex? C_n^2 \\cdot \\frac{1}{m})


### Exercises 11.2-2
***
Demonstrate the insertion of the keys 5, 28, 19, 15, 20, 33, 12, 17, 10 into a hash table with collisions resolved by chaining. Let the table have 9 slots, and let the hash function be h(k) = k mod 9.

### `Answer`
![](./repo/s1/1.png)

### Exercises 11.2-3
***
Professor Marley hypothesizes that substantial performance gains can be obtained if we modify the chaining scheme so that each list is kept in sorted order. How does the professor's modification affect the running time for successful searches, unsuccessful searches, insertions, and deletions?

### `Answer`
* successful searches:没有影响
* unsuccessful searches:当数据量大可以加速,可以提前判断元素是否在区间内
* insertions:降低了插入的速度,需要遍历链表插入在合适的位置
* deletions:没有影响

### Exercises 11.2-4
***
Suggest how storage for elements can be allocated and deallocated within the hash table itself by linking all unused slots into a free list. Assume that one slot can store a flag and either one element plus a pointer or two pointers. All dictionary and free-list operations should run in O(1) expected time. Does the free list need to be doubly linked, or does a singly linked free list suffice?

### `Answer`
需要双链表.每个slot有一个标识标识是否已分配,如果没有分配指针指向free list的属于自己的那个位置. 删除的时候将标志位清一下,将自己加入链表头,指针指向头;插入时根据指针去取.

### Exercises 11.2-5
***
Show that if |U| > nm, there is a subset of U of size n consisting of keys that all hash to thesame slot, so that the worst-case searching time for hashing with chaining is Θ(n).
### `Answer`
如果|U| = nm,假设U的全集要均匀分到m个位置上,每个位置期望就有n个元素,因此至少有一个位置是有至少n个元素的,我们选取这个集合,查找操作需要的时间就是Θ(n).


***
Follow [@louis1992](https://github.com/gzc) on github to help finish this task.

Expand Down
72 changes: 72 additions & 0 deletions C11-Hash-Tables/11.3.md
@@ -0,0 +1,72 @@
### Exercises 11.3-1
***
Suppose we wish to search a linked list of length n, where each element contains a key k along with a hash value h(k). Each key is a long character string. How might we take advantage of the hash values when searching the list for an element with a given key?

### `Answer`
首先计算出给定关键字的hash值. 对列表中的每个元素,先验证hash值对不对,再进行字符串的比较.


### Exercises 11.3-2
***
Suppose that a string of r characters is hashed into m slots by treating it as a radix-128 number and then using the division method. The number m is easily represented as a 32-bit computer word, but the string of r characters, treated as a radix-128 number, takes many words. How can we apply the division method to compute the hash value of the character string without using more than a constant number of words of storage outside the string itself?

### `Answer`
可以取一个31位的素数,取模

### Exercises 11.3-3
***
Consider a version of the division method in which h(k) = k mod m, where m = 2p - 1 and k is a character string interpreted in radix 2p. Show that if string x can be derived from string y by permuting its characters, then x and y hash to the same value. Give an example of an application in which this property would be undesirable in a hash function.

### `Answer`
有一个很简单的数论知识.先举个例子

3 * 128^3 mod 127 = 3 * 128^2 * (128 mod 127) mod 127 = 3 * 128^2 mod 127 = 3 mod 127 = 3

无论怎么交换字符串的order,radix的影响都会消失. 因为2p^n mod 2p-1 === 1.

### Exercises 11.3-4
***
Consider a hash table of size m = 1000 and a corresponding hash function h(k) = ⌊m(k A mod1)⌋ for 
![](http://latex.codecogs.com/gif.latex? A = \\frac{\\sqrt{5}-1}{2}). Compute the locations to which the keys 61, 62, 63, 64, and 65 are mapped.

### `Answer`
key | value
:----:|:----:
61 | 700
62 | 318
63 | 936
64 | 554
65 | 172

### Exercises 11.3-5
***
Define a family
![](http://latex.codecogs.com/gif.latex?\\mathscr{H} )
of hash functions from a finite set U to a finite set B to be **ϵ-universal** if for all pairs of distinct elements k and l in U,

![](http://latex.codecogs.com/gif.latex?\\Pr\\{h\(k\) = h\(l\)\\} \\le \\epsilon)

where the probability is over the choice of the hash function **h** drawn at random from the family ![](http://latex.codecogs.com/gif.latex?\\mathscr{H} ). Show that an ϵ-universal family of hash functions must have

![](http://latex.codecogs.com/gif.latex?\\epsilon \\ge \\frac{1}{|B|} - \\frac{1}{|U|})

### `Answer`
UNSOLVED


### Exercises 11.3-6
***
Let U be the set of n-tuples of values drawn from Zp, and let B = Zp, where p is prime. Definethe hash function hb : U → B for b in Zp on an input n-tuple [a0, a1, ..., an-1] from U as

![](http://latex.codecogs.com/gif.latex?h_b\(\\langle a_0, a_1, \\ldots, a_{n-1} \\rangle\) =
\\sum_{j=0}^{n-1} a_j b^j})

and let H={hb:b∈Zp}. Argue that H is ((n−1)/p)-universal according to the definition of ϵ-universal in Exercise 11.3-5. (Hint: See Exercise 31.4-4.)


### `Answer`
UNSOLVED

***
Follow [@louis1992](https://github.com/gzc) on github to help finish this task.

Expand Down
114 changes: 114 additions & 0 deletions C11-Hash-Tables/11.4.md
@@ -0,0 +1,114 @@
### Exercises 11.4-1
***
Consider inserting the keys 10, 22, 31, 4, 15, 28, 17, 88, 59 into a hash table of length m = 11 using open addressing with the primary hash function h'(k) = k mod m. Illustrate the result of inserting these keys using linear probing, using quadratic probing with c1 = 1 and c2 = 3, and using double hashing with h2(k) = 1 + (k mod (m - 1)).

### `Answer`
index | linear probing | quadratic probing | double hashing
:----: | :----: | :----: | :----:
0 | 22 | 22 | 22
1 | 88 | |
2 | | 88 | 59
3 | | 17 | 17
4 | 4 | 4 | 4
5 | 15 | | 15
6 | 28 | 28 | 28
7 | 17 | 59 | 88
8 | 59 | 15 |
9 | 31 | 91 | 31
10| 10 | 10 | 10 |


### Exercises 11.4-2
***
Write pseudocode for HASH-DELETE as outlined in the text, and modify HASH-INSERT to handle the special value DELETED.

### `Answer`

HASH-DELETE(T, k):
i <- 0
repeat j <- h(k, i)
if T[j] == k:
T[j] = "DELETED"
return
else if T[j] == NIL:
break
else:
i <- i + 1
until i == m
error "k is not in T"
HASH-INSERT(T, k):
i <- 0
repeat j <- h(k, i)
if T[j] == NIL || T[j] == "DELETED":
T[j] = k
rteurn j
else i <- i + 1
until i == m
error "hash table overflow"


### Exercises 11.4-3
***
Suppose that we use double hashing to resolve collisions; that is, we use the hash function h(k, i) = (h1(k)+ih2(k)) mod m. Show that if m and h2(k) have greatest common divisor d ≥ 1 for some key k, then an unsuccessful search for key k examines (1/d)th of the hash table before returning to slot h1(k). Thus, when d = 1, so that m and h2(k) are relatively prime, the search may examine the entire hash table. (Hint: See Chapter 31.)

### `Answer`
简单了解下,这应该算是费马定理(如果没记错)

假设我们有3和7,3作为generator可以生成阶为6的子群,在这种情况下

3*1 mod 7 = 3

3*2 mod 7 = 6

3*3 mod 7 = 2

3*4 mod 7 = 5

3*5 mod 7 = 1

3*6 mod 7 = 4

3*7 mod 7 = 0

如果两个数字互质,那么一个数字的幂的模可以遍历一圈!也就是3 * n mod 7 = 3 * (n+7) mod 7.

对这个题目来说,如果d = 1,则要检查全部的散列.因为要遍历m个位置.

如果d > 1,那么h2(k)和m同时除d后又互质.可能要遍历m/d个位置.
### Exercises 11.4-4
***
Consider an open-address hash table with uniform hashing. Give upper bounds on the expected number of probes in an unsuccessful search and on the expected number of probes in a successful search when the load factor is 3/4 and when it is 7/8.

### `Answer`
Theorem 11.6. Given an open address hash table with load factor α = n/m < 1, the
expected number of probes in an unsuccessful search is at most 1/(1-α), assuming
uniform hashing.

α = ¾, then the upper bound on the number of probes = 1 / (1 - ¾ ) = 4 probes

α = 7/8, then the upper bound on the number of probes = 1 / (1-7/8) = 8 probes

Theorem 11.8. Given an open address hash table with load factor α = n/m < 1, the
expected number of probes in a successful search is at most (1/α) ln (1/(1-α)), assuming
uniform hashing and assuming that each key in the table is equally likely to be searched
for.

α = ¾. (1/ ¾) ln (1/ (1 – ¾)) = 1.85 probes
α = 7/8. (1/ .875) ln (1/ (1 – .875)) = 2.38 probes

### Exercises 11.4-5
***
Consider an open-address hash table with a load factor α. Find the nonzero value α for which the expected number of probes in an unsuccessful search equals twice the expected number of probes in a successful search. Use the upper bounds given by Theorems 11.6 and 11.8 for these expected numbers of probes.

### `Answer`
1/(1-α) = ln(1/(1-α)) * 2/α

解得α = 0.717


***
Follow [@louis1992](https://github.com/gzc) on github to help finish this task.

Expand Down
12 changes: 12 additions & 0 deletions C11-Hash-Tables/11.5.md
@@ -0,0 +1,12 @@
### Exercises 11.5-1
***
Suppose that we insert n keys into a hash table of size m using open addressing and uniform hashing. Let p(n, m) be the probability that no collisions occur. Show that p(n, m) ≤ e^(-n(n-1)/2m). (Hint: See equation (3.11).) Argue that when n exceeds sqrt(m), the probability of avoiding collisions goes rapidly to zero.

### `Answer`
UNSOLVED



***
Follow [@louis1992](https://github.com/gzc) on github to help finish this task.

Expand Down
8 changes: 8 additions & 0 deletions C11-Hash-Tables/README.md
@@ -0,0 +1,8 @@
UNSOLVED

[11.1.4](./11.1.md#exercises-111-4)

[11.3.5](./11.3.md#exercises-113-5)
[11.3.6](./11.3.md#exercises-113-6)

[11.5.1](./11.5.md#exercises-115-1)
50 changes: 50 additions & 0 deletions C11-Hash-Tables/problem.md
@@ -0,0 +1,50 @@
### Problems 1 : Longest-probe bound for hashing
***
A hash table of size m is used to store n items, with n ≤ m/2. Open addressing is used for collision resolution.
**a.**Assuming uniform hashing, show that for i=1,2,…,n, the probability is at most 2^−k that the ith insertion requires strictly more than k probes.
**b.**Show that for i=1,2,…,n, the probability is O(1/n^2) that the ith insertion requires more than 2lgn probes.
Let the random variable Xi denote the number of probes required by the ith insertion. You have shown in part (b) that
![](http://latex.codecogs.com/gif.latex?\\Pr\\{X_i > 2\\lg{n}\\} =
O\(1/n^2\) ). Let the random variable
![](http://latex.codecogs.com/gif.latex?X = max_{1 \\le i \\le n}X_i)denote the maximum number of probes required by any of the n insertions.
**c.**Show that Pr{X > 2lgn}=O(1/n).
**d.**Show that the expected length E[X] of the longest probe sequence is O(lgn).
### `Answer`
**a.**

P = (n/m)^k < (1/2)^k = 2^-k

**b.**

代入a中的结论即可

**c.**

![](http://latex.codecogs.com/gif.latex?P = \\prod_{i=0}^{2\\lg{n}}\\frac{m/2-i}{m} < \\prod_{i=0}^{2\\lg{n}} \\frac{1}{2} = \\frac{1}{2}^{2\\lg{n}} = \\frac{1}{4}^{\\lg{n}} = 4^{\\lg{n^{-1}}} = O\(n^{-1}\) )

**d.**

该题可以参考5.4.2节的关于**球与盒子**的结论和5.4.3节的关于**序列**的结论.

我们把每次的概率放大为1/2(实际上是≤ 1/2的)

所以是O(lgn)



### Problems 2 : Slot-size bound for chaining
***
Suppose that we have a hash table with n slots, with collisions resolved by chaining, and suppose that n keys are inserted into the table. Each key is equally likely to be hashed to each slot. Let M be the maximum number of keys in any slot after all the keys have been inserted. Your mission is to prove an O(lg n/lg lg n) upper bound on E[M], the expected value of M.

**a.**
Argue that the probability Qk that exactly k keys hash to a particular slot is given by

![](http://latex.codecogs.com/gif.latex? Q_k = \(\\frac{1}{n}^k \) \(1-\\frac{1}{n}\)^{n-k} C_k^n)
### `Answer`



***
Follow [@louis1992](https://github.com/gzc) on github to help finish this task.

Expand Down
Binary file added C11-Hash-Tables/repo/s1/1.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
21 changes: 21 additions & 0 deletions LICENSE
@@ -0,0 +1,21 @@
The MIT License (MIT)

Copyright (c) 2015 Zhenchao Gan

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

0 comments on commit fbe940c

Please sign in to comment.