# 1. Heap (data structure)
#### Important links
* Wiki link: https://en.wikipedia.org/wiki/Heap_(data_structure)



## 1.1. Abstract:

* In computer science, a **heap** is a specialized **tree-based** data structure that satisfies the **heap property**:
    * **heap property**: if $P$ is a parent node of $C$,
        - then the key (the value) of $P$ is either greater than or equal to (in a max heap) the key of $C$.
        - or less than or equal to (in a min heap) the key of $C$.
    - **图例**: Example of a binary __max-heap (看来还有 min-heap)__ with node keys being integers from 1 to 100
        ![Example](https://upload.wikimedia.org/wikipedia/commons/thumb/3/38/Max-Heap.svg/320px-Max-Heap.svg.png)
        
    - **基本概念**: The node at the "top" of the heap (with no parents) is called the root node.
    
    - **用途-priority queue**: 
        - The heap is one **maximally efficient implementation** of an abstract data type called a **priority queue**, and in fact priority queues are often referred to as "heaps", regardless of how they may be implemented. 
    - **实现方式-binary heap/tree** : A common implementation of a heap is the binary heap, in which the tree is a binary tree (see figure)
    

- 性质:
    - In a heap, the highest (or lowest) priority element is always stored at the root. A heap is __not a sorted__ structure and can be regarded as **partially ordered**.
    - When a heap is a **complete binary tree**, it has a smallest possible height—a heap with N nodes and for each node $a$ branches always has $log_a N$ height.
    - A heap is a useful data structure when you need to remove the object with the highest (or lowest) priority.
    - The maximum number of children each node can have depends on the type of heap, but in many types it is at most two, which is known as a __binary heap__.
    - Example of a complete binary max-heap with node keys being integers from 1 to 100 and how it would be stored in an array. https://en.wikipedia.org/wiki/Heap_(data_structure)#/media/File:Heap-as-array.svg
    - Heaps are __usually implemented in an array__ (fixed size or dynamic array), and __do not require pointers between elements__. [具体查看 wiki 链接 的 Implementation 部分].

## 1.2.  heapq — Heap queue algorithm
Python LINK: https://docs.python.org/2/library/heapq.html



- 以讨论 __heap queue__ algorithm,  also known as the __priority queue__ algorithm. 这里 priority 指的是 heap, 即 min or max.

- Heaps property:
    - heaps are binary trees where every parent node has a value less than or equal to any of its children. 
    - $heap[k] \leq heap[2k + 1]$ and $heap[k] \leq heap[2k + 2]$, for all $k$, __counting elements from zero__.
    
- Two differences from the textbook:
    - We use __zero-based indexing__. This makes the relationship between the index for a node and the indexes for its children slightly less obvious, but is more suitable since Python uses zero-based indexing.
    - Our pop method returns the smallest item, not the largest (called a “min heap” in textbooks; a “max heap” is more common in texts because of its suitability for in-place sorting).
 
- These two make it possible to view the heap as a __regular Python list__ without surprises: __heap[0]__ is the smallest item, and __heap.sort()__ maintains the heap invariant (不变量), 其实 heap 是almost sorted的例子, 因此用insert sort是很合适的, 估计是 $O(n)$ runtime! 
    - To create a heap, use a list initialized to __[ ]__, 
    - or you can transform a populated list into a heap via function __heapify()__.

### 1.3. The following functions are provided

In [None]:
heapq.heappush(heap, item)
# Push the value item onto the heap, maintaining the heap invariant.

heapq.heappop(heap)
# Pop and return the smallest item from the heap, maintaining the heap invariant.
# If the heap is empty, IndexError is raised. To access the smallest item without 
# popping it, use heap[0].

heapq.heappushpop(heap, item)
# Push item on the heap, then pop and return the smallest item from the heap.
# The combined action runs more efficiently than heappush() followed by a separate
# call to heappop().

heapq.heapify(x)
# Transform list x into a heap, in-place, in linear time.

heapq.heapreplace(heap, item)
# Pop and return the smallest item from the heap, and als

heapq.nsmallest(n, iterable[, key])o push the new item. 
# The heap size doesn’t change. If the heap is empty, IndexError is raised.
#　This one step operation is more efficient than a heappop() followed by heappush()
# and can be more appropriate when using a fixed-size heap.
# 类似于　heapq.heappushpop(heap, item)

heapq.merge(*iterables)
# Merge multiple sorted inputs into a single sorted output.  
# Returns an iterator over the sorted values.

heapq.nsmallest(n, iterable[, key])
heapq.nlargest(n, iterable[, key])
# Return a list with the n smallest elements from the dataset 
# defined by iterable. key, if provide

## 1.4. Basic Examples

#### Heap Sort
A heapsort can be implemented by pushing all values onto a heap and then popping off the smallest values one at a time:

In [2]:
from heapq import heappush, heappop, heapify

def heapsort(iterable):
    heap = []
    for value in iterable:
        heappush(heap, value)
    return [heappop(heap) for i in range(len(heap))]

def heapsort2 (nums):
    heap = [ ]
    heapify (nums)
    tmp = [ heappop(nums) for i in xrange(len(nums)) ]
    return tmp

print heapsort   ([1, 3, 5, 7, 9, 2, 4, 6, 8, 0])
print heapsort2 ([1, 3, 5, 7, 9, 2, 4, 6, 8, 0])

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [None]:
>>> from heapq import heappush, heappop
>>> heap = []
>>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]

>>> for item in data:
...     heappush(heap, item)

>>> ordered = []
>>> while heap:
...     ordered.append(heappop(heap))
>>> ordered = [heappop(heap) for i in xrange(len(heap)) ]


## Using a heap to insert items at the correct place in a priority queue:
>>> heap = []
>>> data = [(1, 'J'), (4, 'N'), (3, 'H'), (2, 'O')]
>>> for item in data:
...     heappush(heap, item)
...
>>> while heap:
...     print(heappop(heap)[1])
J
O
H
N

# 2. Tree
#### **Important links**
* Wiki link: https://en.wikipedia.org/wiki/Tree_(data_structure)

## 2.1. Abstract

- A tree is a widely used __abstract data type (ADT)__
    - it has a hierarchical tree structure
        - with a root value and subtrees of children,
        - represented as a set of __linked nodes__.
        
- __Tree有向无环图__: A tree data structure can be defined __recursively__ as a collection of nodes (starting at a root node),
    - where each node is a data structure consisting of a value, __together with__ a list of references to nodes (the "children"), 
    - with the constraints that no reference is duplicated, and none points to the root. [不要存在环]
    
- 查看 [wiki_link] 的 __Definition__ 部分.

## 2.2. 正文
术语
- Root, Child, Parent, Siblings (兄弟姐妹, same parent), Descendant (后裔), Ancestor,
- Leaf (A node with no children), Edge, Path
- Degree: the number of subtrees of a node.
- Level: The level of a node is defined by 1 + (the number of connections between the node and the root).
- Height of node: the number of edges on the longest path between that node and a leaf.
- Height of tree:  the height of its root node.
- Depth: the number of edges from the tree's root node to the node.
- Forest: a forest is a set of $n \geq 0$ disjoint trees.


    
Tree 的三种 Recursive 解读：
- Recursively, a tree $t$ consists of a value $v$ and a list of other trees:
$$t: v \quad \text{and} \quad [ t[1], ..., t[k] ]$$

- A forest (a list of trees), where a tree consists of a value and a forest (the subtrees of its children):
\begin{align*}
f: & \  [t[1],...,t[k]]\\
t: & \ v\quad \text{and}\quad f
\end{align*}
- A node $n$ consists of a value $v$ and a list of references to other nodes
$$ n:  v \quad \text{and} \quad [ \&n[1], ..., \&n[k] ] $$

- A tree defines a directed graph, 需要满足:
    - every node (other than the root) must have exactly one parent, and the root must have no parents.
    - one reference can point to any given node (a node has at most a single parent), and no node in the tree point to the root
 [wiki_link]:https://en.wikipedia.org/wiki/Tree_(data_structure)


## 2.3. Binary Trees [wiki](https://en.wikipedia.org/wiki/Binary_tree)
* Each node can have no more than two children is a good way to understand trees.
* Exp below: https://pythonschool.net/data-structures-algorithms/binary-tree/

#### Important Links:
* http://www.cs.cmu.edu/~clo/www/CMU/DataStructures/Lessons/lesson4_1.htm
* https://en.wikipedia.org/wiki/Binary_tree


#### [Important Videos on Data structures](https://www.youtube.com/playlist?list=PL2_aWCzGMAwI3W_JlcBbtYTwiQSsOTa6P)
* [Data structures: Introduction to Trees](https://www.youtube.com/watch?v=qH6yxkw0u78)
* [Data structures: Binary Tree](https://www.youtube.com/watch?v=H5JubkIy_p8)
* [Data structures: Binary Search Tree](https://www.youtube.com/watch?v=pYT9F8_LFTM)
* [Delete a node from Binary Search Tree](https://www.youtube.com/watch?v=gcULXE7ViZw)

* binary tree is a tuple $(L, S, R)$, where $L$ and $R$ are binary trees or the empty set and $S$ is a singleton 单件 set

* 用途: Binary trees are seldom used solely for their structure. 
    - Much more typical is to define a labeling function on the nodes, which associates some value to each node.
    - Binary trees labelled this way are used to implement __binary search trees__ and __binary heaps__, and are used for efficient __searching__ and __sorting__. 
    
   
### 2.3.1. [Types of trees](https://www.quora.com/What-are-the-types-of-trees-in-data-structures)

#### Binary tree 子类：
- __full binary tree__ (e.g.  proper or plane binary tree)
    - a tree in which every node in the tree has either 0 or 2 children.
- __perfect binary tree__
    - a binary tree where all interior nodes have two children and all leaves have the same depth or same level.
- __complete binary tree__ 
    - 要求【__perfect binary tree__ 好像和 __complete binary tree__ 不具有包含关系】
        - every level, except possibly the last, is completely filled, 
        - and all nodes in the last level are as far left as possible. 
    -  A complete binary tree can be efficiently represented using an array
    
* 感觉上述三种 binary tree 没有互相包含关系。

- __balanced binary tree__
    - 不懂wiki的定义？

- [__Full  vs. Complete vs. Perfect__ binary tree comparisions](https://www.google.com/search?safe=active&biw=1327&bih=714&tbm=isch&sa=1&ei=FIpAWt2qC8HejwSuu5WQDA&q=Types+of+Binary+Tree&oq=Types+of+Binary+Tree&gs_l=psy-ab.12...0.0.0.204839.0.0.0.0.0.0.0.0..0.0....0...1c..64.psy-ab..0.0.0....0.2FAtdBPx-vU#imgrc=LAbdouxreudmKM)

### 2.3.2. Binary search tree (BST)
* 【定义】BST is a binary tree where the key in each node must be greater than or equal to any key stored in the left sub-tree, and less than or equal to any key stored in the right sub-tree..简单来说，对于任意node，需要满足 $L\leq P \leq R$
* 【特性】 BST allows __fast lookup, addition and removal of items__, 
    - BST can be used to implement either __dynamic sets of items__, or __lookup tables__ that allow finding an item by its key (e.g., finding the phone number of a person by name)
    - On average, each comparison in BST allows the operations to skip half of the tree, so that each lookup, insertion or deletion's runtime is about $\log(n)$, where $n$ is the number of items in the tree.
    - Better than linear time required to find items by key in an (unsorted) array, but slower than the corresponding operations on hash tables.
    
#### Search in BST   的两种实现
- Because in the worst case this algorithm must search from the root of the tree to the leaf farthest from the root, the search operation takes time proportional to the tree's height. 
- On average, binary search trees with $n$ nodes have $O(\log n)$ height. 
- However, in the worst case, binary search trees can have $O(n)$ height, when the unbalanced tree resembles a linked list (degenerate tree).

In [None]:
# 1-Searching: recursive solver ----------------------
def BST_search_recursively (key, node):
    
    # 2 base case
    if node is None or node.key == key:
        return node
    
    # 2 recursively solve
    if key < node.key:
        return BST_search_recursively (key, node.left)
    else: # key > node.key
        return BST_search_recursively (key, node.right)
    

# 2-Searching: iterative solver ---------------------
def BST_search_iteratively (key, node):
    
    curr = node
    while curr is not None:
        if key == curr.key:
            return curr
        elif key < curr.key:
            curr = curr.left
        else: # key > curr.key:
            curr = curr.right
    return curr

#### BST's insertion, inorder print, deletion operation: [Code examples in GeeksforGeeks](https://www.geeksforgeeks.org/binary-tree-data-structure/)
* BST deletion illustration video is shown in https://www.youtube.com/watch?v=puyl7MBqPIg.
* Delete a node from Binary Search Tree: https://www.youtube.com/watch?v=gcULXE7ViZw

In [2]:
# Python program to demonstrate delete operation
# in binary search tree
 
# A Binary Tree Node
class Node:
    # Constructor to create a new node
    def __init__(self, key):
        self.key   = key 
        self.left  = None
        self.right = None
 
 
# A utility function to do inorder traversal of BST
def inorder(root):
    if root is not None:
        inorder(root.left)
        print root.key,
        inorder(root.right)
 
 
# A utility function to insert a new node with given key in BST
def insert( node, key):
    # If the tree is empty, return a new node
    if node is None:
        return Node(key)
    
    # Otherwise recur down the tree
    if key < node.key:
        node.left = insert(node.left, key)
    else:
        node.right = insert(node.right, key)
 
    # return the (unchanged) node pointer
    return node
 
# Given a non-empty binary search tree, return the node
# with minum key value found in that tree. Note that the
# entire tree does not need to be searched 
def minValueNode( node):
    current = node
 
    # loop down to find the leftmost leaf
    while(current.left is not None):
        current = current.left 
 
    return current 
 
# Given a binary search tree and a key, this function
# delete the key and returns the new root
def deleteNode(root, key):
    # Base Case
    if root is None:
        return root 
 
    # If the key to be deleted is smaller than the root's
    # key then it lies in left subtree
    if key < root.key:
        root.left = deleteNode(root.left, key)
 
    # If the key to be delete is greater than the root's key
    # then it lies in right subtree
    elif(key > root.key):
        root.right = deleteNode(root.right, key)
 
    # If key is same as root's key, then this is the node
    # to be deleted
    else:         
        # Node with only one child or no child
        if root.left is None :
            temp = root.right 
            root = None
            return temp     
        elif root.right is None :
            temp = root.left 
            root = None
            return temp
 
        # Node with two children: Get the inorder successor
        # (smallest in the right subtree)
        temp = minValueNode(root.right)
 
        # Copy the inorder successor's content to this node
        root.key = temp.key
 
        # Delete the inorder successor
        root.right = deleteNode(root.right , temp.key)
        
    return root 
 
# Driver program to test above functions
""" Let us create following BST
              50
           /     \
          30      70
         /  \    /  \
       20   40  60   80 """
 
root = None
root = insert(root, 50)
root = insert(root, 30)
root = insert(root, 20)
root = insert(root, 40)
root = insert(root, 70)
root = insert(root, 60)
root = insert(root, 80)
 
print "Inorder traversal of the given tree"
inorder(root)
 
print "\nDelete 20"
root = deleteNode(root, 20)
print "Inorder traversal of the modified tree"
inorder(root)
 
print "\nDelete 30"
root = deleteNode(root, 30)
print "Inorder traversal of the modified tree"
inorder(root)
 
print "\nDelete 50"
root = deleteNode(root, 50)
print "Inorder traversal of the modified tree"

Inorder traversal of the given tree
20 30 40 50 60 70 80 
Delete 20
Inorder traversal of the modified tree
30 40 50 60 70 80 
Delete 30
Inorder traversal of the modified tree
40 50 60 70 80 
Delete 50
Inorder traversal of the modified tree


* Delete a node from Binary Search Tree: https://www.youtube.com/watch?v=gcULXE7ViZw&t=39s
* Find min and max element in a binary search tree: https://www.youtube.com/watch?v=Ut90klNN264&index=30&list=PL2_aWCzGMAwI3W_JlcBbtYTwiQSsOTa6P

#### Deletion in BST [wiki_links](https://en.wikipedia.org/wiki/Binary_search_tree#Insertion)

##### The method that changed at most one after deletion. Three cases:
* Deleting a node with no children: simply remove the node from the tree.
* Deleting a node with one child: remove the node and replace it with its child.
* Deleting a node with two children: 
    * call the node to be deleted D. 
    * Do not delete D. Instead, choose either its in-order predecessor (前任, max in the left sub-tree) node or its in-order successor (继承, min in the right sub-tree) node as replacement node E (s. figure).
    * Copy the user values of E to D. If E does not have a child simply remove E from its previous parent G. If E has a child, say F, it is a right child. Replace E with F at E's parent.

In [None]:
## code is as below 
def find_min(self):   # Gets minimum node in a subtree
    current_node = self
    while current_node.left_child:
        current_node = current_node.left_child
    return current_node

def replace_node_in_parent(self, new_value=None):
    if self.parent:
        if self == self.parent.left_child:
            self.parent.left_child = new_value
        else:
            self.parent.right_child = new_value
    if new_value:
        new_value.parent = self.parent

def binary_tree_delete(self, key):
    if key < self.key:
        self.left_child.binary_tree_delete(key)
    elif key > self.key:
        self.right_child.binary_tree_delete(key)
    else: # delete the key here
        if self.left_child and self.right_child: # if both children are present
            successor = self.right_child.find_min()
            self.key = successor.key
            successor.binary_tree_delete(successor.key)
        elif self.left_child:   # if the node has only a *left* child
            self.replace_node_in_parent(self.left_child)
        elif self.right_child:  # if the node has only a *right* child
            self.replace_node_in_parent(self.right_child)
        else: # this node has no children
            self.replace_node_in_parent(None)

### 2.3.3. 其他 Trees
#### Link (以下内容还没来及细看): https://www.quora.com/What-are-the-types-of-trees-in-data-structures
* __AVL tree or height balanced binary tree__
    * It is a variation of the Binary tree where height difference between left and right sub tree can be at most 1. If at any time they differ by more than one, rebalancing is done to restore this property. Lookup, insertion, and deletion all take $O(\log n)$ time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation.
* __Red-Black tree__:
    * Another variant of binary tree similar to AVL tree it is a self balancing binary search tree. In this tree nodes are either colored red or black.
* __Splay tree (伸展树)__:
    * A splay tree is a self-adjusting binary search tree with the additional property that recently accessed elements are quick to access again. All normal operations on a binary search tree are combined with one basic operation, called splaying. Splaying the tree for a certain element rearranges the tree so that the element is placed at the root of the tree.
    
* __N-ary tree__:
    * In this tree the limitation of the binary tree is removed. Here a node can have at most n children. Like binary tree it can be full,complete or perfect n-ary tree. N-ary is some time known as forest.
    
* __Trie Structure__:
    * In computer science, a trie, also called digital tree and sometimes radix tree or prefix tree (as they can be searched by prefixes), is an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string.
* __Heap Structure__:
    * Heap  structure is another widely used tree structure with a specific ordering property.  There are two types of heap  - Min heap and Max heap. In a min heap the parent of a node must be smaller than the values of all its children.  Similarly in max heap the parent always have greater value compared to all its children. One common implementation of heap is Binary heap where each parent can have at most two children.

### 读 WSISA - CVPR'17 有感　
#### 疑问：
* 为啥不能 survival 只能分析到 ranking，因此让不同用户的数据互相关联起来。
* Cox hazard function 中 $h_0(t)$ 怎么估计的呢？