# 数据结构基础  


## List 
### 定义
* Stores data elements based on an sequential, most commonly 0 based, index. 
* Based on tuples from set theory. （元组 + 可变的集合 -> 可变的元组）。
* They are one of the oldest, most commonly used data structures. 

### 重点
* Advantages: indexing  
* Disadvantages: inserting(except append), and deleting(except pop).
* Linear data structures, are the most basic. 最基本结构：线性数组.
    * Stack 栈 
    * Queue 队列 
* Non-linear data structures.非线性数据结构
    * Graphs 图 [[1,2], [3, 4]]
    * Trees  树 中序遍历 [1,[2,4,5],3]

### 增删改查排及时间复杂度
* Append： O(1)
* Insertion: O(N) (l[a:b] = ...) 
* Pop：O(1)
* Delete: O(N) depends on i; O(N) in worst case
* Indexing: O(1)
* Search: traversal list O(N)
* Sort: O(NlogN)



## Linked list
### 定义
* Stores data with nodes that point to other nodes. 用节点储存数据并指向其他节点
    * 节点的描述：Nodes, at its most basic it has one datum and one reference (another node) 在最基础的情况下，节点拥有一个数据和一个参考（其他节点）
    * 链表的描述：A linked list chains nodes together by pointing one node's reference to another. 节点通过指向另一个节点从而使链表连接在一起

### 重点 
* Advantage: Add and deletion 
* Disadvantage: indexing and searching. 
* Doubly linked list has nodes that also references the previous node. 双向链表有两个参考，一个是之前的node，一个是之后的node
* Circularly linked list is simple linked list whose tail, the last node, references the head, the first node. 成环链表是由简单的链表首末相连组成
* 可以实现栈和队列

### 增删改查排及时间复杂度
* Add node: O(1)
* Deletion: O(1)
* indexing: O(N)
* Search: traversal linklist O(N)
* Sort: O(NlogN)

## Stack, Queue and Heap queue(priority queue)
### Stack 栈 
* Last-In-First-Out (LIFO) concept 
* append() / pop() 
* can be made from **list**
* can be made from **linklist**, by having the head be the only place for insertion and removal. 可用链表构成,插入：建立新节点，连接到头节点；移除：移除头节点，返回下一个
    * 参见 <链表实现stack和Queue>
    
### Queue 队列 
* First-In-First-Out (FIFO) principle 
* Lists are not efficient to implement a queue 列表不能有效的执行队列
    * 故调用容器-双端队列deque(double-end-queue) 
    * from collections import deque d = deque(iterable) ("deck")，If iterable is not specified, the new deque is empty.
    * append(x) / appendleft(x) / pop() / popleft() / extend(iterable) / extendleft(iterable) O(1) 双向append or pop, O(1)
* can be made from **linklist** that only removes from head and adds to tail 也可用链表构成, 从头部移除，从尾部添加
    * 参见 <链表实现stack和Queue>

### Heap queue 堆 
* 最小值在堆顶   

In [None]:
# stack（list）
A= [1,2,3]
A.append([4])  # [1, 2, 3, [4]]
A.extend([4])  # [1, 2, 3, 4]
A.insert(1,4)  # 在i处插入value [1, 4, 2, 3]
A.remove(1)  # 除去值为i的数 [4, 3, 2]
x = A.pop(1)  # 除去第i个数 [4, 2]
x = A.index(4)  # 0 返回序号 A = [3, 1, 5]
A.reverse()  # 转置 [5, 1, 3]
A.sort(key = True)  # 排序 + 转置 [5, 3, 1]
from copy import copy, deepcopy
a = copy(A)  # 浅复制
a = deepcopy(A)  # 深复制

# Queue (deque)
from collections import deque
B = 'desk'  # iterable
d = deque(B) # 转化B为队列d
d = deque()  # 建立新队列d
d.append('c')
d.appendleft('e')  # 'edeskc'
d.extend(a)
d.extendleft(a)  # 扩展最左边
x = d.pop(1) # d.pop(i)
d.popleft() #删除最左边的，即先“进”的，无(i)


# Heap queue 堆
from heapq import heapify, heappop, heappush, heappushpop, heapreplace, nlargest, nsmallest
C = []  # 建立新heapq
C = [2, 3, 4]
heapify(C)  # 转化C为堆
heappush(C, [1, 2])  # 添加[1, 2]进C
x = heappop(C)
heappushpop(C, [1, 2])  # 先添加，再删除顶端的
heapreplace(C, [1, 2])  # 先删除顶端的，再添加
nlargest(2, C)  # nlargest(n, iterable, key=None) 返回前n个符合k条件的最大值（e.g. k = lambda x: x[0]）  = sorted(iterable, key=key, reverse=True)[:n]
nsmallest(2, C)  # nsmallest(n, iterable, key=None) 返回前n个最小值 = sorted(iterable, key=key, reverse= False)[:n]

## Hash Table 及 string
### 定义
* 结构：Stores data with **key value pairs**.
* 功能：Hash function accept a key and return an output unique only to that specific key
    * This is known as hashing, which is the concept that an input and an output have **a one-to-one correspondence** to map information.
    * Hash function has **a unique address** in memory for that data.

### 重点
* Advantages: insertion, deletion, and searching
* Hash collisions(哈希冲突) are when a hash function returns the **same output** for **two distinct inputs**.
    * All hash function have its problem
    * This is often accommodated(解决) for **having the hash table being very large**
* Hashes are important for **dictionary** and **database indexing**.

### 增删改查排及空间复杂度
* Store：O(1) d[k] = v
* Pop: O(1) **d.pop(k)** 删除key对应的pair，并返回value
* Pop item: O(1) **d.popitem()** 删除并返回最后一个pair
* Delete: O(1) del d[k]
* Search: O(1) d[k]


## Binary Tree

### 定义
a tree like data structure where every node has at most two children
    * There is one left and right child node

### 重点
* Designed to optimize searching and sorting
* A **degenerate tree** is an unbalanced tree, which if entirely one-sided is essentially a linked list. 简并树是母节点只有一个子节点的树，如果完全单侧，则就是链表
* They are comparably simple to implement than other data structure.
* Used to make **binary search tree**
    * A binary tree that uses comparable keys to assign which direction a child is. 使用可比较键来指定孩子的方向的二叉树。
    * Left child has a key smaller than its parent node.
    * Right child has a key greater than its parent node.
    * There can be no duplicate node.
    * Because of the above it is more likely to be used as a data stucture than a binary tree.

### 增删改查排及空间复杂度
* Insertion: O(logN)
* Indexing: O(logN)
* Search: O(logN)

# 搜索基础 Search Basics

## 宽度优先搜索 Breadth First Search

### 定义
An algorithm that search a tree (or graph) by searching levels of the tree firs, starting at the root.

### 重点
* Optimal for searching a tree that is **wider than it is deep**.
* Uses a **queque** to store information about the tree while it traverses a tree.
    * Because it uses a queue it is more memory intensive than depth first serarch.
    * The queue uses more memory because it needs to stores pointers.
    
### 复杂度
* Search: O(V + E) under the graph is represented by the adjacency list structure
* Each edege is labeled twice. E is number of edges 边的数量
* Each vertex is labeled twice. V is number of vertices 顶点的数量

## 深度优先搜索 Depth First Search

### 定义
* An algorithm that searches a tree (or graph) by searching depth of the tree first, starting at the root.
    * It traverses left down a tree until it cannot go futher.
    * Once it reaches the end of a branch, it traverses back up trying the right child of nodes on that branch, and if possible left from the right children.
    * When finished examing a branch, it moves to the node right of the root then tries to go left on all its children until it reaches the bottom.
    * The right most node is evaluated last(the node that is right of all its ancestors)

### 重点
* Optimal for searching a tree that is deeper than it is wide
* Uses a stack to push nodes onto:
    * Becase a stack is LIFO, it does not need to keep track of the nodes pointers and is thereforce less memory intensive than breath than BFS.
    * Once it can't go further left it begins evaluating the stack.一旦无法继续前进，它将开始评估堆栈。

### 复杂度
* Search:  O(V + E) under the graph is represented by the adjacency list structure
* Each edege is labeled twice. E is number of edges 边的数量
* Each vertex is labeled twice. V is number of vertices 顶点的数量

### BFS vs DFS
* The simple answer to this question is that it depends on the size and shape of the tree.
    * For wide, shallow tree use BFS 宽而浅
    * For deep, narrow tree use DFS. 深而窄

### Nuances 细微差别:
* Because BFS uses queue to store information about the node and its children, it could use more memory than is available on your computer.
* If using a DFS on a tree that is very deep, you might go unnecessarily deep in the search. 如果在非常深的树上使用DFS，则可能会不必要地深入搜索。
* BFS tends to be a **looping algorithm**.
* DFS tends to be a **recursive algorithm**.


# 高效排序基础 Efficient Sorting Basic
* 基本应用：排序类题目
    * 核心步骤：比较（每个元素）
* 扩展应用：合并类题目（比较过程中，每个元素都遍历到了，故可以实现合并）


##  Merge sort 合并排序法

### 定义 
A comparison based sorting algorithm，流程如下  
  * Divide entire dataset into groups of at most two
  * Compares each number one at a time, moving the smallest number to left of the pair.
  * Once all pairs sorted it, then compares left most elements of the two leftmost pairs to create sorted group of four with the smallest numbers on the left and the largest ones on the right.
  * This process is repeated until there in only one set.
  * 如图 Merge-Sort.png 

### 重点
* This is one of most basic sorting algorithm.
* Know that it divides all the data into small possible sets then compares them.

### 复杂度
* Time
    * Relation: T(n) = 2T(n/2) + O(n)
    * Best Case Sort: O(NlogN)
    * Average Case Sort: O(NlogN)
    * Worst Case Sort: O(NlogN)
    * Prove the NlogN
        * Recurrence Tree method:
            T(n) = 2T(n/2) + O(n) = 2^2T(n/2^2) + 2O(n/2) + O(n) = 2^2T(n/2^2) + 2O(n) = 2^mT(n/2^m) + mO(n);
            又因为n/2^m = 1时，m= log(2)n. 故 T(n) = nT(1) + log(2)n * O(n) = n + nlog(2)n = nlog2(n) = nlogn
        * Master method
            T(n) = 2T(n/2) + O(n)，故为nlogn 
* Space
    * O(N)

### 方法核心
* 把容器元素分拆成小单位，再依次比较、排列、合并 

In [4]:
'''
Merge Sort temple/basic code
edge case: list is None / only one element 
过程 :
    MergeSort(arr[], l,  r)
    If r > l
    1. Find the middle point to divide the array into two halves:   
         middle m = (l+r)/2
    2. Call mergeSort for first half:   
         Call mergeSort(arr, l, m)
    3. Call mergeSort for second half:
         Call mergeSort(arr, m+1, r)
    4. Merge the two halves sorted in step 2 and 3:
         Call merge(arr, l, m, r)
'''

class SortList:
    # Merge sort
    def mergeSort(self, arr):
        if len(arr) == 0 or len(arr) == 1:
            return
        
        
        if len(arr) >= 2:
            # 普通划分
            mid = len(arr) // 2  # find the mid of the array
            L = arr[:mid]  # divide the array elements
            R = arr[mid:]  # into 2 haves
            
            # 递归划分
            self.mergeSort(L)  # Sorting the first half
            self.mergeSort(R)  # Sorting the second half

            # 合并， input: 两个已排序的list，L，R.过程：对L，R进行合并排序。output：合并完成后的一个list
            i = j = k = 0
            while i < len(L) and j < len(R):  # copy data to temp arrays L and R
                if L[i] < R[j]:
                    arr[k] = L[i]
                    i += 1
                else:
                    arr[k] = R[j]
                    j += 1
                k += 1

            while i < len(L):  # process left data in L part
                arr[k] = L[i]
                k += 1
                i += 1

            while j < len(R):  # process right data in R part
                arr[k] = R[j]
                k += 1
                j += 1

        return arr
            
x = SortList() 
x.mergeSort([2,1,0])

[0, 1, 2]

  
##  Bottom-up Merge sort  自下而上排序法

### 定义
A comparison based sorting algorithm，
  * first merges pairs of adjacent lists of 1 element
  * Then merge pairs of adjacent lists of 2 elements
  * And next merge pairs of adjacent lists of 4 elements
  * And so on. Until the whole list is merged.  


### 复杂度
* Time
    * Relation: T(n) = 2T(n/2) + O(n)
    * Best Case Sort: O(NlogN)
    * Average Case Sort: O(NlogN)
    * Worst Case Sort: O(NlogN)
    * Prove the NlogN
        * Recurrence Tree method:
            T(n) = 2T(n/2) + O(n) = 2^2T(n/2^2) + 2O(n/2) + O(n) = 2^2T(n/2^2) + 2O(n) = 2^mT(n/2^m) + mO(n);
            又因为n/2^m = 1时，m= log(2)n. 故 T(n) = nT(1) + log(2)n * O(n) = n + nlog(2)n = nlog2(n) = nlogn
        * Master method
            T(n) = 2T(n/2) + O(n)，故为nlogn 
* Space
    * O(1) # 随是否新建存储结构而变化

In [1]:
# Bottom-up Merge Sort temple/basic code
# edge case: list is None / only one element 
# 易错点：别忘记while interval < length，subsort求的是段，23.Merge k Sorted list求的是点
class SortList:
    def mergeSort(self, arr):  # Merge sort by bottom-up merge sort
        if len(arr) == 0 or len(arr) == 1:  # edge case
            return arr

        length = len(arr)
        interval = 1
        while interval < length:
            for i in range(0, length, 2 * interval):  # bottom-up merge sort
                arr[i: i + 2 * interval] = self.subsort(arr[i: i + interval], arr[i + interval: i + interval * 2])
            interval = interval * 2
        return arr

    def subsort(self, list1, list2):  # sort two sorted list
        if not list1:
            return list2
        if not list2:
            return list1

        new_list = []
        left_len = len(list1)
        right_len = len(list2)
        i, j = 0, 0
        while i < left_len and j < right_len:
            if list1[i] < list2[j]:
                new_list.append(list1[i])
                i += 1
            else:
                new_list.append(list2[j])
                j += 1

        if i != left_len:
            new_list.extend(list1[i:left_len])
        if j != right_len:
            new_list.extend(list2[j:right_len])
        return new_list

x = SortList() 
x.mergeSort([2,1,0])

[0, 1, 2]

### 合并排序的应用

#### 解决方案：  
* 明确 基本结构与要求
    * 输入 含有数个元素容器
    * 输出 格式！！
        * 原容器输出,即原容器保持维度不变的输出 e.g.list sort  -> 递归时直接 self.sort(L)
        * 新参数输出,即新参数，或者原容器维度改变的输出 e.g. sort linklist, sort list[list] -> 递归时 sub_l = self.sort(L)
    * 要求 合并方式排序元素等
* 明确 题目类型
* 确定 idea （解题算法）
* 思考 不同之处
* 修改 Method
    
#### 排序链表，且不要求space complexity - merge sort
* 为什么用方法？ 
    * 要求排序
    * 相对于快排，合并排序不需要太多的对链表节点的访问(access)。
    （链表储存不是连续分块储存的(continuous block of memory)，不利于节点访问，与list相反）
* 时间/空间复杂度
    * O(NlogN) / O(N)
* 例题： 148. Sort List (linked list).py
    * 递归；递归部分返回头节点；每段的完整切割（要next=None）；先确定R头，再确定L尾
    
#### 排序链表，且using constant space complexity. - bottom-up merge sort
* 为什么用这个方法？ 
    * using constant space complexity.
* 时间/空间复杂度
    * O(NlogN) / O(1)
* 例题： 148. Sort List (linked list).py
    * while interval < length + while head1 迭代；
    * dummy,fake-tail的使用；
    * 独立split函数，input->head,interval ; output -> next head
    * 独立merge函数, input->head1,head2; output -> new head, new tail

#### 计算转置数的数量 Inversion Count Problem
* 为什么用方法？ 
    * 利用 inv_count = inv_count + (len(L) - i) 的性质
* 时间/空间复杂度
    * O(NlogN) / O(N)
* 例题： 
    * 775\. Global and Local Inversions.py
    * 例题：Count Inversions in an array.py
    
#### 合并 K 个排序数组 Merge k sorted arrays

#### 3-way Merge Sort

#### 外置排序 External Sorting

#### 合并链表题 
* 为什么用merge sort？
    * 要求合并，即排序的扩展 
* 23\. Merge k Sorted Lists.py - MergeSort method
    * 基本结构与要求:
        * 输入：含有数个元素为link list的head的容器
        * 要求：合并并排序所有link list
        * 输出：合并后的link list的head
    * 明确题目类型： 大合并+小排序类
    * idea: Merge Sort
    * 不同之处：
        * 合并：链表形式合并，生成新部分链表
        * 输出：新生成的链表头，而不是原容器
    * Method:
        * divide lists into groups of at most two
        * merge and sort linklist in every groups into one linklist
        * merge and sort every two new linklist to create new linklist
        * process is repeated until there in only one linklist
        


## Quick sort 快速排序
### 定义
* A comparision based sorting algorithm （与merge sort相同）
    * Divides entire dataset in half by selecting the middle element and putting all smaller elements to the left of the element and larger ones to the right.
    * It repeats this process on the left side until it is comparing only two elements at which point the left side is sorted.
    * When the left side is finished sorting, it performs the same operation on the right side.  
* Computer architecture favors the quick-sort process.
### 重点
* While it has the same Big O as (or worse in the same cases) many other sorting algorithm, it is often faster in practice than many other sorting algorithms, such as merge sort. 虽然有相同或者更坏的时间复杂度，但是实际中，它要更快，比如比归并快
* Know that it halves the data set by the average continuously until all the information is sorted.
### 时间复杂度
* Best case: O(NlogN)
* Average case: O(NlogN)
* Worst case: O(N2)

In [16]:
'''
https://blog.csdn.net/morewindows/article/details/6684558
Quick sort temple
易错点：
'''
# 讨论if i < j 和 i += 1 的作用:
# 仅不满足x < arr[j]时：
#     if一定成立 - 有无都正常
#     要寻找比x大的数，而i += 1存在时，直接跳过arr[i],这个等于arr[j]，且明显小于x的数
#     而i += 1不存在时，重复 i += 1,再进行之后流程
#     故有无 i+= 1都正常
# 仅不满足i<j时，即i=j了，ij重合:
#     if 存在， i += 1存在
#         arr[i] = arr[j] 和 i+=1 都不运行，整个程序运行结束，x赋给arr[i]
#     if 存在， i += 1不存在
#         arr[i] = arr[j] 不运行，整个程序运行结束，x赋给arr[i]
#     if 不存在，i += 1存在
#         arr[j]赋给arr[i],i = j，即arr[i]不变，i + 1，
#         而下面的又进行了arr[j]赋给arr[i]，整个程序结束，x赋给arr[i+1]
#         出错
#     if 不存在，i += 1不存在
#          arr[j]赋给arr[i],i = j，即arr[i]不变， 整个程序结束，x赋给arr[i]
# 
# 
# 故正确的是：
# if 存在，i += 1存在
# if 存在，i += 1不存在
# if 不存在，i += 1不存在

# 挖坑填数的代码
class sort:
    def AdjustArr(arr, l, r):  # return pivot position
        i = l
        j = r
        x = arr[i]  # 第一个坑
        while i < j:
            while i < j and x < arr[j]:  # find smaller val than x
                j -= 1
            if i < j:
                arr[i] = arr[j]
                i += 1
            while i < j and arr[i] < x:
                i += 1
            if i < j:
                arr[j] = arr[i]
                j -= 1
        arr[i] = x
        return i

    def sortArray(self, nums):
        if len(nums) == 0 or len(nums) == 1:
            return nums
        
        l = 0
        r = len(nums) - 1
        return self.AdjustArr(nums, l, r)


[48, 6, 57, 42, 60, 72, 83, 73, 88, 85]


In [None]:
class Sort:
    def quicksort(self, nums):
        if len(nums) == 0 or len(nums) == 1:
            return nums

        l = 0
        r = len(nums) - 1

        i = l
        j = r
        pivot = nums[i]
        while i < j:
            while i < j and pivot < nums[j]:
                j -= 1
            nums[i] = nums[j]
            while i < j and nums[i] <= pivot:
                i += 1
            nums[j] = nums[i]
        nums[i] = pivot
        L = nums[l:i]
        R = nums[i + 1: r + 1]
        nums[l:i] = self.quicksort(L)  # 产生了一部分，就要把这一部分赋值回去，不然不起作用的
        nums[i + 1: r + 1] = self.quicksort(R)
        return nums


x = Sort()
print(x.quicksort([5,1,1,2,0,0]))


## Bubble sort 气泡排序
### 定义
### 重点
### 时间复杂度




# 基本算法类型 Basic Types of Algorithm 

## 递归算法

## 迭代算法

## 贪心算法
### 定义 
* An algorithm that, while excuting, selects only the information that meets a certain criteria.
* The general five components, from Wiki:
    * A candidate set, from which a solution created.
    * A selection funtion, which chooses the best candidate to be added to the solution.
    * A feasibility function(可行性), that is used to determine if a candidate can be used to contribute to a solution.
    * An objective function, which will assign a value to a soluction, or a partial soluction.
    * A soluction fucntion, which will indicate when we have discouvered a complete solution.
* Key

## 动态规划 dynamic plannig
