# Python数据结构基础  


## List 
### Definition 
* Stores data elements based on an sequential, most commonly 0 based, index. 
* Based on tuples from set theory. （元组 + 可变的集合 -> 可变的元组）。
* They are one of the oldest, most commonly used data structures. 

### Key:
* Advantages: indexing  
* Disadvantages: inserting(except append), and deleting(except pop).
* Linear data structures, are the most basic. 最基本结构：线性数组.
    * Stack 栈 
    * Queue 队列 
* Non-linear data structures.非线性数据结构
    * Graphs 图 [[1,2], [3, 4]]
    * Trees  树 中序遍历 [1,[2,4,5],3]

### Time complexity (linear structure) (增删改查排）
* Append： O(1)
* Insertion: O(N) (l[a:b] = ...) 
* Pop：O(1)
* Delete: O(N) depends on i; O(N) in worst case
* Indexing: O(1)
* Search: traversal list O(N)
* Sort: O(NlogN)



## Linked list
### Definition:
* Stores data with nodes that point to other nodes. 用节点储存数据并指向其他节点
    * 节点的描述：Nodes, at its most basic it has one datum and one reference (another node) 在最基础的情况下，节点拥有一个数据和一个参考（其他节点）
    * 链表的描述：A linked list chains nodes together by pointing one node's reference to another. 节点通过指向另一个节点从而使链表连接在一起

### Key: 
* Advantage: Add and deletion 
* Disadvantage: indexing and searching. 
* Doubly linked list has nodes that also references the previous node. 双向链表有两个参考，一个是之前的node，一个是之后的node
* Circularly linked list is simple linked list whose tail, the last node, references the head, the first node. 成环链表是由简单的链表首末相连组成
* 可以实现栈和队列

### Time Complexity:(增删改查排）
* Add node: O(1)
* Deletion: O(1)
* indexing: O(N)
* Search: traversal linklist O(N)
* Sort: O(NlogN)

## Stack and Queue
* Stack 栈 
    * Last-In-First-Out (LIFO) concept 
    * append() / pop() 
    * can be made from **list**
    * can be made from **linklist**, by having the head be the only place for insertion and removal. 可用链表构成,插入：建立新节点，连接到头节点；移除：移除头节点，返回下一个
        * 参见 <链表实现stack和Queue>
* Queue 队列 
    * First-In-First-Out (FIFO) principle 
    * Lists are not efficient to implement a queue 列表不能有效的执行队列
        * 故调用容器-双端队列deque(double-end-queue) 
        * from collections import deque d = deque(iterable) ("deck")，If iterable is not specified, the new deque is empty.
        * append(x) / appendleft(x) / pop() / popleft() / extend(iterable) / extendleft(iterable) O(1) 双向append or pop, O(1)
    * can be made from **linklist** that only removes from head and adds to tail 也可用链表构成, 从头部移除，从尾部添加
        * 参见 <链表实现stack和Queue>
        

## Hash Table/Map (dictionary)
### Definition
* 结构：Stores data with **key value pairs**.
* 功能：Hash function accept a key and return an output unique only to that specific key
    * This is known as hashing, which is the concept that an input and an output have **a one-to-one correspondence** to map information.
    * Hash function has **a unique address** in memory for that data.

### Key 
* Advantages: insertion, deletion, and searching
* Hash collisions(哈希冲突) are when a hash function returns the **same output** for **two distinct inputs**.
    * All hash function have its problem
    * This is often accommodated(解决) for **having the hash table being very large**
* Hashes are important for **dictionary** and **database indexing**.

### Time complexity (增删改查排）
* Store：O(1) d[k] = v
* Pop: O(1) **d.pop(k)** 删除key对应的pair，并返回value
* Pop item: O(1) **d.popitem()** 删除并返回最后一个pair
* Delete: O(1) del d[k]
* Search: O(1) d[k]


# Efficient Sorting Basic 高效排序基础
* 基本应用：排序类题目
    * 核心步骤：比较（每个元素）
* 扩展应用：合并类题目（比较过程中，每个元素都遍历到了，故可以实现合并）


## Merge sort 合并排序法
### Definition
A comparison based sorting algorithm，流程如下  
  * Divide entire dataset into groups of at most two
  * Compares each number one at a time, moving the smallest number to left of the pair.
  * Once all pairs sorted it, then compares left most elements of the two leftmost pairs to create sorted group of four with the smallest numbers on the left and the largest ones on the right.
  * This process is repeated until there in only one set.
  
### Key
* This is one of most basic sorting algorithm.
* Know that it divides all the data into small possible sets then compares them.

### Complexity
* Time
    * Relation: T(n) = 2T(n/2) + O(n)
    * Best Case Sort: O(NlogN)
    * Average Case Sort: O(NlogN)
    * Worst Case Sort: O(NlogN)
    * Prove the NlogN
        * Recurrence Tree method:
            T(n) = 2T(n/2) + O(n) = 2^2T(n/2^2) + 2O(n/2) + O(n) = 2^2T(n/2^2) + 2O(n) = 2^mT(n/2^m) + mO(n);
            又因为n/2^m = 1时，m= log(2)n. 故 T(n) = nT(1) + log(2)n * O(n) = n + nlog(2)n = nlog2(n) = nlogn
        * Master method
            T(n) = 2T(n/2) + O(n)，故为nlogn
         
* Space
    * O(N)

In [None]:
# Merge Sort temple/basic code
# edge case: list is None / only one element 
class SortList:
    # Merge sort
    def mergeSort(self, arr):
        if len(arr) == 0 or len(arr) == 1:
            return
        
        
        if len(arr) >= 2:
            # 普通划分
            mid = len(arr) // 2  # find the mid of the array
            L = arr[:mid]  # divide the array elements
            R = arr[mid:]  # into 2 haves
            
            # 递归划分
            self.mergeSort(L)  # Sorting the first half
            self.mergeSort(R)  # Sorting the second half

            # 合并
            i = j = k = 0
            while i < len(L) and j < len(R):  # copy data to temp arrays L and R
                if L[i] < R[j]:
                    arr[k] = L[i]
                    i += 1
                else:
                    arr[k] = R[j]
                    j += 1
                k += 1

            while i < len(L):  # process left data in L part
                arr[k] = L[i]
                k += 1
                i += 1

            while j < len(R):  # process right data in R part
                arr[k] = R[j]
                k += 1
                j += 1

        return arr
            
x = SortList() 
x.mergeSort([2,1,0])

### 基本结构
1. 输入：容器 - 含有数个元素
* 输出：原容器 - 元素已经排序的
* 要求：合并方式排序元素 

### idea
Merge Sort method

### 思路
* 把容器元素分拆成小单位，再依次比较、排列、合并 
* 在本体上修改

### 思路核心
* 从底层到顶层，两两合并

### Basic Method
如图 Merge-Sort.png  
1. divide lists into groups of at most two
* sort every groups by moving smallest to left 
* sort left most elements of the two leftmost pairs to create sorted group of four
* process is repeated until there in only one pair

### 基本方法（特征点）
1. 划分：普通划分把arr划分成两个部分,之后递归划分把部分划分到最小的单位（划分成对 - ‘分开’）
* 递归比较：依次比较两个部分的元素 （比较）
* 递归排列：从小到大依次重新排列这两部分元素在arr中的顺序 （重新排列） 或者 从小到大依次重新排列这两部分元素，并合并成新部分 （合并到一起（‘合并’））
* 直到只有一对数

### 应用
* 解决方案：  
    * 明确 基本结构与要求
        * 输入
        * 输出
        * 要求
    * 明确 题目类型
    * 确定 idea （解题算法）
    * 思考 不同之处
    * 修改 Method
    
#### 1. 排序链表
* 为什么用merge sort？ 
    * 要求排序
    * 相对于快排，合并排序不需要太多的对链表节点的访问(access)。
    （链表储存不是连续分块储存的(continuous block of memory)，不利于节点访问，与list相反）
* 时间/空间复杂度
    * O(NlogN) / O(1)
* 例题：148. Sort List (linked list).py
#### 2. 计算转置数的数量 Inversion Count Problem
#### 3. 外置排序 External Sorting
#### 4. 合并链表题 
* 为什么用merge sort？
    * 要求合并，即排序的扩展 
* 23\. Merge k Sorted Lists.py - MergeSort method
    * 基本结构与要求:
        * 输入：含有数个元素为link list的head的容器
        * 要求：合并并排序所有link list
        * 输出：合并后的link list的head
    * 明确题目类型： 大合并+小排序类
    * idea: Merge Sort
    * 不同之处：
        * 合并：链表形式合并，生成新部分链表
        * 输出：新生成的链表头，而不是原容器
    * Method:
        * divide lists into groups of at most two
        * merge and sort linklist in every groups into one linklist
        * merge and sort every two new linklist to create new linklist
        * process is repeated until there in only one linklist

## Quick sort 快速排序
### Definition
* A comparision based sorting algorithm （与merge sort相同）
    * Divides entire dataset in half by selecting the middle element and putting all smaller elements to the left of the element and larger ones to the right.
    * It repeats this process on the left side until it is comparing only two elements at which point the left side is sorted.
    * When the left side is finished sorting, it performs the same operation on the right side.  
* Computer architecture favors the quick-sort process.
### Key
* While it has the same Big O as (or worse in the same cases) many other sorting algorithm, it is often faster in practice than many other sorting algorithms, such as merge sort. 虽然有相同或者更坏的时间复杂度，但是实际中，它要更快，比如比归并快
* Know that it halves the data set by the average continuously until all the information is sorted.
### Time complexity
* Best case: O(NlogN)
* Average case: O(NlogN)
* Worst case: O(N2)


## Bubble sort 气泡排序
### Definition
### Key
### Time complexity


