#### [Notebook] Deep Learning Face Representation from Predicting 10,000 Classes #

paper author: Yi Sun, Xiaogang Wang, Xiaoao Tang

> This paper proposes to learn a set of high-level feature representations through deep learning, referred to as Deep hidden IDentity features (DeepID), for face verification. (*[section] Abstract*)

## Overview ##

### Comparison of current face verification algorithms ###

[\[appendix\]](#A.1.1-current-face-verification-algorithms)

> The current best-performing face verification algorithms typically represent faces with over-complete low-level features, followed by shallow models [9, 29, 6] (*[section] 1.Introduction*)

> we propose to learn high-level face identity features with deep models through face identification, i.e. classifying a training image into one of n identities (n ≈ 10,000 in this work). (*[section] 1.Introduction*)

### DeepID - Highly compact, 160-dims features ###

[\[detail\]](#D.1.1-DeepID)

> Highly compact 160-dimensional DeepID is acquired at the end of the cascade that contain rich identity information and directly predict a much larger number (e.g., 10, 000) of identity classes. (*[section] 1.Introduction*)

> The proposed features are extracted from various face regions to form complementary and over-complete representations. (*[section] Abstract*)

> Any state-of-the-art classifiers can be learned based on these high-level representations for face verification. (*[section] Abstract*)

### Datasets  ###

[\[detail\]](#D.2.1-Datasets)


|Dataset|People|Image|Size|
| ----|----|----- |----- |
|CelebFaces|5436|87628|250x250|
|CelebFaces+|10177|202599|250x250|
|LFW|5749|13233|250x250|


### Data Processing and Face Patches ###

[\[detail\]](#D.3.1-Data-Processing-and-Face-Patches)

|Patch|Scale|Region(global, local)|Color channel|
| ----|----|----- |----- |
|60|3|(5+5)|RGB, Gray|
|100|5|(5+5)|RGB, Gray|


### Entire Face verification process -  ConvNet + Joint Bayesian ###

[\[detail: Face verification process\]](#D.4.1-Face-verification-process) [\[detail: ConvNet\]](#D.4.2-ConvNet) [\[detail: Joint Bayesian\]](#D.4.3-Joint-Bayesian) [\[detail: *"neural network"*\]](#D.4.4-%22neural-network%22)

Two components of the process:

- Feature extractor: ConvNet (DeepID model)
- Classifier: Joint Bayesian or *"neural network"*

### Experiments ###

- The classification ability of Multi-scale ConvNets
  - [\[paper\]](#D.5.1.1-The-classification-ability-of-Multi-scale-ConvNets) | [\[model_62x62_3\]](./model_62x62_3/experiment_classification_ability_of_multi-scale_convNets/experiment_classification_ability_of_multi-scale_convNets.ipynb)
- The effectiveness of the learned hidden representations for face verification
  - [\[paper\]](#D.5.1.2-The-effectiveness-of-the-learned-hidden-representations-for-face-verification) | [\[model_62x62_3\]](./model_62x62_3/experiment_effectiveness_of_the_learned_hidden_representations_for_face_verification/experiment_effectiveness_of_the_learned_hidden_representations_for_face_verification.ipynb)
- The learned features extract identity information
  - [\[paper\]](#D.5.1.3-The-learned-features-extract-identity-information) | [\[model_62x62_3\]](./model_62x62_3/experiment_learned_features_extract_identity_information/experiment_learned_features_extract_identity_information.ipynb)
- Various face patches combination contributes to the performance
  - [\[paper\]](#D.5.1.4--Various-face-patches-combination-contributes-to-the-performance)
- Method comparison
  - [\[paper\]](#D.5.1.5-Method-comparison) | [\[model_62x62_3\]](./model_62x62_3/experiment_method_comparison/experiment_method_comparison.ipynb)

### Conclusion ###

> Our method achieves 97.45% face verification accuracy on LFW using only weakly aligned faces (*[section] 1.Introduction*)

> Although the prediction task at the training stage becomes more challenging, the discrimination and generalization ability of the learned features increases. (*[section] 1.Introduction*)

> Allowing the DeepID to pool over multi-scale features reduces validation errors by an average of 4.72%. ([section] 4.1 Multi-scale ConvNets)

> We also observe that as the number of training identities increases, the verification performance steadily gets improved. (*[section] 1.Introduction*)

> We find that faces of the same identity tend to have more commonly activated neurons (positive features being in the same position) than those of different identities. ([section] 4.2 Learning effective features)

> As shown in Figure 9, adding more features from various regions, scales, and color channels consistently improves the performance. ([section] 4.3 Over-complete representation)

### Detail ###

#### D.1.1 DeepID ####

[\[back to top\]](#DeepID---Highly-compact,-160-dims-features)

**DeepID concatenation**

![figure1](images/figure1.png)

> We further concatenate the DeepID extracted from various face regions to form complementary and over-complete representations. (*[section] 1.Introduction*)

> Each ConvNet takes a face patch as input and extracts local low-level features in the bottom layers. (*[section] 1.Introduction*)

> Feature numbers continue to reduce along the feature extraction cascade while gradually more global and high-level features are formed in the top layers. (*[section] 1.Introduction*)

> The learned features can be well generalized to new identities in test, which are not seen in training, and can be readily integrated with any state-of-the-art face classifiers (e.g., Joint Bayesian [8]) for face verification. (*[section] 1.Introduction*)

Note:

- high-level over-complete features are performed by concatenating DeepIDs of 60(or 100) ConvNets

#### D.2.1 Datasets ####

[\[back to top\]](#Datasets)

** Dataset **

LFW: 5749 people, 13233 images

CelebFace: 5436 people, 87628 images

CelebFace+: 10177 people, 202599 images

** Dataset division **

*Choice 1:*

- Learning DeepID
  - training set: 80% CelebFace (4349 people)(randomly select)
  - validation set: 10% images of each training person (randomly select)
- Learning Joint Bayesian
  - training set: remaining 20% CelebFace (1400 people)
  - testing set: all LFW pairs (6000 pairs)

*Choice 2:*

- Learning DeepID
  - training set: 80% CelebFace+ (8700 people)(randomly select)
  - validation set: 10% images of each training person (randomly select)
- Learning Joint Bayesian
  - training set: remaining 20% CelebFace+ (1477 people)
  - testing set: all LFW pairs (6000 pairs)

> We randomly choose 80% (4349) people from CelebFaces to learn the DeepID, and use the remaining 20% people to learn the face verification model (Joint Bayesian or neural networks). (*[section] 4. Experiments*)

> We randomly select 10% images of each training person to generate the validation data. (*[section] 4. Experiments*)


#### D.3.1 Data Processing and Face Patches ####

[\[back to top\]](#Data-Processing-and-Face-Patches)

** The pre-process of face images: **

1. Face Detection
2. Face Alignment
  - Facial landmark detection method: [Deep convolutional network cascade for facial point detection](#A.2.1-Facial-Landmark-Detection-Method)
  - 5 facial points (two eye centers, nose tip, two mouth corners) used for alignment
3. Face Cropping
  - 39 x 31 x k (k = 3 for color image, k = 1 for gray image)
  - 31 x 31 x k

> We detect five facial landmarks, including the two eye centers, the nose tip, and the two mouth corners, with the facial point detection method proposed by Sun et al. [30]. (*[section] 3.2 Feature extraction*)

** 60 patches of face images **

- 3 scales, 5 global regions + 5 local regions, RGB or Gray
  - 60 patches = 3 x (5+5) x 2

** 100 patches of face images **

- 5 scales, 5 global regions + 5 local regions, RGB or Gray
  - 100 patches = 5 x (5+5) x 2

![figure3](images/figure3.png)

> Figure 3. Top: ten face regions of medium scales. The five regions in the top left are global regions taken from the weakly aligned faces, the other five in the top right are local regions centered around the five facial landmarks (two eye centers, nose tip, and two mouse corners). Bottom: three scales of two particular patches. (*[section] Figure 3. description*)

> 60 face patches with ten regions, three scales, and RGB or gray channels. (*[section] 3.2 Feature extraction*)

> 其中局部图像是关键点（每个图像一个关键点）居中，不同的区域大小和不同的尺度图像输入到CNN中，其CNN的结构可能会不相同，但是最后的特征的都是160维度，最后将所有的特征级联起来。 [\[2\]](#Reference)

> "The five regions in the top left are global regions taken from the weakly aligned faces" 左上角那5个人脸块图像: 应该就是简单的根据人脸关键点做剪切 [\[2\]](#Reference)

> "three scales of two particular patches." 這裡的 scale 應該是指圖像"顯示範圍"

```
NOTE:

5 global region 的個人想法：

左圖: 39x31 

右四圖: 31x31

r-t-1: 取 邊界頂和眼高 之間為中心
r-t-2: 取 邊界頂和鼻子 之間為中心
r-d-1: 取 邊界頂和嘴巴 之間為中心
r-d-2: 取 邊界頂和邊界底 之間為中心

r-t, r-d 同一 size
```


#### D.4.1 Face verification process ####

[\[back to top\]](#Entire-Face-verification-process----ConvNet-+-Joint-Bayesian)

image from http://blog.csdn.net/stdcoutzyx/article/details/41596663

![face-verification](https://raw.githubusercontent.com/stdcoutzyx/Blogs/master/blogs/imgs/n2-process.png)

- step1: feature extraction (ConvNet, or called DeepID model)
- step2: feature recognition (Joint Bayesian)

> we conduct feature extraction and recognition in two steps, with the first feature extraction step learned with the target of face identification, which is a much stronger supervision signal than verification. (*[section] 1.Introduction*)

** Feature extraction: ConvNet **

> We trained 60 ConvNets, each of which extracts two 160-dimensional DeepID vectors from a particular patch and its horizontally flipped counterpart. A special case is patches around the two eye centers and the two mouth corners, which are not flipped themselves, but the patches symmetric with them (for example, the flipped counterpart of the patch centered on the left eye is derived by flipping the patch centered on the right eye). (*[section] 3.2 Feature extraction*)

> The total length of DeepID is 19, 200 (160 × 2 × 60), which is ready for the final face verification. (*[section] 3.2 Feature extraction*)

** Face verification: Joint Bayesian **

- method: [Bayesian face revisited: A joint formulation.](#A.1.1-current-face-verification-algorithms)
  - with EM algorithm

> We use the Joint Bayesian [8] technique for face verification based on the DeepID. (*[section] 3.3 Face verification*)

> In face verification, our feature dimension is reduced to 150 by PCA before learning the Joint Bayesian model. (*[section] 4. Experiments*)

> Sμ and Sε can be learned from data with EM algorithm. (*[section] 3.3 Face verification*)

Keynote:

1. 先用 8700 people 去訓練 DeepID model
2. 用已訓練的 DeepID model 來取得 1477 people 的 DeepID
3. 用 1477 people 的 DeepID 來訓練 Joint Bayesian
4. 取得 LFW 的 6000 對 DeepID ，並測試已訓練的 Joint Bayesian
  - input: 6000 pairs on LFW


#### D.4.2 ConvNet ####

[\[back to top\]](#Entire-Face-verification-process----ConvNet-+-Joint-Bayesian)

![figure2](images/figure2_ad.png)

** ConvNet structure **

```
|C1| |C2|  |C3|  |C4| ->|F|  |        |
  \  /  \  /  \  /      | |->|soft-max|
  |M1|  |M2|  |M3| ---->|C|  |        |

Note:

1. input size: 39(height)x31(width)xk or 31x31xk, k is 1 or 3
2. FC 的輸出為 DeepID
3. Cx 後面接 ReLU (activation function)
4. C3: weights are locally shared in every 2×2 regions
5. C4: weights are totally unshared
6. back-propagation: stochastic gradient descent
```

> ConvNets contain four convolutional layers (with max-pooling) (*[section] 3.1 Deep ConvNets*)

> followed by the fully-connected DeepID layer and the softmax output layer (*[section] 3.1 Deep ConvNets*)

> The input is 39 × 31 × k for rectangle patches, and 31 × 31 × k for square patches, where k = 3 for color patches and k = 1 for gray patches. (*[section] 3.1 Deep ConvNets*)

|Layer|kernel nums|kernel size|stride
| ----|----|----- |----- |
|C1|20|4X4|1|
|M1|-|2X2|2|
|C2|40|3X3|1|
|M2|-|2X2|2|
|C3|60|3X3|1|
|M3|-|2X2|2|
|C4|80|2X2|1|

** Different feature extraction process: learn with face identification, not face verification **

- two considerations below:
  - effective features
  - regularization, not over-fit

> We propose an effective way to learn high-level over-complete features with deep ConvNets. (*[section] 1.Introduction*)

> The ConvNets are learned to classify all the faces available for training by their identities, with the last hidden layer neuron activations as features (referred to as Deep hidden IDentity features or DeepID) (*[section] 1.Introduction*)

> Classifying all the identities simultaneously instead of training binary classifiers as in [21, 2, 3] is based on two considerations. (*[section] 1.Introduction*)

> This challenging task can make full use of the super learning capacity of neural networks to extract effective features for face recognition. (*[section] 1.Introduction*)

> it implicitly adds a strong regularization to ConvNets, which helps to form shared hidden representations that can classify all the identities well. (*[section] 1.Introduction*)

> Therefore, the learned high-level features have good generalization ability and do not over-fit to a small subset of training faces. (*[section] 1.Introduction*)

** locally shared **

> Weights in higher convolutional layers of our ConvNets are locally shared to learn different mid- or high-level features in different regions. (*[section] 3.1 Deep ConvNets*)

> In the third convolutional layer, weights are locally shared in every 2 × 2 regions, while weights in the fourth convolutional layer are totally unshared. (*[section] 3.1 Deep ConvNets*)


#### D.4.3 Joint Bayesian ###

[\[back to top\]](#Entire-Face-verification-process----ConvNet-+-Joint-Bayesian)

process:

1. train the Joint Bayesian model with **"DeepID"**
  - input: 150 dims **"DeepID"**
2. test the Joint Bayesian model on LFW

Note:

- The face verification process of [Bayesian face revisited: A joint formulation.](#A.1.1-current-face-verification-algorithms) is: LBP + Joint Bayesian
  - this paper use: DeepID + Joint Bayesian


#### D.4.4 "neural network"####

[\[back to top\]](#Entire-Face-verification-process----ConvNet-+-Joint-Bayesian)

- *"neural network"*: is a custom neural network for face verification

![figure4_ad](images/figure4_ad.png)

> We also train a neural network for verification and compare it to Joint Bayesian to see if other models can also learn from the extracted features and how much the features and a good face verification model contribute to the performance, respectively. (*[section] 3.3 Face verification*)



#### D.5.1 Experiments ####

#### D.5.1.1 The classification ability of Multi-scale ConvNets ####

[\[back to top\]](#Experiments)

![figure5_ad](images/figure5_ad.png)

Motivation:

> We verify the effectiveness of directly connecting neurons in the third convolutional layer (after max-pooling) to the last hidden layer (the DeepID layer) (*[section] 4.1 Multi-scale ConvNets*)

Process:

```
conventional method: only connect C4 to the last hidden layer

multi-scale method: connect M3, C4 to the last hidden layer
```

- train 60 ConvNets with different 60 patches.
  - with conventional method
  - with multi-scale method
- training set: 80% CelebFace (4349 people)
- validation set: 10% images of each training person

Result:

> Figure 5 compares the top-1 validation set error rates of the 60 ConvNets learned to classify the 4349 classes of identities, either with or without the skipping layer. (*[section] 4.1 Multi-scale ConvNets*)

> Allowing the DeepID to pool over multi-scale features reduces validation errors by an average of 4.72%. (*[section] 4.1 Multi-scale ConvNets*)

> It actually also improves the final face verification accuracy from 95.35% to 96.05% when concatenating the DeepID from the 60 ConvNets and using Joint Bayesian for face verification. (*[section] 4.1 Multi-scale ConvNets*)

Note:

The face verification accuracy of concatenating from 60 ConvNets and using Joint Bayesian: (on LFW)

- conventional method: 95.35%
- multi-scale method: 96.05%

#### D.5.1.2 The effectiveness of the learned hidden representations for face verification ####

[\[back to top\]](#Experiments)

![figure6](images/figure6.png)

![figure7](images/figure7.png)

Motivation:

> Classifying a large number of identities simultaneously is key to learning discriminative and compact hidden features. (*[section] 4.2 Learning effective features*)

Process:

1. train ConvNets with 136, 272, 544, 1087, 2175, 4349 people(classes), respectively.
  - training set: 136, 272, 544, 1087, 2175, 4349 people on CelebFace.
  - validation set: 10% images of each training person
  - The input is a single patch covering the whole face in this experiment. (*[section] 4.2 Learning effective features*)
  - There would be 6 learned ConvNets eventually.
2. train Joint Bayesian or *"neural network"* model with features extracted from 6 learned ConvNets, respectively.
  - training set: 20% CelebFace (1400 people)
  - There would be 6 learned verification models eventually.
3. test the verification models on LFW dataset.
  - testing set: all LFW pairs (6000 pairs)

Result:

> More identity classes help to learn better hidden representations that can distinguish more people (discriminative) without increasing the feature length (compact). (*[section] 4.2 Learning effective features*)

> we increase the identity classes for training exponentially (and output neuron numbers correspondingly) from 136 to 4349 while fixing the neuron numbers in all previous layers (the DeepID is kept to be 160 dimensional). (*[section] 4.2 Learning effective features*)

> both Joint Bayesian and *"neural network"* improve linearly in verification accuracy when the identity classes double. (*[section] 4.2 Learning effective features*)

> When identity classes increase 32 times from 136 to 4349, the accuracy increases by 10.13% and 8.42% for Joint Bayesian and neural networks, respectively, or 2.03% and 1.68% on average, respectively, whenever the identity classes double. (*[section] 4.2 Learning effective features*)


#### D.5.1.3 The learned features extract identity information ####

[\[back to top\]](#Experiments)

![figure8](images/figure8.png)

Motivation:

> We find that faces of the same identity tend to have more commonly activated neurons (positive features being in the same position) than those of different identities. (*[section] 4.2 Learning effective features*)

Process:

1. train ConvNet
  - training set: 80% CelebFace (4349 people)
  - validation set: 10% images of each training person
2. input three test pairs in LFW and retrieve corresponding 160-dims DeepID(features) extracted from each patch.
3. rearrange "activation pattern"(sparsity pattern) as 5x32

Result:

> So the learned features extract identity information. (*[section] 4.2 Learning effective features*)

#### D.5.1.4  Various face patches combination contributes to the performance ####

[\[back to top\]](#Experiments)

![figure9](images/figure9.png)

Motivation:

> We evaluate how much combining features extracted from various face patches would contribute to the performance. (*[section] 4.3 Over-complete representation*)

> We train the face verification model with features from k patches (k = 1, 5, 15, 30, 60). (*[section] 4.3 Over-complete representation*)

Process:

1. train the ConvNet, face verification model with features from single patch (k=1)
  - report the best-performing single patch
2. train the ConvNet, face verification model with features from **global color patches in a single scale** (k=5)
3. train the ConvNet, face verification model with features from **all the global color patches** (k=15)
4. train the ConvNet, face verification model with features from **all the color patches** (k=30)
5. train the ConvNet, face verification model with features from **all the patches** (k=60)

- training set: 80% CelebFace (4349 people)
- validation set: 10% images of each training person

Result:

> We report the best-performing single patch (k = 1), the global color patches in a single scale (k = 5), all the global color patches (k = 15), all the color patches (k = 30), and all the patches (k = 60). (*[section] 4.3 Over-complete representation*)

> As shown in Figure 9, adding more features from various regions, scales, and color channels consistently improves the performance. (*[section] 4.3 Over-complete representation*)

> Combing 60 patches increases the accuracy by 4.53% and 5.27% over best single patch for Joint Bayesian and neural networks, respectively. (*[section] 4.3 Over-complete representation*)

> We achieve 96.05% and 94.32% accuracy using Joint Bayesian and neural networks, respectively. (*[section] 4.3 Over-complete representation*)

Note:

Process of creating DeepID concatenation (60 ConvNets):

1. Concatenate 60x2 DeepIDs(origin, horizontally flipped counterpart) from 60 ConvNets => called DeepID'
  - dims of DeepID' = 60x(2x160) = 19200 dims
2. Do PCA (reduce dims of DeepID' to 150 dims) => this is final "DeepID concatenation"!
  - dims of DeepID concatenation = 150 dims

The face verification accuracy of concatenating from 60 ConvNets: (on LFW)

- using Joint Bayesian: 96.05%
- using *"neural networks"*: 94.32%


#### D.5.1.5 Method comparison ####

[\[back to top\]](#Experiments)

![table1](images/table1.png)

![figure10](images/figure10.png)

Process:

1. train and test DeepID on CelebFaces
  - 60 ConvNets, Joint Bayesian
2. train and test DeepID on CelebFace+
  - 100 ConvNets, Joint Bayesian
3. train and test DeepID on CelebFace+ & TL (Transfer learning algorithm)
  - 100 ConvNets, Joint Bayesian, TL

Result:

1. test accuracy of DeepID on CelebFace (60 ConvNets)
  - 96.05%
2. test verification of DeepID on CelebFace+ (100 ConvNets)
  - 97.20%
3. test verification of DeepID on CelebFace+ (100 ConvNets) & TL
  - 97.45%

> we enlarge the CelebFaces dataset to CelebFaces+, which contains 202, 599 face images of 10, 177 celebrities. (*[section] 4.4 Method comparison*)

> People in CelebFaces+ and LFW are mutually exclusive. (*[section] 4.4 Method comparison*)

> We randomly choose 8700 people from CelebFaces+ to learn the DeepID, and use the remaining 1477 people to learn Joint Bayesian for face verification. (*[section] 4.4 Method comparison*)

> we increase the patch number to 100 by using five different scales of patches instead of three. (*[section] 4.4 Method comparison*)

> This results in a 32,000-dimensional DeepID feature vector, which is then reduced to 150 dimensions by PCA. Joint Bayesian learned on this 150-dimensional feature vector achieves 97.20% test accuracy on LFW. (*[section] 4.4 Method comparison*)

> Due to the difference in data distributions, models well fitted to CelebFaces+ may not have equal generalization ability on LFW. (*[section] 4.4 Method comparison*)

> Cao et al. [6] proposed a practical transfer learning algorithm to adapt the Joint Bayesian model from the source domain to the target domain. (*[section] 4.4 Method comparison*)

> We implemented their algorithm by using the 1477 people from CelebFaces+ as the source domain data and nine out of ten folders from LFW as the target domain data for transfer learning Joint Bayesian, and conduct ten- fold cross validation on LFW. (*[section] 4.4 Method comparison*)

Note:

- Transfer learning algorithm

### Appendix ###

[\[back to top\]](#Comparison-of-current-face-verification-algorithms)

#### A.1.1 current face verification algorithms ####

The current best-performing face verification algorithms:

- [9] D. Chen, X. Cao, F. Wen, and J. Sun. [Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification.](http://www.msr-waypoint.net/en-us/um/people/jiansun/papers/CVPR13_HighDim.pdf) In Proc. CVPR, 2013.
- [29] K. Simonyan, O. M. Parkhi, A. Vedaldi, and A. Zisserman. [Fisher vector faces in the wild.](https://www.robots.ox.ac.uk/~vgg/publications/2013/Simonyan13/simonyan13.pdf) In Proc. BMVC, 2013.
- [6] X. Cao, D. Wipf, F. Wen, G. Duan, and J. Sun. [A practical transfer learning algorithm for face verification.](http://research.microsoft.com/pubs/202192/TransferLearning.pdf) In Proc. ICCV, 2013
- [8] D. Chen, X. Cao, L. Wang, F. Wen, and J. Sun. [Bayesian face revisited: A joint formulation.](http://research.microsoft.com/pubs/192105/JointBayesian.pdf) In Proc. ECCV, 2012. [1](#D.4.1-Face-verification-process) [2](#D.4.3-Joint-Bayesian)

#### A.2.1 Facial Landmark Detection Method ####

- [30] Y.Sun, X.Wang, andX.Tang. [Deep convolutional network cascade for facial point detection.](http://www.ee.cuhk.edu.hk/~xgwang/papers/sunWTcvpr13.pdf) In Proc. CVPR, 2013 [1](#D.3.1-Data-Processing-and-Face-Patches)


## Background ##

- state-of-the-art: 最先進的
- unconstrained condition: 未受人為控制的**實際環境**
- over-complete: 過度完全性（**過多**），相對於 compact
  - [信號處理案例：對數據壓縮等應用是不利的](https://books.google.com.tw/books?id=TfoCbM_9vzYC&pg=PA106&lpg=PA106&dq=overcomplete+%E6%84%8F%E7%BE%A9&source=bl&ots=1iepO7zIaX&sig=yY3pv9y4xijk_U0H689AW_pTfxM&hl=zh-TW&sa=X&ved=0ahUKEwims-ado7_LAhUELKYKHVcgAuEQ6AEISDAH#v=onepage&q=overcomplete%20%E6%84%8F%E7%BE%A9&f=false)
- feature extraction cascade: 指從 low-level 到 high-level 的特徵萃取過程
- L2 distance: L2 norm, a euclidean distance between points
  - [wiki: Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance)
  - [mathisfunforum](http://www.mathisfunforum.com/viewtopic.php?id=17995)
- LBP: Local binary patterns
  - [wiki: 局部二值模式](https://zh.wikipedia.org/wiki/%E5%B1%80%E9%83%A8%E4%BA%8C%E5%80%BC%E6%A8%A1%E5%BC%8F)
- softMax: [UFLDL: SoftMax](http://ufldl.stanford.edu/wiki/index.php/Softmax%E5%9B%9E%E5%BD%92)
- Joint Bayesian, EM:
  - [【人脸识别】人脸验证算法Joint Bayesian详解及实现（Python版）](http://blog.csdn.net/cyh_24/article/details/49059475)
  - [ Bayesian face revisited : a joint formulation 笔记 ](http://blog.csdn.net/csyhhb/article/details/46300001)
  - [已知两个高斯分布及他们的关系，如何求条件期望? ](http://www.zhihu.com/question/28086678)
  - [Bayesian Face Revisited: A Joint Formulation 算法流程图 ](http://blog.csdn.net/hqbupt/article/details/37758627)
- Dropout:
- gradient diffusion
- Transfer learning [paper: a pratical transfer learning algorithm for face verification]:
- K-fold cross-validation: K次交叉驗證，初始採樣分割成K個子樣本，一個單獨的子樣本被保留作為驗證模型的數據，其他K-1個樣本用來訓練。交叉驗證重複K次，每個子樣本驗證一次，平均K次的結果或者使用其它結合方式，最終得到一個單一估測。這個方法的優勢在於，同時重複運用隨機產生的子樣本進行訓練和驗證，每次的結果驗證一次，10次交叉驗證是最常用的。
  - https://zh.wikipedia.org/wiki/%E4%BA%A4%E5%8F%89%E9%A9%97%E8%AD%89#K-fold_cross-validation

## Resource ##

- [數據堂](http://datatang.com/)
  - 各式數據下載
- [AlfredXiangWu/face_verification_experiment](https://github.com/AlfredXiangWu/face_verification_experiment)
- [RiweiChen/DeepFace](https://github.com/RiweiChen/DeepFace)
- [cyh24/Joint-Bayesian](https://github.com/cyh24/Joint-Bayesian)
- [RiweiChen/FaceTools](https://github.com/RiweiChen/FaceTools)
- [【云计算虚拟化】Docker的基本命令使用 ](http://blog.csdn.net/chenriwei2/article/details/50250923)
- [【云计算虚拟化】基于docker的caffe环境搭建 ](http://blog.csdn.net/chenriwei2/article/details/50250685)

## Reference ##

[1] 張雨石, [DeepID人脸识别算法之三代](http://blog.csdn.net/stdcoutzyx/article/details/42091205), 2014-12-23

[2] Riwei Chen, [【Caffe实践】基于Caffe的人脸识别实现 - DeepID](http://blog.csdn.net/chenriwei2/article/details/49500687), 2015-11-01

[3] Riwei Chen, [【深度学习论文笔记】Deep Learning Face Representation from Predicting 10,000 Classes ](http://blog.csdn.net/chenriwei2/article/details/31415069), 2015-03-24 [1](#D.3.1-Data-Processing-and-Face-Patches)

[4] CSDNcloud, [專訪DeepID發明者孫禕：關於深度學習與人臉算法的深層思考](http://wechat.kanfb.com/archives/129700), 2015-11-27

[5] 仙道菜, [【人脸识别】人脸验证算法Joint Bayesian详解及实现（Python版）](http://blog.csdn.net/cyh_24/article/details/49059475), 2015-10-12

