reading-list

reading-list

Basic Network

AlexNet MLA Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. ⭐️⭐️⭐️⭐️⭐️ extensive experiment thinking
G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and

R. R. Salakhutdinov. Improving neural networks by preventing

co-adaptation of feature detectors. arXiv preprint

arXiv:1207.0580, 2012.
Dropout Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever. Ruslan Salakhutdinov Dropout: A Simple Way to Prevent Neural Networks from Overfitting. 2014. ⭐️⭐️⭐️⭐️ ensemble architecture adding noise
GoogLeNet Christian Szegedy et al. "Going deeper with convolutions" [InceptionV1] CVPR 2015. ⭐️⭐️⭐️⭐️⭐️ efficient multi-scale 1x1 conv
VGG Karen Simonyan & Andrew Zisserman. "Very Deep Convolutional Networks for Large-Scale Image Recongnition".
PReLU & msra Initilization: He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE international conference on computer vision. 2015. ⭐️⭐️⭐️⭐️⭐️
Batch Normalization & Inception V2 Sergey Ioffe et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ICML2015.⭐️⭐️⭐️⭐️⭐️ stable fast convergency

InceptinV3 Christian Szegedy et al. Rethinking the Inception Architecture for Computer Vision. CVPR 2016.

⭐️⭐️⭐️⭐️⭐️design principles label smoothing reduce
Warmup & LR Priya Goyal, Piotr Dollar, Ross Grishick et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. 2018.
Identity ResNet He, Kaiming, et al. "Identity mappings in deep residual networks." European Conference on Computer Vision. Springer International Publishing, 2016. ⭐️⭐️⭐️⭐️ pre-activation
ResNet: He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. ⭐️⭐️⭐️⭐️⭐️ ##CVPR 2016 Best Paper
ResNeXt Xie, Saining, et al. "Aggregated residual transformations for deep neural networks." arXiv preprint arXiv:1611.05431 (2016). ⭐️⭐️⭐️⭐️ cardinality
InceptionV4 & Inception-ResNet: Szegedy, Christian, et al. "Inception-v4, inception-resnet and the impact of residual connections on learning." arXiv preprint arXiv:1602.07261 (2016). ⭐️⭐️⭐️⭐️ Inception& residual
PolyNet: Zhang, Xingcheng, et al. "Polynet: A pursuit of structural diversity in very deep networks." arXiv preprint arXiv:1611.05725 (2016). Slides ⭐️⭐️⭐️⭐️ divsersity
Xception: Chollet, François. "Xception: Deep Learning with Depthwise Separable Convolutions." arXiv preprint arXiv:1610.02357 (2016). ⭐️⭐️⭐️ channel correlation decoupled
SENet: Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks."In CVPR (2018). ⭐️⭐️⭐️⭐️⭐️ decouple channel excited
DenseNet: Huang, Gao, et al. "Densely connected convolutional networks." arXiv preprint arXiv:1608.06993 (2016). ⭐️⭐️⭐️⭐️⭐️ features reuse ##CVPR 2017 Best Paper
Rethinking ImageNet Pre-training： He, Kaiming, Ross Girshick, and Piotr Dollár. "Rethinking ImageNet Pre-training." arXiv preprint arXiv:1811.08883 (2018). ⭐️⭐️⭐️
Non-local Neural Network: Wang, Xiaolong, Ross Girshick, Abhinav Gupta, and Kaiming He. "Non-local Neural Networks." arXiv preprint arXiv:1711.07971 (2017). ⭐️⭐️⭐️⭐️

##3D vision

###Point cloud

PointNet Charles R Qi, Hao Su et al. "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation". arXiv: 1612.00593v2 (CVPR 2017). ⭐️⭐️⭐️
PointNet++: C. R. Qi et al. "Deep Hierarchical Feature Learning on Point Sets in a Metric Space". (NeurIPS 2017) [pdf] [Github] ⭐️ ⭐️ ⭐️ ⭐️
PointCNN: Y. Li et al. "Convolution On X-Transformed Points" (NeurIPS 2018). [pdf] [Github] ⭐️ ⭐️ ⭐️
RS-CNN： Y. Liu et al. "Relation-Shape Convolutional Neural Network for Point Cloud Analysis" ⭐️⭐️⭐️⭐️

Object detection

SPP: He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." European Conference on Computer Vision. Springer International Publishing, 2014. ⭐️⭐️⭐️⭐️⭐️
Fast RCNN: Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE International Conference on Computer Vision. 2015. ⭐️⭐️⭐️⭐️⭐️
Faster RCNN: Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015. ⭐️⭐️⭐️⭐️⭐️
YOLO: You Only Look Once:Unified, Real-Time Object Detection. ⭐️⭐️⭐️⭐️⭐️

##Vision & Language

Visual Grounding / Referring Expressions (Images):

Karpathy, Andrej, Armand Joulin, and Li F. Fei-Fei. Deep fragment embeddings for bidirectional image sentence mapping. Advances in neural information processing systems. 2014. [Paper]
Karpathy, Andrej, and Li Fei-Fei. Deep visual-semantic alignments for generating image descriptions. Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. Method name: Neural Talk. [Paper] [Code] [Torch Code] [Website]
Hu, Ronghang, et al. Natural language object retrieval. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. Method name: Spatial Context Recurrent ConvNet (SCRC) [Paper] [Code] [Website]
Mao, Junhua, et al. Generation and comprehension of unambiguous object descriptions. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. [Paper] [Code]
Wang, Liwei, Yin Li, and Svetlana Lazebnik. Learning deep structure-preserving image-text embeddings. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. [Paper] [Code]
Yu, Licheng, et al. Modeling context in referring expressions. European Conference on Computer Vision. Springer, Cham, 2016. [Paper][Code]
Nagaraja, Varun K., Vlad I. Morariu, and Larry S. Davis. Modeling context between objects for referring expression understanding. European Conference on Computer Vision. Springer, Cham, 2016.[Paper] [Code]
Rohrbach, Anna, et al. Grounding of textual phrases in images by reconstruction. European Conference on Computer Vision. Springer, Cham, 2016. Method Name: GroundR [Paper] [Tensorflow Code] [Torch Code]
Wang, Mingzhe, et al. Structured matching for phrase localization. European Conference on Computer Vision. Springer, Cham, 2016. Method name: Structured Matching [Paper] [Code]
Hu, Ronghang, Marcus Rohrbach, and Trevor Darrell. Segmentation from natural language expressions. European Conference on Computer Vision. Springer, Cham, 2016. [Paper] [Code] [Website]
Fukui, Akira et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding. EMNLP (2016). Method name: MCB [Paper][Code]
Endo, Ko, et al. An attention-based regression model for grounding textual phrases in images. Proc. IJCAI. 2017. [Paper]
Chen, Kan, et al. MSRC: Multimodal spatial regression with semantic context for phrase grounding. International Journal of Multimedia Information Retrieval 7.1 (2018): 17-28. [Paper -Springer Link]
Wu, Fan et al. An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep Reinforcement Learning. CoRR abs/1703.07579 (2017): n. pag. [Paper] [Code]
Yu, Licheng, et al. A joint speakerlistener-reinforcer model for referring expressions. Computer Vision and Pattern Recognition (CVPR). Vol. 2. 2017. [Paper] [Code][Website]
Hu, Ronghang, et al. Modeling relationships in referential expressions with compositional modular networks. Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. IEEE, 2017. [Paper] [Code]
Luo, Ruotian, and Gregory Shakhnarovich. Comprehension-guided referring expressions. Computer Vision and Pattern Recognition (CVPR). Vol. 2. 2017. [Paper] [Code]
Liu, Jingyu, Liang Wang, and Ming-Hsuan Yang. Referring expression generation and comprehension via attributes. Proceedings of CVPR. 2017. [Paper]
Xiao, Fanyi, Leonid Sigal, and Yong Jae Lee. Weakly-supervised visual grounding of phrases with linguistic structures. arXiv preprint arXiv:1705.01371 (2017). [Paper]
Plummer, Bryan A., et al. Phrase localization and visual relationship detection with comprehensive image-language cues. Proc. ICCV. 2017. [Paper] [Code]
Yu, Licheng, et al. Mattnet: Modular attention network for referring expression comprehension. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018. [Paper] [Code] [Website]
Chen, Yen-Chun, et al. UNITER: Learning UNiversal Image-TExt Representations. arXiv preprint arXiv:1909.11740 (2019). [Paper]

Person ReID

AlignedReID Xuan Zhang, Hao Luo et al. AlignedReID: Surpassing Human-Level Performance in Person Re-Identification. 2015. ⭐️⭐️⭐️⭐️ stripe-based local distance
Mahdi M. Kalayeh et al. Human Semantic Parsing for Person Re-identification. CVPR ( 2018 ). ⭐️ segamentic transfer
Huo et al. Interaction-and-Aggregation Network for Person Re-identification. CVPR ( 2019 ). ⭐️⭐️⭐️⭐️ adaptively localize parts by modeling spatial feature self-attention
Luo et al. Bag of Tricks and A Strong Baseline for Deep Person Re-identification. CVPR ( 2019 ). ⭐️⭐️⭐️⭐️ tricks bnneck inter-intra-class
Zheng et al. Joint Discriminative and Generative Learning for Person Re-identification. CVPR( 2019 ).
Wang et al. Spatial-Temporal Person Re-identification. CVPR( 2019 ).

NIPS

Max Jaderberg Karen Simonyan Andrew Zisserman Koray Kavukcuoglu. Spatial Transformer Networks. NIPS (2015) ⭐️⭐️⭐️⭐️ affine transform learnable layer

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

reading-list

Basic Network

Object detection

Visual Grounding / Referring Expressions (Images):

Person ReID

NIPS

About

Releases

Packages

hbb1/reading-list

Folders and files

Latest commit

History

Repository files navigation

reading-list

Basic Network

Object detection

Visual Grounding / Referring Expressions (Images):

Person ReID

NIPS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages