Personal-Archived-Semantic-Segmentation-Paper-Record

Deep Learning Methods

Semantic Segmentation

FCN ★★★

[Paper] Learning a Deep Convolutional Network for Image Super-Resolution
[Year] CVPR 2015
[Authors] Evan Shelhamer, Jonathan Long, Trevor Darrell
[Pages]
https://github.com/shelhamer/fcn.berkeleyvision.org (official)
https://github.com/MarvinTeichmann/tensorflow-fcn (tensorflow)
https://github.com/wkentaro/pytorch-fcn (pytorch)
[Description]

首篇（？）使用end-to-end CNN实现Semantic Segmentation，文中提到FCN与提取patch逐像素分类是等价的，但FCN中相邻patch间可以共享计算，因此大大提高了效率
把全连接视为一种卷积
特征图通过deconvolution（初始为bilinear interpolation)上采样，恢复为原来的分辨率
使用skip connection改善coarse segmentation maps

U-Net ★

[Paper] U-Net: Convolutional Networks for Biomedical Image Segmentation
[Year] MICCAI 2015
[Authors] Olaf Ronneberge, Philipp Fischer, Thomas Brox
[Pages]
https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/
https://github.com/orobix/retina-unet
[Description]

encoder-decoder结构，encode设计参考的是FCN，decode阶段将encode阶段对应的特征图与up-conv的结果concat起来
用于医学图像分割，数据集小，因此做了很多data augmentation，网络结构也较为简单

zoom-out ★

[Paper] Feedforward semantic segmentation with zoom-out features
[Year] CVPR 2015
[Authors] Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich
[Pages] https://bitbucket.org/m_mostajabi/zoom-out-release
[Description]

以超像素为最小单位，逐步zoom out提取更大尺度的信息，zoom out特征是从CNN不同层提取的特征得到的
特征在超像素的范围内进行average pooling，并concat不同level的特征得到该超像素最后的特征向量。用样本集中每一类出现频率的倒数加权loss。

Dilated Convolution★

[Paper] Multi-Scale Context Aggregation By Dilated Convolutions
[Year] ICLR 2016
[Authors] Fisher Yu , Vladlen Koltun
[Pages] https://github.com/fyu/dilation
[Description]

系统使用了dilated convulution，其实现已被Caffe收录

Understanding Convolution ★☆

[Paper] Understanding Convolution for Semantic Segmentation
[Year] WACV 2018
[Authors] Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, Garrison Cottrell
[Pages] https://github.com/TuSimple/TuSimple-DUC
[Description]

针对语义分割任务, 为encoding和decoding分别设计了DUC和HDC两个结构, 其设计有被deeplab v3借鉴.
decoding阶段: DUC(dense upsampling convolution), 类似于超分辨和instance分割的一些做法, 令最后阶段特征图的每个channel代表上采样后相应位置的预测结果.
encoding阶段: HDC(hybrid dilated convolution), 交替地进行不同dilation rate的卷积, 避免棋盘效应.

DeepLab ★★

[Paper] Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs
[Year] ICLR 2015
[Authors] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille
[Pages] https://bitbucket.org/deeplab/deeplab-public
[Description]

在保证感受野大小的同时，输出dense feature。做法是把VGG16后两个pool stride设置为1，用Hole算法(也就是Dilation卷积)控制感受野范围
输出用全局CRF后处理，一元项为pixel的概率，二元项为当前pixel与图像中除自己外的每个pixel的相似度，考虑颜色和位置，使用高斯核。全连接CRF参考Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
与FCN相似，也使用了多尺度预测

[Paper] Weakly- and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation
[Year] ICCV 2015
[Authors] George Papandreou, Liang-Chieh Chen, Kevin Murphy, Alan L. Yuille

DeepLab-V2 ★

[Paper] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
[Year] arXiv 2016
[Authors] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille
[Pages]
http://liangchiehchen.com/projects/DeepLab.html
https://github.com/DrSleep/tensorflow-deeplab-resnet (tensorflow)
https://github.com/isht7/pytorch-deeplab-resnet (pytorch)
[Description]

与V1相比的不同是：不同的学习策略，多孔空间金字塔池化(ASPP)，更深的网络和多尺度。ASPP就是使用不同stride的dilated conv对同一特征图进行处理

DeepLab-V3 ☆

[Paper] Rethinking Atrous Convolution for Semantic Image Segmentation
[Year] arXiv 1706
[Authors] Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam
[Pages] https://github.com/tensorflow/models/tree/master/research/deeplab
[Description]

使用串联和并行的atrous cov，使用bn，结构优化，达到了soa的精度(080116)

DeepLab-V3+ ★☆

[Paper] Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
[Year] arXiv 1802
[Authors] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam
[Pages] https://github.com/tensorflow/models/tree/master/research/deeplab
[Description]

在DeepLab-V3作为encoder的基础上, 加入了一个简单的decoder, 而不是直接上采样; 采用Xception作为backbone
VOC上分割任务达到soa (0800314), 效果好

Attention to Scale ★

[Paper] Attention to Scale: Scale-aware Semantic Image Segmentation
[Year] CVPR 2016
[Authors] Liang-Chieh Chen, Yi Yang, Jiang Wang, Wei Xu, Alan L. Yuille
[Pages] http://liangchiehchen.com/projects/DeepLab.html
[Description]

多尺度特征融合是语义分割中提高性能的关键之一, 目前特征融合一般使用简单的max或average操作. 本文则使用一个基于FCN的网络训练一weight map, 给多尺度feature map中不同目标区域的各个scale赋予不同的权值, 最后对多尺度的map进行加权求和, 得出融合特征.
在训练deeplab中使用了extra supervision. 实验结果表明extra supervision对性能提升有明显作用, 比attention效果明显得多..

DPC ★★★

[Paper] (NIPS 2018) Searching for Efficient Multi-Scale Architectures for Dense Image Prediction
[Authors] Liang-Chieh Chen, Maxwell D. Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, Jonathon Shlens
[TF-Code]
搜索部分没有细看, NAS用于语义分割的代表性工作, 以被集成到Tensorflow DeepLab工程中.

CRFasRNN ★♥

[Paper] Conditional Random Fields as Recurrent Neural Networks
[Year] ICCV 2015
[Authors] Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, Philip H. S. Torr
[Pages] http://www.robots.ox.ac.uk/~szheng/CRFasRNN.html
[Description]

将CRF推断步骤用卷积, softmax等可微模块替代, 并使用RNN的递归迭代, 将CRF用类似RNN的结构近似. 整个模型都可以end-to-end的优化.
全连接CRF及其推断是在Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials的基础上设计的. 待深入研究CRF后应再仔细阅读这篇paper.

DeconvNet ★

[Paper] Learning Deconvolution Network for Semantic Segmentation
[Year] ICCV 2015
[Authors] Hyeonwoo Noh, Seunghoon Hong, Bohyung Han
[Pages]
http://cvlab.postech.ac.kr/research/deconvnet/
https://github.com/fabianbormann/Tensorflow-DeconvNet-Segmentation (tensorflow)
[Description]

encoder-decoder的代表模型之一, conv-pool特征提取, unpool-deconv恢复分辨率.

SegNet ★★

[Paper] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling
[Year] arXiv 2015
[Authors] Alex Kendall, Vijay Badrinarayanan, Roberto Cipolla
[Pages] http://mi.eng.cam.ac.uk/projects/segnet/

[Paper] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
[Year] PAMI 2017
[Authors] Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla
[Description]

encoder-decoder的代表模型之一，特点是将encoder中的pooling indices保存下来，decoder上采样时用这些indices得到sparse feature map，再用trainable conv得到dense feature map

Piecewise CRF ★

[Paper] Efficient piecewise training of deep structured models for semantic segmentation
[Year] CVPR 2016
[Authors] Guosheng Lin, Chunhua Shen, Anton van dan Hengel, Ian Reid
[Pages]
[Description]

粗读. CRF部分没怎么看懂.
FeatMap-Net接受multi-scale的输入, 生成feature map; 基于feature map设计了CRF的unary和pairwise potential, pairwise中考虑了surrounding和above/below两种context.
CRF training提出了基于piecewise learning的方法.

ENet ★

[Paper] ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
[Year] arXiv 1606
[Authors] Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello
[Pages] https://github.com/e-lab/ENet-training
[Description]

一种快速的encoder-decoder分割网络
大encoder，小decoder; PReLU代替ReLU; 1xn和nx1卷积代替nxn卷积

ParseNet ★

[Paper] ParseNet: Looking Wider to See Better
[Year] ICLR 2016
[Authors] Wei Liu, Andrew Rabinovich, Alexander C. Berg
[Pages] https://github.com/weiliu89/caffe/tree/fcn
[Description]

一种简单的加入global context的方法. 将feature map进行global pooling和L2 norm, 将得到的向量unpool成与原feature map相同尺寸, 再concatenate到也进行了L2 norm的feature map上.
通过简单实验, 提出实际感受野往往远小于理论感受野. 很多paper都引用了这一类观点, 但是感觉缺乏理论论证-_-||

FoveaNet ★★

[Paper] FoveaNet: Perspective-aware Urban Scene Parsing
[Year] ICCV 2017 Oral
[Authors] Xin Li, Zequn Jie, Wei Wang, Changsong Liu, Jimei Yang, Xiaohui Shen, Zhe Lin, Qiang Chen, Shuicheng Yan, Jiashi Feng
[Pages]
[Description]

提出了一种perspective-aware parsing network, 以解决 heterogeneous object scales问题, 提高远处小物体的分割精度, 减少近处大物体的”broken-down”现象.
为更好解析接近vanishing point(即远离成像平面处)的物体, 提出了perspective estimation network(PEN). 通过PEN得到距离的heatmap, 根据heatmap得到包含大多数小目标的fovea region. 将fovea region放大, 与原图并行地送入网络解析. 解析出来的结果再放回原图.
为解决近处目标的”broken-down”问题, 提出了perspective-aware CRF. 结合PEN得到的heatmap和目标检测, 使属于近处目标的像素有更大的pairwise potential, 属于远处目标的像素有更小的parwise potential, 有效缓解了”broken-down”和过度平滑的问题.

PSPNet ★☆

[Paper] Pyramid Scene Parsing Network
[Year] CVPR 2017
[Authors] Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia
[Pages] https://hszhao.github.io/projects/pspnet/
[Description]

提出了pyramid pooling module结合不同尺度的context information。PSPNet把特征图进行不同尺度的pooling(类似spatial pyramid pooling)，再将所有尺度的输出scale到相同尺寸，并concat起来
再res4b22后接了一个auxiliary loss，使用resnet网络结构

RefineNet ★☆

[Paper] RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation
[Year] CVPR 2017
[Authors] Xiaohang Zhan, Ziwei Liu, Ping Luo , Xiaoou Tang, Chen Change Loy
[Pages] https://github.com/guosheng/refinenet
[Description]

encoder为4组残差块, 逐渐降低分辨率; decoder部分为论文提出的RefineNet. 作者认为提出的模型对高分辨率图像的细节信息有更好的分辨能力;
RefineNet前半部分为multi-resolution fusion, 类似于UNet, decoder的每一级模块都利用了对应的encoder模块的信息;
RefineNet后半部分为Chained residual pooling, 目的是"capture background context from a large image region".

GCN ★

[Paper] Large Kernel Matters—— Improve Semantic Segmentation by Global Convolution
[Year] CVPR 2017
[Authors] Peng Chao, Xiangyu Zhang Gang Yu, Guiming Luo, Jian Sun
[Pages] https://github.com/ycszen/pytorch-segmentation (Unofficial)
[Description]

文章认为, segmentation包括localization和classification两部分, 分类需要全局信息, localization需要保证feature map的分辨率以保证空间准确度, 因此二者存在矛盾. 本文提出的解决办法就是用large kernel, 既可以保持分辨率, 又能近似densely connections between feature maps and per-pixel classifiers;
文中使用k1+1k和1k+k1代替k*k的大kernel. 引入boundary refinement模块, 使用残差结构, 捕捉边界信息;
只根据实验说明提出的模型优于k*k kernel和多个小kernel堆叠的策略, 但是并没什么理论支持;
一点不明白: 为什么提出的基于残差结构的BR可以model the boundary alignment?

FastMask ★

[Paper] FastMask: Segment Multi-scale Object Candidates in One Shot
[Year] CVPR 2017 Spotlight
[Authors] Hexiang Hu, Shiyi Lan, Yuning Jiang, Zhimin Cao, Fei Sha
[Pages] https://github.com/voidrank/FastMask
[Description]

粗读. 提出了一个body, neck, head的one-shot模型.
body net部分进行特征提取. 提取到的特征组成多尺度的特征金字塔, 分别送入共享参数的neck module提取multi-scale特征, neck module为residual neck. 得到的特征图进行降维后提取dense sliding window, sliding windows经batch normalization后送入head module, head module为attention head
neck module部分以2为步长对feature map进行下采样, 可能导致尺度过于稀疏. 因此提出two-stream FastMask architecture, 使scale更密集.

Layer Cascade ★

[Paper] Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade
[Year] CVPR 2017 Spotlight
[Authors] Xiaoxiao Li, Ziwei Liu, Ping Luo, Chen Change Loy, Xiaoou Tang
[Pages] https://liuziwei7.github.io/projects/LayerCascade.html
[Description]

多个小模型级联, 不同阶段处理不同难度的样本, 是一类典型的节省计算的方法. 本文提出一种层级联的语义分割方法, 把网络中不同层视为不同stage, 达到近似于模型级联的效果, 提升了性能, 降低了计算量.
将backbone的不同阶段(3阶段)拉出来做预测, 把其中置信度低于一定阈值的区域作为目标, 下一阶段只对其卷积, 其余位置直接置0. 最后把不同阶段的结果合成成最后的输出.
感觉思路很直观清晰, 或许可以在后续的工作中参考. 有个疑问是: 上下文信息在语义分割中应该是很重要的, 这种只对region进行处理的方案会不会导致全局信息不足?

PixelNet ★

[Paper] Representation of the pixels, by the pixels, and for the pixels
[Year] TPAMI 2017
[Authors] Aayush Bansal, Xinlei Chen, Bryan Russell, Abhinav Gupta, Deva Ramanan
[Pages] http://www.cs.cmu.edu/~aayushb/pixelNet/
[Description]

粗读. 使用hypercolumn思想, 速度快. 适用于segmentation, 边缘检测, normal estimation等low-level到high-level的多种问题.
hypercolumn即: 对于一个pixel, 将每一层feature map中其对应位置的feature连接起来组成一个vector, 用MLP对该vector分类.
文中提出, 训练时 just sampling a small number of pixels per image is sufficient for learning. 这样一个mini-batch里就可以从多张图片中采样, 增加了diversity.

LinkNet ☆

[Paper] LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation
[Year] arXiv 1707
[Authors] Abhishek Chaurasia, Eugenio Culurciello
[Pages] https://codeac29.github.io/projects/linknet/
[Description]

还没读, 大致是一个类似U-Net的结构, 速度快

SDN ★

[Paper] Stacked Deconvolutional Network for Semantic Segmentation
[Year] CVPRW 2017
[Author] Simon J´egou, Michal Drozdza, David Vazquez, Adriana Romero, Yoshua Bengio
[Pages] https://github.com/SimJeg/FC-DenseNet
[Description]

DenseNet + 类似U-Net的结构. 大致浏览

FC-DenseNet ★

[Paper] The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation
[Year] arXiv 1708
[Author] Jun Fu, Jing Liu, Yuhang Wang, Hanqing Lu
[Pages]
[Description]

ISCTF ☆

[Paper] Real-time Semantic Image Segmentation via Spatial Sparsity
[Year] arXiv 1712
[Author] Zifeng Wu, Chunhua Shen, Anton van den Hengel
[Pages]
[Description]

粗读, 实时语义分割方法. 提出了一个由低分辨率图像产生稀疏weight map, 引导高分辨率图只处理少数区域, 以达到减小计算量的同时保持边缘精度的目的.
得到spatial sparsity部分以及从原图以什么尺度crop部分没细看. 算法看上去实现可能有些繁琐
从实验大致来看, 提出的基于sparisity的方案似乎带来的提升有限. 另外在速度和性能上与18年以来的real time方案相比, 似乎不占优势. 但是论文的思路很有意思, 可以日后持续关注.

Dense Decoder Shortcut Connections ★

[Paper] Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation
[Year] CVPR 2018
[Author] Piotr Bilinski, Victor Prisacariu
[Pages]
[Description]

粗读. 一种encoder-decoder的语义分割模型, 大致就是skip-connection和dense-connection, 并用ResNeXt做backbone, 思路没什么新奇的. 在设计网络时, 加入了很多对multi-scale的支持, 因此文中生成他们的网络只需要single-scale inference.

DFN ★☆

[Paper] Learning a Discriminative Feature Network for Semantic Segmentation
[Year] CVPR 2018
[Author] Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang
[Pages] https://github.com/whitesockcat/Discriminative-Feature-Network (Unofficial)
[Description]

本文的目的是解决intra-class inconsistency和inter-class indistinction两个, 为此设计了Smooth Network和Border Network, 为实现这两个网络, 又设计了Refinement Residual Block(RRB)和Channel Attention Block (CAB).
Smooth Network是本文的重点. 文中认为类内不一致主要是由于缺乏上下文信息, 因此设计了从global pooling开始自顶向下的逐层refine, 利用来自上层的全局信息得到channel attention vector作为guidance, 使下层选出最有用的channel.
Border Network就是一有前面几层concat的encoder和decode结构. 文中说是给高层特征提供边界信息, 其实对最后结果作用不大.
本篇paper中的RRB, CAB结构上虽然没有很新颖, 但把它们用到要解决的问题上并且得到很好的效果还是很厉害的. 另外paper中对问题和自己工作的阐述很值得学习. 两个问题: 本文提出的border network来解决inter-class indistinction, 说服力不太强; Smooth Network是用上下文信息去选择channel, 没有考虑feature的空间修正, 但不能保证仅靠选特定的feature就能解决intra-class inconsistency问题.

Adapt Structured Output Space ★

[Paper] Learning to Adapt Structured Output Space for Semantic Segmentation
[Year] CVPR 2018 Spotlight
[Author] Yi-Hsuan Tsai, Wei-Chih Hung, Samuel Schulter, Kihyuk Sohn, Ming-Hsuan Yang, Manmohan Chandraker
[Pages] https://github.com/wasidennis/AdaptSegNet
[Description]

提出了一种基于对抗学习的用于语义分割的domain adaptation方法, 在GTA5上训练CityScape测试, 效果不错.
在输出层和中间的一个特征层上做multi-level的对抗训练, 使target domain的预测结果逼近于source domain的预测结果.

EncNet ★★

[Paper] Context Encoding for Semantic Segmentation
[Year] CVPR 2018 Oral
[Author] Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal
[Pages]
https://hangzhang.org/PyTorch-Encoding/
https://github.com/zhanghang1989/PyTorch-Encoding
[Description]

提出了基于Deep Ten的Context Encoding Module, 嵌入语义分割网络中, 提高对global context information的利用.
虽然网络的创新性工作不多, 但把VLAD一类的思想用来挖掘语义分割任务中的上下文信息, 思路还是值得借鉴的.

CCNet ★☆

[Paper] CCNet: Criss-Cross Attention for Semantic Segmentation
[Year] ICCV 2019
[Author] Zilong Huang, Xinggang Wang, Lichao Huang, Chang Huang, Yunchao Wei, Wenyu Liu
[Pages] https://github.com/speedinghzl/CCNet
[Description]

在何凯明等提出的Non-Local的基础上, 提出了一个recurrent的十字形attention形式, 并采用了残差+attention的结构. 方法相比于Non-local和许多其它attention方法, 计算成本低, 且效果不错.
recurrent的次数设为2, 因为文中指出, 两次十字形的attention计算已经可以对图像中的任意两点建立连接. 实验表明, 这种二次循环的十字形attention的确可以捕捉到有用的信息, 且增大递归次数对性能提升作用不大.

ICNet ★

[Paper] ICNet for Real-Time Semantic Segmentation on High-Resolution Images
[Year] ECCV 2018
[Authors] Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, Jiaya Jia
[Pages]
https://github.com/hszhao/ICNet
https://github.com/hellochick/ICNet-tensorflow
[Description]

粗读, 多分辨率输入网络级联的实时语义分割算法, 是多分辨率特征融合做轻量级语义分割模型的代表方法之一.
输入图像以原尺寸, 1/2, 1/4三个分辨率输入三个分支, 小分辨率分支层数较多, 负责提取全局信息; 大分辨率网络层数少, 节省计算成本. 设计了融合模块融合各分辨率的特征. 对每个分支的输出均计算loss.

BiSeNet ★☆

[Paper] BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation
[Year] ECCV 2018
[Authors] Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang
[Pages]
https://github.com/ycszen/TorchSeg (3rd party)
https://github.com/ooooverflow/BiSeNet (3rd party)
https://github.com/GeorgeSeif/Semantic-Segmentation-Suite (3rd party)
[Description]

实时语义分割, 空间细节特征和上下文语义特征融合的典型方法, 在时间和性能上取得了很好的平衡.
分为spatial path和context path两支, spatial部分用3个conv下采样8倍, 保持空间细节信息; context部分使用xception39或resnet18为backbone, 设计了类似SENet的attention进行refine; 最后特征融合时设计了类似于residual attention的融合模块. 使用了auxiliary loss.

ShelfNet ★

*[Paper] ShelfNet for Fast Semantic Segmentation
[Year] arXiv 1811
[Authors] Juntang Zhuang, Junlin Yang, Lin Gu, Nicha Dvornek
[Pages] https://github.com/juntang-zhuang/ShelfNet
[Description]

一个实时分割网络, 使用多个encoder-decoder结构, 达到ensemble的作用. 性能不错, 同样速度下优于BiSeNet.

Fast-SCNN ★

[Paper] Fast-SCNN: Fast Semantic Segmentation Network
[Year] arXiv 1902
[Authors] Rudra PK Poudel, Stephan Liwicki, Roberto Cipolla
[Pages]
[Description]

采用two branch和encoder-decoder的思路做real time的语义分割, 整体思路与BiSeNet非常相似. 在大图像上(1027*2018)速度很快(123fps), 性能与sota相比有差距.
结构: 先用3个卷积降采样8倍, 从此引出一skip connection负责保留空间细节, 另一分支作为feature extractor由若干bottleneck和金字塔池化组成, 最后通过sum将特征融合. 大量使用depthwise separable convolution提速

DFANet ★

[Paper] DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation
[Year] CVPR 2019
[Authors] Hanchao Li, Pengfei Xiong, Haoqiang Fan, Jian Sun
[Pages]
[Description]

旷世提出的一个real time语义分割方法, 在性能和速度上达到不错的平衡, 可保持关注.
采用多分辨率网络聚合的策略, 挖掘不同尺度的上下文信息, 聚合的创新之处在于同时使用了sub-network层级的聚合和sub-stage(即网络内部的feature)层级的聚合. decoder阶段利用各分辨率特征的信息. Backbone是简化版的Xception.

ShuffleNetV2+DPC ★

[Paper] An efficient solution for semantic segmentation: ShuffleNet V2 with atrous separable convolutions
[Year] arXiv 1902
[Authors] Sercan Turkmen, Janne Heikkila
[Pages] https://github.com/sercant/mobile-segmentation
[Description]

基于Deeplab+DPC, 用ShuffleNet v2做backbone.

JPU ★

[Paper] FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation
[Year] arXiv 1903
[Authors] Huikai Wu, Junge Zhang, Kaiqi Huang, Kongming Liang, Yizhou Yu
[Pages]
http://wuhuikai.me/FastFCNProject/
https://github.com/wuhuikai/FastFCN
[Description]

粗读. dilation卷积计算量很大, 本位利用所谓joint upsampling的思想, 找到一种更高效的提取高精度特征的方案以代替dilation.
前面论述很很多, 最后设计的JPU模块实际上就是一个类似于ASPP的结构, 只不过是利用了前三层的特征而不是一层. 感觉后面的方法和前面的叙述有些脱节, 可能是自己没完全理解.
实验结果上看, JPU和FPN结构相比性能和速度上非常接近.

Gated-SCNN ★☆

[Paper] Gated-SCNN: Gated Shape CNNs for Semantic Segmentation
[Year] ICCV, 2019
[Authors] Towaki Takikawa, David Acuna, Varun Jampani, Sanja Fidler
[Pages]
https://nv-tlabs.github.io/GSCNN/
[Description]

提出一个双分支网络, 一个分支负责分割, 一个分支负责边缘检测, 最后将两分支融合生成最后的分割结果.
边缘检测部分, 使用Gated-Convolution, 让分割一支的high-level特征引导lower-level的边缘检测特征, 以去除noise.
使用dual task regularization, 目的是exploit the duality between semantic segmentation and semantic boundary prediction.
在语义分割中引入gated convolution作为一种gating mechanism的思路值得借鉴. 本方法在cityscapes上性能与DPC持平

CFNet ★☆

[Paper] Co-occurrent Features in Semantic Segmentation
[Year] CVPR 2019
[Authors] Hang Zhang, Han Zhang, Chenguang Wang, Junyuan Xie
[Pages]
[Description]

通过计算target feature和其它feature的co-occurrence概率,去挖掘co-occurenct context information. 从报告的数据来看, 效果不错.
CNN特征提取后, 分为三部分：co-occurrent概率计算, co-occurent context先验提取, global pooling.
个人水平有限, 感觉paper的论述有点不太清晰, 而且写作也是。。。

DANet ★☆

[Paper] Dual Attention Network for Scene Segmentation
[Year] CVPR 2019
[Authors] Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang,and Hanqing Lu
[Pages] https://github.com/junfu1115/DANet
[Description]

大致浏览, 提出了一个包括position attention和channel attention的dual attention模块, 并在此基础上设计了语义分割网络DANet, 取得了还不错的效果.
position attention就是计算两像素在channel维度上的相似度, channel attention就是计算两通道在空间维度上的相似度. 此处相似度都是通过计算內积得到的.

Integrated Classification ★

[Paper] Scene Parsing via Integrated Classification Model and Variance-Based Regularization
[Year] CVPR 2019
[Authors] Hengcan Shi, Hongliang Li, Qingbo Wu, Zichen Song
[Pages] https://github.com/shihengcan/ICM-matcaffe
[Description]

提出了一个二阶段pixel像素分类的场景检测方法. 第一阶段用多个二分类器初步分类, 第二阶段对一阶段结果进行refine, 修正之前混淆的类别.
使用了一个variance-based regularization, 促使最后的类别间概率相差尽可能大.

ADVENT

[Paper] ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation
[Year] CVPR 2019 Oral
[Authors] Tuan-Hung Vu, Himalaya Jain, Maxime Bucher, Matthieu Cord, Patrick Pérez
[Pages] https://github.com/valeoai/ADVENT
[Description]

ShelfNet

[Paper] ShelfNet for fast semantic segmentation
[Year] arXiv 1811
[Authors] Juntang Zhuang, Junlin Yang, Lin Gu, Nicha Dvornek
[Pages] https://github.com/juntang-zhuang/ShelfNet
[Description]

SwiftNet

[Paper] In Defense of Pre-trained ImageNet Architectures for Real-time Semantic Segmentation of Road-driving Images
[Year] CVPR 2019
[Authors] Marin Oršić, Ivan Kreš, Siniša Šegvić, Petra Bevandić
[Pages] https://github.com/orsic/swiftnet
[Description]

Panoptic Segmentation

DeeperLab ★

[Paper] DeeperLab: Single-Shot Image Parser
[Year] CVPR 2019
[Authors] Tien-Ju Yang, Maxwell D. Collins, Yukun Zhu, Jyh-Jing Hwang, Ting Liu, Xiao Zhang, Vivienne Sze, George Papandreou, Liang-Chieh Chen
[Pages]
[Description]

大致浏览, XX Lab又一弹, 借鉴了Xception和MobileNet等网络的backbone设计思路, 使用了两个分支分别做semantic和instance的分割.

Foreground-background Segmenation

Pixel Objectness ★

[Paper] Pixel Objectness: Learning to Segment Generic Objects Automatically in Images and Videos
[Year] TPAMI 2018
[Authors] Bo Xiong, Suyog Jain, Kristen Grauman
[Pages] http://vision.cs.utexas.edu/projects/pixelobjectness/
[Description]

大致浏览, 提出pixel objectness这一术语表示得到图像和视频中generic object二类分割图的过程. 提出基于CNN的网络, 对图像和视频中的目标进行分割, 该模型对没见过的物体也很鲁棒.
大段论述了他们的方法为什么合理, 比如用ImageNet预训练模型里面已经包含类别信息了之类的囧, 没仔细看.

Transfer Related

Image-level to Pixel-level Labeling ★

[Paper] From Image-level to Pixel-level Labeling with Convolutional Networks
[Year] CVPR 2015
[Authors] Pedro O. Pinheiro, Ronan Collobert
[Pages]
[Description]

一种weakly supervised方法，用图像类别标签训练分割模型，分割中每个类别的特征图用log-sum-exp变换为分类任务中每个类别的概率，通过最小化分类的loss优化分割模型
推断时为抑制False Positive现象，使用了两种分割先验：Image-Level Prior(分类概率对分割加权)和Smooth Prior(超像素，bounding box candidates，无监督分割MCG)。

BoxSup ★

[Paper] BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation
[Year] ICCV 2015
[Authors] Jifeng Dai, Kaiming He, Jian Sun
[Pages]
[Description]

弱监督语义分割，用bounding box结合region proposal(MCG)生成初始groundtruth mask，再交替更新分割结果和mask.

Mix-and-Match ★

[Paper] Mix-and-Match Tuning for Self-Supervised Semantic Segmentation
[Year] AAAI 2018
[Authors] Xiaohang Zhan, Ziwei Liu, Ping Luo, Xiaoou Tang, Chen Change Loy
[Pages] http://mmlab.ie.cuhk.edu.hk/projects/M&M/
[Description]

self-supervision可分为proxy stage和fine-tuning stage两个阶段. 先用无需标签数据的proxy task(如图像上色)进行预训练, 学到某种语义特征, 再用少量的标记数据进行微调. 但由于proxy task和target task之间存在semantic gap, 自监督方法性能明显较监督方法差.
论文提出了"mix-and-match"策略, 利用少数标记数据提升自监督预训练网络的性能. mix step: 从不同图像中随机提取patch. match step: 在训练时通过on-the-fly的方式构建graph, 并生成triplet, triplet包括anchor , positive, negative patch三个元素. 据此可定义一triplet loss, 鼓励相同类别的patch更相似, 不同类别的patch差别更大.
对自监督了解不够深入, 看代码有助理解. segmentation部分采用的hypercolumn方法论文中貌似没仔细说, 以后可以再研究研究.

Bidirectional Learning ★

[Paper] Bidirectional Learning for Domain Adaptation of Semantic Segmentation
[Year] CVPR 1904
[Authors] Yunsheng Li, Lu Yuan, Nuno Vasconcelos
[Pages] https://github.com/liyunsheng13/BDL
[Description]

提出了一个双向的语义分割domain adaptation方法, 可以用于没有任何target domain真值的情况, 从实验结果看效果不错.
目前常用的的步骤是学习两个分离的网络, 首先是使用GAN学习source到target的变换, 以减小两个domain的gap, 然后用变换了的source图像做分割. 本文提出的所谓双向是指两个阶段的网络会互相作用以提升对方的性能, 另外提出了一个所谓的自监督方法, 用高置信度的target分割结果作为真值. 具体loss论文中描述的很清楚.

Hierarchical Region Selection ☆

[Paper] Not All Areas Are Equal: Transfer Learning for Semantic Segmentation via Hierarchical Region Selectionn
[Year] CVPR 2019 Oral
[Authors] Ruoqi Sun, Xinge Zhu, Chongruo Wu, Chen Huang, Jianping Shi, Lizhuang Ma
[Pages]
[Description]

提出一种针对语义分割的transfer learning方法, 从pixel, region, image三个尺度挖掘source样本中与target domain相似的部分用来训练, 以弥补source和target domain的gap.
学习了三个weight map分别代表pixel, region, image层面上source与target的相似程度, 三个map取平均候得到最后的weight map, 在计算source的loss时用该map对每个像素加权. 另外在encoder出来的特征后加一生成对抗网络, 帮助domain adaptation.
个人感觉作为一篇Oral来说有趣的地方并不多. 实验用VGG和FCN做backbone, 对比方法只用了transfer learning相关的几种方法, 性能与目前没有用transfer learning的SOTA方法比差距很大, 另外GAN似乎并没有提升性能.

SPNet ★

[Paper] Semantic Projection Network for Zero- and Few-Label Semantic Segmentation
[Year] CVPR 2019
[Authors] Yongqin Xian, Subhabrata Choudhury, Yang He, Bernt Schiele, Zeynep Akata
[Pages] https://github.com/subhc/SPNet
[Description]

基于word embeddings提出的针对无标签或少量标签样本的语义分割算法.
由CNN生成每个像素的embedding, 然后计算其与预先得到的class prototype矩阵的內积, 取最相似的类别作为该像素的类别. 该算法的核心是得到word embedding, 本文中是用已有算法(如word2vec)计算的. Inference时只要用感兴趣类别组成的embedding矩阵(见过或没见过的类别均可)去做projection即可.

Knowledge Distillation

Knowledge Adaptation ★☆

[Paper] Knowledge Adaptation for Efficient Semantic Segmentation
[Year] CVPR 2019
[Authors] Tong He, Chunhua Shen, Zhi Tian, Dong Gong, Changming Sun, Youliang Yan
[Pages]
[Description]

粗读, 提出了一个用于语义分割的知识蒸馏方法. 在MobileNetV2的基础上mIoU提高了2个点.
本文不是直接让student的feature逼近teacher的feature, 而是用一个自编码器把teacher的feature变换为更compact的表示, 并令student去逼近这个表示.
本文认为小网络捕获long-term dependency的能力比较弱, 所以设计了一个affinity distillation模块, 采用Non Local的思路, 计算两两像素间的內积, 并使teacher和student网络的affinity matrix相近.

Other Interesting Papers

COB ★

[Paper] Convolutional Oriented Boundaries
[Year] ECCV 2016
[Author] K.K. Maninis, J. Pont-Tuset, P. Arbeláez, L.Van Gool
[Pages] http://www.vision.ee.ethz.ch/~cvlsegmentation/cob/index.html
[Description]

由边缘概率得到分割结果, 整体流程来自伯克利的gPb-owt-ucm, 将前面得到概率图的部分用CNN代替
CNN部分使用多尺度模型预测coarse和fine的8方向的概率
UCM部分提出了sparse boundaries representation, 加快了速度

Traditional Classical Methods

gPb-owt-ucm ★★★

[Paper] Contour Detection and Hierarchical Image Segmentation
[Year] TPAMI 2011
[Authors] Pablo Arbelaez, Michael Maire , Charless Fowlkes , Jitendra Malik
[Pages] https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/resources.html
[Reference] http://blog.csdn.net/nature_XD/article/details/53375344?locationNum=9&fps=1
[Description]

gPb(Global Probability of Boundary):由mPb和sPb组成
OWT:对分水岭变换得到的arc上的像素依据其方向重新计算gPb
UCM:貌似和MST聚类差不多？
sPb还没看懂

Datasets

VOC2012
MSCOCO
ADE20K
MITScenceParsing
Cityscapes

3D

PartNet

Leaderboards

PASCAL VOC
ILSVRC2016
Cityscapes

Sources-Lists

https://handong1587.github.io/deep_learning/2015/10/09/segmentation.html
https://github.com/mrgloom/awesome-semantic-segmentation
https://blog.csdn.net/zziahgf/article/details/72639791

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
README.md		README.md

lcybuzz/Archived-Semantic-Segmentation-Paper-Record

Folders and files

Latest commit

History

README.md