Skip to content

workshop, tutorial, oral, and poster with notes in cvpr2022

Notifications You must be signed in to change notification settings

rese1f/awesome-cvpr2022

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

awesome-cvpr2022

workshop, tutorial, oral, and poster with notes in cvpr2022

Wenhao(Reself) Chai

Undergraduate, UIUC

logo



Workshops

CVPR workshop June 19-20th.

full schedule

Machine Learning with Synthetic Data (SyntML) link

Synthetic data are labeled data made using computer graphic. They are cheap, clean, and have richness of label. Keyword: Domain mismatch, Diversity

  1. human synthesis Google

    • Procedural face generation
      templete + features paradigm
      features can be: identity, expression, pose
    • Hair and clothing
    • Environment
    • Render (Blender)
  2. synthetic data & simulation Nvidia

    • graphics geometry + texture by a distribution
    • mixed reality light estimation + AR
    • generative models GAN / diffusion model
  3. crossing the domain gap with synthetic data Datagen

    • why synthetic data?
      • pixel-accurate labels
      • rich annotationns
      • full control
    • types of gap
      • photorealism gap
      • pose gap
      • augmentation gap
      • annotation gap
    • styleGAN
      • cascade "parameter w class" (also like templete + features)
      • inversion / editing (good sensitivity)
    • mix sythetic data with real data (when limited) can achieve better performance
    • address domain gap
      • photorealism
      • label adaptation
      • add noise
      • global scene parameter distribution (lights, camera, pose)

International Challenge on Activity Recognition (ActivityNet) link

task: real-time online untrimmed security video action detection
object: single / multi / interaction
pipeline:

  • detection
  • background removal
  • tracking (IOU-based)
  • classification

related concept:

  • domain adaptation
  • overlapping spatio-temperal
  • class-unbalance
  • multi-label
  • generalization performance

2nd Workshop and Challenge on Computer Vision in the Built Environment for the Design, Construction, and Operation of Buildings link

  • task: building model through point clouds to room map
  • key tech: semantic segmentation of point clouds

Workshop on Attention and Transformers in Vision link

  1. Visual Attention with Recurrency and Sparsity
  2. BoxeR: Box-Attention for 2D and 3D Transformers
    • 2D / 3D object detection or segmentation
    • query: reference window
    • key: learnable relative region
    • multi-scale feature map
  3. Depth Estimation with Simplified Transformers
    • FC -> 1x1 Conv.
  4. M2F3D: MaskFormer fo 3D Instance Segmentation
    • top-down / bottom-up
    • sparse Conv.

5th MUltimodal Learning and Applications Workshop (MULA) link

Learning to Navigate from Vision and Language

  • human use semantic priors to understand and navigate in unseen environment
  • RL bottlenecks to progress on semantic navigation: scalability, diversity
  • no need to learn a policy -> greedy

7th BMTT Workshop on Benchmarking Multi-Target Tracking: How Far Can Synthetic Data Take us? link

L3D-IVU: Workshop on Learning with Limited Labelled Data for Image and Video Understanding link

  • Low-Shot Scene Decomposition via Reconstruction
    • featurize 3D scene behind the image
    • fuse information form range sensors
    • RGB rendering is useful pre-training for detections
    • continues 3D feature maps with implicit functions
    • unsupervised detection: where and what, decouple these
    • unsupervised 3D segmentation via reconstruction loss

Tutorials

CVPR tutorial June 19-20th.

Denoising Diffusion-based Generative Modeling: Foundations and Applications link

  • kinds of diffusion model
    • momentum-based
    • energy-based
    • latent-space (with pretrained VAE): faster and simpler
    • distilation (merge steps)
    • discrete state diffusion model
  • high-resolution
    • condition form: scalar / image / text
    • quality-diversity trade-off
    • cascade generation with super-resolution method
  • application
    • semantic segmentation
    • image editing
    • adversarial robustness (purfied image)
    • video generation
      • types
        • all frames
        • past frames
        • future frames
        • interpolation
      • tips: training with different types of mask / use time position encodings to encode times
      • backbone: 3D Conv. / 2D Conv. + Att. (ignore initially when train)
      • long-term: generate a frame far away and then interpolation
    • medical imaging
      reconstract original image from sparse measurements
      high-level idea: learn pretrained on pure dataset momdel as "prior" than guide synthesis conditioned on sparse obvervations
    • 3D shape generation
      through point clouds
  • future trend
    • why diffusion models perform better?
    • how can we improve VAE / flow from diffusion model?
    • sampling from diffusion model is still slow
    • diffusion model can be considered as latent variable model without semantic, if with?
    • can diffusion model help to discrimination applications?
    • what are the best network architectures for diffusion model instead of UNet?
    • other data modality further than 2D image
    • controllable generation
    • in some application replace GAN with diffusion model

Recent Advances in Vision-and-Language Pre-training link

  • unifying text and image
  • avoiding explicit detection module
  • high resolution computing cost
  • coarse to fine two-stage VLP
  • fusion in the backbone

Beyond Convolutional Neural Networks link

  • DETR: DEtection TransfoRmer
    • idea: pose the task directly as set prediction, using a transformer encoder-decoder
    • bipartite match

Evaluating Models Beyond the Textbook: Out-of-distribution and Without Labels link

  • robustness encompasses a broad range of phenomena (adv. examples, corruptions, nat. dist shift, etc.)
  • some forms of robustness are currently orthogonal
  • consistent trends across natural distribution shifts -> need more fine-grained understanding of different robustness notions.
  • training data plays a key role in creating broadly robust models (e.g., CLIP). -> How do we construct training sets that enable broadly reliable models?
  • very large improvements in OOD robustness

Orals

Segmentation, Grouping and Shape Analysis

1. Semantic-Aware Domain Generalized Segmentation link

  • sementic-aware normalization adapts a multi-branch normalization strategy, aiming to transform the input feature map into the category-level normalized features that are semantic-aware center aligned.

2. Pointly-Supervised Instance Segmentation link

@Bowen Cheng

  • training with pointed-based annotation
  • implicit pointrend

3. Adaptive Early-Learning Correction for Segmentation From Noisy Annotations link

  • how to define early-training stage without ground truth?
  • how to utilze noisy pesudo label?

4. Unsupervised Hierarchical Semantic Segmentation With Multiview Cosegmentation and Clustering Transformers link

Video Analysis & Understanding

5. Self-supervised Video Transformer link

5. Dual-AI: Dual-Path Actor Interaction Learning for Group Activity Recognition link

see the notes https://reself-c.github.io/DualAI

3D From Single Images

7. Tracking People by Predicting 3D Appearance, Location and Pose link

Transfer / Low-Shot / Long-Tail Learning

8. OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization link

  • two dimensions of distribution shift
    • diversity shift -> shift in label
    • correlation shift -> shift in mapping

9. Robust Fine-Tuning of Zero-Shot Models link

  • weight-space ensemble of Fine-tune model and Zero-shot model (linear)

10. Learning Distinctive Margin Toward Active Domain Adaptation link

  • data sample strategy
    • classic uncertainty sample
    • diversity sample
    • multi-index evaluation
    • adversarial learning
    • ...
    • margin sample (this work)

11. DINE: Domain Adaptation From Single and Multiple Black-Box Predictors link

  • BB-SFDA: only logits

12. Source-Free Object Detection by Learning To Overlook Domain Style link

  • augmentation + alignment

13. Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization link

14. Causality Inspired Representation Learning for Domain Generalization link

15. Learning What Not To Segment: A New Perspective on Few-Shot Segmentation link

16. Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation link

  • pretrain + active-learning

17. ADeLA: Automatic Dense Labeling With Attention for Viewpoint Shift in Semantic Segmentation link

  • viewpoint change causes a prior shift for scene parsing

Image & Video Synthesis and Generation

18. Dataset Distillation by Matching Training Trajectories link

  • compress the dataset from 50k to 10 by matching the parameter in the model

Deep Learning Architectures & Techniques

19. Controllable Dynamic Multi-Task Architectures link

  • select the path and weight for a completed multi-task network architecture

Human Pose Estimation & Tracking, Localization, and Object Pose Estimation

20. Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation link

21. PoseTriplet: Co-Evolving 3D Human Pose Estimation, Imitation, and Hallucination Under Self-Supervision link

22. Generalizable Human Pose Triangulation link

Posters

Segmentation, Grouping and Shape Analysis

1. Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels link

2. Deep Hierarchical Semantic Segmentation link

3. Amodal Segmentation Through Out-of-Task and Out-of-Distribution Generalization With a Bayesian Model link

  • trained with bounding box and output is visible mask
  • out-of-task and out-of-distribution generalization with a Bayesian generative model

4. SWEM: Towards Real-Time Video Object Segmentation With Sequential Weighted Expectation-Maximization link

  • use point feature memory

5. Accelerating Video Object Segmentation With Compressed Video link

  • use residual between frames
  • only inference on key frame and propagate the others by residual

6. High Quality Segmentation for Ultra High-Resolution Images link

  • calculate the relationship between the coordinate of low-resolution feature and ultra high-resolution target to get position information.

7. Pin the Memory: Learning To Generalize Semantic Segmentation link

  • store the feature as memory when inference on other domain
  • close-set assumption, no label mismatch

8. Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity link

  • learn a pairwise affinity for each pixels
  • a data augmentation strategy
  • learn a binary and then classification (is that a object first?)

9. Weakly Supervised Semantic Segmentation Using Out-of-Distribution Data link

10. Multimodal Material Segmentation link

  • material segmentation (may close to texture but not so semantic)

11. Semi-Supervised Learning of Semantic Correspondence With Pseudo-Labels link

Machine Learning

12. A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty link

  • resampling and reweighing for long-tail dataset
  • class balance and hardness balance
  • define a difficulty for classification

13. How Much More Data Do I Need? Estimating Requirements for Downstream Tasks link

  • estimate the amount of data needed
  • most regession functions significantly over- or under- estimate how much data we needed

14. Deep Safe Multi-view Clustering: Reducing the Risk of Clustering Performance Degradation Caused by View Increase link

15. Out-of-distribution Generalization with Causal Invariant Transformations link

Deep Learning Architectures & Techniques

16. Single-Domain Generalized Object Detection in Urban Scene via Cyclic-Disentangled Self-Distillation link

17. Revisiting Weakly Supervised Pre-Training of Visual Perception Models link

  • multi-label (hashtags) classification
  • target is a uniform probality distribution on all hashtags for an image

18. Failure Modes of Domain Generalization Algorithms link

19. Learning Part Segmentation Through Unsupervised Domain Adaptation From Synthetic Vehicles link

Vision Applications & Systems

20. Large-Scale Pre-Training for Person Re-Identification With Noisy Labels link

Recognition: Detection, Categorization, Retrieval

21. Efficient Video Instance Segmentation via Tracklet Query and Proposal link

  • both tracklet and appearance query
  • both bounding box and mask output

22. UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection [link]

(https://arxiv.org/abs/2111.08644)

3D From Single Images

23. Learning To Estimate Robust 3D Human Mesh From In-the-Wild Crowded Scenes link

  • use 2d pose to reduce domain gap
  • self-updated 2d pose from off-the-shelf model

24. Exploiting Pseudo Labels in a Self-Supervised Learning Framework for Improved Monocular Depth Estimation link

  • augmentation + consistency

Low-Level Vision

25. Multi-Scale Memory-Based Video Deblurring link

  • multi-scale
  • memory-based, remember the sharp and inference on blur

Behavior Analysis

25. Self-Supervised Keypoint Discovery in Behavioral Videos link

  • self-supervised pretraining + downstream tasks

26. GLASS: Geometric Latent Augmentation for Shape Spaces link

Vision & Language

27. Video-Text Representation Learning via Differentiable Weak Temporal Alignment link

  • pretraining though multimodal alignment like video version CLIP

28. End-to-End Referring Video Object Segmentation With Multimodal Transformers link

  • multimodal transformer
  • parallel for all the frames instead of sequetial based on memory bank

29. Are Multimodal Transformers Robust to Missing Modality? link

30. Robust Cross-Modal Representation Learning With Progressive Self-Distillation link

31. Multimodal Dynamics: Dynamical Fusion for Trustworthy Multimodal Classification link

Video Analysis & Understanding

32. MLP-3D: A MLP-Like 3D Architecture With Grouped Time Mixing link

33. Coarse-To-Fine Feature Mining for Video Semantic Segmentation link

34. The DEVIL Is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting link

35. YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset link

36. Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark link

  • changable reflective field for attention

Transfer / Low-Shot / Long-Tail Learning

37. Which Model To Transfer? Finding the Needle in the Growing Haystack link

  • pretrain model selecting for downstream tasks

38. Task2Sim: Towards Effective Pre-Training and Transfer From Synthetic Data link

  • use RL to control the parameter of synthetic data generator

Pose Estimation & Tracking

39. MetaPose: Fast 3D Pose From Multiple Views Without 3D Supervision link

40. Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation link

41. PoseTrack21: A Dataset for Person Search, Multi-Object Tracking and Multi-Person Pose Tracking link

42. DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion link

  • train a refine net with only 2D gt

43. DiffPoseNet: Direct Differentiable Camera Pose Estimation link

Recognition: Detection, Categorization, Retrieval

44. Multi-Granularity Alignment Domain Adaptation for Object Detection link

45. Cross-Domain Adaptive Teacher for Object Detection link

  • pixel-/instance-/catagory- level discrimination

Self-, Semi-, Meta-, & Unsupervised Learning

46. DASO: Distribution-Aware Semantics-Oriented Pseudo-Label for Imbalanced Semi-Supervised Learning link

47. Unbiased Teacher v2: Semi-Supervised Object Detection for Anchor-Free and Anchor-Based Detectors link

48. Semi-Supervised Semantic Segmentation With Error Localization Network link

49. Debiased Learning From Naturally Imbalanced Pseudo-Labels link

  • similar to entropy filter

Image & Video Synthesis and Generation

50. Multi-View Consistent Generative Adversarial Networks for 3D-Aware Image Synthesis link

Datasets and Evaluation

51. SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain Adaptation link

52. Replacing Labeled Real-Image Datasets With Auto-Generated Contours link

About

workshop, tutorial, oral, and poster with notes in cvpr2022

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published