Skip to content
Yuwei (Evelyn) Zhang edited this page Jun 27, 2024 · 28 revisions

Patterns - Dimitris

review Chowdhury et al.

Vision

SimCLR — A Simple Framework for Contrastive Learning of Visual Representations ICML 2020 | code

  • contrastive learning with cropping and color as augmentations
  • maximizes agreement between differently transformed views of the same sample via a contrastive cosine similarity loss in the latent space

BYOL - Bootstrap Your Own Latent A New Approach to Self-Supervised Learning NeurIPS 2020

  • not contrastive, done without negative pairs
  • iteratively update the representations, use exponential history mean network as the target

Barlow twins - Barlow twins: Self-supervised learning via redundancy reduction PMLR | code *

Contrastive Clustering AAAI2021 code

  • 每个feature dimension对应一个cluster,(在他们的任务中)对应一个class
  • 两部分contrastive loss:instance-level 和cluster-level,让cluster之间不同
  • 矩阵的row是instance的representation,column可以看成cluster的representation (distribution on the dataset)
  • 是不是可以用这个方法来让每个feature dimension包含不同的信息?能有更明显的cluster?

Audio

survey - Audio Self-supervised Learning: A Survey Patterns Review 2022 *

Wav2vec (2.0) - Facebook

  • Audio wave based contrastive learning (CNN first)
  • contrast between before transformer and after transformer

HuBERT - Facebook 2021

  • predict K-means clustering sudo label

BigSSL-CAP 2022 Google Wav2vec loss on mel-spectrogram

COLA - Contrastive learning of general-purpose audio representations ICASSP 2021 | Google | code

  • same audio as positive pair, others in batch as negative

BYOL-Audio - BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation | code

  • employed normalization and augmentations to modify BYOL for audio SSL
  • might be useful: lots of parameters mentioned in the paper

M2D - Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input | code

  • MAE in the way of BYOL (non reconstruction, let the masked and unmasked input has the same representation directly)
  • Maximize the similarity between masked ground truth (from target network) and generated patches (p + online network)

CLAR - Contrastive Learning of Auditory Representations AISTATS 2021 code

  • time-frequency audio features is better than 1D; no contrast between them
  • semi-supervised: using supervised CE and contrastive learning CL simultaneously while training
  • baseline: CE, SupCon, SimCLR

Multi-Format Contrastive Learning of Audio Representations Google Deepmind | Self-Supervised Learning for Speech and Audio Processing Workshop @ NeurIPS 2020

  • similar to time-frequency consistency
  • maximizing the agreement between the raw audio and its spectral representation

ATST Self-Supervised Audio Teacher-Student Transformer for Both Clip-Level and Frame-Level Tasks | code

  • Teacher-student (BYOL architecture),
  • Clip+Frame, combine together through distillation

SSAST: Self-Supervised Audio Spectrogram Transformer AAAI 2022 | MIT

  • joint discriminative and generative masked spectrogram patch modeling using unlabeled audio from AudioSet and Librispeech
  • two MLP heads for contrastive/reconstruction

MAE-AST: Masked Autoencoding Audio Spectrogram Transformer Interspeech 2022 | code

  • improve upon SSAST in efficiency and outperform

Audio-MAE Masked Autoencoders that Listen | Meta & CMU | NeurIPS 2022 | code

uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures | code

  • use contrastive learning to tune MAE pretrained model, with the idea from CV contrastive tuning
  • employ mixup to enhance the efficiency, using less unlabeled data in CL

Using additional information

Supervised contrastive learning NIPS2020 code pytorch google

Contrastive learning of heart and lung sounds for label-efficient diagnosis Patterns code Stanford + Harvard

  • 用metadata (sex, age, recording location)来进行positive和negative sample 的选择, supervised contrastive loss
  • diagnosing heart and lung diseases through heart and lung sound recordings
  • linear(representation evaluation) 和finetuning (initialization evaluation) 都evaluate了

Weakly Supervised Contrastive Learning ICCV2021 code

  • first head: instance discrimination, infoNCE (NT-Xent) loss
  • second head: weak label based on the connected component labeling process, supervised contrastive loss

Lung sounds

Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification InterSpeech 2023 code

  • architecture: Audio Spectrogram Transformer (AST)
  • mixup patches and contrast similarity of both

[Stethoscope-guided Supervised Contrastive Learning] ICASSP 2024

  • promoting similarity in the same class, 包括不同domain的
  • stop-gradient of target representation proves useful
  • 其实还是supcon?

combining language models / LLM

CLIP - Learning Transferable Visual Models From Natural Language Supervision ICLR2021 openAI open implementation

  • 同时train text和vision encoder, 让同一对pair的相似 其他的不相似
  • 问题:有false negative pairs

CLAP - Learning audio concepts from natural language supervision Microsoft ICASSP2023 code

  • Zero-Shot, Frozen, Finetuning evaluation都做了
  • downstream: sound event classification, speech emotion recognition

MedCLIP - MedCLIP: Contrastive Learning from Unpaired Medical Images and Text EMNLP2022 code

  • two challenges:
    1. Limited medical paired data, mostly only label, no reports
    2. False negatives. Different pairs can be similar.
  • method: Decouple Image-Text Pairs with Medical Knowledge Extractor: map label/report to a vector of 14 entities using MetaMap
  • differentiate samples via their semantic similarities, and define similarity soft targets, a Semantic Matching Loss
  • 那我们直接用symptom的向量算similarity试一下呢?

Health domain

Time series

Neighborhood-based

NCL - Neighborhood Contrastive Learning Applied to Online Patient Monitoring ICML 2021 | code

TNC - Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding ICLR 2021 | code

Self-Supervised Pretraining and Transfer Learning Enable Flu and COVID-19 Predictions in Small Mobile Sensing Datasets Conference on Health, Inference, and Learning (CHIL) 2023 | code data on request

  • dataset: Homekit Flu Monitoring Study: 591k user-days of Fitbit data, 5196 participants, 6 months
  • pretext task: same user, reconstruct data, predict Domain Inspired Features (best!)
  • transfer task: infer positive / symptoms

Step2Heart

SelfHAR

CLOCS

Clone this wiki locally