SSL

Jump to bottom

Yuwei (Evelyn) Zhang edited this page Jun 27, 2024 · 28 revisions

Patterns - Dimitris

review Chowdhury et al.

Vision

SimCLR — A Simple Framework for Contrastive Learning of Visual Representations ICML 2020 | code

contrastive learning with cropping and color as augmentations
maximizes agreement between differently transformed views of the same sample via a contrastive cosine similarity loss in the latent space

BYOL - Bootstrap Your Own Latent A New Approach to Self-Supervised Learning NeurIPS 2020

not contrastive, done without negative pairs
iteratively update the representations, use exponential history mean network as the target

Barlow twins - Barlow twins: Self-supervised learning via redundancy reduction PMLR | code *

Contrastive Clustering AAAI2021 code

每个feature dimension对应一个cluster，（在他们的任务中）对应一个class
两部分contrastive loss：instance-level 和cluster-level，让cluster之间不同
矩阵的row是instance的representation，column可以看成cluster的representation (distribution on the dataset)
是不是可以用这个方法来让每个feature dimension包含不同的信息？能有更明显的cluster？

Audio

survey - Audio Self-supervised Learning: A Survey Patterns Review 2022 *

Wav2vec (2.0) - Facebook

Audio wave based contrastive learning (CNN first)
contrast between before transformer and after transformer

HuBERT - Facebook 2021

predict K-means clustering sudo label

BigSSL-CAP 2022 Google Wav2vec loss on mel-spectrogram

COLA - Contrastive learning of general-purpose audio representations ICASSP 2021 | Google | code

same audio as positive pair, others in batch as negative

BYOL-Audio - BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation | code

employed normalization and augmentations to modify BYOL for audio SSL
might be useful: lots of parameters mentioned in the paper

M2D - Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input | code

MAE in the way of BYOL (non reconstruction, let the masked and unmasked input has the same representation directly)
Maximize the similarity between masked ground truth (from target network) and generated patches (p + online network)

CLAR - Contrastive Learning of Auditory Representations AISTATS 2021 code

time-frequency audio features is better than 1D; no contrast between them
semi-supervised: using supervised CE and contrastive learning CL simultaneously while training
baseline: CE, SupCon, SimCLR

Multi-Format Contrastive Learning of Audio Representations Google Deepmind | Self-Supervised Learning for Speech and Audio Processing Workshop @ NeurIPS 2020

similar to time-frequency consistency
maximizing the agreement between the raw audio and its spectral representation

ATST Self-Supervised Audio Teacher-Student Transformer for Both Clip-Level and Frame-Level Tasks | code

Teacher-student (BYOL architecture),
Clip+Frame, combine together through distillation

SSAST: Self-Supervised Audio Spectrogram Transformer AAAI 2022 | MIT

joint discriminative and generative masked spectrogram patch modeling using unlabeled audio from AudioSet and Librispeech
two MLP heads for contrastive/reconstruction

MAE-AST: Masked Autoencoding Audio Spectrogram Transformer Interspeech 2022 | code

improve upon SSAST in efficiency and outperform

Audio-MAE Masked Autoencoders that Listen | Meta & CMU | NeurIPS 2022 | code

uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures | code

use contrastive learning to tune MAE pretrained model, with the idea from CV contrastive tuning
employ mixup to enhance the efficiency, using less unlabeled data in CL

Using additional information

Supervised contrastive learning NIPS2020 code pytorch google

Contrastive learning of heart and lung sounds for label-efficient diagnosis Patterns code Stanford + Harvard

用metadata (sex, age, recording location)来进行positive和negative sample 的选择, supervised contrastive loss
diagnosing heart and lung diseases through heart and lung sound recordings
linear(representation evaluation) 和finetuning (initialization evaluation) 都evaluate了

Weakly Supervised Contrastive Learning ICCV2021 code

first head: instance discrimination, infoNCE (NT-Xent) loss
second head: weak label based on the connected component labeling process, supervised contrastive loss

Lung sounds

Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification InterSpeech 2023 code

architecture: Audio Spectrogram Transformer (AST)
mixup patches and contrast similarity of both

[Stethoscope-guided Supervised Contrastive Learning] ICASSP 2024

promoting similarity in the same class, 包括不同domain的
stop-gradient of target representation proves useful
其实还是supcon？

combining language models / LLM

CLIP - Learning Transferable Visual Models From Natural Language Supervision ICLR2021 openAI open implementation

同时train text和vision encoder，让同一对pair的相似其他的不相似
问题：有false negative pairs

CLAP - Learning audio concepts from natural language supervision Microsoft ICASSP2023 code

Zero-Shot, Frozen, Finetuning evaluation都做了
downstream: sound event classification, speech emotion recognition

MedCLIP - MedCLIP: Contrastive Learning from Unpaired Medical Images and Text EMNLP2022 code

two challenges:
1. Limited medical paired data, mostly only label, no reports
2. False negatives. Different pairs can be similar.
method: Decouple Image-Text Pairs with Medical Knowledge Extractor: map label/report to a vector of 14 entities using MetaMap
differentiate samples via their semantic similarities, and define similarity soft targets, a Semantic Matching Loss
那我们直接用symptom的向量算similarity试一下呢？

Health domain

Time series

Neighborhood-based

NCL - Neighborhood Contrastive Learning Applied to Online Patient Monitoring ICML 2021 | code

TNC - Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding ICLR 2021 | code

Self-Supervised Pretraining and Transfer Learning Enable Flu and COVID-19 Predictions in Small Mobile Sensing Datasets Conference on Health, Inference, and Learning (CHIL) 2023 | code data on request

dataset: Homekit Flu Monitoring Study: 591k user-days of Fitbit data, 5196 participants, 6 months
pretext task: same user, reconstruct data, predict Domain Inspired Features (best!)
transfer task: infer positive / symptoms

Step2Heart

SelfHAR

CLOCS