Types Of Machine Learning

A wonderful introduction into machine learning, and how to choose the right algorithm or family of algorithms for the task at hand.

VARIOUS MODEL FAMILIES

Stanford cs221 - reflex, variable, state, logic

WEAKLY SUPERVISED

Text classification with extremely small datasets, relies heavily on feature engineering methods such as number of hashtags, number of punctuations and other insights that are really good for this type of text.
A great review paper for weakly supervision, discusses:
1. Incomplete supervision
2. Inaccurate
3. Inexact
4. Active learning
Stanford on weakly
Stanford ai on snorkel
Hazy research on weak and snorkel
Out of distribution generalization using test-time training - Test-time training turns a single unlabeled test instance into a self-supervised learning problem, on which we update the model parameters before making a prediction on this instance.
Learning Deep Networks from Noisy Labels with Dropout Regularization - Large datasets often have unreliable labels—such as those obtained from Amazon’s Mechanical Turk or social media platforms—and classifiers trained on mislabeled datasets often exhibit poor performance. We present a simple, effective technique for accounting for label noise when training deep neural networks. We augment a standard deep network with a softmax layer that models the label noise statistics. Then, we train the deep network and noise model jointly via end-to-end stochastic gradient descent on the (perhaps mislabeled) dataset. The augmented model is overdetermined, so in order to encourage the learning of a non-trivial noise model, we apply dropout regularization to the weights of the noise model during training. Numerical experiments on noisy versions of the CIFAR-10 and MNIST datasets show that the proposed dropout technique outperforms state-of-the-art methods.
Distill to label weakly supervised instance labeling using knowledge distillation - “Weakly supervised instance labeling using only image-level labels, in lieu of expensive fine-grained pixel annotations, is crucial in several applications including medical image analysis. In contrast to conventional instance segmentation scenarios in computer vision, the problems that we consider are characterized by a small number of training images and non-local patterns that lead to the diagnosis. In this paper, we explore the use of multiple instance learning (MIL) to design an instance label generator under this weakly supervised setting. Motivated by the observation that an MIL model can handle bags of varying sizes, we propose to repurpose an MIL model originally trained for bag-level classification to produce reliable predictions for single instances, i.e., bags of size 1. To this end, we introduce a novel regularization strategy based on virtual adversarial training for improving MIL training, and subsequently develop a knowledge distillation technique for repurposing the trained MIL model. Using empirical studies on colon cancer and breast cancer detection from histopathological images, we show that the proposed approach produces high-quality instance-level prediction and significantly outperforms state-of-the MIL methods.”
Yet another article summarising FAIR

SEMI SUPERVISED

Paper review
Ruder an overview of proxy labeled for semi supervised (AMAZING)
Self training
Tri training
Fast ai forums
UDA GIT, paper, medium*, medium 2 (has data augmentation articles)
s4l
Google’s UDM and MixMatch dissected- For text classification, the authors used a combination of back translation and a new method called TF-IDF based word replacing.

Back translation consists of translating a sentence into some other intermediate language (e.g. French) and then translating it back to the original language (English in this case). The authors trained an English-to-French and French-to-English system on the WMT 14 corpus.

TF-IDF word replacement replaces words in a sentence at random based on the TF-IDF scores of each word (words with a lower TF-IDF have a higher probability of being replaced).

MixMatch, medium, 2, 3, 4, that works by guessing low-entropy labels for data-augmented unlabeled examples and mixing labeled and unlabeled data using MixUp. We show that MixMatch obtains state-of-the-art results by a large margin across many datasets and labeled data amounts
ReMixMatch - paper is really good. “We improve the recently-proposed “MixMatch” semi-supervised learning algorithm by introducing two new techniques: distribution alignment and augmentation anchoring”
FixMatch - FixMatch is a recent semi-supervised approach by Sohn et al. from Google Brain that improved the state of the art in semi-supervised learning(SSL). It is a simpler combination of previous methods such as UDA and ReMixMatch.
Curriculum Labeling: Self-paced Pseudo-Labeling for Semi-Supervised Learning
FAIR ****2 original, Summarization of FAIR’s student teacher weak/ semi supervision
Leveraging Just a Few Keywords for Fine-Grained Aspect Detection Through Weakly Supervised Co-Training
Fidelity-Weighted Learning - “fidelity-weighted learning” (FWL), a semi-supervised student- teacher approach for training deep neural networks using weakly-labeled data. FWL modulates the parameter updates to a student network (trained on the task we care about) on a per-sample basis according to the posterior confidence of its label-quality estimated by a teacher (who has access to the high-quality labels). Both student and teacher are learned from the data.
Unproven student teacher git
A really nice student teacher git with examples.

Teacher student for tri training for unlabeled data exploitation

REGRESSION

Metrics:

R2
Medium 1, 2, 3, 4,
Tutorial

ACTIVE LEARNING

If you need to start somewhere start here - types of AL, the methodology, examples, sample selection functions.
A thorough review paper about AL
The book on AL
Choose your model first, then do AL, from lighttag
1. The alternative is Query by committee - Importantly, the active learning method we presented above is the most naive form of what is called "uncertainty sampling" where we chose to sample based on how uncertain our model was. An alternative approach, called Query by Committee, maintains a collection of models (the committee) and selecting the most "controversial" data point to label next, that is one where the models disagreed on. Using such a committee may allow us to overcome the restricted hypothesis a single model can express, though at the onset of a task we still have no way of knowing what hypothesis we should be using.
2. Paper: warning against transferring actively sampled datasets to other models
How to increase accuracy with AL
AL with model selection - paper
Using weak and strong oracle in AL, paper.
The pitfalls of AL - how to choose (cost-effectively) the active learning technique when one starts without the labeled data needed for methods like cross-validation; 2. how to choose (cost-effectively) the base learning technique when one starts without the labeled data needed for methods like cross-validation, given that we know that learning curves cross, and given possible interactions between active learning technique and base learner; 3. how to deal with highly skewed class distributions, where active learning strategies find few (or no) instances of rare classes; 4. how to deal with concepts including very small subconcepts (“disjuncts”)—which are hard enough to find with random sampling (because of their rarity), but active learning strategies can actually avoid finding them if they are misclassified strongly to begin with; 5. how best to address the cold-start problem, and especially 6. whether and what alternatives exist for using human resources to improve learning, that may be more cost efficient than using humans simply for labeling selected cases, such as guided learning [3], active dual supervision [2], guided feature labeling [1], etc.
Confidence based stopping criteria paper
A great tutorial
An ok video
Active learning framework in python
Active Learning Using Pre-clustering
A literature survey of active machine learning in the context of natural language processing
Mnist competition (unpublished) using AL
Practical Online Active Learning for Classification
Video 2
Active learning in R - code
Deep bayesian active learning with image data
Medium on AL***

Robert munro on active learning - should buy his book:

GIT
Active transfer learning
Uncertainty sampling ****
1. Least Confidence: difference between the most confident prediction and 100% confidence
2. Margin of Confidence: difference between the top two most confident predictions
3. Ratio of Confidence: ratio between the top two most confident predictions
4. Entropy: difference between all predictions, as defined by information theory
Diversity sampling - you want to make sure that it covers as diverse a set of data and real-world demographics as possible.
1. Model-based Outliers: sampling for low activation in your logits and hidden layers to find items that are confusing to your model because of lack of information
2. Cluster-based Sampling: using Unsupervised Machine Learning to sample data from all the meaningful trends in your data’s feature-space
3. Representative Sampling: sampling items that are the most representative of the target domain for your model, relative to your current training data
4. Real-world diversity: using sampling strategies that increase fairness when trying to support real-world diversity

Combine uncertainty sampling and diversity sampling
1. Least Confidence Sampling with Clustering-based Sampling: sample items that are confusing to your model and then cluster those items to ensure a diverse sample (see diagram below).
2. Uncertainty Sampling with Model-based Outliers: sample items that are confusing to your model and within those find items with low activation in the model.
3. Uncertainty Sampling with Model-based Outliers and Clustering: combine methods 1 and 2.
4. Representative Cluster-based Sampling: cluster your data to capture multinodal distributions and sample items that are most like your target domain (see diagram below).
5. Sampling from the Highest Entropy Cluster: cluster your unlabeled data and find the cluster with the highest average confusion for your model.
6. Uncertainty Sampling and Representative Sampling: sample items that are both confusing to your current model and the most like your target domain.
7. Model-based Outliers and Representative Sampling: sample items that have low activation in your model but are relatively common in your target domain.
8. Clustering with itself for hierarchical clusters: recursively cluster to maximize the diversity.
9. Sampling from the Highest Entropy Cluster with Margin of Confidence Sampling: find the cluster with the most confusion and then sample for the maximum pairwise label confusion within that cluster.
10. Combining Ensemble Methods and Dropouts with individual strategies: aggregate results that come from multiple models or multiple predictions from one model via Monte-Carlo Dropouts aka Bayesian Deep Learning.
Active transfer learning.

Machine in the loop

Similar to AL, just a machine / model / algo adds suggestions. This is obviously a tradeoff of bias and clean dataset

ONLINE LEARNING

If you want to start with OL - start here & here
Shay Shalev - A thesis about online learning ****
Some answers about what is OL, the first one actually talks about S.Shalev’s other paper.
Online learning - Andrew Ng - coursera
Chip Huyen on online prediction & learning

ONLINE DEEP LEARNING (ODL)

Hedge back propagation (HDP), Autonomous DL, Qactor - online AL for noisy labeled stream data.

N-SHOT LEARNING

Zero shot, one shot, few shot (siamese is one shot)

ZERO SHOT LEARNING

Instead of using class labels, we use some kind of vector representation for the classes, taken from a co-occurrence-after-svd or word2vec. - quite clever. This enables us to figure out if a new unseen class is near one of the known supervised classes. KNN can be used or some other distance-based classifier. Can we use word2vec for similarity measurements of new classes?
for classification, we can use nearest neighbour or manifold-based labeling propagation.
Multiple category vectors? Multilabel zero-shot also in the video

GPT3 is ZERO, ONE, FEW

Prompt Engineering Tips & Tricks
Open GPT3 prompt engineering

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

types-of-machine-learning.md

types-of-machine-learning.md

Types Of Machine Learning

VARIOUS MODEL FAMILIES

WEAKLY SUPERVISED

SEMI SUPERVISED

REGRESSION

ACTIVE LEARNING

ONLINE LEARNING

ONLINE DEEP LEARNING (ODL)

N-SHOT LEARNING

ZERO SHOT LEARNING

GPT3 is ZERO, ONE, FEW

Files

types-of-machine-learning.md

Latest commit

History

types-of-machine-learning.md

File metadata and controls

Types Of Machine Learning

VARIOUS MODEL FAMILIES

WEAKLY SUPERVISED

SEMI SUPERVISED

REGRESSION

ACTIVE LEARNING

ONLINE LEARNING

ONLINE DEEP LEARNING (ODL)

N-SHOT LEARNING

ZERO SHOT LEARNING

GPT3 is ZERO, ONE, FEW