Skip to content

Machine Learning Glossary

rameshjesswani edited this page Sep 12, 2018 · 4 revisions

One-hot vector representation

  • Representation in which vector is 0 in most of the dimensions and 1 in only one dimension
  • For suppose, if we have 10 classes(10 digits), and we want to tell the classifier that Image contains the digit 2. We can represent it as follows:
[0,0,1,0,0,0,0,0,0,0]

Loss or cost function

  • It represents how far off our model is from our desired outcome. We try to minimize that error, and the smaller the error margin, the better our model is.

Different Loss function

  • cross-entropy

Early stopping

  • It is a form of regularization used to avoid over-fitting when training a model with an iterative method such as gradient descent. Furthermore, early stopping rules provide guidance as to how many iterations can be run before the models begins to over-fit.
  • Early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit. Early stopping rules have been employed in many different machine learning methods, with varying amounts of theoretical foundation.

Distant supervision (can be called as semi-supervised learning)

A Distant supervision algorithm usually has the following steps:

  • It may have some labeled training data
  • It "has" access to a pool of unlabeled data
  • It has an operator that allows it to sample from this unlabeled data and label them and this operator is expected to be noisy in its labels
  • The algorithm then collectively utilizes the original labeled training data if it had and this new noisily labeled data to give the final output.

Pre-training

Usual way of training the network

  • You want to train a neural network to perform a task (e.g. classification) on a data set (e.g. a set of images). You start training by initializing the weights randomly. As soon as your start training, the weights are changed in order to perform the task with less mistakes (i.e. optimization). Once you're satisfied with the training results you save the weights of your network somewhere.
  • You are now interested in training a network to do perform a new task (e.g. object detection) on a different data set (e.g. images too but not the same as the ones you used before). Instead of repeating what you did for the first network and start from training with randomly initialized weights, you can use the weights you saved from the previous network as the initial weight values for your new experiment. Initializing the weights this way is referred to as using a pre-trained network. The first network is your pre-trained network. The second one is the network you are fine-tuning.
  • The idea behind pre-training is that random initialization is...well...random, the values of the weights have nothing to do with the task your trying to solve. Why should a set of values be any better than another set? But how else would you initialize the weights? If you knew how to initialize them properly for the task, you might as well set them to the optimal values (slightly exaggerated). No need to train anything. You have the optimal solution to your problem. Pre-training gives the network a head start. As if it has seen the data before.

What to watch out for when pre-training:

  • Using a pre-trained network makes sense if both the datasets of two tasks are related, so that pre-trained network can be more effective.
  • The bigger the gap between two datasets, the less effective pre-training will be. It makes little sense to pre-train a network for image classification by training it one financial data first. In this case there's too much disconnect between the pre-training and fine-tuning stages.

End-to-End Learning

  • End-to-end learning usually refers to omitting any hand-crafted intermediary algorithms and directly learning the solution of a given problem from the sampled dataset. This could involve concatenation of different networks such as multiple CNNs and LSTMs, which are trained simultaneously.
  • For the OCR example, instead of trying to classify characters and clustering them into words, it is shown to be a better approach to directly use CNNs to regress the words themselves.
  • In self driving cars, the network can be trained to directly learn how to drive.
  • In all such examples, the idea is to let the network go from the "raw-est" possible data to the final-most output. This is found to perform better. End-to-end learning reduces the effort of human design and performs better in most applications.

Confusion Matrix

  • Technique that summarizes the performance of the classification algorithm
  • Confusion Matrix shows the ways in which our classification model is confused when it makes predictions
  • The number of correct and incorrect predictions for each class are summarized in one matrix

Deep Learning Training Loop

General Training Loop in Deep Learning

Source: Deep Learning with Python book