-
Notifications
You must be signed in to change notification settings - Fork 0
Machine Learning Glossary
rameshjesswani edited this page Sep 12, 2018
·
4 revisions
- Representation in which vector is 0 in most of the dimensions and 1 in only one dimension
- For suppose, if we have 10 classes(10 digits), and we want to tell the classifier that Image contains the digit
2
. We can represent it as follows:
[0,0,1,0,0,0,0,0,0,0]
- It represents how far off our model is from our desired outcome. We try to minimize that error, and the smaller the error margin, the better our model is.
- cross-entropy
- It is a form of regularization used to avoid over-fitting when training a model with an iterative method such as gradient descent. Furthermore, early stopping rules provide guidance as to how many iterations can be run before the models begins to over-fit.
- Early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit. Early stopping rules have been employed in many different machine learning methods, with varying amounts of theoretical foundation.
A Distant supervision algorithm usually has the following steps:
- It may have some labeled training data
- It "has" access to a pool of unlabeled data
- It has an operator that allows it to sample from this unlabeled data and label them and this operator is expected to be noisy in its labels
- The algorithm then collectively utilizes the original labeled training data if it had and this new noisily labeled data to give the final output.
- You want to train a neural network to perform a task (e.g. classification) on a data set (e.g. a set of images). You start training by initializing the weights randomly. As soon as your start training, the weights are changed in order to perform the task with less mistakes (i.e. optimization). Once you're satisfied with the training results you save the weights of your network somewhere.
- You are now interested in training a network to do perform a new task (e.g. object detection) on a different data set (e.g. images too but not the same as the ones you used before). Instead of repeating what you did for the first network and start from training with randomly initialized weights, you can use the weights you saved from the previous network as the initial weight values for your new experiment. Initializing the weights this way is referred to as using a pre-trained network. The first network is your pre-trained network. The second one is the network you are fine-tuning.
- The idea behind pre-training is that random initialization is...well...random, the values of the weights have nothing to do with the task your trying to solve. Why should a set of values be any better than another set? But how else would you initialize the weights? If you knew how to initialize them properly for the task, you might as well set them to the optimal values (slightly exaggerated). No need to train anything. You have the optimal solution to your problem. Pre-training gives the network a head start. As if it has seen the data before.
- Using a pre-trained network makes sense if both the datasets of two tasks are related, so that pre-trained network can be more effective.
- The bigger the gap between two datasets, the less effective pre-training will be. It makes little sense to pre-train a network for image classification by training it one financial data first. In this case there's too much disconnect between the pre-training and fine-tuning stages.
- End-to-end learning usually refers to omitting any hand-crafted intermediary algorithms and directly learning the solution of a given problem from the sampled dataset. This could involve concatenation of different networks such as multiple CNNs and LSTMs, which are trained simultaneously.
- For the OCR example, instead of trying to classify characters and clustering them into words, it is shown to be a better approach to directly use CNNs to regress the words themselves.
- In self driving cars, the network can be trained to directly learn how to drive.
- In all such examples, the idea is to let the network go from the "raw-est" possible data to the final-most output. This is found to perform better. End-to-end learning reduces the effort of human design and performs better in most applications.
- Technique that summarizes the performance of the classification algorithm
- Confusion Matrix shows the ways in which our classification model is confused when it makes predictions
- The number of correct and incorrect predictions for each class are summarized in one matrix
Source: Deep Learning with Python book