## Machine Learning Quiz

0. What is classification? What is regression?

1. What's the difference between classification and regression?

*Suppose we have a data set with a number of inputs together with corresponding targets. In both classification and regression, what we want to do is learn from the data set the relationship between the inputs and the targets. In other words, given a new data point's input, we want to predict the target using the learned relationship.*

*In classification, this target is *categorical*, e.g something like "dog", "cat", "sheep" if the target is an animal.*

*In regression, the target is a *real number*, e.g. something like 25.1 for a temperature target or 180cm for a height target.*

2. Why is dimensionality reduction useful?

*High-dimensional data is hard to visualise and often hard to train a machine learning model on due to the *curse of dimensionality*.*

*Dimensionality reduction maps a set of high-dimensional data points down to a low number of dimensions (e.g. 2 or 3 dimensions for visualisation). This low-dimensional representation is an approximation, but should capture as much of the original high-dim structure as possible. In other words, points that are close together in high-dim space should stay close together in low-dim space, and points that are far apart should stay far apart.*

*We can then use this low-dimensional approximation as our new set of features for a machine learning model.*

3. What's the purpose of clustering?

*Given a set of data points, the goal of clustering is to assign groups of points to separate "clusters". Similar points should be grouped in the same cluster, and dissimilar points should be in different clusters.*

*Clustering gives us insight into the intrinsic structure of the data; we would typically expect points within the same cluster to behave similarly. For example, it is often the case that points within the same cluster map to the same target in regression/classification.*

4. What is the difference between a categorical and numerical feature?

*A categorical feature takes on one of a number of classes, e.g. a gender feature with values "male" or "female" or the make of a car.*

*A numerical feature is a feature that can be any real number, e.g. height or weight.*

5. How does linear regression work?

*In linear regression, we assume that the target is a linear function of the features. In other words, the target is a weighted sum of the features (possibly with a constant "intercept" added on).*

*To make a prediction for a new input, we simply calculate this weighted sum of its features to give us the predicted target.*

*Training a linear regression model involves learning the weights for the weighted sum from a training set. The most straightforward way of doing this involves finding the weights that minimise the sum of squared errors over the training set. I could tell you more, but you'd have to pay me.*

6. How does the naive Bayes algorithm work?

*Naive Bayes is used for classification. It assumes that the feature values are independent given the target class, and this assumption allows us to easily predict the class for a new input using Bayes' theorem.*

*Training naive Bayes involves estimating a 1D distribution for each individual feature from the training data (over each separate class). This distribution might be Gaussian for a continuous feature or a multinomial distribution for a categorical feature.*

*We can use each learned feature distribution to estimate the probabilities of obtaining a new input's feature values for each class. Because of our independence assumption, we can then apply Bayes' theorem to derive the probability of each class given the new feature values.*

7. What's the purpose of splitting a data set into training and test sets?

*The machine learning model is trained over the training set; typically this involves trying to minimise some sort of error over the training set.*

*To assess the performance of the model, we should compute the error over a completely "unseen" test set. This will give us an unbiased estimate of how well the model will generalise to new data.*

*Using the training set to compute the error would give an overly optimistic estimate, as the model has already seen all the data in the training set and used it to fit its parameters.*

8. What is overfitting? How can you detect when your machine learning algorithm is overfitting?

*Overfitting happens when the model fits to the noise in the data instead of learning the underlying function mapping inputs to targets.*

*If the error over the test set is much lower than the error over the training set, this indicates that overfitting has occurred.*

9. How can overfitting be combatted?

*Generally overfitting occurs when the model is too complex relative to the amount of training data we've been given. Some ways to combat it:*

*Penalise overly complex models using "regularisation". These techniques squash parameters towards zero, reducing model complexity.*

*Assess generalisation error over a separate validation set (or using cross-validation) and make sure this is low*

*Reduce the number of features used, either by manually preprocessing the data and eliminating irrelevant features or by dimensionality reduction*

10. Name two ways of evaluating the performance of a regression algorithm.

*Mean-squared error over the test set: just the average of the squared differences between the predicted targets and the actual targets.*

*Coefficient of determination $R^2$ over the test set: when the predicted targets are plotted against the actual targets, measures the deviation from the "identity line" of perfect prediction.*

11. Name two ways of evaluating the performance of a classification algorithm.

*Accuracy over the test set: Simplest way of assessing the performance of a classifier. Just the number of correctly classified points.*

*Confusion matrix over the test set: Shows the predicted classes vs the actual classes in a matrix. For example, the matrix entry with row "dog" and column "cat" shows the number of test observations with actual class "dog" but predicted class "cat".*

*For a good model, the diagonal elements of the confusion matrix should be much greater than those off the diagonal.*

12. How does k-NN work?

*k-nearest neighbours can be used for both classification and regression.*

*For a new input, we make a prediction by simply combining the outputs of the k nearest neighbours of the new input.*

*In the case of classification, we take the majority vote of the output classes of the k nearest neighbours.*

*For regression, we take the average of the output values of the k nearest neighbours.*

13. How does a support vector machine work?

*In the case of a linear SVM with a binary target (i.e. just 2 target classes), the SVM will attempt to learn a "hyperplane" in feature space separating the 2 classes.*

*So for a new input, we simply see which side of the hyperplane the input lies on and use this to predict the target class.*

*A linear SVM learns the hyperplane that gives rise to the largest separation (the "margin") between the two classes. This is called the "maximum-margin" classifier.*

*Unless the data is perfectly linearly separable, some data will fall into the margin as we won't have perfect separation. So what we do is trade off the size of the margin vs. the number of points that fall into the margin. This trade-off can be specified via an SVM parameter.*

14. What's the purpose of a validation set?

*A validation set is used to perform model selection, especially in terms of choosing parameters that control model complexity, by assessing the generalisation performance of the algorithm on this validation set with some kind of accuracy metric. For example, we might choose a parameter that controls how much to penalise complex models, i.e. the trade-off between the model bias and its variance.*

*Or we might even create multiple different models and choose the one with the best performance over the validation set.*

15. How does a decision tree work?

16. What is logistic regression?

17. What is cross-validation?

18. What is underfitting? How can you detect when your machine learning algorithm is underfitting?

19. Explain in simple terms how machine learning algorithms trade off bias against variance.

20. What is the curse of dimensionality and how would you combat it?


height = (180, 175, 155)
weight = (75, 85, 65)
gender = ("male", "female")