Added answers for Quiz 6(2), 7, 8(1).

johanga · Nov 27, 2016 · 9d4c754 · 9d4c754
1 parent c0d8571
commit 9d4c754
Show file tree

Hide file tree

Showing 3 changed files with 170 additions and 0 deletions.
diff --git a/quiz/week6_2.txt b/quiz/week6_2.txt
@@ -0,0 +1,71 @@
+Machine Learning System Design Quiz.
+===============================================================================================================================================================
+Question 1.
+You are working on a spam classification system using regularized logistic regression. 
+"Spam" is a positive class (y = 1) and "not spam" is the negative class (y = 0). 
+You have trained your classifier and there are m = 1000 examples in the cross-validation set. The chart of predicted class vs. actual class is:
+
+	           | Actual Class: 1 | Actual Class: 0 |
+-------------------|-----------------|-----------------|
+Predicted Class: 1 |       85        |       890       |
+Predicted Class: 0 |       15        |        10       |
+
+For reference:
+Accuracy = (true positives + true negatives) / (total examples)
+Precision = (true positives) / (true positives + false positives)
+Recall = (true positives) / (true positives + false negatives)
+F1 score = (2 * precision * recall) / (precision + recall)
+
+What is the classifier's precision (as a value from 0 to 1)?
+Enter your answer in the box below. If necessary, provide at least two values after the decimal point.
+
+A. CORRECT. Precision = 0.087179
+A. CORRECT. Recall = 0.85
+A. CORRECT. F1 Score = 0.158138
+===============================================================================================================================================================
+Question 2.
+Suppose a massive dataset is available for training a learning algorithm. Training on a lot of data is likely to give good performance when two of the following conditions hold true.
+Which are the two?
+
+1. CORRECT. Our learning algorithm is able to represent fairly complex functions (for example, if we train a neural network or other model with a large number of parameters).
+2. WRONG.   The classes are not too skewed.
+3. CORRECT. A human expert on the application domain can confidently predict y when given only the features x (or more generally, if we have some way to be confident that x contains sufficient information to predict y accurately).
+4. WRONG.   When we are willing to include high order polynomial features of x (such as x21, x22, x1x2, etc.).
+5. WRONG.   We train a model that does not use regularization.
+6. CORRECT. We train a learning algorithm with a large number of parameters (that is able to learn/represent fairly complex functions).
+7. WRONG.   We train a learning algorithm with a small number of parameters (that is thus unlikely to overfit).
+8. CORRECT. The features x contain sufficient information to predict y accurately. (For example, one way to verify this is if a human expert on the domain can confidently predict y when given only x).
+===============================================================================================================================================================
+Question 3.
+Suppose you have trained a logistic regression classifier which is outputing hθ(x).
+Currently, you predict 1 if hθ(x)≥threshold, and predict 0 if hθ(x)<threshold, where currently the threshold is set to 0.5.
+Suppose you increase the threshold to 0.7. Which of the following are true? Check all that apply.
+
+1. CORRECT. The classifier is likely to now have higher precision.
+2. WRONG.   The classifier is likely to have unchanged precision and recall, but higher accuracy.
+3. WRONG.   The classifier is likely to have unchanged precision and recall, and thus the same F1 score.
+4. WRONG.   The classifier is likely to now have higher recall.
+5. WRONG.   The classifier is likely to have unchanged precision and recall, but lower accuracy.
+6. CORRECT. The classifier is likely to now have lower recall.
+8. WRONG.   The classifier is likely to now have lower precision.
+===============================================================================================================================================================
+Question 4.
+Suppose you are working on a spam classifier, where spam emails are positive examples (y=1) and non-spam emails are negative examples (y=0). 
+You have a training set of emails in which 99% of the emails are non-spam and the other 1% is spam. Which of the following statements are true? Check all that apply.
+
+1. WRONG.   If you always predict spam (output y=1), your classifier will have a recall of 0% and precision of 99%.
+2. CORRECT. If you always predict non-spam (output y=0), your classifier will have an accuracy of 99%.
+3. CORRECT. If you always predict non-spam (output y=0), your classifier will have a recall of 0%.
+4. CORRECT. If you always predict spam (output y=1), your classifier will have a recall of 100% and precision of 1%.
+5. CORRECT. A good classifierr should have both a high precision and high recall on the cross validation set.
+6. CORRECT. If you always predict non-spam (output y=0), your classifier will have 99% accuracy on the training set, and it will likely perform similarly on the cross validation set.
+7. WRONG.   If you always predict non-spam (output y=0), your classifier will have 99% accuracy on the training set, but it will do much worse on the cross validation set because it has overfit the training data.
+===============================================================================================================================================================
+Question 5.
+Which of the following statements are true? Check all that apply.
+
+1. CORRECT. The "error analysis" process of manually examining the examples which your algorithm got wrong can help suggest what are good steps to take (e.g., developing new features) to improve your algorithm's performance.
+2. WRONG.   If your model is underfitting the training set, then obtaining more data is likely to help.
+3. CORRECT. Using a very large training set makes it unlikely for model to overfit the training data.
+4. WRONG.   It is a good idea to spend a lot of time collecting a large amount of data before building your first version of a learning algorithm.
+5. WRONG.   After training a logistic regression classifier, you must use 0.5 as your threshold for predicting whether an example is positive or negative.
diff --git a/quiz/week7.txt b/quiz/week7.txt
@@ -0,0 +1,52 @@
+Machine Learning Support Vector Machines Quiz.
+===============================================================================================================================================================
+Question 1.
+Suppose you have trained an SVM classifier with a Gaussian kernel, and it learned the following decision boundary on the training set:
+http://spark-public.s3.amazonaws.com/ml/images/12.1-b.jpg
+When you measure the SVM's performance on a cross validation set, it does poorly. Should you try increasing or decreasing C? Increasing or decreasing sigma^2?
+
+A. WRONG.   It would be reasonable to try decreasing C. It would also be reasonable to try decreasing sigma^2.
+A. WRONG.   It would be reasonable to try increasing C. It would also be reasonable to try increasing sigma^2.
+A. WRONG.   It would be reasonable to try increasing C. It would also be reasonable to try decreasing sigma^2.
+A. CORRECT. It would be reasonable to try decreasing C. It would also be reasonable to try increasing sigma^2.
+
+===============================================================================================================================================================
+Question 2.
+The figure below shows a plot of f1=similarity(x,l(1)) when sigma^2=1.
+http://spark-public.s3.amazonaws.com/ml/images/12.2-question.jpg
+Which of the following is a plot of f1 when sigma^2=1=0.25?
+
+A. CORRECT. http://spark-public.s3.amazonaws.com/ml/images/12.2-b.jpg
+A. WRONG.   http://spark-public.s3.amazonaws.com/ml/images/12.2-a.jpg
+A. WRONG.   http://spark-public.s3.amazonaws.com/ml/images/12.2-c.jpg
+A. WRONG.   http://spark-public.s3.amazonaws.com/ml/images/12.2-d.jpg
+
+===============================================================================================================================================================
+Question 3.
+This first term(of SVM min function) will be zero if two of the following four conditions hold true. 
+Which are the two conditions that would guarantee that this term equals zero?
+
+A. CORRECT. For every example with y(i)=1, we have that thata^T * x(i) >= 1.
+A. WRONG.   For every example with y(i)=0, we have that thata^T * x(i) <= 0.
+A. CORRECT. For every example with y(i)=0, we have that thata^T * x(i) <= -1.
+A. WRONG.   For every example with y(i)=1, we have that thata^T * x(i) >= 0.
+
+===============================================================================================================================================================
+Question 4.
+Suppose you have a dataset with n = 10 features and m = 5000 examples.
+After training your logistic regression classifier with gradient descent, you find that it has underfit the training set and does not achieve the desired performance on the training or cross validation sets.
+Which of the following might be promising steps to take? Check all that apply.
+
+A. CORRECT. Try using a neural network with a large number of hidden units.
+A. WRONG.   Use a different optimization method since using gradient descent to train logistic regression might result in a local minimum.
+A. CORRECT. Create / add new polynomial features.
+A. WRONG.   Reduce the number of examples in the training set.
+
+===============================================================================================================================================================
+Question 5.
+Which of the following statements are true? Check all that apply.
+
+A. WRONG.   If you are training multi-class SVMs with the one-vs-all method, it is not possible to use a kernel.
+A. CORRECT. The maximum value of the Gaussian kernel (i.e., sim(x,l(1))) is 1.
+A. WRONG.   If the data are linearly separable, an SVM using a linear kernel will return the same parameters Theta regardless of the chosen value of C (i.e., the resulting value of Theta does not depend on C).
+A. CORRECT. Suppose you have 2D input examples (ie, x(i) is R2). The decision boundary of the SVM (with the linear kernel) is a straight line.
diff --git a/quiz/week8_1.txt b/quiz/week8_1.txt
@@ -0,0 +1,47 @@
+Machine Learning Unsupervised Learning Quiz.
+===============================================================================================================================================================
+Question 1.
+For which of the following tasks might K-means clustering be a suitable algorithm? Select all that apply.
+
+A. CORRECT. Given a set of news articles from many different news websites, find out what are the main topics covered.
+A. WRONG.   Given many emails, you want to determine if they are Spam or Non-Spam emails.
+A. WRONG.   Given historical weather records, predict if tomorrow's weather will be sunny or rainy.
+A. CORRECT. From the user usage patterns on a website, figure out what different groups of users exist.
+
+===============================================================================================================================================================
+Question 2.
+Suppose we have three cluster centroids mu1=[1,2], mu2=[-3,0] and mu3=[4,2]. Furthermore, we have a training example x(i)=[3,1]. 
+After a cluster assignment step, what will c(i) be?
+
+A. WRONG.   c(i) = 2
+A. WRONG.   c(i) is not assigned
+A. WRONG.   c(i) = 1
+A. CORRECT. c(i) = 3
+
+===============================================================================================================================================================
+Question 3.
+K-means is an iterative algorithm, and two of the following steps are repeatedly carried out in its inner-loop. Which two?
+
+A. WRONG.   The cluster centroid assignment step, where each cluster centroid mu_i is assigned (by setting c(i)) to the closest training example x(i).
+A. WRONG.   Move each cluster centroid mu_k, by setting it to be equal to the closest training example x(i)
+A. CORRECT. Move the cluster centroids, where the centroids mu_k are updated.
+A. CORRECT. The cluster assignment step, where the parameters c(i) are updated.
+
+===============================================================================================================================================================
+Question 4.
+Suppose you have an unlabeled dataset {x(1),�,x(m)}. You run K-means with 50 different random initializations, and obtain 50 different clusterings of the
+data. What is the recommended way for choosing which one of these 50 clusterings to use?
+
+A. WRONG.   The only way to do so is if we also have labels y(i) for our data.
+A. CORRECT. For each of the clusterings, compute 1/m*sum(||x(i)-mu_c(i)||^2) for i=1,m, and pick the one that minimizes this.
+A. WRONG.   The answer is ambiguous, and there is no good way of choosing.
+A. WRONG.   Always pick the final (50th) clustering found, since by that time it is more likely to have converged to a good solution.
+
+===============================================================================================================================================================
+Question 5.
+Which of the following statements are true? Select all that apply.
+
+A. WRONG.   Since K-Means is an unsupervised learning algorithm, it cannot overfit the data, and thus it is always better to have as large a number of clusters as is computationally feasible.
+A. CORRECT. If we are worried about K-means getting stuck in bad local optima, one way to ameliorate (reduce) this problem is if we try using multiple random initializations.
+A. CORRECT. For some datasets, the "right" or "correct" value of K (the number of clusters) can be ambiguous, and hard even for a human expert looking carefully at the data to decide.
+A. WRONG.   The standard way of initializing K-means is setting mu_1=...=mu_k to be equal to a vector of zeros.