# Machine Learning Literacy

## First Try

1. Which techniques are unsupervised machine learning techniques?
    - [x] Clustering and dimensionality reduction


2. What is a characteristic of the random forest classification algorithm?
    - [ ] Certain coefficients are forced to be zero.
    - [x] It is insensitive to highly correlated variables.
    - [ ] It calculates Manhattan distances.
    - [ ] It is prone to overfitting.


3. What kind of a clustering algorithm is DBSCAN?
    - [x] Density-based
    - [ ] Distribution-based
    - [ ] Centroid-based
    - [ ] Hierarchical


4. Which is an example of a classification model that is sensitive to variable magnitudes?
    - [ ] Tree-based models
    - [ ] Naive Bayes
    - [ ] LDA
    - [x] <span style='background:yellow'>KNN</span>


5. You have a stable model deployed in production. The data scientist provides you with an alternative model to replace the one already deployed. How will you ensure that the new model does not cause any losses?
    - [ ] Compare the train set performance of both models and replace the older model if the newer one has better metrics.
    - [ ] Compare the test set performance of both models and replace the older model if the newer one has better metrics.
    - [x] Do a staged roll out of the new model by making it available to a small set of inputs, increasing it as the performance proves good.
    - [ ] Replace the older model with the newer one. Use it until there is as much model performance data as in the test dataset.
    

6. Which condition would guarantee that stochastic gradient descent will converge to the global optimum by minimizing a fully convex loss function `f(x)` with step size "n" at iteration "t"?
    - [x] Decreasing step size `n = 1/sqrt(t)`
    - [ ] Constant step size "n"
    - [ ] Step size "n" should be less than 0, n<0
    - [ ] Decreasing step size `n=1/t^2`


7. You built your machine learning model using Synthetic Minority Oversampling Technique (SMOTE) to synthetically oversample your data to generate more samples, but your model still has an R-squared value of 49%. Assuming you applied the appropriate modeling technique for the problem, which best describes the poor model performance?
    - [ ] There are duplicate data points in the majority class while oversampling with SMOTE.
    - [ ] The oversampling technique is the reason the model is performing poorly.
    - [X] <span style='background:yellow'>The data is of poor quality and fails to represent the phenomena you are predicting.</span>
    - [ ] Hyperparameters are not tuned.


8. In Python's scikit-learn library, what is the output of a K-means clustering model's predict() function?
    - [ ] All the clusters to which a data point could belong
    - [ ] The probability of a data point belonging to each cluster
    - [x] <span style='background:yellow'>The cluster to which each data point belongs</span>
    - [ ] The distance of a data point from each centroid


9. Which function transforms the skewed distribution of a price variable?
![this is caption](img/transform_distribution.png)
    - [x] np.log(price+1)
    - [ ] np.exp(price)
    - [ ] np.log(1/price)
    - [ ] np.sqrt(price)
    

10. In a multi-GPU architecture, what will you do to achieve data parallelism in a neural network?
    - [ ] Use the mini-batches but different weights on each GPU.
    - [ ] Use the mini-batches but different layers on each GPU.
    - [x] Use the same weights but split the mini-batches to each GPU.
    - [ ] Use the mini-batches but different nodes on each GPU.


11. After sampling from the predictions that your model has produced on the newest batch of data, you notice a reduction in model performance by more than 10%. What is a common reason for this?
    - [ ] You are only reviewing a sample of the predictions
    - [ ] Poor model prediction at the time the model was put in production
    - [x] Model drift (Concept drift)
    - [ ] Model selection


12. On a complex dataset with a lot of features, why do random forests generally outperform decision trees by a good margin, even though they are built out of decision trees?
    - [ ] Random forests reduce overfitting by training a decision tree separately on different parts of the data and deriving an average of all decision trees.
    - [ ] Random forests reduce overfitting by training different decision trees for different random subsets of data and use the best performing tree on the whole data.
    - [ ] Random forests reduce overfitting by training multiple decision trees on the data and picking the best.
    - [x] Random forests reduce overfitting by combining decision tree results trained on random features of random subsets of data.

13. Your regression model's performance is not very good and you tried tuning the parameters. How will you decide if using LASSO regression can help you improve the model's performance?
    - [ ] When there aren't enough features and the regression coefficients are too small
    - [ ] When the input data has too many outliers and there is no correlation between the variables
    - [ ] When the model is underfitting or there are too many categorical features
    - [x] <span style='background:yellow'>When you require automatic feature selection or are dealing with highly correlated predictors</span>


14. Based on the summary statistics, which method should you apply to replace missing values of the dataframe df with its relevant tendency measures?
![this is caption](img/median.png)

```python
df[[X1,X2,X3]].apply(lambda x: x.fillna(x.mean()),axis=0) and 
df[[X4]].apply(lambda x: x.fillna(x.median()),axis=0)
```


15. Xi, Xj, Yi,Yj are points in higher and lower dimension. The similarity between Xi and Xj is the conditional probability p (j|i), and for Yi and Yj it is q (j|i). Which must be true for a perfect representation of Xi and Xj in a lower dimensional space?
    - [x] p (j|i) = q (j|i)
    - [ ] p (j|i) > q (j|i)
    - [ ] p (j|i) = 0 and q (j|i) = 1
    - [ ] p (j|i) < q (j|i)
    
16. What is canary testing of a model?
    - [ ] Deploying a small part of the model before full model release
    - [ ] Testing the model for its response times under high stress conditions
    - [ ] Testing a model for its scalability under high traffic conditions
    - [x] Deploying a model for a fraction of the users before full release


17. What is the difference between traditional neural networks and deep neural networks?
    - [ ] Deep neural networks are used only for computer vision, whereas traditional neural networks are used only for smaller problems.
    - [ ] Traditional neural networks are used for fewer classes than deep neural networks.
    - [ ] Deep neural networks require more data for training than traditional neural networks.
    - [x] <span style='background:yellow'>Deep neural networks can have loops, whereas traditional neural networks are always feed-forward.</span>


18. How do you implement algorithm stacking to create a machine learning model?
    - [x] Use several models on the entire dataset and use a meta-model on the results.
    - [ ] Use several models on the entire data set and use the average of the results.
    - [ ] Test several models on the entire data set and use the model with the best validation performance.
    - [ ] Use different models on different subsets of the data and use the average of all results.