# Feature Engineering


## First Try

1. You are testing the null hypothesis for several groups in a given dataset using analysis of variance (ANOVA) and your calculation of the F-statistic is 12. What information is missing in order to reject the null hypothesis or not?
    - [ ] There is no missing information, any F > 0 indicates the null hypothesis should be rejected.
    - [ ] The between and within sum of squares, and the selected critical value alpha.
    - [ ] There is no missing information, any F > 1 indicates the null hypothesis should be rejected.
    - [x] <span style='background:yellow'>The degrees of freedom of the between and within variabilities, and the selected critical value alpha.</span>


2. You are implementing a nearest neighbor classifier on a dataset whose features need to be scaled. How can you use the MinMaxScaler method to achieve this task?

```python
sklearn.preprocessing.MinMaxScaler().fit_transform(features)
```


3. What is a text corpus?
    - [ ] The body of a text (as opposed to the introduction and conclusion).
    - [ ] A subset of documents resulting from a query in a system.
    - [ ] The structure of a given text.
    - [x] <span style='background:yellow'>The entire set of documents involved in a system.</span>


4. The principal component in the principal component analysis (PCA) procedure has the largest what?
    - [ ] Entropy
    - [ ] Mean
    - [x] Variance
    - [ ] Significance


5. Which of the following algorithms can be used to assess the following relationship: King is to Queen as Husband is to Wife?
    - [ ] Bag-of-n-grams
    - [x] <span style='background:yellow'>Word2Vec</span>
    - [ ] Parts-of-speech
    - [ ] Bag-of-Words


6. What are stop words?
    - [ ] The final word of every sentence.
    - [ ] Very rare words in the text corpus.
    - [ ] The word that separates two topics in the text.
    - [x] <span style='background:yellow'>Very common words in the language (e.g. the, a, is).</span>


7. What do Parts-of-speech (POS) tags identify?
    - [x] <span style='background:yellow'>The role a word has in a particular sentence (i.e. the noun, article, conjunction, etc.).</span>
    - [ ] The role a word performs in a sentence (i.e. obscure can be a noun and a verb).
    - [ ] The role a sentence has in a text (i.e. topic, support, transition).
    - [ ] The logical propositions given by the text (i.e. P or Q > R ).


8. You have an array of given values: `['#$%', 'ALpd', '123', 89]` How can you label encode this data using Scikit-Learn method?
    - `sklearn.preprocessing.LabelEncoder().fit_transform(['#$%', 'ALpd', '123', 89])`


9. You have a coordinate vector of: [4, 5] What is the L1 norm of this vector?
    - 9


10. You are using an autoencoder for input x. After a forward pass to compute activations of all the hidden layers and obtain an output x', what is the next step?
    - [ ] Use a softmax layer to perform feature selection.
    - [ ] Use max-pooling to reduce dimensionality.
    - [ ] Compute the sigmoid of x' and subtract x from it.
    - [x] Measure the error that deviates the output x' from the input x.


11. When using a bag-of-words (BoW) representation of a document for sentiment analysis, many of the predictions fail to take into account the effect of negations. This results in misclassifications such as, "I'm not very happy about this" being assigned a positive sentiment. What is the most likely explanation for this?
    - [ ] The dataset is clearly mislabeled; BoW representations should work in this scenario.
    - [ ] There are too few samples that include negations in the dataset.
    - [x] BoW representations cannot model the role of the word in sentences.
    - [ ] BoW representations cannot be used effectively for tasks involving classification.


12. Given the ngrams(text, n) function below, which statement is the correct option to generate n-grams from a given text?
```python
def ngrams(text, n):
    words = text.split(' ')
    output = []
    output_len = len(words)-n+1
    for i in range(output_len):
        # select the correct line for this line
        output.append(' '.join(words[i:i+n]))
    return output
```


13. What is model stacking?
    - [ ] To use several models in parallel and average the outputs as the result.
    - [ ] A training optimization technique that reuses the weights of previously trained models for similar architectures.
    - [x] To use the output of a model as the input of another.
    - [ ] To use several models in parallel and use the maximum output as the result.

**Model stacking** is an efficient ensemble method in which the predictions, generated by using various machine learning algorithms, are used as inputs in a second-layer learning algorithm. This second-layer algorithm is trained to optimally combine the model predictions to form a new set of predictions.


14. You have a dataset with unbalanced categorical data, leading to low accuracy. What can you do to improve the accuracy of the model?
    - [ ] Redistribute observations between the training and test data to remove unidentified bias.
    - [ ] Normalize numerical values to fall between -1 and 1, to prevent features with high values from being dominant.
    - [x] <span style='background:yellow'>Resample the data to balance out the categories.</span>
    - [ ] Distribute observations randomly across the dataset.


15. You calculate between group variability V_b and within group variability V_w. What is the next step in a classical analysis of variance (ANOVA) test?
    - [ ] Compute F = V_b - V_w
    - [ ] Compute F = V_b * V_w
    - [ ] Compute F = V_b + V_w
    - [x] Compute F = V_b/V_w


16. Which of the following statements correctly explains the rationale behind feature selection using a variance threshold?
    - [ ] Features with values close to zero carry little information.
    - [ ] Features with a mean value close to zero carry little information.
    - [ ] Features spanning a short range carry little information.
    - [x] <span style='background:yellow'>Features with little changes in the data carry little information.</span>


17. In a bin counting scheme, what happens if data from a feature doesn't falls into any of the existing bins?
    - [ ] The data is assigned to the bin with the highest number of values.
    - [x] The data is assigned to a garbage bin.
    - [ ] The data is assigned to the bin with the lowest number of values.
    - [ ] The data is mapped to the last bin.


18. Consider the task of counting the number of livestock in aerial images of farmlands. If the machine learning model that detects animals is only capable of dealing with images where the animal is in the center, what is a very important step in the pre-processing stage of the task?
    - [ ] Transform the image so that all animals are centered and in different channels before sending it to the model.
    - [x] <span style='background:yellow'>Assigning bounding boxes to all objects so that the object is centered in its box. Individual boxes should be used as input to the model.</span>
    - [ ] Cut the image in squares of equal size. The input of the model should be the individual squares.
    - [ ] Removing the background of the image, keeping only the pixels with animals in it.