##### 1. What exactly is a feature? Give an example to illustrate your point ?

**Ans:** In machine learning, a feature is a measurable property or characteristic of a phenomenon. Features are usually numeric, but structural features such as strings and graphs are used in syntactic pattern recognition. The concept of "feature" is related to that of explanatory variable used in statistical techniques such as linear regression.

For example, in a dataset of customer orders, a feature could be the customer's age, the customer's gender, or the customer's zip code. Features are used by machine learning algorithms to learn about the data and to make predictions.

Here is an example of how features are used in machine learning. Let's say we have a dataset of customer orders and we want to build a machine learning model that can predict whether a customer will churn (cancel their subscription). We could use the following features to train our model:

- Customer age
- Customer gender
- Customer zip code
- Customer number of orders
- Customer average order value

Once our model is trained, we can use it to predict whether a new customer is likely to churn. For example, if a new customer is 25 years old, female, lives in a high-income zip code, and has placed 10 orders with an average order value of $100, our model might predict that they are unlikely to churn.

Features are an important part of machine learning. By choosing the right features, we can improve the accuracy of our machine learning models.

##### 2. What are the various circumstances in which feature construction is required ?

**Ans:** Feature construction is the process of creating new features from existing features in a dataset. It is often used in machine learning to improve the accuracy of models. There are many different circumstances in which feature construction is required. Some of the most common circumstances include:

- **When the original features are not informative enough.** This can happen if the original features are not relevant to the target variable, or if they are not precise enough. For example, if you are trying to predict whether a customer will churn, the original features might not be detailed enough to capture all of the factors that might influence a customer's decision to churn.
- **When the original features are correlated.** This can happen if the original features are measuring the same thing, or if they are measuring two things that are closely related. For example, if you are trying to predict whether a customer will buy a product, the original features might include the customer's income and the customer's age. However, income and age are correlated, so including both features in the model might not improve the accuracy of the model.
- **When the original features are missing.** This can happen if the data was not collected properly, or if some of the data was lost. For example, if you are trying to predict whether a customer will default on a loan, the original features might include the customer's income and the customer's credit score. However, if some of the data is missing, you might not be able to include these features in the model.

Feature construction can be a complex process, but it can be a valuable tool for improving the accuracy of machine learning models.

Here are some of the most common methods of feature construction:

**Feature extraction:** This is the process of creating new features from existing features by using statistical methods. For example, you could use feature extraction to create a new feature that represents the average of two existing features.
**Feature transformation:** This is the process of changing the values of existing features to make them more useful for machine learning algorithms. For example, you could use feature transformation to normalize the values of features so that they have a similar scale.
**Feature selection:** This is the process of selecting a subset of existing features that are most useful for machine learning algorithms. For example, you could use feature selection to select the features that are most correlated with the target variable.

The best method of feature construction for a particular problem will depend on the specific data and the machine learning algorithm that is being used. However, in general, feature construction can be a valuable tool for improving the accuracy of machine learning models.

##### 3. Describe how nominal variables are encoded ?

**Ans:** Nominal variables are encoded by assigning each category a unique integer value. This allows machine learning algorithms to understand the relationships between the different categories. There are two common ways to encode nominal variables:

**Label encoding:** This is the simplest method of encoding nominal variables. Each category is assigned a unique integer value, starting from 0. 

**One-hot encoding:** This is a more complex method of encoding nominal variables. For each category, a new binary variable is created. The binary variable is set to 1 if the category is present, and 0 if it is not present. 

The choice of which method to use depends on the machine learning algorithm that is being used. Some algorithms, such as decision trees, can handle nominal variables directly. Other algorithms, such as support vector machines, require the nominal variables to be encoded.

Here are some of the benefits of encoding nominal variables:

- It allows machine learning algorithms to understand the relationships between the different categories.
- It can improve the accuracy of machine learning models.
- It can make the data more understandable to humans.

Here are some of the drawbacks of encoding nominal variables:
- It can increase the size of the dataset.
- It can introduce bias into the data.
- It can make the data less understandable to humans.
- In general, encoding nominal variables is a good practice that can improve the accuracy of machine learning models. However, it is important to weigh the benefits and drawbacks of encoding before making a decision.

##### 4. Describe how numeric features are converted to categorical features ?

**Ans:** Numeric Features can be converted to Categorical Features using Binning. Discretization: It is the process of transforming continuous variables into categorical variables by creating a set of intervals, which are contiguous, that span over the range of the variable’s values. It is also known as “Binning”, where the bin is an analogous name for an interval.

Benefits of this method are:
1. Handles the Outliers in a better way.
2. Improves the value spread.
3. Minimize the effects of small observation errors.

 ables.

 
Techniques to Encode Numerical Columns:

(a) Equal width binning: It is also known as “Uniform Binning” since the width of all the intervals is the same. The algorithm divides the data into N intervals of equal size. The width of intervals is:

   w=(max-min)/N

Therefore, the interval boundaries are:[min+w], [min+2w], [min+3w],..., [min+(N-1)w] where, min and max are the minimum and maximum value from the data respectively. This technique does not changes the spread of the data but does handle the outliers.

(b) Equal frequency binning: It is also known as “Quantile Binning”. The algorithm divides the data into N groups where each group contains approximately the same number of values.

Consider, we want 10 bins, that is each interval contains 10% of the total observations. Here the width of the interval need not necessarily be equal.
Handles outliers better than the previous method and makes the value spread approximately uniform(each interval contains almost the same number of values).

(c) K-means binning: This technique uses the clustering algorithm namely ” K-Means Algorithm”. This technique is mostly used when our data is in the form of clusters.


##### 5. Describe the feature selection wrapper approach. State the advantages and disadvantages of this approach ?

**Ans:** Wrapper methods measure the “usefulness” of features based on the classifier performance. In contrast, the filter methods pick up the intrinsic properties of the features (i.e., the “relevance” of the features) measured via univariate statistics instead of cross-validation performance.

The wrapper classification algorithms with joint dimensionality reduction and classification can also be used but these methods have high computation cost, lower discriminative power. Moreover, these methods depend on the efficient selection of classifiers for obtaining high accuracy.

**Most commonly used techniques under wrapper methods are:**

1.**Forward selection**: In forward selection, we start with a null model and then start fitting the model with each individual feature one at a time and select the feature with the minimum p-value. Now fit a model with two features by trying combinations of the earlier selected feature with all other remaining features. Again select the feature with the minimum p-value. Now fit a model with three features by trying combinations of two previously selected features with other remaining features. Repeat this process until we have a set of selected features with a p-value of individual features less than the significance level.

2.**Backward elimination**: In backward elimination, we start with the full model (including all the independent variables) and then remove the insignificant feature with the highest p-value(> significance level). This process repeats again and again until we have the final set of significant features

3.**Bi-directional elimination(Stepwise Selection):** It is similar to forward selection but the difference is while adding a new feature it also checks the significance of already added features and if it finds any of the already selected features insignificant then it simply removes that particular feature through backward elimination. Hence, It is a combination of forward selection and backward elimination.


"\"![0_V0GyOt3LoDVfY7y5.png](https://editor.analyticsvidhya.com/uploads/46072IMAGE2.gif)\""



##### 6. When is a feature considered irrelevant? What can be said to quantify it ?

**Ans:** Features are considered relevant if they are either strongly or weakly relevant, and are considered irrelevant otherwise. 

Irrelevant features can never contribute to prediction accuracy, by definition. Also to quantify it we need to first check the list of features, There are three types of feature selection:

- **Wrapper methods** (forward, backward, and stepwise selection)
- **Filter methods** (ANOVA, Pearson correlation, variance thresholding)
- **Embedded methods** (Lasso, Ridge, Decision Tree).

p-value greater than 0.05 means that the feature is insignificant.

##### 7. When is a function considered redundant? What criteria are used to identify features that could be redundant ?

**Ans:** If two features `{X1, X2}` are highly correlated, then the two features become redundant features since they have same information in terms of correlation measure. In other words, the correlation measure provides statistical association between any given a pair of features. 

Minimum redundancy feature selection is an algorithm frequently used in a method to accurately identify characteristics of genes and phenotypes

"\"![0_V0GyOt3LoDVfY7y5.png](https://slideplayer.com/slide/4394644/14/images/3/Background+Relevance+between+features+Correlation+F-statistic.jpg)\""


##### 8. What are the various distance measurements used to determine feature similarity ?

**Ans:** Four of the most commonly used distance measures in machine learning are as follows: 
- Hamming Distance: Hamming distance calculates the distance between two binary vectors, also referred to as binary strings or bitstrings for short.

- Euclidean Distance: Calculates the distance between two real-valued vectors.
- Manhattan Distance: Also called the Taxicab distance or the City Block distance, calculates the distance between two real-valued vectors.

- Minkowski Distance: Minkowski distance calculates the distance between two real-valued vectors. It is a generalization of the Euclidean and Manhattan distance measures and adds a parameter, called the “order” or “p“, that allows different distance measures to be calculated.



##### 9. State difference between Euclidean and Manhattan distances ?

**Ans:** Euclidean & Hamming distances are used to measure similarity or dissimilarity between two sequences. Euclidean distance is extensively applied in analysis of convolutional codes and Trellis codes.

Euclidean distance is the shortest path between source and destination which is a straight line as shown in Figure 1.3. but Manhattan distance is sum of all the real distances between source(s) and destination(d) and each distance are always the straight lines

"\"![0_V0GyOt3LoDVfY7y5.png](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQfHj0UiFFvu4iz6l4sjinKw11BtBgjpAjiTw&usqp=CAU)\""


##### 10. Distinguish between feature transformation and feature selection ?

**Ans:** Feature transformation and feature selection are two important techniques used in machine learning to improve the performance of machine learning models.

Feature transformation is the process of changing the values of features in a dataset in order to make them more useful for machine learning algorithms. This can be done by rescaling, normalizing, or converting features to different data types. Feature transformation can help to improve the accuracy of machine learning models by making the features more linearly separable and by reducing the impact of outliers.

Feature selection is the process of selecting a subset of features from a dataset in order to improve the performance of machine learning models. This can be done by using statistical methods to rank features based on their importance or by using machine learning algorithms to select features that are most predictive of the target variable. Feature selection can help to improve the accuracy of machine learning models by reducing the number of features that the model has to learn, which can make the model more generalizable to new data.

The main difference between feature transformation and feature selection is that feature transformation changes the values of features, while feature selection removes features from the dataset. Feature transformation can be used in conjunction with feature selection to improve the performance of machine learning models.\

![Differences](attachment:image.png)