# **Feature Engineering**

1. **What is a parameter?**

  Ans. A parameter typically refers to a setting or a configuration option within a specific feature engineering technique. These parameters control how the transformation or creation of features is performed.
---

2. **What is correlation? What does negative correlation mean?**

  Ans. In feature engineering, correlation helps us understand the relationship between different features (variables) in our dataset. It tells us how strongly two features are related to each other. The correlation value ranges from -1 to 1:

  - 1: Strong positive correlation. If one feature increases, the other also increases.
  - 0: No correlation. The features are not linearly related.
  - -1: Strong negative correlation. If one feature increases, the other decreases.

  A negative correlation, in the context of feature engineering, means that two features move in opposite directions. If one feature's value increases, the other feature's value tends to decrease. It's important to note that a negative correlation is just as informative as a positive correlation. Both indicate a strong relationship, just in different directions.
---

3. **Define Machine Learning. What are the main components in Machine Learning?**

  Ans. Machine Learning is a process where algorithms learn patterns and relationships from engineered features to make predictions or decisions without being explicitly programmed. The quality and relevance of these engineered features heavily influence the performance of the learning algorithm. Essentially, machine learning algorithms consume our carefully crafted features to build models.

  The main components in machine learning are:

  - Data (and crucially, the Features!): This is the foundation. Machine learning algorithms learn from data. However, for effective learning, this data needs to be represented in a meaningful way through features. Our role in feature engineering is to transform raw data into a set of informative features that the algorithm can easily understand and learn from. This involves:
    - Feature Extraction: Deriving initial features from raw data.
    - Feature Transformation: Scaling, normalizing, encoding, etc., to make features suitable for the algorithm.
    - Feature Creation: Constructing new features from existing ones that might capture underlying patterns better.

  - The Learning Algorithm: This is the engine that learns from the provided features. Different algorithms (like linear regression, decision trees, neural networks) have different inductive biases and are suited for different types of tasks and data characteristics. The choice of algorithm often depends on the nature of the problem and the characteristics of the engineered features.

  - A Model: This is the output of the learning process. It's the learned representation of the relationships within the engineered features that can be used to make predictions or decisions on new, unseen data (which will also need to have the same feature engineering applied to it).

  - Evaluation: We need to assess how well our model is performing. This is done by feeding it unseen data (with the same feature engineering applied) and comparing its predictions to the actual values. The choice of evaluation metrics depends on the problem type (e.g., accuracy, precision, recall for classification; RMSE, MAE for regression). The insights from evaluation often guide us back to further feature engineering to improve performance.
---

4. **How does loss value help in determining whether the model is good or not?**

  Ans. The loss value in machine learning quantifies how well our model's predictions align with the actual target values. Essentially, it's a measure of how "wrong" the model is. A lower loss value generally indicates that the model is making predictions closer to the true values.

  The loss value helps in determining if the model is good or not in the following ways:

  - Direct Measure of Error: The loss function is designed to penalize the model for incorrect predictions. A high loss means the model's predictions are far off, suggesting it's not a good model (yet). Conversely, a low loss suggests the predictions are close to the actual values.

  - Guidance for Learning: During the training process, the loss value is used by optimization algorithms (like gradient descent) to adjust the model's parameters (weights). The goal is to minimize this loss. If the loss consistently decreases during training, it indicates that the model is learning from the engineered features and improving its predictions.

  - Comparison of Models: We can train different models (potentially with different sets of engineered features) on the same task and compare their loss values on the same data (e.g., a validation set). The model with the lower loss is generally considered better at that point.

  - Identifying Overfitting/Underfitting:

    - If the loss on the training data is very low, but the loss on a separate validation set is high (or starts increasing), it might indicate overfitting. This means the model has learned the training data too well (including the noise) and doesn't generalize well to new, unseen data. Our engineered features might be too complex or specific to the training set.
    - If the loss is high on both the training and validation sets, it might indicate underfitting. This means the model hasn't learned the underlying patterns in the data well enough. Our engineered features might not be informative enough.
---

5. **What are continuous and categorical variables?**

  Ans.
  
  Continuous Variables:

  - Continuous variables are numerical variables that can take on any value within a given range. These values can usually be measured with high precision and can include fractions or decimals.

  - Feature Engineering Implications: Continuous variables are often scaled (e.g., standardization, normalization), transformed (e.g., log transform, polynomial features), or discretized (binned) to create new features.
  - Examples: Temperature, height, weight, time, income, blood pressure.

  Categorical Variables:

  - Categorical variables represent qualities or characteristics. They take on a limited and usually fixed number of possible values, which are often labels or names.

  - Feature Engineering Implications: Machine learning models typically work best with numerical input. Therefore, categorical variables need to be encoded into numerical representations. Common techniques include:
    - One-Hot Encoding: Creating binary columns for each category.
    - Label Encoding: Assigning a unique numerical label to each category.
    - Ordinal Encoding: For ordered categories, assigning numerical labels that respect the order.
  - Examples: Gender (Male, Female, Other), Color (Red, Green, Blue), City (New York, London, Tokyo), Education Level (High School, Bachelor's, Master's).
---

6. **How do we handle categorical variables in Machine Learning? What are the common techniques?**

  Ans. We handle categorical variables in Machine Learning in the following ways:

  - One-Hot Encoding:

    - For each unique category in the variable, a new binary (0 or 1) column is created.
    - If an observation belongs to that category, the corresponding column gets a 1, otherwise 0.
    - When to use: Typically used for nominal categorical variables (where there's no inherent order between categories), like colors, city names, or types of animals.
    - Feature Engineering Consideration: One-hot encoding can significantly increase the dimensionality of the dataset if the categorical variable has many unique categories. We might need to consider techniques to handle this, such as grouping less frequent categories or using more advanced encoding methods.

  - Label Encoding:

    - Each unique category is assigned an integer. For example, "Red" might become 0, "Green" become 1, and "Blue" become 2.

    - When to use: Primarily for ordinal categorical variables (where there's a meaningful order), like "Low," "Medium," "High," or education levels. It can sometimes be used for nominal variables with a small number of categories, but be cautious as it can introduce an artificial ordinality that the model might incorrectly interpret.
    - Feature Engineering Consideration: For nominal variables, label encoding might not be the best choice as the numerical labels can be misinterpreted as having an order.

  - Ordinal Encoding:

    - Similar to label encoding, but we explicitly define the mapping of categories to numerical values based on their order. For example, "Low" could be mapped to 1, "Medium" to 2, and "High" to 3.
    - When to use: Specifically for ordinal categorical variables where the order is important for the model to understand.
    - Feature Engineering Consideration: Requires careful definition of the order of categories.

  - Binary Encoding:

    - Categories are first assigned an integer, and then those integers are converted into binary code. Each bit in the binary representation becomes a new feature.
    - When to use: Can be more space-efficient than one-hot encoding for categorical variables with a high number of unique categories.
    - Feature Engineering Consideration: Introduces fewer new features than one-hot encoding for high cardinality variables.

  - Target Encoding (Mean Encoding):

    - For each category, we calculate the mean of the target variable for the observations belonging to that category and use this mean as the encoded value.
    - When to use: Can be effective, especially for high cardinality categorical variables.
    - Feature Engineering Consideration: Prone to overfitting if not handled carefully (e.g., using techniques like cross-validation or adding smoothing).

  - Hashing:

    - Categories are hashed into a lower-dimensional space.
    - When to use: Useful for very high cardinality categorical variables.
    - Feature Engineering Consideration: Can lead to collisions (different categories mapping to the same hash value), which might lose some information. The number of hash features is a parameter to tune.
---

7. **What do you mean by training and testing a dataset?**

  Ans. Training a Dataset:

  - Purpose: The primary goal of training is to allow the machine learning algorithm to learn the underlying patterns and relationships within your engineered features that predict the target variable.
  - Process: We take a portion of our dataset (the training set) and feed it to the learning algorithm. The algorithm uses these examples to adjust its internal parameters (weights, biases, etc.) to minimize the loss function (the measure of how wrong its predictions are, as we discussed earlier). The quality of our engineered features directly impacts how effectively and efficiently the algorithm can learn during this phase. If our features are informative and relevant, the algorithm will likely converge to a good model faster and with better performance.
  - Feature Engineering Connection: During training, the model learns the significance of each of our engineered features in predicting the target. If we've created irrelevant or redundant features, they might confuse the training process or lead to a less optimal model.
  
  Once the model has learned from the training data, we need to see how well it can generalize to new, unseen data. This is where testing comes in.

  Testing a Dataset:

  - Purpose: The goal of testing is to evaluate the performance of the trained model on data it has never seen before. This gives us an estimate of how well the model is likely to perform in the real world.
  - Process: We take a separate portion of our dataset (the testing set) that was not used during training and feed the engineered features from this set into our trained model. We then compare the model's predictions on this test set to the actual target values to assess its performance using various metrics (e.g., accuracy, F1-score, RMSE).
  - Feature Engineering Connection: It's crucial that the same feature engineering steps applied to the training data are also applied to the testing data. If we engineer features differently for the training and testing sets, the model won't be evaluated fairly, as the input features will have different scales or representations. A good model should perform reasonably well on the test set, indicating that it has learned generalizable patterns from our engineered features rather than just memorizing the training data.
---

8. **What is sklearn.preprocessing?**

  Ans. sklearn.preprocessing is a module in the scikit-learn (sklearn) library in Python that provides a collection of utility functions and transformer classes. These tools are primarily used for feature scaling and encoding, which are fundamental steps in feature engineering to prepare data for machine learning models.
---

9. **What is a Test set?**

  Ans. A test set is a subset of your data that is held back and never used during the training process of your machine learning model. After the model has been trained on the training data (where we've applied our feature engineering), we use the test set to evaluate its performance on completely new, unseen data.
---

10. **How do we split data for model fitting (training and testing) in Python? How do you approach a Machine Learning problem?**

  Ans. The most common way to split data into training and testing sets in Python is using the train_test_split function from the sklearn.model_selection module.

  For Example:

        from sklearn.model_selection import train_test_split
        import pandas as pd

        # Assuming you have your features in 'X' (a pandas DataFrame or NumPy array)
        # and your target variable in 'y' (a pandas Series or NumPy array)

        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.2, random_state=42, stratify=None
        )

        print(f"Shape of X_train: {X_train.shape}")
        print(f"Shape of X_test: {X_test.shape}")
        print(f"Shape of y_train: {y_train.shape}")
        print(f"Shape of y_test: {y_test.shape}")

  How to Approach a Machine Learning Problem:

  Here's a general approach, emphasizing the role of feature engineering:

  - Understand the Problem and the Data:
      - What is the goal? (e.g., classification, regression).
      - What kind of data do we have? (structure, types of variables).
      - Initial data exploration (EDA) to understand distributions, relationships, and potential issues (missing values, outliers).

  - Feature Engineering: This is often the most critical step!
      - Handle Missing Values: Decide on a strategy (imputation, removal) based on the nature of the missing data.
      - Handle Outliers: Identify and decide how to treat them (removal, transformation).
      - Encode Categorical Variables: Choose appropriate encoding techniques (one-hot, label, etc.).
      - Scale Numerical Features: Apply scaling methods (standardization, normalization) as needed.
      - Create New Features: Derive potentially informative features from existing ones (e.g., polynomial features, interaction terms, ratios).
      - Feature Selection/Reduction: Identify and remove less relevant or redundant features.

  - Split the Data: Divide the data into training, validation (optional but recommended for hyperparameter tuning), and testing sets using train_test_split.

  - Model Selection: Choose one or more machine learning algorithms that are suitable for the problem type.

  - Model Training: Train the chosen model(s) using the engineered features from the training set.

  - Model Evaluation: Evaluate the performance of the trained model(s) on the test set using appropriate metrics.

  - Hyperparameter Tuning (and Iteration): If needed, tune the hyperparameters of the model (often using a validation set). This might also involve going back to the feature engineering step if the model's performance isn't satisfactory.

  - Deployment (if applicable): Once a satisfactory model is trained and evaluated, deploy it for making predictions on new, real-world data (remembering to apply the same feature engineering pipeline).
---

11. **Why do we have to perform EDA before fitting a model to the data?**

  Ans. We perform Exploratory Data Analysis (EDA) before fitting a model for several key reasons:

  - Understanding the Data Landscape: EDA allows us to get a feel for the data. We can see the distribution of each feature, identify the types of variables (continuous, categorical), and understand the range of values. This initial understanding guides our subsequent feature engineering choices. For example, knowing a feature is highly skewed might lead us to apply a log transformation.

  - Identifying Data Quality Issues: EDA helps us uncover potential problems like:
      - Missing Values: We can see how many missing values exist in each feature and their patterns, which informs our imputation strategies.
      - Outliers: Visualizations like box plots or scatter plots can reveal outliers that might need special handling (e.g., removal, transformation, or using robust scaling).
      - Inconsistencies: We might find unexpected values or inconsistencies in categorical variables (e.g., different spellings for the same category).

  - Gaining Insights into Relationships: EDA helps us understand the relationships between features and the target variable, as well as relationships between the features themselves.
      - Correlation: We can analyze correlations between numerical features, which might inform feature selection or the creation of interaction terms.
      - Relationship with Target: Visualizations (like scatter plots for regression or box plots for classification) can give us an initial idea of how different features relate to the target, guiding us on which features might be most predictive.

  - Informing Feature Engineering Decisions: The insights gained from EDA directly influence our feature engineering strategies. For instance:
      - If a categorical variable has many unique values, we might consider target encoding or hashing instead of one-hot encoding.
      - If a numerical feature has a non-linear relationship with the target (observed through a scatter plot), we might decide to create polynomial features.
      - If features are highly correlated, we might choose to keep only one of them or create a new feature that captures the combined information.

  - Making Assumptions Visible: Many machine learning models make certain assumptions about the data (e.g., linearity, normality). EDA can help us check if these assumptions are likely to be violated, which might influence our choice of model or the need for specific feature transformations.
---

12. **What is correlation?**

  Ans. correlation is a statistical measure that helps us understand the linear relationship between two of our engineered features (or between an engineered feature and the target variable).

  It tells us two main things:

  - Direction: Whether the features tend to move in the same direction (positive correlation) or opposite directions (negative correlation).
  - Strength: How strongly this linear relationship exists, ranging from -1 to +1.
      - A correlation close to +1 indicates a strong positive linear relationship: as one feature's value increases, the other tends to increase as well.
      - A correlation close to -1 indicates a strong negative linear relationship: as one feature's value increases, the other tends to decrease.
      - A correlation close to 0 suggests a weak or no linear relationship between the two features.
---

13. **What does negative correlation mean?**

  Ans. negative correlation between two features (or between a feature and the target) means that they tend to move in opposite directions linearly.

  Here's what that implies from a feature engineering standpoint:

  - Inverse Relationship: If feature A and feature B have a negative correlation, then as the values of feature A increase, the values of feature B tend to decrease, and vice versa.

  - Information Content: A strong negative correlation (close to -1) is just as informative as a strong positive correlation (close to +1). It indicates a strong linear relationship, just in the opposite direction.

  - Feature Selection Considerations: Similar to positive correlation, if two features are highly negatively correlated, they might be providing similar information (just inversely). In such cases, we might consider keeping only one of them to reduce redundancy in our set of engineered features.

  - Relationship with the Target: If an engineered feature has a negative correlation with the target variable, it means that as the feature's value increases, the target variable tends to decrease. This kind of relationship can still be very useful for a predictive model. For example, if we're predicting house prices, a feature like 'time since last renovation' might have a negative correlation with the price – the longer it's been since the renovation, the lower the price might tend to be.
---


14. **How can you find correlation between variables in Python?**

  Ans. In Python, the most common and straightforward way to find the correlation between variables is using the corr() method in the pandas library.

  Pandas corr() method also allows you to specify the method of correlation you want to calculate:
  - method='pearson' (default): Standard correlation coefficient that measures the linear relationship between two datasets.
  - method='kendall': Kendall Tau correlation coefficient. A measure of the correspondence between the ranking of the elements of two datasets.
  - method='spearman': Spearman rank correlation. A non-parametric measure of the rank correlation.
---

15. **What is causation? Explain difference between correlation and causation with an example.**

  Ans. Causation means that one event directly produces another event. If A causes B, then every time A happens, B will follow (assuming all other relevant conditions remain constant). It's a direct cause-and-effect relationship.

  Correlation, on the other hand, simply means that two or more variables tend to move together. This movement can be in the same direction (positive correlation) or in opposite directions (negative correlation). Importantly, correlation does not imply that one variable causes the other.

  Here's an example to illustrate the difference:

  Imagine we observe that ice cream sales and the number of reported drowning incidents both increase during the summer months. We would likely find a positive correlation between these two variables. However, it would be incorrect to conclude that increased ice cream sales cause more drownings, or vice versa.

  The likely cause for both of these trends is the warmer weather in the summer. More people go swimming (leading to more potential drownings), and more people buy ice cream to cool down. So, while ice cream sales and drownings are correlated, there isn't a direct causal link between them. The warmer weather is a confounding variable that influences both.

  In feature engineering, identifying correlations can be useful for understanding relationships between features. However, it's crucial not to assume causation based solely on correlation. If we incorrectly assume causation, we might make flawed decisions about which features to use or how to engineer new ones. Establishing causation typically requires more rigorous methods, such as controlled experiments.
---

16. **What is an Optimizer? What are different types of optimizers? Explain each with an example.**

  Ans. An optimizer is an algorithm used to adjust the parameters of your model (like the weights in a neural network) in order to minimize a loss function. Think of it like finding the lowest point in a landscape; the optimizer helps the model navigate this landscape to find the minimum error.

  There are several types of optimizers, each with its own approach. Here are a few common ones:

  - Gradient Descent (GD):
      - Explanation: This is the most basic optimizer. It iteratively moves the parameters in the direction of the negative gradient of the loss function. Imagine you're on a hill and want to get to the bottom; gradient descent tells you to take a step in the steepest downward direction.
      - Example:
            # Simplified illustration (not runnable directly in this context)
            learning_rate = 0.01
            weight = weight - learning_rate * derivative_of_loss_wrt_weight

      - Types: There are variations like Batch GD (uses the entire dataset for each update), Stochastic GD (SGD - uses a single data point for each update), and Mini-batch GD (uses a small batch of data).

  - Stochastic Gradient Descent (SGD):
      - Explanation: Instead of using the entire dataset to calculate the gradient (like in Batch GD), SGD picks a single random data point (or a small batch in Mini-batch SGD) to compute the gradient and update the parameters. This makes it faster for large datasets and can help escape local minima. However, the path to the minimum can be noisy.
      - Example:
              # Simplified illustration (not runnable directly in this context)
              for each_data_point in dataset:
                  gradient = compute_gradient(loss, model, data_point)
                  weights = weights - learning_rate * gradient

  - Adam (Adaptive Moment Estimation):
      - Explanation: Adam is a popular optimizer that combines the ideas of Momentum (which helps accelerate learning in the relevant direction and dampens oscillations) and RMSprop (which adapts the learning rates of each parameter). It computes adaptive learning rates for each parameter. This often leads to faster convergence and better performance across a wide range of problems without requiring extensive hyperparameter tuning.
      - Example: While the exact update rules are a bit more involved, conceptually, Adam adjusts the learning rate for each parameter based on the first and second moments of the gradients. Most deep learning libraries like TensorFlow and PyTorch have built-in Adam optimizers that you can easily use.
              # Example using TensorFlow (not runnable directly here)
              # optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
---

17. **What is sklearn.linear_model?**

  Ans. sklearn.linear_model is a module in the scikit-learn (often shortened to sklearn) library in Python. This module implements a variety of linear models for regression and classification tasks.

  In linear models, the output is assumed to be a linear combination of the input features. These models are often used for their simplicity and interpretability, as well as for providing a good baseline for more complex models.
---


18. **What does model.fit() do? What arguments must be given?**

  Ans. The model.fit() method is used to train the model using the provided training data. During this process, the model learns the underlying patterns in the data and adjusts its internal parameters (e.g., weights and biases in linear models or decision rules in tree-based models) to best map the input features to the target variable.

  The primary arguments that must be given to model.fit() are:

  - X: This represents the training data or the features. It's typically a 2D array-like structure (like a NumPy array or a pandas DataFrame) where each row corresponds to a single sample, and each column represents a feature.
  - y: This represents the target variable corresponding to the training data X. For supervised learning tasks (like regression and classification), y is a 1D array-like structure containing the target values for each sample in X.
---


19. **What does model.predict() do? What arguments must be given?**

  Ans. The model.predict() method takes your trained model and uses it to generate predictions for new input data. Based on what the model has learned during the fit() step, it will output the predicted target values.

  The argument that must be given to model.predict() is:

  - X: This represents the new data for which you want to make predictions. It should be a 2D array-like structure (similar to the X you used in model.fit()) where each row is a sample and each column corresponds to the features that the model was trained on. The number of features in this new data must generally match the number of features in the training data.
---

20. **What are continuous and categorical variables?**

  Ans. Continuous Variables:

  - These are variables that can take on any value within a given range. They are usually numeric and can have decimal or fractional values.
  - Think of measurements.
  
  Examples of continuous variables:

  - Height
  - Weight
  - Temperature
  - Time
  - Salary

  Categorical Variables:

  - These variables represent qualities or characteristics. They take on a limited, and usually fixed, number of possible values, which are often labels or names.
  - Think of groupings or classifications.

  Examples of categorical variables:

  - Color (e.g., red, blue, green)
  - Gender (e.g., male, female, other)
  - Country (e.g., USA, India, Japan)
  - Product category (e.g., electronics, clothing, books)
---

21. **What is feature scaling? How does it help in Machine Learning?**

  Ans. Feature scaling is a crucial preprocessing step in feature engineering where we transform the values of numerical features to a similar scale. Essentially, we make sure that no single feature dominates others simply because its values are much larger.

  How Feature Scaling Helps in Machine Learning:

  - Gradient Descent Convergence: Many machine learning algorithms, especially those using gradient descent (like linear regression, logistic regression, and neural networks), converge much faster when features are on a similar scale. If one feature has very large values, the gradients associated with its weights might also be large, leading to oscillations and a slower convergence to the optimal solution.

  - Distance-Based Algorithms: Algorithms that rely on distance calculations, such as k-Nearest Neighbors (KNN) and Support Vector Machines (SVM) with radial basis function (RBF) kernels, are highly sensitive to the scale of the features. Features with larger values can disproportionately influence the distance calculations, leading to suboptimal results. Scaling ensures that all features contribute more equally to the distance.

  - Regularization: Regularization techniques like L1 (Lasso) and L2 (Ridge) penalize the magnitude of the coefficients. If features have vastly different scales, the penalty might affect some features more than others unfairly. Scaling helps to ensure that the regularization is applied more uniformly.

  - Improved Model Interpretability (in some cases): While scaling doesn't directly improve interpretability for all models, in some linear models, after standardization, the magnitude of the coefficients can be more directly compared to assess the relative importance of the features (though this should be done cautiously).
---


22. **How do we perform scaling in Python?**

  Ans. We can perfrom scaling in Python using the scikit-learn library (sklearn). Specifically, the sklearn.preprocessing module provides tools for this.

  Here are a few common methods:

  - Min-Max Scaling (Normalization): This scales the data to a fixed range, usually between 0 and 1.
            from sklearn.preprocessing import MinMaxScaler
            import numpy as np

            data = np.array([[-1, 2], [-0.5, 6], [0, 10], [1, 18]])
            scaler = MinMaxScaler()
            scaled_data = scaler.fit_transform(data)
            print(scaled_data)

  - StandardScaler (Standardization): This standardizes the data by removing the mean and scaling to unit variance.
            from sklearn.preprocessing import StandardScaler
            import numpy as np

            data = np.array([[-1, 2], [-0.5, 6], [0, 10], [1, 18]])
            scaler = StandardScaler()
            scaled_data = scaler.fit_transform(data)
            print(scaled_data)\

  - RobustScaler: This scaler is robust to outliers. It removes the median and scales the data according to the interquartile range (IQR).
            from sklearn.preprocessing import RobustScaler
            import numpy as np

            data = np.array([[-1, 2], [-0.5, 6], [0, 10], [1, 18], [10, 100]])
            scaler = RobustScaler()
            scaled_data = scaler.fit_transform(data)
            print(scaled_data)
---

23. **What is sklearn.preprocessing?**

  Ans. sklearn.preprocessing is a module in the scikit-learn library in Python that provides a collection of utility functions and transformer classes. These tools are essential for feature engineering as they help in converting raw feature vectors into a more suitable format for machine learning algorithms.

  The module includes methods for:

  - Scaling: Standardizing or normalizing numerical features (e.g., StandardScaler, MinMaxScaler, RobustScaler).
  - Encoding Categorical Features: Converting categorical data into numerical formats that machine learning models can understand (e.g., OneHotEncoder, OrdinalEncoder).
  - Imputation of Missing Values: Handling missing data using various strategies (e.g., SimpleImputer).
  - Generating Polynomial Features: Creating new features by raising existing ones to certain powers and including interaction terms (PolynomialFeatures).
  - Normalization: Scaling individual samples to have unit norm (Normalizer).
  - Binarization: Thresholding numerical features to get binary (0 or 1) values (Binarizer).
---

24. **How do we split data for model fitting (training and testing) in Python?**

  Ans. The most common way to split data into training and testing sets in Python is using the train_test_split function from the sklearn.model_selection module.

  For Example:

        from sklearn.model_selection import train_test_split
        import pandas as pd

        # Assuming you have your features in 'X' (a pandas DataFrame or NumPy array)
        # and your target variable in 'y' (a pandas Series or NumPy array)

        X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.2, random_state=42, stratify=None
        )

        print(f"Shape of X_train: {X_train.shape}")
        print(f"Shape of X_test: {X_test.shape}")
        print(f"Shape of y_train: {y_train.shape}")
        print(f"Shape of y_test: {y_test.shape}")
---

25. **Explain data encoding?**

  Ans. Data encoding refers to the process of converting categorical data into a numerical format that machine learning algorithms can understand. Many machine learning models work best with, or can only process, numerical input. Therefore, if our dataset contains categorical features (like colors, names, or types), you need to encode them.
---