## Features Engineering

1. What is a parameter?

       - A parameter is an internal variable in a model whose value is learned from the training data. For example, in linear regression, the slope and intercept are parameters. These values are adjusted during model training to minimize the loss.

2. What is correlation?

       - Correlation is a statistical measure that expresses the extent to which two variables are linearly related. A correlation value ranges from -1 to +1:
        
        +1: Strong positive correlation
        
        0: No correlation
        
        -1: Strong negative correlation
    
3. What does negative correlation mean?

   
   - A negative correlation indicates that as one variable increases, the other decreases. For example, as temperature rises, sales of heaters go down.

4. Define Machine Learning. What are the main components in Machine Learning?

       -Machine Learning (ML) is a subset of Artificial Intelligence that enables computers to learn from data and make predictions or decisions without being explicitly programmed.
    
    Main components:
    
    Data
    
    Model
    
    Loss function
    
    Optimization algorithm
    
    Evaluation metrics

5. How does loss value help in determining whether the model is good or not?

       - The loss value measures how far the model's predictions are from the actual values. A lower loss indicates better model performance. It guides the optimization process during training.

6. What are continuous and categorical variables?

       - Continuous variables can take any numerical value within a range (e.g., height, weight, age).
    
    Categorical variables represent categories or groups (e.g., gender, color, product type).

7. How do we handle categorical variables in Machine Learning? What are the common techniques?
   
       - We handle categorical variables using:
    
    Label Encoding: Converts categories into numbers.
    
    One-Hot Encoding: Creates binary columns for each category.
    
    Ordinal Encoding: Applies when categories have an order.

8. What do you mean by training and testing a dataset?
       - Training data is used to teach the model.
    
    Testing data is used to evaluate how well the model performs on unseen data.

9. What is sklearn.preprocessing?

        - sklearn.preprocessing is a module in Scikit-learn used for preprocessing data such as scaling, encoding, normalization, etc., to prepare it for model training.

10. What is a Test set?

        - A Test set is a portion of the dataset used to evaluate the final performance of a trained model. It helps check how well the model generalizes to new, unseen data.



11. How do we split data for model fitting (training and testing) in Python?
    - We use the train_test_split() function from sklearn.model_selection:

In [2]:
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn.datasets import load_iris

# Load sample data
data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

# Split into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Training data shape:", X_train.shape)
print("Test data shape:", X_test.shape)


Training data shape: (120, 4)
Test data shape: (30, 4)


12. How do you approach a Machine Learning problem?

    -
    Steps to approach an ML problem:
    
    Understand the problem
    
    Collect and explore the data
    
    Preprocess the data
    
    Select a model
    
    Train the model
    
    Evaluate the model
    
    Tune hyperparameters
    
    Deploy and monitor

13. Why do we have to perform EDA before fitting a model to the data?
    - Exploratory Data Analysis (EDA) helps in understanding the data, identifying patterns, detecting outliers, and uncovering missing values, which guides proper preprocessing and model selection.

14. What is correlation?
   - (Same as Q2) Correlation shows how strongly two variables are related.

15. What does negative correlation mean?
   - (Same as Q3) It means one variable increases as the other decreases.



16. How can you find correlation between variables in Python?
    - Using .corr() method in pandas:

In [4]:
import pandas as pd

# Sample dataset
data = {
    'Height': [150, 160, 170, 180, 190],
    'Weight': [50, 60, 65, 70, 80],
    'Age': [25, 30, 35, 40, 45]
}

# Create DataFrame
df = pd.DataFrame(data)

# Calculate correlation matrix
correlation_matrix = df.corr()

print(correlation_matrix)


          Height    Weight       Age
Height  1.000000  0.989949  1.000000
Weight  0.989949  1.000000  0.989949
Age     1.000000  0.989949  1.000000


17. What is causation? Explain difference between correlation and causation with an example.
  -   Causation means one variable directly affects another.
    
    Correlation means two variables change together but don’t necessarily cause each other.
    
    Example: Ice cream sales and drowning deaths are correlated (due to summer) but one does not cause the other.

18. What is an Optimizer? What are different types of optimizers? Explain each with an example.
  -  An optimizer adjusts model parameters to minimize loss during training.
    
    Types:
    
    SGD (Stochastic Gradient Descent): Updates weights using a small batch of data.
    
    Adam: Combines momentum and adaptive learning rates.
    
    RMSprop: Maintains per-parameter learning rate.

19. What is sklearn.linear_model?
    - sklearn.linear_model is a module in Scikit-learn that contains algorithms for linear models like:
    
    LinearRegression
    
    LogisticRegression
    
    Ridge
    
    Lasso

20. What does model.fit() do? What arguments must be given?
   - model.fit(X, y) trains the model using features X and target y.
    
    Required arguments:
    
    X: Training features
    
    y: Target variable

21. What does model.predict() do? What arguments must be given?
   - model.predict(X_test) uses the trained model to predict the target for X_test.
    
    Required argument:
    
    X_test: Test data

22. What are continuous and categorical variables?
   -  (Same as Q6) Continuous = numeric, Categorical = categories.

23. What is feature scaling? How does it help in Machine Learning?
   - Feature scaling transforms features to a common scale (e.g., 0–1).

    It improves model performance, especially for algorithms like KNN, SVM, and gradient descent-based models.



24. How do we perform scaling in Python?
   -  Using StandardScaler or MinMaxScaler:

In [5]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)


25. Explain data encoding?
    Data encoding converts categorical variables into numerical format so ML algorithms can process them.
    
    Types:
    
    Label Encoding
    
    One-Hot Encoding

In [7]:
import pandas as pd

# Sample DataFrame with a categorical column 'Color'
data = {
    'Color': ['Red', 'Blue', 'Green', 'Blue', 'Red'],
    'Value': [10, 15, 20, 25, 30]
}

df = pd.DataFrame(data)

# Apply one-hot encoding
encoded_df = pd.get_dummies(df, columns=['Color'])

print(encoded_df)


   Value  Color_Blue  Color_Green  Color_Red
0     10       False        False       True
1     15        True        False      False
2     20       False         True      False
3     25        True        False      False
4     30       False        False       True
