**Feature Engineering**

---

### 1. **What is a parameter?**  
In machine learning, a **parameter** refers to a configuration variable that is internal to the model and whose value is estimated from the data during training (e.g., weights in linear regression).

---

### 2. **What is correlation?**  
**Correlation** measures the strength and direction of a linear relationship between two variables. It ranges from -1 to 1.

---

### 3. **What does negative correlation mean?**  
A **negative correlation** means that as one variable increases, the other decreases. For example, more hours watching TV might correlate negatively with test scores.

---

### 4. **Define Machine Learning. What are the main components in Machine Learning?**  
**Machine Learning** is a subset of AI where systems learn patterns from data to make predictions or decisions.  
Main components:  
- **Data**  
- **Model**  
- **Algorithm**  
- **Loss function**  
- **Optimizer**  
- **Evaluation metrics**

---

### 5. **How does loss value help in determining whether the model is good or not?**  
The **loss value** indicates how far off the model's predictions are from the actual results. A lower loss generally means a better model.

---

### 6. **What are continuous and categorical variables?**  
- **Continuous variables**: Numeric values with infinite possibilities (e.g., height, age).  
- **Categorical variables**: Distinct groups or categories (e.g., gender, color).

---

### 7. **How do we handle categorical variables in Machine Learning? Common techniques?**  
We convert them into numerical form using techniques like:  
- **Label Encoding**  
- **One-Hot Encoding**  
- **Ordinal Encoding**

---

### 8. **What do you mean by training and testing a dataset?**  
- **Training data** is used to train the model.  
- **Testing data** is used to evaluate the model’s performance on unseen data.

---

### 9. **What is sklearn.preprocessing?**  
It’s a module in **Scikit-learn** that provides methods for feature scaling, normalization, encoding, and transformation.

---

### 10. **What is a Test set?**  
A **test set** is a portion of data used only to assess the model’s performance after training.

---

### 11. **How do we split data for model fitting (training and testing) in Python?**  
Using `train_test_split` from `sklearn.model_selection`:
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
```

---

### 12. **How do you approach a Machine Learning problem?**  
- Understand the problem  
- Gather and clean data  
- Exploratory Data Analysis (EDA)  
- Feature engineering  
- Choose model(s)  
- Train and validate  
- Evaluate performance  
- Tune parameters  
- Deploy

---

### 13. **Why do we perform EDA before fitting a model to the data?**  
EDA helps in understanding patterns, spotting anomalies, testing assumptions, and selecting relevant features for the model.

---

### 14. **What is correlation?** *(Repeated)*  
See Q2.

---

### 15. **What does negative correlation mean?** *(Repeated)*  
See Q3.

---

### 16. **How can you find correlation between variables in Python?**  
Using `.corr()`:
```python
import pandas as pd
df.corr()
```
Or using `seaborn.heatmap` for visualization.

---

### 17. **What is causation? Explain difference between correlation and causation with an example.**  
- **Causation** means one variable directly affects another.  
- **Correlation** just shows a relationship.

**Example**: Ice cream sales and drowning deaths may correlate, but eating ice cream doesn't cause drowning — hot weather is the cause (causation).

---

### 18. **What is an Optimizer? What are different types of optimizers?**  
An **optimizer** updates model parameters to minimize the loss.

Common types:
- **SGD (Stochastic Gradient Descent)**  
- **Adam** – adapts learning rate  
- **RMSprop** – scales gradients  
- **Adagrad** – adapts per parameter

Each balances learning speed and stability.

---

### 19. **What is sklearn.linear_model?**  
It’s a module in Scikit-learn containing linear models like:
- `LinearRegression`
- `LogisticRegression`
- `Ridge`
- `Lasso`, etc.

---

### 20. **What does model.fit() do? What arguments must be given?**  
It trains the model on data.  
Arguments: `X_train`, `y_train`
```python
model.fit(X_train, y_train)
```

---

### 21. **What does model.predict() do? What arguments must be given?**  
It predicts target values for input data.  
Argument: `X_test`
```python
predictions = model.predict(X_test)
```

---

### 22. **What are continuous and categorical variables?** *(Repeated)*  
See Q6.

---

### 23. **What is feature scaling? How does it help in Machine Learning?**  
Feature scaling standardizes or normalizes features.  
Helps models converge faster and improves accuracy, especially for distance-based models.

---

### 24. **How do we perform scaling in Python?**  
Using `StandardScaler` or `MinMaxScaler` from `sklearn.preprocessing`:
```python
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
```

---

### 25. **What is sklearn.preprocessing?** *(Repeated)*  
See Q9.

---

### 26. **How do we split data for model fitting (training and testing) in Python?** *(Repeated)*  
See Q11.

---

### 27. **Explain data encoding?**  
**Data encoding** is converting categorical data into numerical form.  
Types include:
- **Label Encoding** – assigns each category an integer  
- **One-Hot Encoding** – creates binary columns for each category  
Used to make data suitable for algorithms.

---

