#Feature Engineering

### 1. What is a parameter?
A parameter is a value that describes a characteristic of a population, such as a population mean or standard deviation. It is fixed and typically unknown, estimated using sample data.

---

### 2. What is correlation?
Correlation measures the strength and direction of a linear relationship between two variables. It ranges from -1 (perfect negative) to +1 (perfect positive), with 0 indicating no linear relationship.

---

### 3. What does negative correlation mean?
Negative correlation means that as one variable increases, the other decreases. For example, if time spent exercising increases and weight decreases, the two are negatively correlated.

---

### 4. Define Machine Learning. What are the main components in Machine Learning?
Machine Learning is a field of computer science where algorithms learn patterns from data to make predictions or decisions without being explicitly programmed.  
**Main components:**  
- Data  
- Model/Algorithm  
- Loss Function  
- Optimizer  
- Evaluation Metric

---

### 5. How does loss value help in determining whether the model is good or not?
The loss value quantifies the difference between the predicted and actual values. A smaller loss indicates a better-performing model. It helps during training by guiding the optimization process.

---

### 6. What are continuous and categorical variables?
- **Continuous variables**: Numeric variables that can take any value within a range (e.g., height, weight).  
- **Categorical variables**: Variables that represent categories or groups (e.g., gender, color).

---

### 7. How do we handle categorical variables in Machine Learning? What are the common techniques?
Categorical variables are converted into numerical form using encoding techniques:  
- **Label Encoding**: Assigns an integer to each category.  
- **One-Hot Encoding**: Creates binary columns for each category.  
- **Ordinal Encoding**: Encodes categories with an implicit order.

---

### 8. What do you mean by training and testing a dataset?
- **Training set**: Used to train the machine learning model.  
- **Testing set**: Used to evaluate the model’s performance on unseen data.

---

### 9. What is `sklearn.preprocessing`?
`sklearn.preprocessing` is a module in scikit-learn that contains tools for scaling, encoding, normalizing, and transforming data to prepare it for machine learning models.

---

### 10. What is a Test set?
A test set is a portion of the dataset that is not used in training but is used to evaluate how well the model generalizes to new, unseen data.

---

### 11. How do we split data for model fitting (training and testing) in Python?
Using `train_test_split` from scikit-learn:
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
---

###12. How do you approach a Machine Learning problem?
- Understand the problem and collect data

- Clean and preprocess the data

- Perform Exploratory Data Analysis (EDA)

- Split data into training and testing sets

- Select and train a model

- Evaluate the model

- Tune hyperparameters

- Deploy the model if needed

---

###13. Why do we have to perform EDA before fitting a model to the data?
EDA helps understand the dataset, detect anomalies, spot patterns, visualize relationships, and make decisions about feature selection and preprocessing.

---

###14. What is correlation?
Correlation measures the degree of linear association between two variables. Values close to +1 or -1 indicate strong relationships, while values near 0 suggest weak or no relationship.

---

###15. What does negative correlation mean?
Negative correlation means that as one variable increases, the other decreases. For example, temperature and heating bill often show negative correlation.

---

###16. How can you find correlation between variables in Python?
You can use the .corr() method in Pandas:

```python
import pandas as pd
data.corr()
```
---

###17. What is causation? Explain difference between correlation and causation with an example.
- Causation means one variable causes a change in another.
- Correlation only shows a relationship between variables.
Example:

Correlation: Ice cream sales and drowning incidents are correlated.

Causation: Hot weather causes both more ice cream consumption and swimming, leading to more drownings — not ice cream causing drownings.

---

###18. What is an Optimizer? What are different types of optimizers? Explain each with an example.
An optimizer is an algorithm that adjusts the model parameters to minimize the loss function.
Common optimizers:

- SGD: Stochastic Gradient Descent

- Adam: Adaptive Moment Estimation

- RMSprop: Root Mean Square Propagation

Example in TensorFlow:

```python
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
```
---

###19. What is sklearn.linear_model?
sklearn.linear_model is a module in Scikit-learn that provides linear models like:

- Linear Regression

- Logistic Regression

- Ridge and Lasso regression

---

###20. What does model.fit() do? What arguments must be given?
model.fit() trains the model on the provided dataset.
Arguments:

- Features (X)

- Target variable (y)

Example:

```python
model.fit(X_train, y_train)
```
---

###21. What does model.predict() do? What arguments must be given?
- model.predict() uses the trained model to make predictions on new data.
Arguments:

- New input data (e.g., X_test)

Example:

```python
predictions = model.predict(X_test)
```
---

###22. What are continuous and categorical variables?
- Continuous: Quantitative, measurable values (e.g., temperature).

- Categorical: Qualitative values or groups (e.g., country, color).

---

###23. What is feature scaling? How does it help in Machine Learning?
- Feature scaling standardizes or normalizes the range of features. It helps algorithms like KNN, SVM, and gradient descent converge faster and perform better.

---

###24. How do we perform scaling in Python?
- Using StandardScaler or MinMaxScaler from sklearn.preprocessing:

```python
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
```
---

###25. What is sklearn.preprocessing?
- It is a module that provides functions to prepare data for ML models by handling scaling, normalization, encoding, etc.

---

###26. How do we split data for model fitting (training and testing) in Python?
- Using train_test_split from sklearn.model_selection:

```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
```
---

###27. Explain data encoding?
Data encoding transforms categorical data into numerical format so that ML algorithms can process it. Common techniques:

- Label Encoding

- One-Hot Encoding

- Ordinal Encoding
---
