# Feature Engineering Assignment


### Q1. What is a parameter?
A parameter is a configuration variable that the model learns from the data during training.  
For example, in Linear Regression, the slope (weight) and intercept (bias) are parameters.

---

### Q2. What is correlation?
Correlation measures the strength and direction of a linear relationship between two variables.  
It ranges from **-1 to +1**:
- **+1** → Perfect positive correlation  
- **-1** → Perfect negative correlation  
- **0** → No correlation

---

### Q3. What does negative correlation mean?
A negative correlation means that when one variable increases, the other decreases.  
Example: As the price of a product increases, its demand often decreases.

---

### Q4. Define Machine Learning. What are the main components in Machine Learning?
Machine Learning (ML) is the process of teaching a computer system to make predictions or decisions from data without being explicitly programmed.  
**Main components:**
1. **Dataset** – Input data used for learning.  
2. **Model** – Mathematical representation of relationships in data.  
3. **Loss function** – Measures how far predictions are from true values.  
4. **Optimizer** – Adjusts model parameters to minimize loss.  
5. **Training and Testing** – Used to evaluate model performance.

---

### Q5. How does loss value help in determining whether the model is good or not?
A loss value quantifies how far the model’s predictions are from actual values.  
- A low loss value indicates the model is performing well.  
- A high loss value means the model needs improvement (e.g., more training or tuning).

---

### Q6. What are continuous and categorical variables?
- **Continuous variables:** Numeric values that can take any value within a range (e.g., height, temperature).  
- **Categorical variables:** Represent categories or groups (e.g., gender, color, type).

---

### Q7. How do we handle categorical variables in Machine Learning? What are the common techniques?
We convert categories into numerical values using:
1. **Label Encoding** – Assigns a numeric value to each category.  
2. **One-Hot Encoding** – Creates binary columns (0/1) for each category.  
3. **Ordinal Encoding** – Used for ordered categories (e.g., low < medium < high).

---

### Q8. What do you mean by training and testing a dataset?
- **Training set:** Used to teach the model and adjust its parameters.  
- **Testing set:** Used to evaluate model performance on unseen data.

---

### Q9. What is sklearn.preprocessing?
`sklearn.preprocessing` is a module in Scikit-learn that provides functions for:
- Scaling features (StandardScaler, MinMaxScaler)
- Encoding categorical data (LabelEncoder, OneHotEncoder)
- Normalization and transformation

---

### Q10. What is a Test set?
The test set is a subset of data not used in training. It is used to evaluate how well the model generalizes to new, unseen data.

---

### Q11. How do we split data for model fitting (training and testing) in Python?
We use Scikit-learn’s `train_test_split()` function:
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)




---

### Q12. How do you approach a Machine Learning problem?
**Steps:**
1. Understand the problem statement  
2. Collect and clean the dataset  
3. Perform Exploratory Data Analysis (EDA)
4. Preprocess data (scaling, encoding, handling missing values)  
5. Select suitable algorithms  
6. Train and test the model  
7. Evaluate performance  
8. Optimize hyperparameters  
9. Deploy the model

---

### Q13. Why do we have to perform EDA before fitting a model to the data?
- To understand data distribution and patterns  
- To detect missing values, outliers, and errors  
- To identify relationships and correlations  
- To decide preprocessing and model strategy  

---

### Q14. What is correlation?
Correlation shows how strongly two variables are related.  
It ranges between **-1 to +1**.  
```python
import pandas as pd
df.corr()


---

### Q15. What does negative correlation mean?
A negative correlation means that when one variable increases, the other decreases.  
It shows an inverse relationship between two variables.  

**Example:**  
As the temperature increases, heater usage decreases.

---

### Q16. How can you find correlation between variables in Python?
```python
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Example correlation matrix and heatmap
corr = df.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.show()


---

### Q17. What is causation? Explain the difference between correlation and causation with an example.
- **Correlation:** Describes how two variables move together, but one may not cause the other.  
- **Causation:** One variable directly affects or causes a change in another variable.  

**Example:**  
There is a correlation between ice cream sales and drowning cases,  
but hot weather causes both — not ice cream consumption.

---

### Q18. What is an Optimizer? What are different types of optimizers? Explain each with an example.
An optimizer is an algorithm that adjusts the model’s weights to minimize the loss function and improve model accuracy.  

**Types of Optimizers:**
1. **Gradient Descent:** Updates weights based on the slope of the loss function.  
2. **SGD (Stochastic Gradient Descent):** Updates weights for each training example.  
3. **RMSProp:** Adapts learning rates based on the moving average of squared gradients.  
4. **Adam:** Combines momentum and adaptive learning rate for efficient convergence.  

```python
from tensorflow.keras.optimizers import Adam
optimizer = Adam(learning_rate=0.001)


---

### Q19. What is sklearn.linear_model?
`sklearn.linear_model` is a module in Scikit-learn that provides linear models for both regression and classification tasks.  

**Common models include:**
- `LinearRegression()` – For predicting continuous values  
- `LogisticRegression()` – For classification problems  
- `Ridge()` and `Lasso()` – For regression with regularization  

```python
from sklearn.linear_model import LinearRegression
model = LinearRegression()


---

### Q20. What does model.fit() do? What arguments must be given?
`model.fit(X_train, y_train)` is used to **train the model** using training data.  
It helps the algorithm learn the relationship between **input features (X_train)** and **target labels (y_train)**.  

**Arguments:**
- **X_train:** Input features used for training  
- **y_train:** Target variable or labels  

---

### Q21. What does model.predict() do? What arguments must be given?
`model.predict(X_test)` is used to **generate predictions** from the trained model on new or unseen data.  

**Arguments:**
- **X_test:** Input features for which predictions are required  

---

### Q22. What are continuous and categorical variables?
- **Continuous Variables:** Represent numeric data that can take any value within a range.  
  *Examples:* Age, Height, Salary  
- **Categorical Variables:** Represent data divided into groups or categories.  
  *Examples:* Gender, City, Color  

---

### Q23. What is feature scaling? How does it help in Machine Learning?
**Feature scaling** standardizes or normalizes the range of independent variables.  
It ensures that features with large numeric ranges don’t dominate features with smaller ranges.  

**Importance:**  
- Makes algorithms like **KNN, SVM, Gradient Descent, and K-Means** perform better.  
- Improves model convergence speed.  

---

### Q24. How do we perform scaling in Python?
```python
from sklearn.preprocessing import StandardScaler, MinMaxScaler

# Standard Scaling (Z-score normalization)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Min-Max Scaling (range 0 to 1)
minmax = MinMaxScaler()
X_minmax = minmax.fit_transform(X)


---
###Q25. Explain data encoding.

Data encoding converts categorical data into numerical form so that machine learning models can process it.

Common Encoding Techniques:

Label Encoding: Assigns a unique number to each category.

One-Hot Encoding: Creates separate binary columns for each category.