<a href="https://colab.research.google.com/github/jagnyasenymohapatra/Phython-structure-module-2/blob/main/Feature_Engineering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# FEATURE ENGINEERING ASSIGNMENT


### 1. What is a parameter?  
A **parameter** is a numerical value that defines a characteristic of a model. In machine learning, parameters are the variables that the learning algorithm adjusts during training to minimize the error. Examples include weights and biases in neural networks.  



### 2. What is correlation?  
**Correlation** is a statistical measure that describes the relationship between two variables. It tells how one variable changes concerning another. It can be positive, negative, or zero.  

#### What does negative correlation mean?  
A **negative correlation** means that as one variable increases, the other decreases. For example, an increase in exercise duration is negatively correlated with body weight.  


### 3. Define Machine Learning. What are the main components in Machine Learning?  
**Machine Learning (ML)** is a branch of artificial intelligence that enables systems to learn from data and make predictions without being explicitly programmed.  

#### Main components of ML:  
1. **Dataset** – Collection of structured or unstructured data.  
2. **Features** – Input variables used to make predictions.  
3. **Model** – Algorithm that learns from data.  
4. **Loss Function** – Measures how well the model performs.  
5. **Optimizer** – Adjusts parameters to minimize loss.  
6. **Training Process** – Model learns from data.  



### 4. How does loss value help in determining whether the model is good or not?  
The **loss value** quantifies how far the predicted values are from the actual values. A lower loss value indicates better model performance. If the loss value is high, the model is underfitting or overfitting.  



### 5. What are continuous and categorical variables?  
- **Continuous variables** can take an infinite number of values (e.g., height, weight, temperature).  
- **Categorical variables** take discrete values that belong to specific categories (e.g., gender, country, product type).  



### 6. How do we handle categorical variables in Machine Learning? What are the common techniques?  
Categorical variables must be converted into numerical values for ML models. Common techniques include:  
1. **Label Encoding** – Assigns unique numbers to categories.  
2. **One-Hot Encoding** – Creates binary columns for each category.  
3. **Ordinal Encoding** – Assigns ordered numbers based on category ranking.  



### 7. What do you mean by training and testing a dataset?  
- **Training dataset**: Used to train the model.  
- **Testing dataset**: Used to evaluate the model's performance on unseen data.  



### 8. What is sklearn.preprocessing?  
`sklearn.preprocessing` is a module in Scikit-Learn that provides functions for feature scaling, normalization, encoding categorical variables, and other data transformations.  


### 9. What is a Test set?  
A **test set** is a separate dataset used to evaluate the final model's performance after training.  



### 10. How do we split data for model fitting (training and testing) in Python?  
We use `train_test_split` from `sklearn.model_selection`:  

```python
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
This splits the dataset into 80% training and 20% testing.  



### 11. Why do we have to perform EDA before fitting a model to the data?  
**Exploratory Data Analysis (EDA)** helps understand data distribution, missing values, outliers, and correlations. This ensures data is properly preprocessed for training, improving model accuracy.  



### 12. What is correlation?  
**Correlation** is a statistical measure that describes the relationship between two variables. It tells how one variable changes concerning another. It can be positive, negative, or zero.  

### 13. What does negative correlation mean?  
A **negative correlation** means that as one variable increases, the other decreases. For example, an increase in exercise duration is negatively correlated with body weight.  


### 14. How can you find correlation between variables in Python?  
Using **Pandas** and **Seaborn**:  

```python
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv("data.csv")
correlation_matrix = df.corr()

sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm")
plt.show()
```



### 15. What is causation? Explain difference between correlation and causation with an example.  
**Causation** means one event directly influences another. **Correlation** means two variables are related but do not necessarily cause each other.  

**Example:**  
- **Correlation**: Ice cream sales and drowning incidents increase together (due to summer).  
- **Causation**: Smoking causes lung cancer.  



### 16. What is an Optimizer? What are different types of optimizers? Explain each with an example.  
An **optimizer** updates the model's parameters to minimize loss.  

**Types of Optimizers:**  
1. **Gradient Descent** – Basic optimization technique.  
2. **SGD (Stochastic Gradient Descent)** – Updates weights for each batch.  
3. **Adam** – Adaptive learning rate optimizer.  

Example:  
```python
from tensorflow.keras.optimizers import Adam

optimizer = Adam(learning_rate=0.001)
```



### 17. What is sklearn.linear_model?  
`sklearn.linear_model` is a module in Scikit-Learn containing linear models like Linear Regression, Logistic Regression, and Ridge Regression.  



### 18. What does model.fit() do? What arguments must be given?  
`model.fit()` trains the model using the training data.  
**Arguments:**  
- `X_train` (features)  
- `y_train` (target variable)  

Example:  
```python
model.fit(X_train, y_train)
```



### 19. What does model.predict() do? What arguments must be given?  
`model.predict()` makes predictions using a trained model.  
**Arguments:**  
- `X_test` (input features for prediction)  

Example:  
```python
y_pred = model.predict(X_test)
```



### 20. What are continuous and categorical variables?  
- **Continuous variables** can take an infinite number of values (e.g., height, weight, temperature).  
- **Categorical variables** take discrete values that belong to specific categories (e.g., gender, country, product type).  



### 21. What is feature scaling? How does it help in Machine Learning?  
**Feature scaling** standardizes numerical features to improve model performance and prevent dominance by larger values.  



### 22. How do we perform scaling in Python?  
Using **StandardScaler** or **MinMaxScaler**:  

```python
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
```



### 23. What is sklearn.preprocessing?  
(Repeated – Answered in Q8)  



### 24. How do we split data for model fitting (training and testing) in Python?  
(Repeated – Answered in Q10)  



### 25. Explain data encoding?  
**Data encoding** converts categorical data into numerical form.  
Types:  
1. **Label Encoding** – Converts categories to integers.  
2. **One-Hot Encoding** – Creates binary columns for each category.  

Example:  
```python
from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder()
X_encoded = encoder.fit_transform(X)
```
