# Feature Engineering

## What is a parameter?
Parameter is setting that control how features are constructed, encoded, or scaled. They play a crucial role in how well your machine learning model performs.

## What is correlation?
Correlation is a statistical measure that tells us how two variables are related to each other. It shows if increase or decrease in one variable affects the other. Correlation ranges from -1 to 1.

## What does negative correlation mean?
Negative correlation means when one variable increases the other variable decreases. For example, if hours spent watching TV increases, marks in exam might decrease. The value of correlation in this case will be closer to -1.

## Define Machine Learning. What are the main components in Machine Learning?
Machine Learning is the process where machines are trained using data to make decisions or predictions without being explicitly programmed. Main components in machine learning are:
- Data: Input data is used to train the model.
- Model: Algorithm that learns patterns from the data.
- Loss function: Measures how good or bad the model is performing.
- Optimizer: Used to minimize the loss.
- Evaluation metrics: Used to evaluate model performance.

## How does loss value help in determining whether the model is good or not?
Loss value tells us how far the predicted value is from the actual value. Lower the loss, better the model performance. If loss is high, model is making wrong predictions and needs improvement.

## What are continuous and categorical variables?
Continuous variables are numerical and can take any value in a range like age, height, weight. Categorical variables are non-numeric and represent categories or labels like gender, country, color.

## How do we handle categorical variables in Machine Learning? What are the common techniques?
Categorical variables are converted into numerical form using encoding. Common techniques are:
- Label Encoding: Assigns a unique number to each category.
- One Hot Encoding: Creates binary columns for each category.

## What do you mean by training and testing a dataset?
Training dataset is used to train the model. Testing dataset is used to evaluate how well the model performs on new unseen data. This helps in avoiding overfitting.

## What is sklearn.preprocessing?
`sklearn.preprocessing` is a module in sklearn that provides functions to preprocess the data. This includes functions for scaling, encoding, imputing missing values, etc.

## What is a Test set?
Test set is a part of dataset which is kept aside and not used during training. It is used to evaluate the performance of the model after training.

## How do we split data for model fitting (training and testing) in Python?
We can use `train_test_split()` from `sklearn.model_selection` to split data. Example:
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
```

## How do you approach a Machine Learning problem?
- Understand the problem and data
- Clean and preprocess the data
- Perform Exploratory Data Analysis (EDA)
- Encode and scale the features
- Choose and train the model
- Evaluate the model using test data
- Improve model by tuning hyperparameters

## Why do we have to perform EDA before fitting a model to the data?
EDA helps in understanding the data, checking missing values, distributions, outliers, and relationships between variables. It helps in making decisions about preprocessing and feature selection.

## What is correlation?
Correlation is a statistical technique used to measure the strength and direction of relationship between two variables. Positive correlation means both increase together, negative means one increases while other decreases.

## What does negative correlation mean?
Negative correlation means as one variable increases, the other decreases. It indicates an inverse relationship. Example: number of hours spent partying and exam scores.

## How can you find correlation between variables in Python?
We can use `.corr()` function in pandas to find correlation. Example:
```python
import pandas as pd
df = pd.read_csv("data.csv")
df.corr()
```

## What is causation? Explain difference between correlation and causation with an example.
Causation means one variable directly affects another. Correlation just shows relationship but not cause. Example: Ice cream sales and drowning cases are correlated, but ice cream does not cause drowning. Hot weather causes both.

## What is an Optimizer? What are different types of optimizers? Explain each with an example.
Optimizer is used to update model weights to minimize loss. Common types:
- **SGD (Stochastic Gradient Descent)**: Updates weights using a small batch of data.
- **Adam**: Combines momentum and RMSProp. Works well in most cases.

Example in TensorFlow:
```python
import tensorflow as tf
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
```

## What is sklearn.linear_model?
`sklearn.linear_model` is a module in sklearn which provides linear models like LinearRegression, LogisticRegression, etc. These are used for regression and classification tasks.

## What does model.fit() do? What arguments must be given?
`model.fit()` trains the model using the training data. It takes the features (X) and target (y) as arguments. Example:
```python
model.fit(X_train, y_train)
```

## What does model.predict() do? What arguments must be given?
`model.predict()` is used to predict the output for given input data. It takes feature data as argument. Example:
```python
y_pred = model.predict(X_test)
```

## What are continuous and categorical variables?
Continuous variables are numeric and can take infinite values like 1.5, 3.9, 100. Categorical variables are labels like "Male", "Female", "Yes", "No".

## What is feature scaling? How does it help in Machine Learning?
Feature scaling is a method to normalize the range of independent variables. It helps algorithms that are sensitive to the magnitude of data like KNN, SVM and Gradient Descent. It makes training faster and more accurate.

## How do we perform scaling in Python?
We can use StandardScaler or MinMaxScaler from `sklearn.preprocessing`. Example:
```python
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
```

## What is sklearn.preprocessing?
`sklearn.preprocessing` is a module which contains functions for preprocessing data before fitting a model. It includes scaling, encoding, normalization, binarization, etc.

## How do we split data for model fitting (training and testing) in Python?
We use `train_test_split()` from `sklearn.model_selection` to split data into training and test set. Example:
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
```

## Explain data encoding?
Data encoding is a technique used to convert categorical data into numerical values so that machine learning models can understand it. Common techniques include:
- Label Encoding
- One Hot Encoding
