### Q1. What is a parameter?

A **parameter** is an internal variable in a model that is learned from the training data. 
For example, in linear regression `y = wx + b`, the slope `w` and intercept `b` are parameters learned by minimizing the loss function.

### Q2. What is correlation?

**Correlation** is a statistical measure that describes the strength and direction of the relationship between two variables. 
It ranges from -1 to +1. A value close to 1 means strong positive relation, close to -1 means strong negative relation, and near 0 means no relation.

### What does negative correlation mean?

**Negative correlation** means that as one variable increases, the other decreases. 
Example: hours of exercise per week and body fat percentage usually have a negative correlation.

### Q3. Define Machine Learning. What are the main components in Machine Learning?

**Machine Learning** is the field of study that gives computers the ability to learn from data without being explicitly programmed.  
**Main components**:  
- Data  
- Features  
- Model  
- Loss function  
- Optimization algorithm  
- Evaluation metrics

### Q4. How does loss value help in determining whether the model is good or not?

The **loss value** quantifies how far the model's predictions are from the actual labels. 
A lower loss means a better model. It guides optimization during training.

### Q5. What are continuous and categorical variables?

**Continuous variables**: Numeric values that can take any real number within a range (e.g., height, weight).  
**Categorical variables**: Values that represent categories or groups (e.g., gender, country).

### Q6. How do we handle categorical variables in Machine Learning? What are the common techniques?

Handling categorical variables:  
- **Label Encoding**: Assigns numbers to categories.  
- **One-Hot Encoding**: Creates binary columns for each category.  
- **Target Encoding**: Replaces categories with mean of target variable.

### Q7. What do you mean by training and testing a dataset?

**Training dataset**: Used to fit the model.  
**Testing dataset**: Used to evaluate the model performance on unseen data.

### Q8. What is sklearn.preprocessing?

`sklearn.preprocessing` is a module in scikit-learn that provides methods for preprocessing data such as scaling, encoding categorical features, normalization, etc.

### Q9. What is a Test set?

A **test set** is a portion of the dataset kept aside to evaluate how well the trained model generalizes to new, unseen data.

### Q10. How do we split data for model fitting (training and testing) in Python?

In Python, we split data using `train_test_split` from `sklearn.model_selection`.  
Example:  
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```

 How do you approach a Machine Learning problem?

Steps to approach an ML problem:  
1. Define the problem  
2. Collect and preprocess data  
3. Perform EDA  
4. Feature engineering  
5. Select a model  
6. Train the model  
7. Evaluate the model  
8. Deploy and monitor

### Q11. Why do we have to perform EDA before fitting a model to the data?

EDA (Exploratory Data Analysis) is done to understand the dataset, detect patterns, missing values, outliers, and relationships before fitting a model. 
It ensures better feature engineering and model selection.

### Q12. What is correlation?

Correlation is already defined above. It measures the relationship between variables.

### Q13. What does negative correlation mean?

Negative correlation is already defined above. It indicates an inverse relationship between two variables.

### Q14 How can you find correlation between variables in Python?

In Python, correlation can be computed using Pandas:  
```python
import pandas as pd
df.corr()
```  
Or specifically between two variables:  
```python
df['x'].corr(df['y'])
```

### Q15. What is causation? Explain difference between correlation and causation with an example.

**Causation** means that changes in one variable directly cause changes in another.  
**Difference**: Correlation only shows association, not cause-effect.  
Example: Ice cream sales and drowning cases are correlated (both rise in summer), but eating ice cream does not cause drowning (causation is heat).

### Q16. What is an Optimizer? What are different types of optimizers? Explain each with an example.

**Optimizer** is an algorithm that adjusts parameters of a model to minimize loss.  
Types:  
- **SGD** (Stochastic Gradient Descent)  
- **Adam** (adaptive learning rate)  
- **RMSprop** (uses moving average of squared gradients)  
- **Adagrad** (adapts learning rate based on frequency of updates)

### Q17. What is sklearn.linear_model ?

`sklearn.linear_model` is a module in scikit-learn that provides linear models such as Linear Regression, Logistic Regression, Ridge, Lasso, etc.

### Q18. What does model.fit() do? What arguments must be given?

`model.fit(X, y)` trains the model on features `X` and target `y`.  
Arguments: training data and labels.

### Q19. What does model.predict() do? What arguments must be given?

`model.predict(X)` predicts outcomes for new feature data `X`.  
Arguments: input features only.

### Q20. What are continuous and categorical variables?

Continuous and categorical variables definition is same as above.

### Q21. What is feature scaling? How does it help in Machine Learning?

**Feature scaling** is transforming features to a similar scale.  
Helps ML algorithms converge faster and perform better (especially gradient descent, distance-based models).

### Q22. How do we perform scaling in Python?

Scaling in Python using sklearn:  
```python
from sklearn.preprocessing import StandardScaler, MinMaxScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
```

### Q23. What is sklearn.preprocessing?

`sklearn.preprocessing` is a module for scaling, encoding, normalization etc (already defined above).

### Q24. How do we split data for model fitting (training and testing) in Python?

Data splitting example in Python:  
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)
```

### Q25. Explain data encoding?

**Data encoding**: Converting categorical variables into numerical format so ML models can use them.  
Techniques:  
- Label Encoding  
- One-Hot Encoding  
- Ordinal Encoding  
- Target Encoding