# 1. What is a Parameter?

A parameter is a numerical value that is learned by a machine learning model
during the training process. These values control the behavior of the model0
and determine how input data is transformed into output predictions. Parameters
are not manually set; instead, they are automatically adjusted by the learning
algorithm in such a way that the error between the actual output and the
predicted output becomes minimum.

For example, in a linear regression model represented by the equation:

y = mx + b

the values m (slope) and b (intercept) are parameters. During training, the
model finds the most suitable values of m and b so that the predicted line fits
the given data points as accurately as possible.

Thus, parameters are internal to the model and are essential for making
accurate predictions.



## 2. What is Correlation?

Correlation is a statistical measure that describes the relationship between two variables. It indicates how strongly and in what direction the variables are related to each other. Correlation helps in understanding whether a change in one variable causes a change in another variable.

The value of correlation lies between -1 and +1.

If the correlation is +1, it means there is a perfect positive relationship.
If the correlation is 0, it means there is no relationship.
If the correlation is -1, it means there is a perfect negative relationship.

Correlation is widely used in data analysis and feature selection because highly correlated variables may contain similar information.

# What does Negative Correlation mean?
Negative correlation means that when the value of one variable increases, the value of the other variable decreases, and vice versa. In other words, both variables move in opposite directions.

For example, as the price of a product increases, its demand decreases. Similarly, if the speed of a vehicle increases, the time required to reach the destination decreases.

This inverse relationship between variables is known as negative correlation.


## 3. Define Machine Learning. What are the main components in Machine Learning?

Machine Learning is a branch of artificial intelligence that enables computers to learn from data and make predictions or decisions without being explicitly programmed. Instead of following fixed instructions, the machine identifies patterns from the data and improves its performance automatically with experience.

The main components of machine learning are:

1. Dataset - It is the collection of data used for training and testing the model.
2. Features - These are the input variables used to make predictions.
3. Target variable - This is the output that the model tries to predict.
4. Model - It is the mathematical representation that learns patterns from the data.
5. Loss function - It measures the error between actual and predicted values.
6. Optimization algorithm - It updates the parameters to reduce the loss.
7. Evaluation metric - It measures the performance of the trained model.


## 4. How does loss value help in determining whether the model is good or not?

Loss value represents the difference between the actual output and the predicted output given by the model. It tells how well the model is performing.

A small loss value indicates that the predicted values are close to the actual values, which means the model is performing well. A large loss value indicates poor performance.

During training, the objective of the model is to minimize the loss function. If the loss continuously decreases, it means the model is learning. If the training loss is very low but the testing loss is very high, then the model is overfitting.

Therefore, loss value is an important measure for determining the quality of a machine learning model.

## 5. What are Continuous and Categorical Variables?

Continuous variables are numerical variables that can take any value within a given range. They are measurable quantities and usually represent real numbers. Examples include height, weight, temperature, and salary.

Categorical variables are variables that represent categories or labels. They do not represent numerical quantities. Examples include gender, city, color, and department.

Continuous variables are used directly in machine learning models, whereas categorical variables must be converted into numerical form before being used.


## 6. How do we handle categorical variables in Machine Learning? What are the common techniques?

Machine learning algorithms work with numerical data, so categorical variables must be converted into numerical format. The common techniques are:

**Label Encoding**
In this method, each category is assigned a unique number. For example, Red = 0, Blue = 1, Green = 2. This method is suitable when the categories have an order.

**One-Hot Encoding**
In this method, new binary columns are created for each category. If the category is present, the value is 1; otherwise, it is 0. This method is used when there is no order among the categories.

**Ordinal Encoding**
This method is used when the categories have a meaningful order, such as Low, Medium, and High.

These techniques help in converting categorical data into a format suitable for machine learning models.


## 7. What do you mean by training and testing a dataset?

Training dataset is the portion of the data that is used to train the machine learning model. The model learns the relationship between input features and the target variable using this data.

Testing dataset is the portion of the data that is used to evaluate the performance of the trained model. This data is not shown to the model during training. It helps in checking how well the model works on new and unseen data.

This division ensures that the model is not just memorizing the data but is actually learning the underlying patterns.


## 8. What is sklearn.preprocessing?

sklearn.preprocessing is a module of the Scikit-learn library that is used for data preprocessing. It provides functions for transforming raw data into a suitable format for machine learning algorithms.

It is used for:

Feature scaling using StandardScaler and MinMaxScaler
Encoding categorical variables using LabelEncoder and OneHotEncoder
Normalization of data
Handling binary data

Data preprocessing is an important step because machine learning models perform better when the input data is properly scaled and formatted.


## 9. What is a Test set?

A test set is a part of the dataset that is used to evaluate the final performance of a trained machine learning model. It contains data that the model has never seen before.

The test set helps in measuring how accurately the model can make predictions on new and unseen data. It is used only after the training process is complete.



## 10. How do we split data for model fitting (training and testing) in Python?

In Python, data is commonly split using the train_test_split function from sklearn.model_selection.

Example:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Here, test_size = 0.2 means 20% of the data is used for testing and 80% for training.
random_state ensures that the split is reproducible.

## How do you approach a Machine Learning problem?

The general steps to solve a machine learning problem are:

1. **Problem understanding**
   First, understand whether the problem is classification, regression, or clustering.

2. **Data collection**
   Gather the relevant data required to train the model.

3. **Data preprocessing**
   Handle missing values, encode categorical variables, and scale the features.

4. **Exploratory Data Analysis**
   Analyze the data using graphs and statistical methods to find patterns and relationships.

5. **Feature selection and feature engineering**
   Select important features and create new features if required.

6. **Splitting the dataset**
   Divide the dataset into training and testing sets.

7. **Model selection**
   Choose a suitable machine learning algorithm.

8. Model training
   Train the model using the training dataset.

9. **Model evaluation**
   Evaluate the model using appropriate performance metrics.

10. **Hyperparameter tuning**
    Improve the performance of the model by adjusting its parameters.

11. **Prediction**
    Use the trained model to make predictions on new data.


##11. Why do we have to perform EDA before fitting a model to the data?

Exploratory Data Analysis (EDA) is performed to understand the structure, quality, and patterns present in the dataset before applying any machine learning algorithm. Real-world data is usually incomplete, noisy, and inconsistent. If we directly train a model without understanding the data, the model may learn incorrect patterns and produce poor results.

EDA helps in:

1. Understanding the distribution of data
   It shows whether the data is normally distributed or skewed.

2. Detecting missing values
   Missing values must be handled before training.

3. Identifying outliers
   Outliers can significantly affect model performance.

4. Checking relationships between variables
   This helps in selecting important features.

5. Detecting multicollinearity
   Highly correlated independent variables can reduce model efficiency.

6. Understanding the scale of features
   Some models require feature scaling.




##12. What is Correlation?

Correlation is a statistical measure that indicates the strength and direction of the relationship between two variables. It tells how one variable changes with respect to another.

The correlation value ranges from -1 to +1:

+1 -> Perfect positive correlation
0 -> No correlation
-1 -> Perfect negative correlation


##13. What does Negative Correlation mean?

Negative correlation means that when one variable increases, the other variable decreases. Both variables move in opposite directions.

Example:
As the price of a product increases, its demand decreases.



##14. How can you find correlation between variables in Python?

Correlation in Python can be calculated using the pandas library.

Example:

```python
import pandas as pd

data = pd.read_csv("data.csv")
correlation_matrix = data.corr()
print(correlation_matrix)
```

To find correlation between two variables:

```python
data["col1"].corr(data["col2"])
```


##15. What is causation? Explain difference between correlation and causation with an example.

Causation means that a change in one variable directly causes a change in another variable.

Correlation only shows that two variables are related, but it does not prove that one causes the other.

Example:

Ice cream sales and number of drowning cases are positively correlated.
But ice cream does not cause drowning.
The actual cause is hot weather, which increases both ice cream sales and swimming activity.

Correlation -> Variables move together
Causation -> One variable produces an effect on another


##16. What is an Optimizer? What are different types of optimizers? Explain each with an example.

An optimizer is an algorithm used to minimize the loss function by updating the model parameters during training. It finds the best values of parameters so that the model makes accurate predictions.

Types of optimizers:

### 1. Gradient Descent

It updates parameters by moving in the direction of the negative gradient of the loss function.

Example:
Used in linear regression.

### 2. Stochastic Gradient Descent (SGD)

It updates parameters using one training example at a time. It is faster and suitable for large datasets.

### 3. Mini-batch Gradient Descent

It updates parameters using small batches of data. It is the most commonly used method.

### 4. Adam (Adaptive Moment Estimation)

It combines the advantages of Momentum and RMSProp. It is fast and widely used in deep learning.

### 5. RMSProp

It adjusts the learning rate automatically for each parameter.


##17. What is sklearn.linear_model?

sklearn.linear_model is a module in Scikit-learn that contains all linear models.

Examples:

LinearRegression
LogisticRegression
Ridge
Lasso

These models are used for regression and classification tasks.


##18 What does model.fit() do? What arguments must be given?

model.fit() is used to train the machine learning model. It learns the relationship between input features and the target variable.

Syntax:

```python
model.fit(X_train, y_train)
```

Arguments:

X_train ->Training input data.

y_train -> Training output data.


##19. What does model.predict() do? What arguments must be given?

model.predict() is used to make predictions using the trained model.

Syntax:

```python
model.predict(X_test)
```

Argument:

X_test -> Data for which predictions are required


##20. What are Continuous and Categorical Variables?

Continuous variables are numerical variables that can take any value within a range.
Example: height, weight, salary.

Categorical variables represent categories or labels.
Example: gender, city, color.


##21. What is Feature Scaling? How does it help?

Feature scaling is the process of bringing all feature values to a similar range.

It is important because:

1. It improves model performance
2. It speeds up training
3. It prevents features with large values from dominating
4. It is required for distance-based algorithms like KNN and SVM


##22. How do we perform scaling in Python?

Using StandardScaler:

```python
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
```

Using MinMaxScaler:

```python
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
```

##23. What is sklearn.preprocessing?

It is a module of Scikit-learn used for data preprocessing.

It provides:

StandardScaler

MinMaxScaler

LabelEncoder

OneHotEncoder

Normalizer

It helps in converting raw data into a suitable format for machine learning.


##24. How do we split data for training and testing in Python?

```python
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
```

test_size = 0.2 means 20% testing and 80% training.


##25. Explain Data Encoding

Data encoding is the process of converting categorical data into numerical form so that it can be used in machine learning models.

Types of encoding:

### 1. Label Encoding

Each category is assigned a unique number.

### 2. One-Hot Encoding

Creates separate binary columns for each category.

### 3. Ordinal Encoding

Used when categories have a meaningful order.

Encoding is necessary because machine learning algorithms work only with numerical data.
