1. What is a parameter?
A parameter in Machine Learning refers to variables that are intrinsic to the model and are learned from the data during the training process. These parameters determine the behavior and performance of the model. For example, in a linear regression model, the slope (weight) and intercept (bias) are parameters.
Parameters differ from hyperparameters, which are set manually before training (e.g., learning rate or the number of layers in a neural network).


2. What is correlation?
Correlation is a statistical measure that describes the extent to which two variables are related. It quantifies the direction and strength of their relationship.

A positive correlation indicates that as one variable increases, the other increases.
A negative correlation indicates that as one variable increases, the other decreases.
A correlation coefficient (r) ranges from -1 to +1:
+1: Perfect positive correlation.
-1: Perfect negative correlation.
0: No correlation.


3. What does negative correlation mean?
Negative correlation occurs when two variables are inversely related. As one variable increases, the other decreases.
For example:

As the price of a product increases, the demand for the product typically decreases. The correlation coefficient (r) for a negative correlation is between -1 and 0.


4. Define Machine Learning. What are the main components in Machine Learning?
Machine Learning (ML) is a branch of Artificial Intelligence that focuses on developing algorithms and systems capable of learning from data to make predictions or decisions without being explicitly programmed.

Main Components of ML:

Data: The foundation for ML. High-quality and diverse datasets improve model performance.
Model: Mathematical representation of the problem (e.g., linear regression, neural networks).
Training: The process of feeding data to the model to learn patterns.
Algorithm: The method used to train the model (e.g., gradient descent).
Evaluation: Assessment of the model’s performance using metrics like accuracy, precision, and recall.


5. How does loss value help in determining whether the model is good or not?
The loss value quantifies the error between the model’s predictions and the actual target values. It acts as a feedback mechanism to improve the model during training.

A low loss value indicates the model is making accurate predictions.
A high loss value suggests the model is not performing well and needs adjustments.
The loss value is minimized during training using optimization techniques (e.g., gradient descent).


6. What are continuous and categorical variables?
Continuous Variables: These are numeric variables that can take any value within a range (e.g., height, weight, temperature).
Example: A person’s weight could be 70.5 kg.
Categorical Variables: These are variables that represent distinct categories or groups. They can be:
Nominal: No inherent order (e.g., colors: red, green, blue).
Ordinal: Have an order (e.g., education levels: high school, college, graduate).


7. How do we handle categorical variables in Machine Learning? What are the common techniques?
Handling categorical variables involves converting them into numerical representations that models can interpret. Common techniques include:

Label Encoding: Assigns each category a unique number.
Example: [Red, Blue, Green] → [1, 2, 3].
One-Hot Encoding: Creates binary columns for each category.
Example:
Color	Red	Blue	Green
Red	1	0	0
Ordinal Encoding: Assigns ordered numbers based on category ranking.
Example: [Low, Medium, High] → [1, 2, 3].


8. What do you mean by training and testing a dataset?
Training Dataset: The portion of data used to train the model and adjust its parameters.
Testing Dataset: The unseen data used to evaluate the model’s performance and generalization capabilities.
A common split is 80% training and 20% testing.


9. What is sklearn.preprocessing?
The sklearn.preprocessing module in Scikit-learn provides tools to preprocess data, ensuring it is in the correct format for ML algorithms.
Key functions include:

StandardScaler: Standardizes features to have zero mean and unit variance.
MinMaxScaler: Scales features to a specified range (default: 0 to 1).
LabelEncoder: Encodes categorical variables.


10. What is a Test set?
A Test set is a portion of the dataset that is used to evaluate the trained model’s performance. It helps measure how well the model generalizes to unseen data.


11. How do we split data for model fitting (training and testing) in Python?
Data can be split using train_test_split from Scikit-learn:

python
Copy code
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Here:

X_train, y_train: Training data.
X_test, y_test: Testing data.
test_size: Fraction of data reserved for testing.


12. How do you approach a Machine Learning problem?
Understand the Problem: Define objectives and success criteria.
Data Collection: Gather relevant and sufficient data.
Data Cleaning: Handle missing values, duplicates, and outliers.
EDA (Exploratory Data Analysis): Understand data patterns, distributions, and relationships.
Feature Engineering: Create new features, encode categorical data, and scale variables.
Model Selection: Choose a suitable algorithm (e.g., regression, classification).
Training: Train the model using the training set.
Evaluation: Assess the model with metrics like accuracy or mean squared error.
Optimization: Tune hyperparameters to improve performance.


13. Why do we have to perform EDA before fitting a model to the data?
EDA is crucial for understanding the data and ensuring it is ready for modeling. It helps:

Identify patterns and relationships.
Detect and handle missing or inconsistent data.
Highlight anomalies or outliers.
Select important features.


14. How can you find correlation between variables in Python?
Using pandas:

python
Copy code
import pandas as pd

correlation = df.corr()  # Calculates pairwise correlation
print(correlation)
You can visualize correlations using a heatmap:

python
Copy code
import seaborn as sns
import matplotlib.pyplot as plt

sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.show()


15. What is causation? Explain the difference between correlation and causation with an example.
Causation occurs when one event directly causes another.

Correlation: Two variables move together but may not have a cause-and-effect relationship.
Example:
Correlation: Ice cream sales and drowning incidents increase in summer.
Causation: Hot weather causes more people to swim, leading to more drowning incidents.