1. What is a parameter?


A parameter in machine learning is an internal variable of a model that is learned from the data during training. Parameters determine how input data is transformed into the desired output. Examples include the weights in a linear regression or neural network. They are not set manually, but are adjusted automatically by the learning algorithm to minimize prediction errors

2. What is correlation? What does negative correlation mean?



Correlation is a statistical measure that describes the degree to which two variables move in relation to each other. It quantifies the strength and direction of a linear relationship between two variables. Correlation coefficients range from -1 (perfect negative correlation) to +1 (perfect positive correlation), with 0 indicating no linear relationship

A negative correlation means that as one variable increases, the other decreases, and vice versa. The variables move in opposite directions. In statistics, a perfect negative correlation is represented by a coefficient of -1.0. For example, as the price of a product increases, the quantity demanded typically decreases, showing a negative correlation

3. Define Machine Learning. What are the main components in Machine Learning?


Machine Learning (ML) is a branch of artificial intelligence (AI) that enables computers to learn from data, identify patterns, and make decisions with minimal human intervention. ML systems improve their performance automatically through experience

Main components in ML:

Data: The raw information used to train models.

Algorithms: The mathematical procedures or rules that process data and learn patterns.

Models: The output of the learning process; models make predictions or decisions based on input data.

Predictions: The results or outputs generated by the model when given new data

4. How does loss value help in determining whether the model is good or not?


The loss value quantifies the difference between the model's predictions and the actual target values. A lower loss indicates that the model's predictions are closer to the true values, signifying better performance. During training, the model updates its parameters to minimize the loss, thus improving accuracy. Monitoring the loss helps determine if the model is learning effectively or needs adjustments

5. What are continuous and categorical variables?


Continuous variables: Variables that can take any value within a range, including fractional values. Examples: height, weight, temperature.

Categorical variables: Variables that represent qualitative groups or categories. They have a fixed number of possible values (categories), such as gender, color, or type of animal.

6.How do we handle categorical variables in Machine Learning? What are the common techniques?

Categorical variables must be converted to numerical representations for most ML algorithms. Common techniques include:

One-hot encoding: Creates binary columns for each category.

Label encoding: Assigns a unique integer to each category.

Target encoding: Replaces categories with the mean of the target variable for each category.

Binary encoding, dummy encoding, effect encoding: Other advanced techniques for high-cardinality or specific modeling needs

7. What do you mean by training and testing a dataset?


Training a dataset: Using a subset of data (training set) to teach the model to recognize patterns and relationships.

Testing a dataset: Using a separate, unseen subset (test set) to evaluate how well the trained model generalizes to new data. This helps check for overfitting and ensures the model performs well on real-world data

8. What is sklearn.preprocessing?

sklearn.preprocessing is a module in the scikit-learn library that provides tools for preprocessing data, such as scaling, normalizing, encoding categorical variables, and transforming features. These transformations help prepare raw data for machine learning algorithms

9. What is a Test set?

A test set is a portion of the dataset that is kept separate from the training data. It is used exclusively to evaluate the final model's performance and generalization ability. The model has never seen this data during training, ensuring an unbiased assessment.

10.How do we split data for model fitting (training and testing) in Python?
 How do you approach a Machine Learning problem?

To split data for model fitting in Python, the most common approach is to use the train_test_split function from scikit-learn’s model_selection module. This function divides your dataset into two parts: a training set (used to train the model) and a test set (used to evaluate the model’s performance on unseen data).

Typical usage:

python
from sklearn.model_selection import train_test_split

 X: features, y: target variable
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
test_size=0.2 means 20% of the data is reserved for testing, 80% for training.

random_state ensures reproducibility.

You can also specify train_size instead of test_size, or both.

The function shuffles the data by default before splitting.

A structured approach to a machine learning problem typically involves the following steps:

Define the Problem

Clearly state what you are trying to predict or classify. Understand the business or research objective.

Collect and Explore Data

Gather relevant and high-quality data. Perform exploratory data analysis (EDA) to understand data distribution, spot anomalies, and identify patterns.

Prepare the Data

Clean the data (handle missing values, outliers, duplicates), and preprocess features (scaling, encoding categorical variables, feature engineering).

Split the Data

Divide the data into training and test sets (and sometimes a validation set) to ensure unbiased model evaluation.

Choose and Train a Model

Select an appropriate algorithm based on the problem type (classification, regression, etc.) and train it on the training data.

Evaluate the Model

Assess model performance using suitable metrics (accuracy, precision, recall, RMSE, etc.) on the test set.

Tune and Optimize

Adjust model hyperparameters, try different algorithms, or engineer new features to improve performance.

Deploy and Monitor

Once satisfied, deploy the model for real-world use and monitor its performance over time.

11. Why do we have to perform EDA before fitting a model to the data?


EDA helps us figure out what kind of data we have, how much missing data is there, and if there are any weird things going on... By doing EDA, we clean up our data and make sure it is ready for the next step. This way, we can be confident our results are accurate and based on good information. Basically, EDA helps us avoid building our analysis on a shaky foundation.

12. What is correlation?


Correlation is a statistical measure that describes the degree to which two variables move in relation to each other. It quantifies the strength and direction of a linear relationship between two variables. Correlation coefficients range from -1 (perfect negative correlation) to +1 (perfect positive correlation), with 0 indicating no linear relationship

13. What does negative correlation mean?


A negative correlation means that as one variable increases, the other decreases, and vice versa. The variables move in opposite directions. In statistics, a perfect negative correlation is represented by a coefficient of -1.0. For example, as the price of a product increases, the quantity demanded typically decreases, showing a negative correlation

14. How can you find correlation between variables in Python?


You can find the correlation between variables in Python using several libraries, most commonly pandas, NumPy, and SciPy. Here are the main approaches:

1. Using pandas
If your data is in a pandas DataFrame, you can use the .corr() method to compute the correlation matrix for all numeric columns, or the correlation between two specific columns:

python
import pandas as pd

For the whole DataFrame
df.corr()

For two specific columns
df['col1'].corr(df['col2'])
By default, .corr() computes the Pearson correlation. You can specify other methods like 'spearman' or 'kendall':

python
df.corr(method='spearman')
df['col1'].corr(df['col2'], method='kendall')

15. What is causation? Explain difference between correlation and causation with an example.


### What is Causation?

**Causation** means that a change in one variable directly causes a change in another variable. In other words, there is a cause-and-effect relationship: when variable A changes, it produces a change in variable B Establishing causation requires evidence that the effect is a result of the cause, not just that the two variables move together.

---

### Difference Between Correlation and Causation

| Correlation                                                | Causation                                                         |
|------------------------------------------------------------|-------------------------------------------------------------------|
| Indicates a statistical association between variables | Indicates that one variable directly affects another |
| Variables change together, but not necessarily due to cause | One variable’s change produces a change in the other              |
| Does not prove cause-and-effect                             | Proves cause-and-effect                                           |
| Can be due to coincidence or a third/confounding variable   | Requires evidence, often from experiments, to rule out other causes|
| Example: Ice cream sales and sunburns rise together in summer, but one does not cause the other-both are influenced by hot weather | Example: Increasing the temperature of water causes it to boil    |

---

### **Example to Illustrate the Difference**

- **Correlation Example:**  
  There is a strong correlation between ice cream sales and the number of people who get sunburned. Both increase during the summer months. However, eating ice cream does not cause sunburns. Instead, a third variable-hot weather-causes both to increase. This is correlation, not causation

- **Causation Example:**  
  If research shows that taking a specific medication (variable A) directly leads to a reduction in blood pressure (variable B), and this has been demonstrated through controlled experiments that rule out other factors, then we have established causation

---

### **Key Takeaways**

- **Correlation** is about variables moving together, but not necessarily because one causes the other.
- **Causation** is when one variable’s change produces a direct effect on another.
- Correlation does **not** imply causation-mistaking the two can lead to false conclusions and poor decision-making.

---

**In summary:**  
Causation is a cause-and-effect relationship between two variables, while correlation simply means they move together. Establishing causation requires more rigorous evidence, often through experiments, to rule out other explanations for the observed relationship

16. What is an Optimizer? What are different types of optimizers? Explain each with an example.


An optimizer is an algorithm used in machine learning and deep learning to adjust the parameters (weights and biases) of a model during training in order to minimize the loss function. The optimizer determines how the model learns from data by updating parameters based on the gradients calculated from the loss, thereby improving the model’s predictions.

"Imagine you’re on a mountain and your goal is to get to the bottom where there’s a beautiful valley (the optimal solution). The optimizer takes steps in the direction where the ground declines the most, adjusting parameters to minimize the loss function."

Types of Optimizers and Their Explanations
Below are some of the most widely used optimizers in machine learning and deep learning, with brief explanations and examples for each:

1. Gradient Descent (GD)
How it works: Updates parameters by computing the gradient of the loss function using the entire dataset and moves in the direction of the steepest descent (negative gradient).

Pros: Simple and effective for small datasets.

Cons: Slow for large datasets, can get stuck in local minima, sensitive to learning rate.

Example: Used in linear regression to find the best-fitting line by minimizing mean squared error.

2. Stochastic Gradient Descent (SGD)
How it works: Similar to GD, but updates parameters using a single data point or a small batch (mini-batch) at each iteration, introducing randomness.

Pros: Faster and more scalable for large datasets, can escape local minima.

Cons: Updates are noisy, may require more epochs and careful tuning of learning rate.

Example: Commonly used in training neural networks for image classification.

3. Momentum
How it works: Builds on SGD by adding a fraction of the previous update to the current update, helping to accelerate convergence and dampen oscillations.

Pros: Faster convergence, helps overcome local minima.

Cons: Sensitive to hyperparameters (momentum factor).

Example: Used in deep neural networks to improve training speed and stability.

4. Nesterov Accelerated Gradient (NAG)
How it works: An improvement on momentum; computes the gradient not just at the current position but anticipates the future position, leading to more accurate updates.

Pros: Faster and more precise convergence than vanilla momentum.

Cons: More computationally expensive.

Example: Used in deep learning tasks requiring faster convergence.

5. Adagrad
How it works: Adapts the learning rate for each parameter based on the sum of past squared gradients, making larger updates for infrequent parameters.

Pros: Good for sparse data (e.g., NLP).

Cons: Learning rate can become excessively small over time.

Example: Used in natural language processing tasks.

6. RMSProp
How it works: Modifies Adagrad by using a moving average of squared gradients to normalize the learning rate, preventing it from shrinking too much.

Pros: Works well for non-stationary objectives, effective for RNNs and deep networks.

Cons: Requires tuning of decay rate hyperparameter.

Example: Used in training recurrent neural networks.

7. Adam (Adaptive Moment Estimation)
How it works: Combines the benefits of momentum and RMSProp by maintaining moving averages of both the gradients and their squares, with bias correction.

Pros: Fast convergence, less sensitive to hyperparameters, widely used and robust.

Cons: Can sometimes generalize worse than SGD in some scenarios.

Example: The default optimizer for many deep learning frameworks; used in image recognition, NLP, and more

17. What is sklearn.linear_model ?

sklearn.linear_model is a module in the scikit-learn library that provides a variety of linear models for regression and classification tasks. These models assume that the target variable is a linear combination of the input features, making them both simple and interpretable

18. What does model.fit() do? What arguments must be given?

model.fit() trains (fits) the machine learning model to your data.

Required arguments:

X: Feature matrix (input variables, usually 2D array or DataFrame)

y: Target variable (output values, usually 1D array or Series)

Example:

python
model.fit(X_train, y_train)
This finds the best parameters for the model using the training data.

19. What does model.predict() do? What arguments must be given?


model.predict() uses the trained model to make predictions on new data.

Required argument:

X: Feature matrix of new/unseen data (same structure as training features)

Example:

python
predictions = model.predict(X_test)
This returns predicted values for each row in X_test.

20. What are continuous and categorical variables?


Continuous variables are quantitative variables that can take any value within a given range, including fractional or decimal values. They are measurable and can assume an infinite number of possible values within that range. Examples include height, weight, temperature, and delivery time-any variable that can be measured on a scale and can have any value, not just whole numbers.

Categorical variables (also called qualitative variables) represent distinct groups or categories. They contain a finite number of possible values, usually describing qualities or characteristics rather than measurements. Categorical variables can be:

Nominal: Categories with no logical order (e.g., hair color, city, pizza topping).

Ordinal: Categories with a logical order (e.g., education level, pizza size: small, medium, large)

21. What is feature scaling? How does it help in Machine Learning?


Feature scaling is the process of transforming numerical features to a common scale or range, such as 0–1 (normalization) or mean 0 and standard deviation 1 (standardization).

Why it helps:

Prevents features with large ranges from dominating the model.

Improves convergence speed and accuracy for many algorithms, especially those using distance calculations (e.g., k-NN, SVM, K-Means) or gradient descent.

Ensures all features contribute equally and avoids bias.

Makes model results more interpretable

22. How do we perform scaling in Python?


Use scikit-learn’s sklearn.preprocessing module.

Example (Standardization):

python
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
Example (Normalization):

python
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)
Always fit the scaler on training data and transform both training and test data using the fitted scaler.

23. What is sklearn.preprocessing?


sklearn.preprocessing is a module in scikit-learn that provides tools for preprocessing data, such as scaling (standardization, normalization), encoding categorical variables, and transforming features to prepare them for machine learning algorithms

24. How do we split data for model fitting (training and testing) in Python?


To split data for model fitting in Python, the most common approach is to use the train_test_split function from scikit-learn’s model_selection module. This function divides your dataset into two parts: a training set (used to train the model) and a test set (used to evaluate the model’s performance on unseen data).

Typical usage:

python
from sklearn.model_selection import train_test_split

 X: features, y: target variable
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
test_size=0.2 means 20% of the data is reserved for testing, 80% for training.

random_state ensures reproducibility.

You can also specify train_size instead of test_size, or both.

The function shuffles the data by default before splitting.

25. Explain data encoding?


Data encoding is the process of converting categorical variables into a numerical format so that machine learning algorithms can use them.

Common techniques:

Label Encoding: Assigns each category a unique integer.

One-Hot Encoding: Creates binary columns for each category.

Ordinal Encoding: Assigns ordered integers to ordered categories.

Encoding is often performed using sklearn.preprocessing tools such as LabelEncoder and OneHotEncoder.