1. What is a parameter?

A **parameter** is a **numerical value that describes a characteristic of an entire population**.

### In simple terms:

* It is **fixed** (does not change)
* It is usually **unknown**
* It is calculated from **population data**

### Examples:

* The **average height of all people in India**
* The **true mean income of all employees in a company**
* The **population proportion** of voters who support a candidate

### In statistics:

* Parameters are often denoted by **Greek letters**, such as:

  * **μ (mu)** → population mean
  * **σ (sigma)** → population standard deviation
  * **p** → population proportion

### Parameter vs Statistic:

* **Parameter** → describes a **population**
* **Statistic** → describes a **sample**


2. What is correlation? What does negative correlation mean?

### **What is correlation?**

**Correlation** is a statistical measure that shows **how two variables are related**—that is, how changes in one variable are associated with changes in another.

* It tells the **direction** and **strength** of a relationship.
* The correlation value ranges from **–1 to +1**.

### **What does negative correlation mean?**

A **negative correlation** means that **when one variable increases, the other decreases**, and vice versa.

### **Examples of negative correlation:**

* As **price increases**, **demand decreases**
* As **speed increases**, **time to reach a destination decreases**
* As **exercise time increases**, **body fat percentage decreases**

### **Types of correlation (quick view):**

* **Positive correlation (+)**: Both variables move in the same direction
* **Negative correlation (–)**: Variables move in opposite directions
* **Zero correlation (0)**: No relationship between variables


3. Define Machine Learning. What are the main components in Machine Learning?

### **Definition of Machine Learning**

**Machine Learning (ML)** is a branch of Artificial Intelligence (AI) that enables computers to **learn from data and improve their performance automatically without being explicitly programmed**.

In simple words, machines identify **patterns in data** and make **predictions or decisions** based on those patterns.

---

### **Main Components of Machine Learning**

1. **Data**

   * The most important component.
   * Includes training data and testing data.
   * Can be structured (tables) or unstructured (images, text).

2. **Features**

   * Individual measurable properties or variables in the data.
   * Example: age, salary, marks, pixels in an image.

3. **Model**

   * A mathematical representation that learns patterns from data.
   * Examples: Linear Regression, Decision Trees, Neural Networks.

4. **Algorithm**

   * The method used to train the model.
   * Examples: Gradient Descent, k-NN, Backpropagation.

5. **Training Process**

   * The process of feeding data to the algorithm so the model can learn.
   * Adjusts model parameters to minimize errors.

6. **Evaluation**

   * Measures how well the model performs.
   * Examples: Accuracy, Precision, Recall, RMSE.

7. **Prediction / Inference**

   * Using the trained model to make predictions on new, unseen data.

---


4. How does loss value help in determining whether the model is good or not?

**Loss value** measures **how far the model’s predictions are from the actual values**.

* **Low loss** → predictions are close to actual values → **good model**
* **High loss** → predictions are far from actual values → **poor model**
* If loss **decreases during training**, the model is learning well
* If loss **stays high or increases**, the model is not learning properly



5. What are continuous and categorical variables?

* **Continuous variables**: Numerical values that can take **any value within a range**.
  *Example:* height, weight, temperature

* **Categorical variables**: Variables that represent **distinct groups or categories**.
  *Example:* gender, blood group, color, yes/no


6. How do we handle categorical variables in Machine Learning? What are the common techniques?

**Categorical variables** are handled in Machine Learning by **converting them into numerical form**, since ML models work with numbers.

### **Common techniques (short):**

* **Label Encoding** – Assigns a unique number to each category
  *Example:* Low=0, Medium=1, High=2

* **One-Hot Encoding** – Creates binary columns for each category
  *Example:* Color → Red, Blue, Green (0/1 columns)

* **Ordinal Encoding** – Used when categories have a meaningful order
  *Example:* Poor < Average < Good

* **Target Encoding** – Replaces categories with the mean of the target variable


7. What do you mean by training and testing a dataset?

* **Training dataset**: The portion of data used to **teach the model** by learning patterns and relationships.

* **Testing dataset**: The portion of data used to **evaluate the model’s performance** on unseen data.


8. What is sklearn.preprocessing?

**`sklearn.preprocessing`** is a module in **Scikit-learn** used to **prepare and transform data** before applying Machine Learning models.

### **It is used for:**

* **Scaling** features (StandardScaler, MinMaxScaler)
* **Encoding categorical data** (LabelEncoder, OneHotEncoder)
* **Normalizing** data
* **Handling missing values** (SimpleImputer)


9. What is a Test set?

A **test set** is a portion of the dataset used to **evaluate the performance of a trained machine learning model**.

* It contains **unseen data**
* It checks how well the model **generalizes**
* It is **not used during training**


10. How do we split data for model fitting (training and testing) in Python?
How do you approach a Machine Learning problem?

### **1. How do we split data for model fitting in Python?**

We use **`train_test_split()`** from **`sklearn.model_selection`**.

```python
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
```

* **80% training**, **20% testing** (common split)
* `random_state` ensures reproducibility

---

### **2. How do you approach a Machine Learning problem? (Short)**

1. Understand the problem
2. Collect & clean data
3. Perform feature engineering
4. Split data (train/test)
5. Select & train model
6. Evaluate performance
7. Tune & deploy



11. Why do we have to perform EDA before fitting a model to the data?

We perform **Exploratory Data Analysis (EDA)** before fitting a model to:

* **Understand the data** (structure, distributions, relationships)
* **Identify missing values** and errors
* **Detect outliers** and anomalies
* **Check patterns and correlations**
* **Decide proper preprocessing** (scaling, encoding, transformations)
* **Choose the right model**


12. What is correlation?

**Correlation** is a statistical measure that shows **the strength and direction of the relationship between two variables**.

* Values range from **–1 to +1**
* **+1** → perfect positive relationship
* **–1** → perfect negative relationship
* **0** → no relationship


13. What does negative correlation mean?

**Negative correlation** means that **two variables move in opposite directions**.

* When one variable **increases**, the other **decreases**
* When one variable **decreases**, the other **increases**

**Example:**
As **price increases**, **demand decreases**


14. How can you find correlation between variables in Python?

You can find correlation between variables in Python mainly using **Pandas** (and optionally NumPy).

### Using Pandas

```python
df.corr()
```

* Computes the **correlation matrix** between numerical variables
* Default method: **Pearson**

### Specific methods

```python
df.corr(method='pearson')   # linear relationship
df.corr(method='spearman')  # rank-based
df.corr(method='kendall')   # ordinal data
```

### Between two columns

```python
df['col1'].corr(df['col2'])
```


15. What is causation? Explain difference between correlation and causation with an example.

### **What is causation?**

**Causation** means that **one variable directly causes a change in another variable**.

---

### **Difference between correlation and causation**

| Correlation                            | Causation                               |
| -------------------------------------- | --------------------------------------- |
| Shows a relationship between variables | Shows a cause-and-effect relationship   |
| Variables move together                | One variable directly affects the other |
| Does **not** imply cause               | Implies direct cause                    |

---

### **Example**

* **Correlation:** Ice cream sales and drowning incidents both increase in summer.
* **Causation:** Hot weather causes more people to swim → increases drowning incidents.



16. What is an Optimizer? What are different types of optimizers? Explain each with an example.

### **What is an Optimizer?**

In Machine Learning and Deep Learning, an **optimizer** is an algorithm that **adjusts the model’s parameters (weights) to minimize the loss function** during training.

* Goal: **Find the best weights** that make predictions accurate.
* Works by updating weights based on **gradients** (from backpropagation in neural networks).

---

### **Types of Optimizers**

1. **Gradient Descent (GD)**

   * Updates weights using the **average gradient of the whole dataset**.
   * **Pros:** Simple and stable.
   * **Cons:** Slow for large datasets.
   * **Example:** Linear regression using GD.

2. **Stochastic Gradient Descent (SGD)**

   * Updates weights using **one training example at a time**.
   * **Pros:** Faster, can escape local minima.
   * **Cons:** Noisy updates, can oscillate.
   * **Example:** Neural network training on images.

3. **Mini-batch Gradient Descent**

   * Updates weights using a **small batch of data** instead of the full dataset.
   * **Pros:** Balances stability and speed.
   * **Example:** Training CNNs with batch size 32.

4. **Momentum**

   * Accelerates SGD by **adding a fraction of the previous update** to current update.
   * **Pros:** Faster convergence, avoids oscillation.
   * **Example:** Image classification models.

5. **Adam (Adaptive Moment Estimation)**

   * Combines **Momentum + RMSProp**; adapts learning rate for each parameter.
   * **Pros:** Fast, widely used, works well in practice.
   * **Example:** Most modern deep learning models like Transformers.

6. **RMSProp**

   * Adjusts learning rate based on **recent squared gradients**.
   * **Pros:** Handles non-stationary objectives, good for RNNs.
   * **Example:** LSTM for sequence prediction.



17. What is sklearn.linear_model ?

**`sklearn.linear_model`** is a module in **Scikit-learn** that provides **linear models** for regression and classification.

### **Purpose:**

To model the relationship between **independent variables (features)** and a **dependent variable (target)** using a **linear equation**.

---

### **Common models in `sklearn.linear_model`**

1. **LinearRegression** – Predicts a continuous target.

   ```python
   from sklearn.linear_model import LinearRegression
   model = LinearRegression()
   model.fit(X_train, y_train)
   ```

2. **LogisticRegression** – For binary/multi-class classification.

   ```python
   from sklearn.linear_model import LogisticRegression
   model = LogisticRegression()
   model.fit(X_train, y_train)
   ```

3. **Ridge & Lasso Regression** – Linear regression with **regularization** to prevent overfitting.

   ```python
   from sklearn.linear_model import Ridge, Lasso
   ```

4. **ElasticNet** – Combination of L1 (Lasso) and L2 (Ridge) regularization.

---


18. What does model.fit() do? What arguments must be given?

**`model.fit()`** is used to **train a machine learning model** on the given data.

* It **learns patterns** from the training data and adjusts model parameters.

### **Arguments:**

```python
model.fit(X_train, y_train)
```

* **X_train** → features (input data)
* **y_train** → target (output/labels)


19. What does model.predict() do? What arguments must be given?

**`model.predict()`** is used to **make predictions using a trained machine learning model**.

* It uses the **learned patterns** from training to predict outputs for new data.

### **Arguments:**

```python
y_pred = model.predict(X_test)
```

* **X_test** → features of the data you want predictions for


20.What are continuous and categorical variables?

### **Continuous and Categorical Variables**

1. **Continuous Variables:**

* Can take **any numerical value** within a range.
* Usually measurable quantities.
* Examples: height, weight, temperature, salary

2. **Categorical Variables:**

* Represent **distinct groups or categories**.
* Can be **nominal** (no order) or **ordinal** (ordered).
* Examples: gender, blood group, color, education level

---


21. What is feature scaling? How does it help in Machine Learning?

### **What is Feature Scaling?**

**Feature scaling** is the process of **rescaling numerical features** to a **similar range** so that no feature dominates others.

---

### **Why it helps in Machine Learning:**

1. **Faster convergence**

   * Many algorithms (like Gradient Descent) converge faster when features are on a similar scale.

2. **Improved accuracy**

   * Models like **KNN, SVM, and Logistic Regression** are sensitive to feature scales.

3. **Prevents bias**

   * Ensures features with larger ranges don’t **dominate** the learning process.

---

### **Common techniques:**

* **Min-Max Scaling:** scales values to [0, 1]
* **Standardization (Z-score):** scales values to have mean = 0, std = 1



22. How do we perform scaling in Python?

We perform feature scaling in Python using **`sklearn.preprocessing`**.

### **1. Standardization (Z-score)**

```python
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)  # fit to data and transform
```

* Scales data to **mean = 0** and **std = 1**

---

### **2. Min-Max Scaling**

```python
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)  # scales to [0, 1]
```

---

### **3. MaxAbs Scaling** (for data with negative values)

```python
from sklearn.preprocessing import MaxAbsScaler

scaler = MaxAbsScaler()
X_scaled = scaler.fit_transform(X)
```


23. **`sklearn.preprocessing`** is a module in **Scikit-learn** used to **prepare and transform data** before applying Machine Learning models.

---

### **What it does:**

1. **Scaling features**

   * `StandardScaler`, `MinMaxScaler`, `MaxAbsScaler`
2. **Encoding categorical data**

   * `LabelEncoder`, `OneHotEncoder`
3. **Normalizing data**

   * `Normalizer`
4. **Handling missing values**

   * `SimpleImputer`

---


24. How do we split data for model fitting (training and testing) in Python?

In Python, we split data for training and testing using **`train_test_split`** from **`sklearn.model_selection`**.

### **Example:**

```python
from sklearn.model_selection import train_test_split

# X = features, y = target
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
```

### **Explanation:**

* **`X_train` / `y_train`** → data for **training the model**
* **`X_test` / `y_test`** → data for **testing/evaluating the model**
* **`test_size=0.2`** → 20% of data used for testing
* **`random_state`** → ensures reproducibility



25. Explain data encoding?

### **What is Data Encoding?**

**Data encoding** is the process of **converting categorical (non-numeric) data into numerical format** so that machine learning models can process it.

Most ML algorithms **cannot work with text labels** directly, so encoding is necessary.

---

### **Common Techniques for Data Encoding**

1. **Label Encoding**

   * Assigns a **unique integer** to each category.
   * Example: `Red → 0, Blue → 1, Green → 2`
   * Use: `sklearn.preprocessing.LabelEncoder`

2. **One-Hot Encoding**

   * Creates **binary columns** for each category.
   * Example: Color → Red, Blue, Green

     | Red | Blue | Green |
     | --- | ---- | ----- |
     | 1   | 0    | 0     |
     | 0   | 1    | 0     |
   * Use: `sklearn.preprocessing.OneHotEncoder` or `pd.get_dummies()`

3. **Ordinal Encoding**

   * Used for **ordered categories**.
   * Example: `Low → 0, Medium → 1, High → 2`

4. **Target Encoding**

   * Replaces categories with the **mean of the target variable** for each category.
   * Useful for categorical features in predictive models.

---
