# Assignment Questions

### 1. What is a parameter?

A **parameter** is a value that defines a specific characteristic in a model or function. In ML, it's often something the model learns from the data — like weights in linear regression.

---

### 2. What is correlation? What does negative correlation mean?

**Correlation** shows the relationship between two variables.

* **Positive correlation**: both increase together.
* **Negative correlation**: one increases while the other decreases.

Example: As study time goes up, exam mistakes may go down (negative correlation).

---

### 3. Define Machine Learning. What are the main components in Machine Learning?

**Machine Learning** is the process where computers learn patterns from data without being manually programmed.

**Main components**:

* Data
* Features & Labels
* Algorithms
* Models
* Loss function
* Evaluation metrics

---

### 4. How does loss value help in determining whether the model is good or not?

The **loss value** measures how far off the predictions are from actual results.

* **Lower loss** = better model accuracy
* It guides the optimization process during training.

---

### 5. What are continuous and categorical variables?

* **Continuous variables**: numeric and can take any value (e.g., temperature, price).
* **Categorical variables**: represent categories or groups (e.g., gender, city).

---

### 6. How do we handle categorical variables in Machine Learning? What are the common techniques?

We convert them into numeric format using:

* **Label Encoding**
* **One-Hot Encoding**
* **Ordinal Encoding**

These help the model understand non-numeric categories.

---

### 7. What do you mean by training and testing a dataset?

* **Training set**: used to teach the model patterns from data.
* **Testing set**: used to check how well the model performs on unseen data.

---

### 8. What is sklearn.preprocessing?

It’s a Scikit-learn module that provides preprocessing tools like:

* Scaling
* Encoding
* Normalizing
* Imputation for missing values

---

### 9. What is a Test set?

The **test set** is a portion of your dataset used only after training — to evaluate how well the model performs on **unseen** data.

---

### 10. How do we split data for model fitting (training and testing) in Python? How do you approach a Machine Learning problem?

Using `train_test_split()` from `sklearn.model_selection`:

```python
from sklearn.model_selection import train_test_split  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
```

Steps:

1. Understand the problem
2. Explore the data (EDA)
3. Preprocess data
4. Choose a model
5. Train the model
6. Evaluate and tune
7. Deploy if needed

---

### 11. Why do we have to perform EDA before fitting a model to the data?

EDA (Exploratory Data Analysis) helps us:

* Understand structure & distribution
* Spot missing values, outliers, or patterns
* Choose the right preprocessing and model

---

### 12. What is correlation?

**Correlation** shows the relationship between two variables.

---

### 13. What does negative correlation mean?

* **Negative correlation**: one increases while the other decreases.

Example: As study time goes up, exam mistakes may go down (negative correlation).

---

### 14. How can you find correlation between variables in Python?

Use `.corr()` in Pandas:

```python
df.corr()
```

It returns a correlation matrix between numerical columns.

---

### 15. What is causation? Explain difference between correlation and causation with an example.

**Causation** means one variable **directly affects** the other.
**Correlation** just means they change together — not necessarily cause each other.

Example:

* Correlation: Ice cream sales ↑ and drowning ↑ (due to summer)
* Causation: More studying → better exam results

---

### 16. What is an Optimizer? What are different types of optimizers? Explain each with an example.

An **optimizer** adjusts model parameters to minimize the loss.

Common types:

* **SGD** (Stochastic Gradient Descent): Basic and fast
* **Adam**: Adaptive learning rates; very popular
* **RMSProp**: Works well with recurrent data

Example with Keras:

```python
optimizer = tf.keras.optimizers.Adam()
```

---

### 17. What is sklearn.linear\_model?

It’s a Scikit-learn module with tools for linear models:

* LinearRegression
* Ridge
* Lasso
  Used for regression and classification tasks.

---

### 18. What does model.fit() do? What arguments must be given?

`.fit()` is used to train the model with training data.

Arguments:

```python
model.fit(X_train, y_train)
```

---

### 19. What does model.predict() do? What arguments must be given?

`.predict()` is used to make predictions using the trained model.

Arguments:

```python
model.predict(X_test)
```

---

### 20. What are continuous and categorical variables?


* **Continuous variables**: numeric and can take any value (e.g., temperature, price).
* **Categorical variables**: represent categories or groups (e.g., gender, city).

---

### 21. What is feature scaling? How does it help in Machine Learning?

**Feature scaling** adjusts numerical values to a common scale.
Helps models perform better — especially those based on distances (like KNN, SVM).

---

### 22. How do we perform scaling in Python?

Using Scikit-learn:

```python
from sklearn.preprocessing import StandardScaler  
scaler = StandardScaler()  
X_scaled = scaler.fit_transform(X)
```

---

### 23. What is sklearn.preprocessing?

It’s a Scikit-learn module that provides preprocessing tools like:

* Scaling
* Encoding
* Normalizing
* Imputation for missing values
---

### 24. How do we split data for model fitting (training and testing) in Python?

Using `train_test_split()` from `sklearn.model_selection`:

```python
from sklearn.model_selection import train_test_split  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
```

Steps:

1. Understand the problem
2. Explore the data (EDA)
3. Preprocess data
4. Choose a model
5. Train the model
6. Evaluate and tune
7. Deploy if needed

---

### 25. Explain data encoding.

**Data encoding** converts categorical text data into numbers so that ML models can understand them.

Common methods:

* **Label Encoding**
* **One-Hot Encoding**
* **Ordinal Encoding**

These are available in `sklearn.preprocessing`.
