# 📘 EDA & ML Interview Notes with Examples
This notebook contains important EDA and ML concepts in markdown format for easy review.

## 🎤 EDA Interview Questions & Answers

### Q1. What is EDA and why is it important?
EDA helps understand the dataset, detect anomalies, and prepare data for ML by summarizing and visualizing features.

**Example Table:**

| ID | Age | Gender | Income | Purchased |
|----|-----|--------|--------|-----------|
| 1  | 25  | Male   | 50000  | Yes       |
| 2  | 30  | Female | 60000  | No        |
| 3  | NaN | Male   | NaN    | Yes       |

### Q2. How do you handle missing values in a dataset?
- For numerical: mean, median
- For categorical: mode
- Can also delete rows or use advanced imputation

**Example:**

| ID | Age  | Income | Strategy         |
|----|------|--------|------------------|
| 1  | 25   | 50000  | Original         |
| 2  | 30   | 60000  | Original         |
| 3  | 27.5 | 55000  | Filled with mean |

### Q3. Difference between univariate and bivariate analysis

Answer:

**Univariate** → analyze 1 variable (e.g., Age)

**Bivariate** → analyze 2 variables together (e.g., Age vs Income)


**Univariate Example:**

| Age | Count |
|-----|-------|
| 25  | 2     |
| 30  | 1     |
| 35  | 1     |

**Bivariate Example:**

| Age | Income |
|-----|--------|
| 25  | 50000  |
| 30  | 60000  |
| 35  | 70000  |

### Q4. How to detect outliers?
Answer:
Use boxplot, IQR, or Z-score methods.

**Example:** Using IQR:

| Age | Is Outlier? |
|-----|-------------|
| 25  | No          |
| 30  | No          |
| 80  | **Yes**     |

(80 is beyond Q3 + 1.5×IQR → Outlier)

### Q5. Variable Transformation
Answer:
Used to normalize, scale, or encode data before modeling.


**Scaling and Encoding Examples**

**MinMax Scaling:**

| Original Age | Scaled Age |
|--------------|-------------|
| 20           | 0.0         |
| 40           | 0.5         |
| 60           | 1.0         |

**Label Encoding:**

| Gender | Encoded |
|--------|---------|
| Male   | 0       |
| Female | 1       |

## 🤖 Machine Learning Interview Questions & Answers

### Q6. Categories of ML Problems

| Category      | Target Type  | Example Use Case         |
|---------------|--------------|--------------------------|
| Regression    | Continuous   | Predict house price      |
| Classification| Categorical  | Spam email detection     |
| Clustering    | No target    | Customer segmentation    |

### Q7. Regression vs Classification
Answer:
Depends on target (Y):


| Target (Y)         | Task Type     | Example       |
|--------------------|---------------|---------------|
| Price (e.g., $400) | Regression    | House pricing |
| Purchased (Yes/No) | Classification| Buy decision  |

### Q8. Overfitting and How to Prevent It
Answer:
Model performs well on training but poorly on test data.


| Technique        | Purpose                          |
|------------------|----------------------------------|
| Cross-validation | Checks performance on new data   |
| Regularization   | Penalizes model complexity       |
| Pruning (trees)  | Reduces over-complex branches    |

### Q9. Regularization Techniques

| Technique     | Description                        |
|---------------|------------------------------------|
| Lasso (L1)    | Shrinks some coefficients to 0     |
| Ridge (L2)    | Penalizes large coefficients       |
| Elastic Net   | Mix of L1 and L2                   |

### Q10. KNN Regression vs Classification

| Feature               | KNN Classification         | KNN Regression             |
|------------------------|----------------------------|----------------------------|
| Target Type           | Categorical                | Numerical                  |
| Output                | Majority class of neighbors| Average of neighbor values |
| Use Case              | Spam detection             | House price estimation     |