# 📊 EDA & ML Interview Q&A with Examples
This notebook contains common EDA and Machine Learning interview questions with answers and illustrative tables.

## 🎤 EDA Interview Questions

### Q1. What is EDA and why is it important?
EDA helps understand the dataset, detect anomalies, and prepare data for ML by summarizing and visualizing features.

**Example Table:**

In [None]:
import pandas as pd

data = {'ID': [1, 2, 3],
        'Age': [25, 30, None],
        'Gender': ['Male', 'Female', 'Male'],
        'Income': [50000, 60000, None],
        'Purchased': ['Yes', 'No', 'Yes']}

pd.DataFrame(data)

### Q2. How do you handle missing values in a dataset?
- For numerical: mean, median
- For categorical: mode

**Example:**

In [None]:
df = pd.DataFrame({'Age': [25, 30, None], 'Income': [50000, 60000, None]})
df.fillna(df.mean(numeric_only=True))

### Q3. Difference between univariate and bivariate analysis

**Univariate Example:**

In [None]:
pd.Series([25, 30, 30, 25, 35]).value_counts().reset_index().rename(columns={'index': 'Age', 0: 'Count'})

**Bivariate Example:**

In [None]:
pd.DataFrame({'Age': [25, 30, 35], 'Income': [50000, 60000, 70000]})

### Q4. Outlier detection using IQR

In [None]:
ages = pd.Series([25, 30, 80])
Q1 = ages.quantile(0.25)
Q3 = ages.quantile(0.75)
IQR = Q3 - Q1
outliers = ages[(ages < Q1 - 1.5 * IQR) | (ages > Q3 + 1.5 * IQR)]
outliers

### Q5. Variable Transformation Example
**MinMax Scaling & Label Encoding**

In [None]:
from sklearn.preprocessing import MinMaxScaler, LabelEncoder

scaler = MinMaxScaler()
df_age = pd.DataFrame({'Age': [20, 40, 60]})
df_age['Age_scaled'] = scaler.fit_transform(df_age[['Age']])

encoder = LabelEncoder()
df_gender = pd.DataFrame({'Gender': ['Male', 'Female', 'Female']})
df_gender['Encoded'] = encoder.fit_transform(df_gender['Gender'])

pd.concat([df_age, df_gender], axis=1)

## 🤖 ML Interview Questions

### Q6. Categories of ML Problems

In [None]:
pd.DataFrame({
    'Category': ['Regression', 'Classification', 'Clustering'],
    'Target Type': ['Continuous', 'Categorical', 'No target'],
    'Use Case': ['Predict house price', 'Spam detection', 'Customer segmentation']
})

### Q7. Regression vs Classification based on target

In [None]:
pd.DataFrame({
    'Target (Y)': ['Price ($400)', 'Purchased (Yes/No)'],
    'Task Type': ['Regression', 'Classification'],
    'Example': ['House Pricing', 'Buy decision']
})

### Q8. How to prevent Overfitting?

In [None]:
pd.DataFrame({
    'Technique': ['Cross-validation', 'Regularization', 'Pruning'],
    'Purpose': [
        'Validate on new data',
        'Penalize complexity',
        'Reduce complex tree branches'
    ]
})

### Q9. Regularization Techniques

In [None]:
pd.DataFrame({
    'Technique': ['Lasso (L1)', 'Ridge (L2)', 'Elastic Net'],
    'Description': [
        'Shrinks some coefficients to 0',
        'Penalizes large coefficients',
        'Combination of L1 and L2']
})

### Q10. KNN Regression vs Classification

In [None]:
pd.DataFrame({
    'Feature': ['Target Type', 'Output', 'Use Case'],
    'KNN Classification': ['Categorical', 'Majority class', 'Spam detection'],
    'KNN Regression': ['Numerical', 'Average value', 'House price prediction']
})