# Interview Questions

**28-04-2025**

# Python 

**1. What is the purpose of the __init__ method in Python classes? How is it 
different from other methods?**

### 🔹 `__init__` method in Python:

* It’s a **constructor** 💥
* Runs **automatically** when you create an object
* Used to **initialize** values 🛠️


### 🔸 Difference from other methods:

* Other methods are called **manually**
* `__init__` runs **first**, right after object creation 🔄


### 🧪 Example:

```python
class Car:
    def __init__(self, brand):
        self.brand = brand

c = Car("BMW")  # __init__ runs here
```


2. Explain the usage of *args and **kwargs in Python function definitions. Provide an example.**

### 🔹 `*args`

* Takes **multiple positional values** 📦
* Returns as a **tuple**

---

### 🔸 `**kwargs`

* Takes **multiple keyword (key=value) pairs** 🗂️
* Returns as a **dictionary**

**3. How does Python handle exceptions? What are the differences between 
try/except and finally blocks?**

### 🛠️ How Python handles exceptions:

👉 Python uses **try-except** to **catch and handle errors** without crashing 🚫💥

---

### 🔹 `try/except`

* **try**: write risky code here
* **except**: what to do if error happens

```python
try:
    x = 5 / 0
except ZeroDivisionError:
    print("Can't divide by zero 😅")
```

---

### 🔸 `finally`

* Runs **no matter what** (error or not)

```python
finally:
    print("Done ✅")
```


**4. How can you sort a dictionary in Python based on its values?**

In [1]:
my_dict = {'a': 3, 'b': 1, 'c': 2}

my_dict

{'a': 3, 'b': 1, 'c': 2}

In [4]:
my_dict.items()

dict_items([('a', 3), ('b', 1), ('c', 2)])

In [5]:
dict(sorted(my_dict.items(), key = lambda item: item[1]))

{'b': 1, 'c': 2, 'a': 3}

**5. Explain negative indexing in lists and provide an example**

### 🔹 What is **negative indexing**?

👉 It means counting **from the end** of the list 🔚
👉 `-1` = last item, `-2` = second last, etc.



### 🧪 Example:

```python
my_list = [10, 20, 30, 40]

print(my_list[-1])  # 👉 40  
print(my_list[-2])  # 👉 30
```




**6. Define a palindrome. Write a Python program to check if a string is a 
palindrome.**

```python
def is_palindrome(s):
    return s == s[::-1]

# EDA

**1. What insights can you gather from a histogram in EDA?**

### 📊 Histogram Insights in EDA:

1. **Data Distribution** – Shows how values are spread 📈
2. **Skewness** – Tells if data is left or right skewed ↩️➡️
3. **Outliers** – Unusual spikes or gaps 🚨
4. **Central Tendency** – Where most values lie 🎯

---

💡 Helps to understand the **shape** of data quickly! 🔍📉


**2. Which data visualization technique is commonly used to display the 
relationship between a categorical variable and a numerical variable?** 

### ✅ **Bar Plot**:

* Each **rectangle** (bar) shows the **frequency** (count) 📏
* OR it can show **average/total** if you **group by** category 💡
* Used with **categorical + numerical** data 📊


**3. What are the standard names for positive, negative, and normal kurtosis 
curves?**

### 📈 Kurtosis Types:

1. **Positive kurtosis** → **Leptokurtic** 🔺
2. **Negative kurtosis** → **Platykurtic** 🔻
3. **Normal kurtosis** → **Mesokurtic** ➖

**4. What is the formula for calculating the expected value?**

### 📘 **Expected Value (E)** Formula:

$$
E(X) = \sum [x \times P(x)]
$$

👉 Multiply each value `x` by its probability `P(x)` and add them up ➕

**5. What is the default value for the number of bins in a histogram using 
matplotlib and seaborn libraries?**

### 📊 Default bins:

* **Matplotlib** 👉 `bins=10` 🔟
* **Seaborn** 👉 Auto-calculated using **Freedman–Diaconis rule** ⚙️ (varies with data)


# Data preprocessing

**1) What is the difference between data normalization and data standardization? 
When would you use each technique?**

### 🔹 **Normalization** (Min-Max Scaling)

👉 Scales data between **0 and 1**
📦 Formula:

$$
X' = \frac{X - X_{\min}}{X_{\max} - X_{\min}}
$$

✅ Use when **data doesn't follow normal distribution**

---

### 🔸 **Standardization** (Z-score Scaling)

👉 Converts data to have **mean = 0** and **std = 1**
📦 Formula:

$$
Z = \frac{X - \mu}{\sigma}
$$

✅ Use when **data is normally distributed**

---

💡 Normalize → for ML models like **KNN, NN**
💡 Standardize → for models like **LR, SVM**

🔥 Both help in better model performance! 💪📊


**2.Explain the difference between One Hot Encoding and Label Encoding. When 
would you use each method?**

### 🔹 **Label Encoding**

👉 Converts categories to **numbers** (e.g., Red=0, Blue=1) 🔢
✅ Use when **order matters** (like Low, Medium, High) 📶

---

### 🔸 **One Hot Encoding**

👉 Creates **separate column** for each category with 0 or 1 🟦
✅ Use when **no order** (like City, Gender) 🚫📊

---

💡 Use One Hot for ML models that care about **distance** (like KNN)
💡 Use Label when categories have **ranking** 🔝

🔥 Helps models understand categories ! 💪📈


**3. Define discretization. How can you discretize continuous data?**



### 🔹 **Discretization**

👉 It's the process of **converting continuous data** (like 10.5, 12.3) into **discrete categories** (like low, medium, high) 🏷️



### 🔸 **How to discretize continuous data?**

1. **Equal Width Binning**: Divide range into equal intervals ⏸️
2. **Equal Frequency Binning**: Divide data into bins with the **same number** of data points 🔢
3. **Custom Binning**: Based on domain knowledge or specific thresholds 🧠



💡 Discretization helps models work better with non-continuous values! 🔥📊


**4. How would you normalize text data for natural language processing (NLP)?**

### 🔸 Steps for Normalization:

1. **Lowercasing**: Convert all text to lowercase 🧑‍💻

   * `"Hello"` → `"hello"`

2. **Removing Punctuation**: Eliminate special characters and symbols ✂️

   * `"Hello, world!"` → `"hello world"`

3. **Removing Stop Words**: Remove common words that don’t add much meaning (like "the", "and") 🚫

   * `"The cat is on the mat"` → `"cat mat"`

4. **Tokenization**: Split text into words or tokens 🔠

   * `"hello world"` → `["hello", "world"]`

5. **Stemming/Lemmatization**: Reduce words to their base form 🌱

   * `"running"` → `"run"` (Stemming)
   * `"better"` → `"good"` (Lemmatization)


**5. What is meant by Imbalaced data?How do you handle imbalanced datasets in 
data preprocessing?**

### 🔹 **Imbalanced Data**

👉 When one class (or category) has **significantly more** samples than another in a dataset ⚖️

### 🔸 **Handling Imbalanced Datasets**:

1. **Resampling Techniques**:

   * **Oversampling**: Increase the number of samples in the **minority class** (e.g., using **SMOTE**) 📈
   * **Undersampling**: Decrease the number of samples in the **majority class** 📉