# Interview question  

**25-04-2025**

# python 

**1. How can you match a string that starts with an uppercase letter followed by any number of lowercase letters using regular expressions in Python?**

```python
import re

pattern = r'^[A-Z][a-z]*'
```
It matches:

* Starts with **one uppercase** `A-Z`
* Followed by **zero or more lowercase** `a-z`

**2. When would you choose to use append() over extend() and vice versa?**

🔹 **append()** 👉 Adds **one item** (even a list as a single element)

```python
a = [1, 2]; a.append([3, 4]) ➡️ [1, 2, [3, 4]]
```

🔹 **extend()** 👉 Adds **all elements** from another list

```python
a = [1, 2]; a.extend([3, 4]) ➡️ [1, 2, 3, 4]
```

✅ Use:

* `append()` for single item 📦
* `extend()` for merging lists 🔗


**3. What is the purpose of the Python garbage collector, and how does it reclaim 
memory from objects that are no longer referenced?**

Python's **garbage collector** 🗑️🧠 does this:

🔸 **Purpose**: Frees memory by removing **unused objects**.
🔸 **How**: Tracks **reference count** ➕

* If count = 0 ➡️ object is deleted
* Handles **circular references** using **generational GC** ♻️

So, it keeps your program memory clean and smooth 🧼💨

**4.How can you convert a string to an integer in Python using built-in functions or 
methods?**
```python
num = int("123")
```

✅ Converts string `"123"` to integer `123` 🔁🔢



**5. Are there any available programs or libraries to aid with static analysis or 
problem finding in Python?**

Top tools for **static analysis** in Python 🐍🔍:

1. ✅ **Pylint** – Finds errors, style issues 🧹
2. ✅ **Flake8** – Lightweight, checks PEP8 + errors 📏
3. ✅ **mypy** – Checks **type hints** 🧠
4. ✅ **Bandit** – Finds **security issues** 🔐
5. ✅ **Pyright** – Fast type checker (by Microsoft) ⚡
6. ✅ **Black** – Auto-formats code 🖤

**6. Create a programme to convert between celsius and fahrenheit values.** 


```python
def celsius_to_fahrenheit(celsius):
    return (celsius * 9/5) + 32

def fahrenheit_to_celsius(fahrenheit):
    return (fahrenheit - 32) * 5/9

# Get user input
temperature = float(input("Enter temperature value: "))
unit = input("Is this in Celsius or Fahrenheit? (C/F): ").upper()

if unit == "C":
    print(f"{temperature}°C = {celsius_to_fahrenheit(temperature)}°F")
elif unit == "F":
    print(f"{temperature}°F = {fahrenheit_to_celsius(temperature)}°C")
else:
    print("Invalid unit! Please enter C or F.")
```




# EDA

1. **Which statistical measure provides information about the spread or variability of a 
dataset?**

* **Standard Deviation**: Shows how much data deviates from the mean.
* **Variance**: Measures the average squared deviation from the mean.

2. **Which data visualisation technique is used to display the relationship between two 
numerical variables?**

A **scatter plot** is used to display the relationship between **two numerical variables**. It shows individual data points on a two-dimensional grid, where:

* The **x-axis** represents one variable
* The **y-axis** represents the other variable

**primary purpose**
- direction 
- strength 
- linear or non-linear 

**Disadvantage strength is subjective solution using correlation is the range is -1 to +1 > 0.85 means high and < 0.4 low correlated**

3. **What is the formula for calculating Kurtosis and which python function is used to get 
the kurtosis value?**

<img src="../resources/skewKurt.png" width=500/>

```python
from scipy.stats import kurtosis
kurt_value = kurtosis(data)
```

```python
from scipy.stats import skew
skew_value = skew(data)
```

Main purposes of **skewness** and **kurtosis**:

### 📉 Skewness:

1. ✅ Tells if data is **left** or **right** skewed
2. ✅ Helps check if **mean = median = mode** (normal)

### 📈 Kurtosis:

1. ✅ Shows if data has **heavy or light tails** (outliers)
2. ✅ Checks how **peaky** the distribution is

Useful in understanding **data shape** before modeling 🤓📊


4. **Why is mean influenced by outliers, but why not median?**


### ✅ Mean is affected 😵‍💫

Because it **adds all values**, even big outliers. So 1 big number changes it a lot.

### ✅ Median is safe 😎

It only looks at the **middle value**, so outliers don’t matter.

# Data preprocessing

**1)  How can you check for stationarity in a given time series data using statistical 
tests or visual inspection?**
### **Visual Check** 👀

* Plot the data 📈
* If **mean & variance** look constant → likely stationary
* If trend or seasonality → not stationary

**2. Can you provide examples of issues such as seasonality, trends, or 
irregularities that need to be addressed during preprocessing?**

### 🔁 **Seasonality**

📅 Pattern repeats over time (daily, weekly, yearly)
**Example**: Ice cream sales high in summer, low in winter 🍦❄️

**Fix**: Seasonal differencing or decomposition

---

### 📈 **Trend**

⬆️ or ⬇️ movement over time (long-term growth or drop)
**Example**: Increasing temperature due to climate change 🌍🔥

**Fix**: Differencing or detrending

---

### ❓ **Irregularities (Noise/Outliers)**

Sudden spikes or drops
**Example**: Sudden sales spike due to a one-day offer 💥🛒

**Fix**: Smoothing, outlier removal, or transformatio

**3. How do you handle missing values in time series data and What are some 
techniques for imputing missing values in time series data?** 

 how you handle missing values in time series data 🔧:

### Handling Missing Values:

* **Remove them:** If few missing points, drop rows.
* **Impute them:** Fill missing values to keep data complete.

### Imputation Techniques:

1. **Forward Fill (ffill):** Use the last known value ➡️

   ```python
   df['value'].fillna(method='ffill', inplace=True)
   ```
2. **Backward Fill (bfill):** Use the next known value ⬅️
3. **Linear Interpolation:** Estimate values using surrounding data ➗

   ```python
   df['value'].interpolate(method='linear', inplace=True)
   ```
4. **Seasonal Adjustment:** Use seasonal patterns to estimate missing values 🌦️


**4.  Explain the impact of outliers on time series analysis and forecasting.**

### ⚠️ Outliers = Unusual spikes or drops

### 🔻 Impact on Analysis & Forecasting:

1. **Skews model predictions** 😵‍💫
   → Models learn wrong patterns

2. **Affects trend/seasonality detection** 📉
   → Hard to spot real trends

3. **Errors increase** ❌
   → Metrics like MAE, RMSE go up

4. **Bad confidence intervals** 📊
   → Unreliable forecast ranges

---

### ✅ Solution:

* Detect with plots or z-scores
* Fix using smoothing, transformation, or replace values

Handle them smartly = 📈 better forecasts

**5. What is the significance of smoothing techniques in time series data 
preprocessing and name some smoothing techniques used?**

Helps to remove noise and see the real pattern in time series 

### ✨ Significance:

* Clears out small ups & downs (noise) 🔕
* Highlights trends and seasonality 🌊
* Helps better forecasting and analysis 🎯


### 🔧 Common Smoothing Techniques:

1. **Moving Average** 🌀
   → Average of last 'n' values
   `rolling(window=3).mean()`

2. **Exponential Smoothing** 📉
   → More weight to recent data

3. **LOESS / LOWESS** 📈
   → Local regression smoothing

4. **Gaussian Smoothing** 🧠
   → Uses bell curve for smoother results


Useful for clean & clear time series 

**6. Discuss the concept of window size or lag in moving average smoothing and 
its impact on the level of smoothing.**


**window size (lag)** in **moving average smoothing** 📊


### 🧠 What is Window Size?

* Number of past data points to average
* Example: `window=3` means average of last 3 values

### 📈 Impact of Window Size:

1. **Small Window (e.g. 2 or 3)**
   ➤ **Less smooth**, reacts fast to changes ⚡
   ➤ Good for short-term patterns

2. **Large Window (e.g. 10 or 20)**
   ➤ **More smooth**, hides small changes 🌊
   ➤ Good for long-term trends

### 🔁 Summary:

* **Small lag** = detailed view 🔬
* **Big lag** = smoother curve but may lose details 🎨




**9. Discuss the metrics or criteria used to assess the performance of different 
smoothing techniques.**



### 📊 Common Metrics:

1. **MAE (Mean Absolute Error)**
   ➤ Average of absolute errors
   ➤ Easy & simple 💡
   `MAE = mean(|actual - predicted|)`

2. **MSE (Mean Squared Error)**
   ➤ Squares the error (penalizes big mistakes) 😬
   `MSE = mean((actual - predicted)²)`

3. **RMSE (Root MSE)**
   ➤ Square root of MSE
   ➤ Same units as data 📐
   `RMSE = sqrt(MSE)`

4. **MAPE (Mean Absolute Percentage Error)**
   ➤ % error, easy to understand 💯
   `MAPE = mean(|(actual - predicted)/actual|) × 100`

---

### 🧠 Why Use These?

* To compare models 🔁
* To tune smoothing parameters 🎯
* Lower = better performance ✅