# **📊 Interquartile Range (IQR)**

---

### 🧮 `Mean`: 
* The average of all values in the dataset.
```python
    mean = sum(data)/len(data)
```

---

### 🔺 `Max`: 
* The largest value in the dataset.
```python
    max_value = max(data)
```

---

### 🔹 `Median (Q2)`: 
* The middle value when data is sorted.
```python
    median = sorted(data)[len(data)//2]
```

---

### 🔻 `Q1 (First Quartile)`: 
* The median of the lower half of the data (25th percentile).
```python
    q1 = sorted(data)[:len(data)//2][len(sorted(data)[:len(data)//2])//2]
```

---

### 🔸 `Q2 (Second Quartile)`: 
* The same as the median, the middle value in the dataset.
```python
    q2 = median
```

---

### 🔺 `Q3 (Third Quartile)`: 
* The median of the upper half of the data (75th percentile).
```python
    q3 = sorted(data)[len(data)//2:][len(sorted(data)[len(data)//2:])//2]
```

---

### 📏 `Range`: 
* The difference between the max and min values.
```python
    range_value = max(data) - min(data)
```

---

### 📦 `IQR`: 
* The difference between Q3 and Q1 (spread of middle 50% of data).
```python
    iqr = q3 - q1
```

---

### 🚨 `1.5x IQR Rule`: 
* Any value outside the range [Q1 - 1.5 * IQR, Q3 + 1.5 * IQR] is an outlier.
```python
    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr
```

---

### ⚠️ `Outliers`: 
* Identify any values outside the range as outliers.
```python
    outliers = [x for x in data if x < lower_bound or x > upper_bound]
```

---

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from sklearn.impute import KNNImputer

In [None]:
data = {'Students':["Alice", "Bob", "Charlie", "David", "Eva"],
        "Height":[165,170,172,168,175]}

df = pd.DataFrame(data)
sns.boxplot(y = df["Height"], color="yellow")
plt.title("Boxplot of Students' Heights")
plt.ylabel("Height")
plt.show()


In [None]:
# Box plot of diamonds

data = sns.load_dataset("diamonds")
sns.boxplot(x="cut", y="price", data=data)
plt.title("Box plot of diamond prices by cut")
plt.show()


# Violin plot of diamonds

data = sns.load_dataset("diamonds")
sns.violinplot(x="cut", y="price", hue="cut", data=data, palette="muted")
plt.title("Violin plot of price by cut")
plt.show()

In [None]:
# KNN to fillna

data = pd.DataFrame({'Age':[25,30,np.nan,35],'Salary':[400000,500000,600000,np.nan]})

imputer = KNNImputer(n_neighbors=2)
imputed_data = imputer.fit_transform(data)
imputed_data = pd.DataFrame(imputed_data, columns=data.columns)
print(imputed_data)

In [None]:
# Load data

data = pd.read_csv("products_data.csv")
print("Original DataFrame:")
print(data)
print(data.isnull().sum())

In [None]:
# KNN impute

imputer = KNNImputer(n_neighbors=2)
imputed_data = imputer.fit_transform(data[['Price','Stock','Rating']])
df = pd.DataFrame(imputed_data, columns=['Price','Stock','Rating'])
df.insert(0, 'Product_ID', data['Product_ID'])
df.insert(1, 'Product_Name', data['Product_Name'])
print(df)


In [None]:
# Visualization

plt.figure(figsize=(10, 5))
sns.boxplot(data=df)
plt.title("Boxplot of Products' Prices")
plt.xlabel("Products")
plt.ylabel("Price")
plt.show()

In [None]:
# Heatmap

corr_mat = df.corr(numeric_only=True)
print(corr_mat)

plt.figure(figsize=(10, 5))
sns.heatmap(corr_mat, annot=True, cmap='Spectral')
plt.title("Correlation Heatmap")
plt.show()

In [None]:
# Bar plot

plt.figure(figsize=(8,6))
sns.barplot(y=df['Price'], x=df['Stock'], errorbar=('ci',0))
plt.xlabel("Stocks")
plt.ylabel('Price')
plt.grid(axis='y')
plt.title("Products with their stock vs price")
plt.show()

In [None]:
# Regression Plot

plt.figure(figsize=(8,5))
sns.regplot(y='Stock', x='Rating',color='y', data= df)
plt.title('Stock v/s Rating')
plt.show()

In [None]:
# Linear Model Plot

plt.figure(figsize=(8,5))
sns.lmplot(y='Stock', x='Rating', hue='Product_Name', data= df, )
plt.title('Stock v/s Rating')
plt.grid()
plt.show()