

# **📊 EDA Report – Cryptocurrency Liquidity Prediction**

## **1. Dataset Overview**

* **Rows & Columns (after cleaning)**: **866 rows × 11 columns** (after removing outliers using 3×IQR).
* **Key Columns**:

  * **price** (float): Price of cryptocurrency in USD.
  * **1h, 24h, 7d** (float): Percentage change over 1 hour, 24 hours, and 7 days.
  * **24h\_volume, mkt\_cap** (float): Trading volume and market capitalization.
  * **liquidity\_ratio** (float): Target variable (measures ease of buying/selling).
  * **date**: Dropped (only 2022-03-16 & 17 present, not useful).

---

## **2. Missing Values**

| Column           | Missing Values |
| ---------------- | -------------- |
| price            | 0              |
| 1h               | 7              |
| 24h              | 7              |
| 7d               | 8              |
| 24h\_volume      | 7              |
| mkt\_cap         | 0              |
| liquidity\_ratio | 7              |

✅ **Treatment**: Dropped rows with nulls (`df.dropna(inplace=True)`).

---

## **3. Outlier Detection (Before Log Transform)**

**Method**: Interquartile Range (IQR = Q3 – Q1, threshold = 1.5×IQR)

| Column           | Outliers |
| ---------------- | -------- |
| price            | 188      |
| 1h               | 26       |
| 24h              | 91       |
| 7d               | 104      |
| 24h\_volume      | 146      |
| mkt\_cap         | 161      |
| liquidity\_ratio | 86       |

✅ **Observation**: Extreme right skewness in price, volume, and market cap.

---

## **4. Log Transformation & Outlier Check**

Applied `np.log1p()` to skewed columns (`price`, `24h_volume`, `mkt_cap`, `liquidity_ratio`):

| Column (Log)          | Outliers (After Log) |
| --------------------- | -------------------- |
| price\_log            | 69                   |
| 24h\_volume\_log      | 28                   |
| mkt\_cap\_log         | 44                   |
| liquidity\_ratio\_log | 76                   |

✅ **Observation**: Significant reduction in outliers for **price, volume, market cap**.
✅ Liquidity ratio still slightly skewed → acceptable as target.

---

## **5. Outlier Removal (3×IQR)**

After applying **3×IQR**:

* **Original Shape**: (992, 11)
* **After Dropping**: (866, 11)

Outliers **greatly reduced** while preserving data.

---

## **6. Correlation Analysis**

**Correlation with Target (liquidity\_ratio\_log):**

| Feature                 | Correlation                      |
| ----------------------- | -------------------------------- |
| volume\_to\_market\_cap | **0.71** ✅ (Strongest predictor) |
| 7d                      | 0.14                             |
| 1h                      | 0.12                             |
| 24h                     | 0.11                             |
| price\_to\_liquidity    | 0.04                             |
| price\_log              | 0.02                             |

✅ **Feature Engineering Added**:

* **volume\_to\_market\_cap = 24h\_volume\_log / mkt\_cap\_log**
* **price\_to\_liquidity = price\_log / liquidity\_ratio\_log**

---

## **7. Multicollinearity Check (VIF Analysis)**

### **Before Removing Highly Correlated Features**

| Feature          | VIF        |
| ---------------- | ---------- |
| 24h\_volume\_log | **72.3** ❌ |
| mkt\_cap\_log    | **72.2** ❌ |

✅ **Decision**: Drop `24h_volume_log` & `mkt_cap_log` (already represented by `volume_to_market_cap`).

---

### **After Feature Selection (Final Features)**

| Feature                 | VIF    |
| ----------------------- | ------ |
| volume\_to\_market\_cap | 2.48 ✅ |
| 1h                      | 1.72 ✅ |
| 24h                     | 1.55 ✅ |
| 7d                      | 1.10 ✅ |

---

## **8. Final Features Used for Modeling**

✅ **`['volume_to_market_cap', '1h', '24h', '7d']`**
(Target = `liquidity_ratio_log`)

---

## **9. Model Evaluation (Scatter Plot Inference)**

**Scatter Plot: Actual vs Predicted Liquidity Ratio**
✅ Points closely follow the **y = x diagonal** → Good predictions.
✅ Few minor deviations in low-liquidity coins (expected due to natural market volatility).

---

## **10. Best Model Performance (Gradient Boosting)**

| Metric   | Train      | Test       |
| -------- | ---------- | ---------- |
| **R²**   | **0.9878** | **0.9591** |
| **MAE**  | 0.0042     | 0.0077     |
| **RMSE** | 0.0064     | 0.0118     |

✅ **No severe overfitting (Train ≈ Test performance)**
✅ **High accuracy & stability → suitable for deployment**


