
# Cryptocurrency Liquidity Prediction Project 🚀

This project is built to predict **cryptocurrency liquidity** using historical market data.  
We'll go through each stage of a typical machine learning pipeline:

1. 🧠 Understanding the Problem Statement  
2. 📥 Data Collection  
3. 🧹 Data Cleaning  
4. 📊 Exploratory Data Analysis (EDA)  
5. 🔧 Feature Engineering  
6. 🤖 Model Training & Evaluation  
7. 🧪 Model Comparison  
8. 🚀 Final Inference  



## 🧠 1. Understanding the Problem

Cryptocurrency markets are volatile. This project helps detect **liquidity crises** early by analyzing features such as:
- Trading volume
- Price fluctuations
- Market capitalization

We aim to **predict liquidity** using machine learning models.


In [None]:

import pandas as pd

df = pd.read_csv("/content/coin_gecko_2022-03-17.csv")
df.head()


In [None]:

df.info()



## 🧹 2. Data Cleaning


In [None]:

# Drop rows with missing values
df.dropna(inplace=True)
df.reset_index(drop=True, inplace=True)


In [None]:

df['date'] = pd.to_datetime(df['date'])
df.dtypes



## 📊 3. Exploratory Data Analysis


In [None]:

import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(12, 6))
sns.histplot(df['price'], bins=30, kde=True)
plt.title("Distribution of Crypto Prices")
plt.show()



## 🔧 4. Feature Engineering


In [None]:

df = df.sort_values(by='mkt_cap', ascending=False).reset_index(drop=True)
df['ma_price'] = df['price'].rolling(window=10, min_periods=1).mean()
df['ma_volume'] = df['24h_volume'].rolling(window=10, min_periods=1).mean()
df['liquidity_ratio'] = df['24h_volume'] / df['mkt_cap']



## 🤖 5. Model Training and Evaluation


In [None]:

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

features = ['price', 'ma_price', 'ma_volume', 'liquidity_ratio']
target = '24h_volume'

X = df[features]
y = df[target]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [None]:

rf = RandomForestRegressor()
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)


In [None]:

lr = LinearRegression()
lr.fit(X_train, y_train)
y_pred_lr = lr.predict(X_test)


In [None]:

def evaluate_model(y_true, y_pred, model_name):
    print(f"Model: {model_name}")
    print("MAE:", mean_absolute_error(y_true, y_pred))
    print("RMSE:", mean_squared_error(y_true, y_pred, squared=False))
    print("R²:", r2_score(y_true, y_pred))
    print("-" * 30)

evaluate_model(y_test, y_pred_rf, "Random Forest")
evaluate_model(y_test, y_pred_lr, "Linear Regression")



## ✅ Conclusion

- We cleaned and analyzed the dataset.
- Performed feature engineering to extract meaningful indicators.
- Trained two models and compared their performance.
- You can deploy this using Streamlit or export predictions.

> ✅ Random Forest gave better results than Linear Regression for this dataset.
