## 📊 EDA & Outlier Detection: Ethereum Network Daily Data

This notebook performs exploratory data analysis (EDA) and basic outlier detection on daily ETH network metrics such as transaction count, total fees, and active addresses.

All results are saved under `results/eth/`

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os

# Ensure output directory exists

In [2]:
os.makedirs("../../results/eth", exist_ok=True)

# Load daily network statistics

In [3]:
df = pd.read_csv("../../data/eth/eth_network_daily.csv")

# Convert date column

In [4]:
df['Date'] = pd.to_datetime(df['Date'])

# 📈 Plot time series of core metrics

In [5]:
plt.figure(figsize=(12, 4))
plt.plot(df['Date'], df['tx_count'], label='Transaction Count')
plt.title("Daily Transaction Count (ETH)")
plt.xlabel("Date")
plt.ylabel("Transactions")
plt.tight_layout()
plt.savefig("../../results/eth/tx_count_timeseries.png")
plt.close()

In [6]:

plt.figure(figsize=(12, 4))
plt.plot(df['Date'], df['total_fees'], label='Total Fees (ETH)', color='orange')
plt.title("Daily Total Fees (ETH)")
plt.xlabel("Date")
plt.ylabel("ETH")
plt.tight_layout()
plt.savefig("../../results/eth/total_fees_timeseries.png")
plt.close()

In [7]:
plt.figure(figsize=(12, 4))
plt.plot(df['Date'], df['active_addresses'], label='Active Addresses', color='green')
plt.title("Daily Active Addresses (ETH)")
plt.xlabel("Date")
plt.ylabel("Addresses")
plt.tight_layout()
plt.savefig("../../results/eth/active_addresses_timeseries.png")
plt.close()

# 📊 Distribution plots

In [8]:
plt.figure(figsize=(12, 6))
for i, col in enumerate(['tx_count', 'total_fees', 'active_addresses']):
    plt.subplot(1, 3, i+1)
    sns.histplot(df[col], kde=True)
    plt.title(f"Distribution: {col}")
plt.tight_layout()
plt.savefig("../../results/eth/feature_distributions.png")
plt.close()

# 🔥 Correlation heatmap

In [9]:
plt.figure(figsize=(6, 5))
sns.heatmap(df[['tx_count', 'total_fees', 'active_addresses']].corr(), annot=True, cmap='coolwarm')
plt.title("Feature Correlation Heatmap")
plt.tight_layout()
plt.savefig("../../results/eth/correlation_heatmap.png")
plt.close()

# 🚨 Outlier Detection (IQR method)

In [10]:
outlier_flags = {}

In [11]:
for col in ['tx_count', 'total_fees', 'active_addresses']:
    Q1 = df[col].quantile(0.25)
    Q3 = df[col].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    flag_col = f"is_outlier_{col}"
    df[flag_col] = (df[col] < lower_bound) | (df[col] > upper_bound)
    outlier_flags[flag_col] = df[flag_col].sum()

# Create single flag for any kind of outlier in a row


In [12]:
df['any_outlier'] = df[[f"is_outlier_{col}" for col in ['tx_count', 'total_fees', 'active_addresses']]].any(axis=1)

# Save rows marked as outliers

In [13]:
df[df['any_outlier']].to_csv("../../results/eth/eth_outliers.csv", index=False)

# Save annotated full data

In [14]:
df.to_csv("../../results/eth/eth_network_daily_annotated.csv", index=False)

In [15]:
print("EDA and outlier detection completed. Results saved to `results/eth/`.")

EDA and outlier detection completed. Results saved to `results/eth/`.
