# EDA and Experiments

Exploratory Data Analysis and model experiments for phishing-detection.

## Setup
```python
import pandas as pd
import numpy as np
from src.data_processing import process_data
from src.features import extract_features
import matplotlib.pyplot as plt
import seaborn as sns
```

Load data:
```python
df = process_data()
print(df.head())
```

In [None]:
import pandas as pd
import numpy as np
from src.data_processing import process_data
from src.features import extract_features
import matplotlib.pyplot as plt
import seaborn as sns

# Load and explore
df = process_data()
print(df.describe())

# Visualize label distribution
sns.countplot(x='label', data=df)
plt.title('Label Distribution')
plt.show()

# Feature correlations
feat_cols = [f'feature_{i+1}' for i in range(10)]
corr = df[feat_cols].corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title('Feature Correlations')
plt.show()

# Experiment: Manual feature extraction
sample_url = 'https://example.com'
feats = extract_features(sample_url)
print(f'Features for {sample_url}: {feats}')

## Model Experiments

Train a simple model here:
```python
from src.ml_helpers import split_data, compute_metrics
from sklearn.ensemble import RandomForestClassifier

X = df[feat_cols].values
y = df['label'].values
X_train, X_test, y_train, y_test = split_data(X, y)

model = RandomForestClassifier(n_estimators=50)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
metrics = compute_metrics(y_test, y_pred)
print(metrics)
```

Hyperparameter tuning (e.g., GridSearchCV) can be added.