# Trader Performance vs Market Sentiment (Primetrade.ai)

**Goal:** Analyze whether trader behavior/performance differs on Fear vs Greed days, segment traders, and propose actionable strategy ideas.

This notebook is the primary submission artifact (charts/tables + short write-up).

## 0) Setup

In [None]:
import os
import sys
from pathlib import Path

import numpy as np
import pandas as pd
import plotly.express as px

# Ensure repo root is on sys.path when running Jupyter from other working dirs
REPO_ROOT = Path(os.getcwd()).resolve()
if str(REPO_ROOT) not in sys.path:
    sys.path.insert(0, str(REPO_ROOT))

from src.data_loader import DataLoader
from src.analysis import Analyzer

OUTPUT_DIR = REPO_ROOT / "output"
CHARTS_DIR = OUTPUT_DIR / "charts"
TABLES_DIR = OUTPUT_DIR / "tables"
REPORTS_DIR = OUTPUT_DIR / "reports"
for d in [CHARTS_DIR, TABLES_DIR, REPORTS_DIR]:
    d.mkdir(parents=True, exist_ok=True)

pd.set_option("display.max_columns", 200)
pd.set_option("display.width", 140)
pd.set_option("display.float_format", lambda x: f"{x:,.4f}")


## 1) Load Data + Basic QA (rows/cols, missing values, duplicates)

In [None]:
loader = DataLoader()
sentiment_df, trades_df = loader.load_data()
merged_df = loader.clean_and_merge()

print("Sentiment:", sentiment_df.shape)
print("Trades:", trades_df.shape)
print("Merged:", merged_df.shape)

display(merged_df.head())


In [None]:
def summarize_df(df: pd.DataFrame, name: str) -> pd.DataFrame:
    missing = df.isna().sum().sort_values(ascending=False)
    missing = missing[missing > 0]
    duplicates = int(df.duplicated().sum())
    summary = pd.DataFrame({
        "rows": [df.shape[0]],
        "cols": [df.shape[1]],
        "duplicate_rows": [duplicates],
    }, index=[name])
    if not missing.empty:
        display(pd.DataFrame({"missing_count": missing, "missing_pct": (missing / len(df))}).head(25))
    return summary

qa = pd.concat([
    summarize_df(sentiment_df, "sentiment"),
    summarize_df(trades_df, "trades"),
    summarize_df(merged_df, "merged"),
])
qa


## 2) Daily Metrics (Fear vs Greed performance)

Required question: **Does performance differ between Fear vs Greed days?**

Added metrics (per assignment examples):
- Average trade size (USD)
- Long/short ratio

In [None]:
analyzer = Analyzer(merged_df)
daily = analyzer.calculate_daily_metrics()
daily.head()


In [None]:
# Bucket variants like "Extreme Fear" into Fear, and "Extreme Greed" into Greed
daily = daily.copy()
daily["sentiment_bucket"] = "Other"
daily.loc[daily["Classification"].astype(str).str.contains("fear", case=False, na=False), "sentiment_bucket"] = "Fear"
daily.loc[daily["Classification"].astype(str).str.contains("greed", case=False, na=False), "sentiment_bucket"] = "Greed"

fear_greed = (
    daily[daily["sentiment_bucket"].isin(["Fear", "Greed"])]
    .groupby("sentiment_bucket", as_index=False)
    .agg(
        avg_pnl=("avg_pnl", "mean"),
        win_rate=("win_rate", "mean"),
        avg_trades_per_day=("trade_count", "mean"),
        avg_trade_size_usd=("avg_trade_size_usd", "mean"),
        long_short_ratio=("long_short_ratio", "mean"),
        long_share=("long_share", "mean"),
    )
)

fear_greed

fear_greed.to_csv(TABLES_DIR / "fear_greed_summary.csv", index=False)
fear_greed.to_json(TABLES_DIR / "fear_greed_summary.json", orient="records", indent=2)


In [None]:
fig1 = px.bar(fear_greed, x="sentiment_bucket", y="win_rate", title="Win Rate: Fear vs Greed")
fig1.update_layout(yaxis_tickformat=".0%")
fig1

fig1.write_html(CHARTS_DIR / "win_rate_fear_vs_greed.html")


In [None]:
fig2 = px.bar(fear_greed, x="sentiment_bucket", y="avg_pnl", title="Avg PnL: Fear vs Greed")
fig2

fig2.write_html(CHARTS_DIR / "avg_pnl_fear_vs_greed.html")


In [None]:
fig_size = px.bar(
    fear_greed,
    x="sentiment_bucket",
    y="avg_trade_size_usd",
    title="Average Trade Size (USD): Fear vs Greed",
)
fig_size

fig_size.write_html(CHARTS_DIR / "avg_trade_size_fear_vs_greed.html")


In [None]:
fig_ls = px.bar(
    fear_greed,
    x="sentiment_bucket",
    y="long_short_ratio",
    title="Long/Short Ratio: Fear vs Greed",
)
fig_ls

fig_ls.write_html(CHARTS_DIR / "long_short_ratio_fear_vs_greed.html")


## 3) Behavior Changes by Sentiment

Required question: **Do traders change behavior based on sentiment?**

Suggested metrics:
- trade frequency (# trades per day)
- leverage distribution
- long/short ratio
- position sizes (USD)


In [None]:
# Leverage distribution (note: may be synthesized if missing in raw trades data)
if "leverage" in merged_df.columns:
    lev = pd.to_numeric(merged_df["leverage"], errors="coerce").dropna()
    fig3 = px.histogram(lev.to_frame(name="leverage"), x="leverage", title="Leverage Distribution")
    fig3
else:
    print("No leverage column present.")


## 4) Trader Segmentation (2â€“3 segments)

Required: identify segments such as high vs low leverage, frequent vs infrequent, consistent winners, etc.

In [None]:
try:
    high_lev, low_lev = analyzer.segment_traders()
    print("High leverage traders:", high_lev["account"].nunique())
    print("Low leverage traders:", low_lev["account"].nunique())
    display(high_lev.head())
    display(low_lev.head())
except Exception as e:
    print("Segmentation failed:", e)


In [None]:
# Profitability segment: Consistent Winners vs Net Losers
winners, losers = analyzer.segment_profitability()
print("Consistent Winners:", winners["account"].nunique())
print("Net Losers:", losers["account"].nunique())

display(winners.head())
display(losers.head())

profitability_counts = pd.DataFrame({
    "segment": ["Consistent Winners", "Net Losers"],
    "traders": [winners["account"].nunique(), losers["account"].nunique()],
})
profitability_counts.to_csv(TABLES_DIR / "profitability_segment_counts.csv", index=False)
profitability_counts.to_json(TABLES_DIR / "profitability_segment_counts.json", orient="records", indent=2)


In [None]:
# Activity segment: High Frequency (>5 trades/day) vs Low Frequency
high_freq, low_freq = analyzer.segment_activity(trades_per_day_threshold=5.0)
print("High Frequency traders (>5/day):", high_freq["account"].nunique())
print("Low Frequency traders (<=5/day):", low_freq["account"].nunique())

display(high_freq.head())
display(low_freq.head())

activity_counts = pd.DataFrame({
    "segment": ["High Frequency (>5/day)", "Low Frequency (<=5/day)"],
    "traders": [high_freq["account"].nunique(), low_freq["account"].nunique()],
})
activity_counts.to_csv(TABLES_DIR / "activity_segment_counts.csv", index=False)
activity_counts.to_json(TABLES_DIR / "activity_segment_counts.json", orient="records", indent=2)


## 5) Insights + Strategy Recommendations (write-up)

Fill in:
- **3+ insights** backed by charts/tables
- **2 strategy rules of thumb** derived from findings

Example format:
1. Insight: ... (evidence)
2. Insight: ... (evidence)
3. Insight: ... (evidence)

Strategies:
- Rule 1: ...
- Rule 2: ...
