## Project : Trader Behavior Analysis 

In [None]:
# Core Libraries Required for Data Manipulation and Visualization

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


## Part A : Data Preparation

In [None]:
# Datasets Analysis

df_trader_data = pd.read_csv("historical_data.csv")
df_sentiment_data = pd.read_csv("fear_greed_index.csv")

print("Trader Historical Data:", df_trader_data.shape)
print("Sentiment Data:", df_sentiment_data.shape)

In [None]:
df_trader_data.head()

In [None]:
df_sentiment_data.head()

In [None]:
# Analyzing Missing Values

print("\nMissing Values (Trader_Dataset):")
print(df_trader_data.isnull().sum())

print("\nMissing Values (Sentiment_Dataset):")
print(df_sentiment_data.isnull().sum())


## Part B : Analysis

In [None]:
# Converting Timestamps 

df_trader_data['Timestamp IST'] = pd.to_datetime(df_trader_data['Timestamp IST'], errors = 'coerce')
df_trader_data['date'] = df_trader_data['Timestamp IST'].dt.floor('D')

df_sentiment_data['date'] = pd.to_datetime(df_sentiment_data['date'])

In [None]:
# Sentiment Analysis

def sentiment (x):
    if 'Fear' in x:
        return 'Fear'
    elif 'Greed' in x:
        return 'Greed'
    else :
        return 'Neutral'

df_sentiment_data['sentiment'] = df_sentiment_data['classification'].apply(sentiment)

print("\nTrader Date Range:", df_trader_data['date'].min(), "to", df_trader_data['date'].max())
print("\nSentiment Date Range:", df_sentiment_data['date'].min(), "to", df_sentiment_data['date'].max())

In [None]:
# Key Daily Metrics 

daily_trader_data = df_trader_data.groupby('date').agg({
    'Closed PnL' : 'sum',
    'Trade ID' : 'count',
    'Size USD' : 'mean',
    'Account' : 'nunique'
}).reset_index()

daily_trader_data.columns = [
    'date',
    'daily_PnL',
    'num_trades',
    'Avg_Trade_Size',
    'Active_Traders'
]

daily_trader_data['PnL_per_Trade'] = daily_trader_data['daily_PnL'] / daily_trader_data['num_trades']

print("\nDaily Metrics:")
print(daily_trader_data.head())

In [None]:
# Merging Datasets 

df_merged_data = pd.merge(
    daily_trader_data,
    df_sentiment_data[['date', 'sentiment']],
    on = 'date',
    how = 'inner'
)

print("Merged Dataset Shape:", df_merged_data.shape)

In [None]:
# Fear vs Greed

performance_summary = df_merged_data.groupby('sentiment')['daily_PnL'].agg(
    ['mean', 'median', 'std', 'count']
)

print("\nPerformance Summary:")
print(performance_summary)

In [None]:
# Analysis using Chart 

df_merged_data.groupby('sentiment')['daily_PnL'].mean().plot(kind = 'bar')
plt.title("Average Daily PnL by Sentiment")
plt.ylabel("Average Daily PnL")
plt.show()

In [None]:
# Behavior Analysis on the basis of Sentiment

behavior_summary = df_merged_data.groupby('sentiment')[[
    'num_trades',
    'Avg_Trade_Size',
    'Active_Traders'
]].mean()

print("\nBehavior Summary")
print(behavior_summary)

In [None]:
# Frequent vs InFrequent Analysis

trader_metrics_data = df_trader_data.groupby('Account').agg({
    'Closed PnL': 'sum',
    'Trade ID' : 'count',
    'Size USD' : 'mean',

}).reset_index()

trader_metrics_data.columns = [
    'Account',
    'Total_PnL',
    'Total_Trades',
    'Avg_Trade_Size'
]

median_trades = trader_metrics_data['Total_Trades'].median()

trader_metrics_data['Activity_Segment'] = np.where(
    trader_metrics_data['Total_Trades'] >= median_trades,
    'High Activity',
    'Low Activity'
)

segment_summary = trader_metrics_data.groupby('Activity_Segment')[[
    'Total_PnL',
    'Total_Trades',
    'Avg_Trade_Size'
]].mean()

print("\nSegment Summary:")
print(segment_summary)

## Part C : Actionable Output 

## Strategy 1 - Sentiment-Based Risk Adjustment

Based on analysis, trader performance and activity levels across sentiment regimes.

During Fear Periods:
- Reduce the size of trades because the market is more volatile
- Avoid taking on more positions aggressively
- Focus on strategies that protect capital

During Greed Periods:
- Be watchful of overtrading as the number of trades increases
- Use more strict rules for entering trades
- Avoid increasing exposure based solely on optimism

## Strategy 2 - Segment-Specific Risk Management

From the segmentation analysis:

High Activity Traders:
- Make more money overall but are exposed to more risk
- Use tighter risk controls and keep an eye on drawdowns
- Set Limits on Leverage during times of high volatility

Low Activity Traders:
- Have Lower exposure and more consistent behavior
- Can slowly increase trade size when sentiment is stable
- Focus on being consistent rather than trading frequently 



## Predictive Model : Logistic Regression

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

df_LR_Model = df_merged_data.sort_values('date').copy()

df_LR_Model['next_day_PnL'] = df_LR_Model['daily_PnL'].shift(-1)
df_LR_Model['target'] = np.where(df_LR_Model['next_day_PnL'] > 0, 1, 0)

df_LR_Model['sentiment_encoded'] = df_LR_Model['sentiment'].map({
    'Fear': 0,
    'Neutral': 1,
    'Greed' : 2
})

df_LR_Model = df_LR_Model.iloc[:-1]

features = [
    'daily_PnL',
    'num_trades',
    'Avg_Trade_Size',
    'Active_Traders',
    'sentiment_encoded'
]

X = df_LR_Model[features]
Y = df_LR_Model['target']

X_train, X_test, Y_train, Y_test = train_test_split(
    X, Y,
    test_size = 0.2,
    shuffle = False
)

model = LogisticRegression(max_iter=1000, class_weight='balanced')
model.fit(X_train, Y_train)

Y_pred = model.predict(X_test)

print("\nModel Accuracy:", accuracy_score(Y_test, Y_pred))
print("\nClassification Report:\n")
print(classification_report(Y_test, Y_pred))

A Logistic Regression Model was created to forecast the next days's profitability
based on the following factors:

- Daily Profit and Loss
- How often trades are made
- The average size of each trade
- The number of active traders
- Overall market sentiment

## Results

The model showed moderate ability to make predictions. Financial return prediction is 
inherently noisy. The main goal of building this model was to show how structured predictive 
modeling can be done rather than achieving Higher Accuracy. Hence, the model serves as a 
probabilistic guide to market behavior rather than a preccise prediction mechanism.






# Methodology

1. Loaded and cleaned trader and sentiment datasets.
2. Converted timestamps and aligned both datasets.
3. Created daily performance metrics including:
   - Total daily PnL
   - Number of Trades
   - Average trade size
   - Active Traders
4. Compared trader behavior and performance across sentiment regimes.
5. Segmented traders based on activity level.
6. Built a logistic regression model to predict next-day profitability

## Final Conclusion 

This analysis shows that how people feel about the market affects how traders act
and how well they perform.

Key Findings of the Project:
- Traders are more active when the market feels greedy.
- Their performance varies more when market feels fearful.
- Traders who are very active make bigger profits overall, but also it possess more risk.

Even though the models aren't perfectly accurate, the framework shows how sentiment and
behavior can help shape trading ideas.

The project gives both a deeper understanding and practical strategies that take risk
into account.
