# PrimoGPT NLP Features Evaluation

This notebook evaluates the predictive accuracy of NLP features generated by PrimoGPT model. The evaluation compares generated features against actual market movements in the following period.

## Evaluation Period
- Start Date: January 1, 2024
- End Date: July 31, 2024
- Purpose: Aligns with PrimoRL trading simulation phase

## Feature Evaluation
The notebook analyzes the predictive power of:
1. Combined evaluation algorithm (sentiment + trend)
2. Individual features:
   - News Relevance (0-2)
   - Sentiment (-1 to 1)
   - Price Impact Potential (-3 to 3)
   - Trend Direction (-1 to 1)
   - Earnings Impact (-2 to 2)
   - Investor Confidence (-3 to 3)
   - Risk Profile Change (-2 to 2)

In [1]:
import pandas as pd
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import numpy as np

In [2]:
csv_file_name = "data/CRM_data.csv"

df = pd.read_csv(csv_file_name)

# Convert 'Date' column to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Filter data for the specified date range
start_date = '2024-01-01'
end_date = '2024-07-31'
df = df[(df['Date'] >= start_date) & (df['Date'] <= end_date)]

df.head()

Unnamed: 0,Date,Adj Close Price,Returns,Bin Label,News Relevance,Sentiment,Price Impact Potential,Trend Direction,Earnings Impact,Investor Confidence,Risk Profile Change,Prompt
438,2024-01-02,254.707367,-0.02664,D3,1,-1,-2,-1,0,-1,0,\n [COMPANY BASICS]\n ...
439,2024-01-03,250.441177,-0.016749,D2,1,0,-1,-1,0,-1,0,\n [COMPANY BASICS]\n ...
440,2024-01-04,249.844513,-0.002382,D1,2,1,2,1,2,2,0,\n [COMPANY BASICS]\n ...
441,2024-01-05,249.725174,-0.000478,D1,0,0,0,0,0,0,0,\n [COMPANY BASICS]\n ...
442,2024-01-08,259.421021,0.038826,U4,2,1,2,1,1,2,0,\n [COMPANY BASICS]\n ...


In [3]:
def predict_movement(row, previous_direction):
    if row['News Relevance'] == 0 or (row['Sentiment'] + row['Trend Direction'] == 0):
        return previous_direction
    
    score = row['Sentiment'] + row['Trend Direction']
    
    if score > 0:
        return 'U'
    else:
        return 'D'

def classify_actual_movement(bin_label):
    if pd.isna(bin_label) or bin_label is None:
        return 'U'  # Assume 'Up' for missing values (or you can choose 'D')
    bin_label = str(bin_label)  # Convert to string just in case
    if bin_label.startswith('U'):
        return 'U'
    else:
        return 'D'

# Add actual values for the current day
df['Actual_Current_Day'] = df['Bin Label'].apply(classify_actual_movement)

# Initialize a list for predictions
predictions = []

# Iterate through the DataFrame and predict for each row
previous_direction = 'U'  # Initial assumption for the first row
for _, row in df.iterrows():
    prediction = predict_movement(row, previous_direction)
    predictions.append(prediction)
    previous_direction = row['Actual_Current_Day']  # Update previous_direction for the next iteration

# Add predictions to the DataFrame
df['Predicted_Next_Day'] = predictions

# Shift actual values one day forward for comparison with predictions
df['Actual_Next_Day'] = df['Actual_Current_Day'].shift(-1)

# Remove the last row since we don't have an actual value for the next day after the last day
df = df.dropna(subset=['Actual_Next_Day'])

# Calculate accuracy
accuracy = accuracy_score(df['Actual_Next_Day'], df['Predicted_Next_Day'])
conf_matrix = confusion_matrix(df['Actual_Next_Day'], df['Predicted_Next_Day'], labels=['U', 'D'])

print(f"Prediction accuracy: {accuracy:.2f}")
print("\nConfusion matrix:")
print(pd.DataFrame(conf_matrix, index=['Actual U', 'Actual D'], columns=['Pred U', 'Pred D']))

print("\nClassification report:")
print(classification_report(df['Actual_Next_Day'], df['Predicted_Next_Day']))

# Analyze the accuracy of the combination of sentiment and trend
correct_predictions = ((df['Sentiment'] + df['Trend Direction'] > 0) & (df['Actual_Next_Day'] == 'U')) | \
                      ((df['Sentiment'] + df['Trend Direction'] < 0) & (df['Actual_Next_Day'] == 'D')) | \
                      ((df['Sentiment'] + df['Trend Direction'] == 0) & (df['Predicted_Next_Day'] == df['Actual_Next_Day']))
combined_accuracy = correct_predictions.mean()
print(f"\nAccuracy of Sentiment + Trend Direction combination: {combined_accuracy:.2f}")

Prediction accuracy: 0.46

Confusion matrix:
          Pred U  Pred D
Actual U      43      34
Actual D      45      23

Classification report:
              precision    recall  f1-score   support

           D       0.40      0.34      0.37        68
           U       0.49      0.56      0.52        77

    accuracy                           0.46       145
   macro avg       0.45      0.45      0.44       145
weighted avg       0.45      0.46      0.45       145


Accuracy of Sentiment + Trend Direction combination: 0.46


In [4]:
def analyze_feature_predictive_power(df, features):
    results = {}
    
    for feature in features:
        predictions = []
        previous_direction = 'U'  # Initial assumption for the first row
        
        for _, row in df.iterrows():
            if row[feature] > 0:
                prediction = 'U'
            elif row[feature] < 0:
                prediction = 'D'
            else:
                prediction = previous_direction
            
            predictions.append(prediction)
            previous_direction = row['Actual_Current_Day']
        
        accuracy = accuracy_score(df['Actual_Next_Day'], predictions)
        conf_matrix = confusion_matrix(df['Actual_Next_Day'], predictions, labels=['U', 'D'])
        report = classification_report(df['Actual_Next_Day'], predictions, output_dict=True)
        
        results[feature] = {
            'accuracy': accuracy,
            'confusion_matrix': conf_matrix,
            'classification_report': report
        }
    
    return results

# List of features to analyze
features_to_analyze = [
    "Sentiment", "Price Impact Potential", 
    "Trend Direction", "Earnings Impact", "Investor Confidence", "Risk Profile Change"
]

# Perform the analysis
feature_analysis = analyze_feature_predictive_power(df, features_to_analyze)

# Print the results
for feature, results in feature_analysis.items():
    print(f"\nAnalysis for feature: {feature}")
    print(f"Accuracy: {results['accuracy']:.2f}")
    print("\nConfusion matrix:")
    print(pd.DataFrame(results['confusion_matrix'], index=['Actual U', 'Actual D'], columns=['Pred U', 'Pred D']))
    print("\nDetailed analysis:")
    print(f"Precision (U): {results['classification_report']['U']['precision']:.2f}")
    print(f"Recall (U): {results['classification_report']['U']['recall']:.2f}")
    print(f"F1-score (U): {results['classification_report']['U']['f1-score']:.2f}")
    print(f"Precision (D): {results['classification_report']['D']['precision']:.2f}")
    print(f"Recall (D): {results['classification_report']['D']['recall']:.2f}")
    print(f"F1-score (D): {results['classification_report']['D']['f1-score']:.2f}")
    print("-" * 50)

# Sort features by accuracy
sorted_features = sorted(feature_analysis.items(), key=lambda x: x[1]['accuracy'], reverse=True)

print("\nFeatures ranked by accuracy:")
for feature, results in sorted_features:
    print(f"{feature}: {results['accuracy']:.2f}")


Analysis for feature: Sentiment
Accuracy: 0.47

Confusion matrix:
          Pred U  Pred D
Actual U      46      31
Actual D      46      22

Detailed analysis:
Precision (U): 0.50
Recall (U): 0.60
F1-score (U): 0.54
Precision (D): 0.42
Recall (D): 0.32
F1-score (D): 0.36
--------------------------------------------------

Analysis for feature: Price Impact Potential
Accuracy: 0.46

Confusion matrix:
          Pred U  Pred D
Actual U      38      39
Actual D      40      28

Detailed analysis:
Precision (U): 0.49
Recall (U): 0.49
F1-score (U): 0.49
Precision (D): 0.42
Recall (D): 0.41
F1-score (D): 0.41
--------------------------------------------------

Analysis for feature: Trend Direction
Accuracy: 0.45

Confusion matrix:
          Pred U  Pred D
Actual U      42      35
Actual D      45      23

Detailed analysis:
Precision (U): 0.48
Recall (U): 0.55
F1-score (U): 0.51
Precision (D): 0.40
Recall (D): 0.34
F1-score (D): 0.37
--------------------------------------------------

Analy