# Task 3: Employee Score Calculation

This notebook implements **Task 3** of the Employee Sentiment Analysis project. The objective is to compute a monthly sentiment score for each employee based on their messages.

## Scoring System:
- **Positive Message**: +1 point
- **Negative Message**: -1 point  
- **Neutral Message**: 0 points (no effect)

## Key Requirements:
- Aggregate scores on a monthly basis for each employee
- Scores reset at the beginning of each new month
- Clear documentation of grouping and calculation methods

## 1. Import Required Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Set plotting style
plt.style.use('ggplot')
sns.set_palette("husl")
plt.rcParams['figure.figsize'] = (12, 8)

## 2. Load Data and Implement Scoring System

In [None]:
# Load data with sentiment labels
df = pd.read_csv('../data/processed/email_data_with_sentiment.csv')

# Convert date column to datetime
df['date'] = pd.to_datetime(df['date'])

print(f"Dataset loaded: {df.shape}")
print(f"Date range: {df['date'].min()} to {df['date'].max()}")
print(f"Unique employees: {df['from'].nunique()}")

# Implement scoring system
def get_sentiment_score(sentiment):
    """Convert sentiment to numerical score"""
    if sentiment == 'Positive':
        return 1
    elif sentiment == 'Negative':
        return -1
    else:  # Neutral
        return 0

# Apply scoring system
df['sentiment_score'] = df['sentiment_final'].apply(get_sentiment_score)

# Create year-month column for grouping
df['year_month'] = df['date'].dt.to_period('M')

print(f"\nScoring system applied:")
print(f"- Positive messages: +1 point")
print(f"- Negative messages: -1 point") 
print(f"- Neutral messages: 0 points")

print(f"\nSample scoring:")
print(df[['from', 'date', 'sentiment_final', 'sentiment_score']].head(10))

## 3. Calculate Monthly Sentiment Scores

Calculate cumulative monthly scores for each employee, with scores resetting each month.

In [None]:
# Calculate monthly scores for each employee
monthly_scores = df.groupby(['from', 'year_month']).agg({
    'sentiment_score': 'sum',
    'sentiment_final': 'count',
    'date': 'min'  # First message date in that month
}).rename(columns={
    'sentiment_score': 'monthly_score',
    'sentiment_final': 'message_count',
    'date': 'first_message_date'
})

# Reset index to make it easier to work with
monthly_scores = monthly_scores.reset_index()

# Add additional metrics
sentiment_breakdown = df.groupby(['from', 'year_month', 'sentiment_final']).size().unstack(fill_value=0)
sentiment_breakdown.columns = [f'{col.lower()}_count' for col in sentiment_breakdown.columns]
sentiment_breakdown = sentiment_breakdown.reset_index()

# Merge with monthly scores
monthly_scores = monthly_scores.merge(sentiment_breakdown, on=['from', 'year_month'], how='left')

# Fill NaN values with 0 for sentiment counts
sentiment_cols = ['positive_count', 'negative_count', 'neutral_count']
for col in sentiment_cols:
    if col in monthly_scores.columns:
        monthly_scores[col] = monthly_scores[col].fillna(0)

print(f"Monthly scores calculated for {monthly_scores.shape[0]} employee-month combinations")
print(f"Date range: {monthly_scores['year_month'].min()} to {monthly_scores['year_month'].max()}")

# Display sample results
print(f"\nSample Monthly Scores:")
print(monthly_scores.head(10))

# Summary statistics
print(f"\nMonthly Score Statistics:")
print(monthly_scores['monthly_score'].describe())

print(f"\nEmployees with highest average monthly scores:")
avg_scores = monthly_scores.groupby('from')['monthly_score'].mean().sort_values(ascending=False)
print(avg_scores.head(10))

## 4. Save Monthly Scores

Save the calculated monthly scores for use in subsequent tasks.

In [None]:
# Save monthly scores
monthly_scores.to_csv('../data/processed/monthly_scores.csv', index=False)

print("="*60)
print("TASK 3: EMPLOYEE SCORE CALCULATION - SUMMARY REPORT")
print("="*60)

print(f"\nScoring System Summary:")
print(f"- Positive messages: +1 point")
print(f"- Negative messages: -1 point") 
print(f"- Neutral messages: 0 points")

print(f"\nDataset Overview:")
print(f"- Total employee-month combinations: {len(monthly_scores):,}")
print(f"- Unique employees: {monthly_scores['from'].nunique():,}")
print(f"- Months covered: {monthly_scores['year_month'].nunique()}")
print(f"- Date range: {monthly_scores['year_month'].min()} to {monthly_scores['year_month'].max()}")

print(f"\nScore Statistics:")
print(f"- Highest monthly score: {monthly_scores['monthly_score'].max()}")
print(f"- Lowest monthly score: {monthly_scores['monthly_score'].min()}")
print(f"- Average monthly score: {monthly_scores['monthly_score'].mean():.2f}")

print(f"\nOutput Files:")
print(f"- Monthly scores: ../data/processed/monthly_scores.csv")

print(f"\n" + "="*60)
print("Task 3 completed successfully!")
print("Next: Run Task 4 (Employee Ranking)")
print("="*60)