<a href="https://colab.research.google.com/github/bcdanl/210-code/blob/main/danl_proj_nba.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---
title: Data Analysis Project
subtitle: Yahoo Finance Analysis
author: Ryan Horn
date: 2024-02-15
from: markdown+emoji
---

# Unifying ESG Metrics with Financial Analysis

---

## 1. Introduction

### Background
Over the past decade, investors and regulators have increasingly emphasized Environmental, Social, and Governance (ESG) performance as a key indicator of a company’s long-term resilience and ethical standing. At the same time, traditional financial metrics—like revenue growth, profit margins, and stock returns—remain the bedrock of investment decisions. Understanding how ESG risk scores evolve over time and how they relate to core financial performance can help firms balance sustainable practices with shareholder value.

### Problem Statement
This project examines whether changes in a company’s ESG risk scores between 2024 and 2025 are associated with shifts in its financial health, as reflected in historical stock returns and accounting ratios. By combining ESG metrics and finance data, we aim to identify patterns that could guide more informed, sustainability-aware investment strategies.

---

## 2. Data Collection

*Data for both ESG risk scores and historical stock market data were retrieved using a standalone Python Selenium script (e.g., `data_collection.py`).*  
*The script performs the following steps (submitted separately to Brightspace):*  
- Opens a headless browser, navigates to Yahoo Finance ESG pages for each ticker, and scrapes Total ESG, Environmental, Social, Governance scores, and Controversy level.  
- Navigates historical data pages (Jan 1 2024–Mar 31 2025) for each ticker and scrapes daily OHLCV.  
- Saves to `danl_210_HORN_RYAN_ESG.csv` and `danl_210_HORN_RYAN_stock.csv` locally.  

---

## 3. Data Loading & Cleaning
```python
import pandas as pd

# Load saved CSVs
esg_df = pd.read_csv('danl_210_HORN_RYAN_ESG.csv')
stock_df = pd.read_csv('danl_210_HORN_RYAN_stock.csv')

# Ensure Year column is integer and compute returns for stock data
stock_df['Return'] = stock_df['Close'].pct_change()

# Quick peek at data types
print(esg_df.dtypes)
print(stock_df.dtypes)

# Drop rows missing key ESG values (example)
esg_df = esg_df.dropna(subset=['Total_ESG_Risk'])
```

---

## 4. Descriptive Statistics

### 4.1 Ungrouped Summaries
```python
# ESG ungrouped summary
esg_summary = esg_df[['Total_ESG_Risk', 'Environmental_Risk', 'Social_Risk', 'Governance_Risk', 'Controversy']].describe()
print("ESG Overall Summary:\n", esg_summary)

# Stock ungrouped summary
stock_summary = stock_df[['Close', 'Volume', 'Return']].describe()
print("\nStock Overall Summary:\n", stock_summary)
```
*Interpretation:* Discuss central tendency (mean, median) and dispersion (std) for ESG scores and stock returns.

### 4.2 Grouped Summaries by Year
```python
# ESG grouped by Year
esg_by_year = esg_df.groupby('Year')[['Total_ESG_Risk','Environmental_Risk','Social_Risk','Governance_Risk','Controversy']]\
    .agg(['mean','median','std'])
print("\nESG by Year:\n", esg_by_year)

# Stock grouped by Year
stock_by_year = stock_df.groupby('Year')[['Close','Volume','Return']].agg(['mean','median','std'])
print("\nStock by Year:\n", stock_by_year)
```
*Interpretation:* Compare 2024 vs 2025 metrics to highlight year-over-year changes.

---

## 5. Exploratory Data Analysis (EDA) & Visualizations
```python
import seaborn as sns
import matplotlib.pyplot as plt
```

### 5.1 Total ESG Risk Distribution
```python
plt.figure()
sns.histplot(esg_df['Total_ESG_Risk'], kde=True)
plt.title('Total ESG Risk Distribution')
plt.xlabel('Total ESG Risk Score')
plt.ylabel('Frequency')
plt.show()
```
*Interpretation:* Comment on skewness, modal range, and tails of ESG risk scores.

### 5.2 Daily Stock Return Distribution
```python
plt.figure()
sns.histplot(stock_df['Return'].dropna(), kde=True)
plt.title('Daily Stock Return Distribution')
plt.xlabel('Daily Return')
plt.ylabel('Frequency')
plt.show()
```
*Interpretation:* Note volatility, presence of outliers, and symmetry.

### 5.3 ESG vs. Return Correlation Heatmap
```python
# Merge ESG sub-scores with returns on Year index
merged = esg_df.set_index('Year')[['Total_ESG_Risk','Environmental_Risk','Social_Risk','Governance_Risk']]
merged['Return'] = stock_df.set_index('Year')['Return']

plt.figure(figsize=(8,6))
sns.heatmap(merged.corr(), annot=True, fmt='.2f', linewidths=0.5)
plt.title('ESG Sub-scores & Stock Return Correlation Heatmap')
plt.show()
```
*Interpretation:* Identify which ESG dimensions correlate most strongly (positively/negatively) with returns.

### 5.4 Yearly ESG Risk vs. Average Return Trend
```python
# Compute quarterly trends if desired, or yearly
esg_trend = esg_df.groupby('Year')['Total_ESG_Risk'].mean()
return_trend = stock_df.groupby('Year')['Return'].mean()

plt.figure()
esg_trend.plot(marker='o', label='Avg ESG Risk')
return_trend.plot(marker='x', label='Avg Return')
plt.legend()
plt.title('Yearly Avg ESG Risk vs. Avg Return')
plt.xlabel('Year')
plt.ylabel('Value')
plt.show()
```
*Interpretation:* Compare directional movements of ESG risk and returns across years.

### 5.5 Returns by ESG Risk Quartile
```python
# Create quartile categories on Total_ESG_Risk
esg_df['Risk_Quartile'] = pd.qcut(esg_df['Total_ESG_Risk'], 4, labels=['Low','MidLow','MidHigh','High'])
plot_df = esg_df.merge(stock_df[['Date','Return']], on='Date')

plt.figure()
sns.boxplot(x='Risk_Quartile', y='Return', data=plot_df)
plt.title('Daily Returns by ESG Risk Quartile')
plt.xlabel('ESG Risk Quartile')
plt.ylabel('Return')
plt.show()
```
*Interpretation:* Assess whether higher ESG risk firms experience different return distributions than lower-risk firms.

---

## 6. Significance & Implications
_Summarize how these findings inform investment strategies, corporate sustainability practices, and policy considerations._

---

## 7. References & Acknowledgments
- Yahoo Finance: https://finance.yahoo.com/  
- pandas: https://pandas.pydata.org/  
- seaborn: https://seaborn.pydata.org/  
- **AI & Collaboration**: Guided by ChatGPT and collaboration with colleagues.
