This project offers a detailed quantitative analysis of five key market sectors: Technology, Healthcare, Finance, Energy, and Consumer Discretionary. Using historical data and advanced statistical techniques, we provide insights to help investors make informed decisions about sector allocation in their portfolios.

Key Components:

1. Risk-Return Profile: We analyze the relationship between volatility and returns for each sector. The scatter plot shows that Technology offers the highest returns but with increased volatility, while Healthcare presents lower risk and more moderate returns.

2. Monte Carlo Simulation: Our simulation projects potential future returns, accounting for market uncertainties. Technology shows the highest expected return (26.97%) but also significant risk, with a worst-case scenario of -36.19%. Energy demonstrates high potential returns (22.10%) but the highest volatility, with a worst-case of -51.69%.

3. Sector Correlations: The heatmap reveals important relationships between sectors. Technology and Consumer Discretionary show the strongest correlation (0.85), suggesting limited diversification benefits when combined. Energy exhibits the lowest average correlation, potentially offering valuable diversification opportunities.

4. Cumulative Returns: The line chart tracks sector performance over time. Technology consistently outperforms other sectors, while Energy shows high volatility with periods of both underperformance and strong recovery.

Insights:
- Technology offers the highest growth potential but with increased risk.
- Healthcare provides stability and could serve as a defensive play.
- Energy presents opportunities for high returns but requires tolerance for significant volatility.
- Combining less correlated sectors (e.g., Technology with Energy or Healthcare) may enhance portfolio diversification.

This analysis equips investors with quantitative insights to optimize their sector allocation strategies, balancing potential returns with risk management based on individual risk tolerance and investment goals.

###Analysis (Visual dashboard is available for reference at the project's conclusion.)

In [1]:
import yfinance as yf
import pandas as pd
import numpy as np
from scipy import stats
from scipy.stats import norm
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import ipywidgets as widgets
from IPython.display import display

In [2]:
def fetch_sector_data(sectors, start_date, end_date):
    sector_data = {}
    for sector, ticker in sectors.items():
        data = yf.download(ticker, start=start_date, end=end_date)
        sector_data[sector] = data['Adj Close']
    return pd.DataFrame(sector_data)

# Define sectors and corresponding ETFs
sectors = {
    'Technology': 'XLK',
    'Healthcare': 'XLV',
    'Finance': 'XLF',
    'Energy': 'XLE',
    'Consumer Discretionary': 'XLY'
}

# Set date range
end_date = datetime.now()
start_date = end_date - timedelta(days=5*365)  # 5 years of data

# Fetch data
df = fetch_sector_data(sectors, start_date, end_date)

# Calculate daily returns
returns = df.pct_change().dropna()

# Calculate cumulative returns
cumulative_returns = (1 + returns).cumprod()

# Save processed data
returns.to_csv('sector_returns.csv')
cumulative_returns.to_csv('sector_cumulative_returns.csv')
print("Processed data saved to sector_returns.csv and sector_cumulative_returns.csv")

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


Processed data saved to sector_returns.csv and sector_cumulative_returns.csv


In [3]:
def calculate_volatility(returns):
    return returns.std() * np.sqrt(252)  # Annualized volatility

def calculate_beta(sector_returns, market_returns):
    # Align dates
    aligned_returns = pd.concat([sector_returns, market_returns], axis=1).dropna()
    sector_returns = aligned_returns.iloc[:, 0]
    market_returns = aligned_returns.iloc[:, 1]

    covariance = np.cov(sector_returns, market_returns)[0][1]
    market_variance = np.var(market_returns)
    return covariance / market_variance

def calculate_var(returns, confidence_level=0.95):
    return np.percentile(returns, 100 * (1 - confidence_level))

# Load returns data
returns = pd.read_csv('sector_returns.csv', index_col=0, parse_dates=True)

# Calculate volatility
volatility = returns.apply(calculate_volatility)

# Calculate beta (using S&P 500 as market benchmark)
sp500 = yf.download('^GSPC', start=returns.index[0], end=returns.index[-1])['Adj Close']
sp500_returns = sp500.pct_change().dropna()

# Ensure sp500_returns has the same index as returns
sp500_returns = sp500_returns.reindex(returns.index).dropna()

# Recalculate returns to match sp500_returns dates
returns = returns.loc[sp500_returns.index]

beta = returns.apply(lambda x: calculate_beta(x, sp500_returns))

# Calculate Value at Risk (VaR)
var = returns.apply(calculate_var)

# Combine risk metrics
risk_metrics = pd.DataFrame({
    'Volatility': volatility,
    'Beta': beta,
    'VaR (95%)': var
})

# Save risk metrics
risk_metrics.to_csv('risk_metrics.csv')
print("Risk metrics saved to risk_metrics.csv")

[*********************100%%**********************]  1 of 1 completed

Risk metrics saved to risk_metrics.csv





In [4]:
# Load data
returns = pd.read_csv('sector_returns.csv', index_col=0, parse_dates=True)
cumulative_returns = pd.read_csv('sector_cumulative_returns.csv', index_col=0, parse_dates=True)
risk_metrics = pd.read_csv('risk_metrics.csv', index_col=0)

# Set style
plt.style.use('seaborn')

# 1. Cumulative returns plot
plt.figure(figsize=(12, 6))
cumulative_returns.plot()
plt.title('Cumulative Returns by Sector')
plt.ylabel('Cumulative Return')
plt.legend(loc='upper left')
plt.savefig('cumulative_returns.png')
plt.close()

# 2. Risk-return scatter plot
plt.figure(figsize=(10, 6))
annualized_returns = returns.mean() * 252
plt.scatter(risk_metrics['Volatility'], annualized_returns, s=50)
for i, sector in enumerate(risk_metrics.index):
    plt.annotate(sector, (risk_metrics['Volatility'][i], annualized_returns[i]))
plt.xlabel('Annualized Volatility')
plt.ylabel('Annualized Return')
plt.title('Risk-Return Profile by Sector')
plt.savefig('risk_return_scatter.png')
plt.close()

# 3. Correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(returns.corr(), annot=True, cmap='coolwarm', vmin=-1, vmax=1, center=0)
plt.title('Sector Return Correlations')
plt.savefig('correlation_heatmap.png')
plt.close()

print("Visualizations saved as PNG files.")

  plt.style.use('seaborn')
  plt.annotate(sector, (risk_metrics['Volatility'][i], annualized_returns[i]))


Visualizations saved as PNG files.


<Figure size 1200x600 with 0 Axes>

In [5]:
def monte_carlo_simulation(returns, num_simulations=10000, time_horizon=252):
    mean_returns = returns.mean()
    cov_matrix = returns.cov()

    simulations = np.zeros((num_simulations, len(returns.columns)))

    for i in range(num_simulations):
        Z = norm.ppf(np.random.rand(time_horizon, len(returns.columns)))
        L = np.linalg.cholesky(cov_matrix)
        daily_returns = mean_returns.values + np.dot(Z, L.T)
        simulations[i] = np.cumprod(1 + daily_returns, axis=0)[-1] - 1

    return pd.DataFrame(simulations, columns=returns.columns)

# Load returns data
returns = pd.read_csv('sector_returns.csv', index_col=0, parse_dates=True)

# Run Monte Carlo simulation
simulated_returns = monte_carlo_simulation(returns)

# Plot distribution of simulated returns
plt.figure(figsize=(12, 8))
simulated_returns.plot(kind='kde')
plt.title('Distribution of Simulated Annual Returns by Sector')
plt.xlabel('Annual Return')
plt.ylabel('Density')
plt.legend(title='Sectors')
plt.savefig('monte_carlo_distribution.png')
plt.close()

# Calculate and print risk metrics from simulation
risk_metrics = pd.DataFrame({
    'Expected Return': simulated_returns.mean(),
    'Worst Case (1%)': simulated_returns.quantile(0.01),
    'Best Case (99%)': simulated_returns.quantile(0.99)
})
risk_metrics.to_csv('monte_carlo_risk_metrics.csv')
print("Monte Carlo simulation results saved to monte_carlo_risk_metrics.csv")

Monte Carlo simulation results saved to monte_carlo_risk_metrics.csv


<Figure size 1200x800 with 0 Axes>

In [6]:
# Load returns data
returns = pd.read_csv('sector_returns.csv', index_col=0, parse_dates=True)

def create_features(df, window=30):
    df = df.copy()
    for col in df.columns:
        df[f'{col}_MA{window}'] = df[col].rolling(window=window).mean()
        df[f'{col}_Volatility{window}'] = df[col].rolling(window=window).std()
    df['SPY'] = df.mean(axis=1)  # Simple proxy for market return
    df['SPY_MA30'] = df['SPY'].rolling(window=30).mean()
    df['SPY_Volatility30'] = df['SPY'].rolling(window=30).std()
    return df.dropna()

def train_predict_sector(df, sector):
    X = df.drop(columns=df.filter(regex='^(?!.*_)').columns)
    y = df[sector]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    model = RandomForestRegressor(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)  # Changed from model.test to model.predict
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)

    feature_importance = pd.Series(model.feature_importances_, index=X.columns).sort_values(ascending=False)

    return model, mse, r2, feature_importance

# Prepare data
data = create_features(returns)

# Train models and get results for each sector
results = {}
for sector in returns.columns:
    model, mse, r2, importance = train_predict_sector(data, sector)
    results[sector] = {'MSE': mse, 'R2': r2, 'Feature Importance': importance}

# Save results
with pd.ExcelWriter('ml_prediction_results.xlsx') as writer:
    for sector, res in results.items():
        pd.DataFrame({'MSE': [res['MSE']], 'R2': [res['R2']]}).to_excel(writer, sheet_name=sector)
        res['Feature Importance'].to_excel(writer, sheet_name=f"{sector}_importance")

print("Machine learning prediction results saved to ml_prediction_results.xlsx")

# Plot feature importance for each sector
for sector, res in results.items():
    plt.figure(figsize=(10, 6))
    res['Feature Importance'][:10].plot(kind='bar')
    plt.title(f'Top 10 Feature Importance for {sector}')
    plt.tight_layout()
    plt.savefig(f'{sector}_feature_importance.png')
    plt.close()

print("Feature importance plots saved as PNG files.")



Machine learning prediction results saved to ml_prediction_results.xlsx
Feature importance plots saved as PNG files.


###Visualization

In [7]:
# Load data
risk_metrics = pd.read_csv('risk_metrics.csv', index_col=0)
returns = pd.read_csv('sector_returns.csv', index_col=0, parse_dates=True)

# Calculate cumulative returns
cumulative_returns = (1 + returns).cumprod()

# Create risk-return scatter plot
def plot_risk_return():
    fig = px.scatter(risk_metrics, x='Volatility', y='Beta',
                     text=risk_metrics.index, title="Risk-Return Profile")
    fig.update_traces(textposition='top center')
    fig.show()

# Create cumulative returns line plot
def plot_cumulative_returns():
    fig = px.line(cumulative_returns, x=cumulative_returns.index, y=cumulative_returns.columns,
                  title="Cumulative Returns by Sector")
    fig.show()

# Create interactive sector returns plot
def plot_sector_returns(sector):
    fig = px.line(returns, x=returns.index, y=sector,
                  title=f"Returns for {sector}")
    fig.show()

# Create correlation heatmap
def plot_correlation_heatmap():
    corr = returns.corr()
    fig = px.imshow(corr, text_auto=True, aspect="equal",
                    title="Sector Return Correlations")
    fig.show()

# Create interactive widget for sector selection
sector_dropdown = widgets.Dropdown(
    options=list(returns.columns),
    value=returns.columns[0],
    description='Select Sector:',
    style={'description_width': 'initial'}
)

# Display all plots
print("# Sector Analysis Dashboard")

print("## Risk-Return Profile")
plot_risk_return()

print("## Cumulative Returns")
plot_cumulative_returns()

print("## Sector Correlations")
plot_correlation_heatmap()

print("## Individual Sector Returns")
display(sector_dropdown)
widgets.interactive(plot_sector_returns, sector=sector_dropdown)

# Additional analysis: Sharpe Ratio calculation
risk_free_rate = 0.02  # Assume 2% risk-free rate
sharpe_ratios = (returns.mean() - risk_free_rate) / returns.std() * np.sqrt(252)

print("## Sharpe Ratios")
fig = px.bar(sharpe_ratios, title="Sharpe Ratio by Sector")
fig.show()

print("### Interpretation:")
print("The Sharpe ratio measures risk-adjusted performance. A higher Sharpe ratio indicates better risk-adjusted returns.")
print(f"Best performing sector: {sharpe_ratios.idxmax()} with Sharpe ratio of {sharpe_ratios.max():.2f}")
print(f"Worst performing sector: {sharpe_ratios.idxmin()} with Sharpe ratio of {sharpe_ratios.min():.2f}")

# Monte Carlo Simulation
def monte_carlo_simulation(returns, num_simulations=1000, time_horizon=252):
    mean_returns = returns.mean()
    cov_matrix = returns.cov()

    simulations = np.zeros((num_simulations, len(returns.columns)))

    for i in range(num_simulations):
        Z = np.random.normal(0, 1, (time_horizon, len(returns.columns)))
        L = np.linalg.cholesky(cov_matrix)
        daily_returns = mean_returns.values + np.dot(Z, L.T)
        simulations[i] = np.cumprod(1 + daily_returns, axis=0)[-1] - 1

    return pd.DataFrame(simulations, columns=returns.columns)

print("## Monte Carlo Simulation")
simulated_returns = monte_carlo_simulation(returns)

fig = go.Figure()
for col in simulated_returns.columns:
    fig.add_trace(go.Box(y=simulated_returns[col], name=col))
fig.update_layout(title="Distribution of Simulated Annual Returns by Sector",
                  yaxis_title="Annual Return")
fig.show()

print("### Interpretation:")
print("The box plots show the distribution of potential annual returns for each sector based on historical data.")
print("The box represents the interquartile range, the line in the box is the median, and the whiskers extend to show the rest of the distribution.")
print("Wider boxes and longer whiskers indicate higher volatility and potential for extreme returns (both positive and negative).")

# Sector Analysis Dashboard
## Risk-Return Profile


## Cumulative Returns



The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result



## Sector Correlations


## Individual Sector Returns


Dropdown(description='Select Sector:', options=('Technology', 'Healthcare', 'Finance', 'Energy', 'Consumer Dis…

## Sharpe Ratios


### Interpretation:
The Sharpe ratio measures risk-adjusted performance. A higher Sharpe ratio indicates better risk-adjusted returns.
Best performing sector: Energy with Sharpe ratio of -13.26
Worst performing sector: Healthcare with Sharpe ratio of -26.67
## Monte Carlo Simulation


### Interpretation:
The box plots show the distribution of potential annual returns for each sector based on historical data.
The box represents the interquartile range, the line in the box is the median, and the whiskers extend to show the rest of the distribution.
Wider boxes and longer whiskers indicate higher volatility and potential for extreme returns (both positive and negative).
