<a href="https://colab.research.google.com/github/leomercanti/Course_Advanced_Investing_with_AI/blob/main/Module_1_Foundations_of_AI_in_Finance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Course: Advanced Investing with AI**

## Module 1: Foundations of AI in Finance

### 1.1 Recommended Readings and Resources

- **Textbook:** "Advances in Financial Machine Learning" by Marcos López de Prado
Chapters 1-3: This will give you a foundation on applying machine learning to financial data. You’ll explore data structures used in finance, labeling financial data for AI training, and the concept of overfitting in financial ML models.

- **Research Papers:** “Financial Markets Regime Detection with AI” – Provides insights into classifying different market conditions using ML.
“The Role of Machine Learning in Asset Pricing Models” – Useful for understanding how AI enhances classical financial models.

- **Optional:** “Machine Learning for Asset Managers” by Marcos López de Prado: This will guide you through practical applications of machine learning models in portfolio management.

### 1.2 Key Topics Overview

**Supervised Learning in Finance:**
- ***Regression Models:*** Learn how linear and non-linear models (e.g., Random Forest, Gradient Boosting) predict asset prices.
- ***Classification Models:*** Classifying market regimes (e.g., bullish, bearish) using logistic regression, decision trees, or ensemble models.
- ***Hands-On Example:*** Predicting Stock Price Movements
Use random forests to predict stock price direction based on historical features (moving averages, technical indicators).

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import yfinance as yf

In [None]:
# Fetch historical stock data (Apple example)
data = yf.download('AAPL', start='2019-09-01', end='2024-09-01')

In [None]:
# Feature engineering: Adding moving averages
data['SMA_50'] = data['Close'].rolling(window=50).mean()
data['SMA_200'] = data['Close'].rolling(window=200).mean()

In [None]:
# Target variable: 1 if price increases the next day, 0 otherwise
data['Target'] = (data['Close'].shift(-1) > data['Close']).astype(int)

In [None]:
# Drop NaNs and prepare features/target
data.dropna(inplace=True)
X = data[['SMA_50', 'SMA_200']]
y = data['Target']

In [None]:
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Train random forest model
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

In [None]:
# Model performance
accuracy = clf.score(X_test, y_test)
print(f"Model Accuracy: {accuracy:.2f}")

**Unsupervised Learning for Financial Clustering:**

- ***K-Means Clustering:*** Group similar stocks based on technical or fundamental metrics (e.g., volatility, returns).
- ***Dimensionality Reduction (PCA):*** Reduce complex datasets into lower dimensions to detect patterns in market dynamics.
- ***Hands-On Example:*** Clustering Stocks Based on Returns and Volatility

In [None]:
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import numpy as np

In [None]:
# Stock symbols
tickers = ['AAPL', 'GOOGL', 'AMZN', 'MSFT', 'TSLA']

In [None]:
# Download data
data2 = yf.download(tickers, start='2021-01-01', end='2024-09-01')['Adj Close'].pct_change().dropna()

In [None]:
# Check column names to find the correct name for the close price
print(data2.columns)

In [None]:
# Calculate volatility (std) and returns (mean) for each stock
volatility = data2.std() * np.sqrt(252)  # Annualized volatility
returns = data2.mean() * 252  # Annualized returns

In [None]:
# Stack data for clustering
X = np.column_stack([returns, volatility])

In [None]:
# KMeans clustering
kmeans = KMeans(n_clusters=3, random_state=42)
clusters = kmeans.fit_predict(X)

In [None]:
# Plot the clusters
plt.scatter(returns, volatility, c=clusters)
plt.xlabel('Annualized Returns')
plt.ylabel('Annualized Volatility')
plt.title('Stock Clustering based on Returns and Volatility')
plt.show()


### 1.3 Financial Data Acquisition and Preprocessing

**Data Sources:**

- ***Yahoo Finance*** (for stock prices, technical indicators)
- ***Quandl*** (for macroeconomic indicators, alternative data)
- ***Alpha Vantage*** (for real-time data and technical analysis)

**Key Financial Metrics:**

- ***Returns:*** Daily/annualized returns, moving averages.
- ***Volatility:*** Calculated using standard deviation over time windows (e.g., 30, 60 days).
- ***Technical Indicators:*** Moving Averages (SMA, EMA), Bollinger Bands, RSI, MACD.
- ***Hands-On Example:*** Using Yahoo Finance to Fetch and Clean Financial Data

In [None]:
import yfinance as yf
import pandas as pd

In [None]:
# Download historical data for multiple stocks
tickers = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'TSLA']
data3 = yf.download(tickers, start='2021-01-01', end='2024-09-01')['Adj Close']

In [None]:
# Calculate daily returns
returns = data3.pct_change().dropna()

In [None]:
# Simple moving average example
data3['SMA_50'] = data2['AAPL'].rolling(window=50).mean()

In [None]:
# Technical indicator (Relative Strength Index - RSI)
window = 14
delta = data3['AAPL'].diff(1)
gain = (delta.where(delta > 0, 0)).rolling(window=window).mean()
loss = (-delta.where(delta < 0, 0)).rolling(window=window).mean()
rs = gain / loss
data2['RSI'] = 100 - (100 / (1 + rs))

In [None]:
# Inspect data
print(data3.tail())

### 1.4 Building Predictive Models to Detect Market Regimes

**Market Regime Detection:**
- ***Classify the market*** into bullish, bearish, or sideways trends using classification models like logistic regression or decision trees.
- ***Train models*** using technical indicators (e.g., moving averages, volatility) as features.
- ***Hands-On Example:*** Market Regime Classification Using Decision Trees

In [None]:
from sklearn.tree import DecisionTreeClassifier
import numpy as np

In [None]:
# Check column names to find the correct name for the close price
# We will use the very first dataset imported on section 1.2
print(data.columns)

In [None]:
# Define bullish and bearish regimes using moving averages
data['Regime'] = np.where(data['SMA_50'] > data['SMA_200'], 1, 0)  # Bullish if SMA_50 > SMA_200

In [None]:
# Features and target
X = data[['SMA_50', 'SMA_200']].dropna()
y = data['Regime'].dropna()

In [None]:
# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [None]:
# Train Decision Tree Classifier
clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

In [None]:
# Model accuracy
accuracy = clf.score(X_test, y_test)
print(f"Model Accuracy: {accuracy:.2f}")

### 1.5 End of Module Assignments and Practice (Optional)

- ***Assignment 1:*** Download stock data from multiple companies and calculate various technical indicators (RSI, MACD, Moving Averages). Perform feature engineering to create meaningful input data for machine learning models.

- ***Assignment 2:*** Implement a market regime classifier using decision trees or random forests, and evaluate model performance. Experiment with different feature sets (e.g., volatility, returns) and hyperparameters.

- ***Assignment 3:*** Apply K-Means clustering to group stocks into clusters based on returns, volatility, and other financial metrics. Analyze the results and adjust the number of clusters.

By the end of this Module 1, you should be familiar with acquiring financial data, performing feature engineering, and building simple machine learning models to detect trends or regimes in the stock market. This will set the stage for more advanced AI techniques (e.g., deep learning, reinforcement learning) in the following weeks.