<a href="https://colab.research.google.com/github/leomercanti/Beginner_Investing_with_AI/blob/main/Module_3_Introduction_to_Machine_Learning_in_Finance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Module 3 - Introduction to Machine Learning in Finance**

- **Objective:** Learn basic ML techniques and their application in finance.

- **Topics:**
  - **Supervised Learning:** Regression and classification techniques.
  - **Model Evaluation:** Metrics such as MAE, RMSE, cross-validation.

- **Readings:**
  - “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow” by Aurélien Géron.

### **3.1 Supervised Learning**

- **Objective:** Understand the basics of supervised learning and its application in finance.

<br>

#### **What is Supervised Learning?**

- **Definition:** Supervised learning involves training a model on a labeled dataset, where the outcome (target) is known. The model learns to map inputs to outputs based on this data.
- **Types of Problems:**
  - **Regression:** Predicting continuous outcomes (e.g., predicting stock prices).
  - **Classification:** Predicting categorical outcomes (e.g., classifying stock price movements as "up" or "down").

#### **Regression Techniques**

- **Linear Regression:** A fundamental technique for predicting continuous outcomes.

- **Hands-on Example:** Linear Regression in Finance

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

In [None]:
# Example: Predicting stock price based on features
X = data[['Open', 'High', 'Low', 'Volume']]  # Features
y = data['Close']  # Target variable

In [None]:
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)

In [None]:
# Make predictions
predictions = model.predict(X_test)

In [None]:
# Evaluate the model
mse = mean_squared_error(y_test, predictions)
print(f'Mean Squared Error: {mse}')

- **Explanation:** This code trains a linear regression model to predict stock prices based on features like opening price and volume. It then evaluates the model using Mean Squared Error (MSE).

#### **Classification Techniques**

- **Logistic Regression:** Used for binary classification problems (e.g., predicting if the price will go up or down).

- **Hands-on Example:** Logistic Regression for Classification

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

In [None]:
# Example: Classifying stock price movement
data['Target'] = (data['Close'].shift(-1) > data['Close']).astype(int)  # 1 if price goes up, else 0
X = data[['Open', 'High', 'Low', 'Volume']]
y = data['Target']

In [None]:
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Create and train the model
model = LogisticRegression()
model.fit(X_train, y_train)

In [None]:
# Make predictions
predictions = model.predict(X_test)

In [None]:
# Evaluate the model
accuracy = accuracy_score(y_test, predictions)
print(f'Accuracy: {accuracy}')

- **Explanation:** This code trains a logistic regression model to classify whether the stock price will go up or down based on features. It evaluates the model's performance using accuracy.

### **3.2 Model Evaluation**

- **Objective:** Learn how to evaluate and validate machine learning models effectively.

<br>

#### **Evaluation Metrics**

- **Regression Metrics:**
  - **Mean Squared Error (MSE):** Measures the average squared difference between actual and predicted values.
  - **R-squared:** Represents the proportion of variance explained by the model.

- **Classification Metrics:**
  - **Accuracy:** The proportion of correctly classified instances.
  - **Precision, Recall, F1-Score:** Metrics that provide insights into the model’s performance on imbalanced datasets.

- **Hands-on Example:** Evaluation Metrics for Classification

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

In [None]:
# Print classification report
print(classification_report(y_test, predictions))

In [None]:
# Print confusion matrix
print(confusion_matrix(y_test, predictions))

- **Explanation:** The classification report provides detailed metrics like precision, recall, and F1-score, while the confusion matrix shows the counts of true positives, true negatives, false positives, and false negatives.

#### **Cross-Validation**

- **Definition:** A technique to assess the model’s performance by dividing the dataset into multiple folds and training/testing the model on different subsets.

- **Hands-on Example:** Cross-Validation

In [None]:
from sklearn.model_selection import cross_val_score

In [None]:
# Perform cross-validation
cv_scores = cross_val_score(model, X, y, cv=5)
print(f'Cross-Validation Scores: {cv_scores}')
print(f'Mean Cross-Validation Score: {cv_scores.mean()}')

- **Explanation:** Cross-validation scores provide insights into the model’s performance across different subsets of the data, helping to ensure that the model generalizes well.

### **3.3 Advanced Techniques**

- **Objective:** Explore more advanced machine learning techniques used in finance.

#### **Ensemble Methods**

- **Random Forest:** An ensemble of decision trees that improves predictive performance by averaging the results of multiple trees.

- **Hands-on Example:** Random Forest Regressor

In [None]:
from sklearn.ensemble import RandomForestRegressor

In [None]:
# Create and train model
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

In [None]:
# Make predictions
rf_predictions = rf_model.predict(X_test)

In [None]:
# Evaluate model
rf_mse = mean_squared_error(y_test, rf_predictions)
print(f'Random Forest Mean Squared Error: {rf_mse}')

- **Explanation:** The Random Forest Regressor aggregates multiple decision trees to enhance prediction accuracy and robustness.

#### **Neural Networks**

- **Introduction:** Neural networks, particularly deep learning models, can capture complex patterns in large datasets.

- **Hands-on Example:** Basic Neural Network with Keras

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

In [None]:
# Define the neural network model
nn_model = Sequential([
    Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    Dense(32, activation='relu'),
    Dense(1)
])

In [None]:
# Compile and train the model
nn_model.compile(optimizer='adam', loss='mean_squared_error')
nn_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.1)

In [None]:
# Make predictions
nn_predictions = nn_model.predict(X_test)

In [None]:
# Evaluate model
nn_mse = mean_squared_error(y_test, nn_predictions)
print(f'Neural Network Mean Squared Error: {nn_mse}')

- **Explanation:** This code defines and trains a simple neural network for regression tasks. It demonstrates how deep learning models can be applied to financial predictions.

### **3.4 Further Reading and Resources**

- **Books:**
  - “Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow” by Aurélien Géron
  - “Advances in Financial Machine Learning” by Marcos López de Prado

- **Online Courses:**
  - Udacity’s “Deep Learning Nanodegree” for more in-depth neural network training.