# LSTM Networks in Time Series Analysis
### LSTM-Based Portfolio Optimization Using Market and Economic Data

#### Introduction

In this project, we aim to develop a machine learning model that utilizes Long Short-Term Memory (LSTM) networks to optimize a portfolio of selected stocks. The model will be trained on a combination of high-frequency market data and key economic indicators. By leveraging the temporal relationships and patterns in the data, our goal is to predict future stock returns and optimize the portfolio for maximum return or minimum risk.

#### Project Goals:
1. **Data Collection**: Gather high-frequency market data (e.g., daily stock prices, volume, volatility) and key economic indicators (e.g., interest rates, PMI, VIX).
2. **Feature Engineering**: Create features that capture underlying trends and relationships, such as technical indicators and economic regimes.
3. **Model Training**: Train an LSTM network to predict future returns based on the engineered features.
4. **Backtesting and Evaluation**: Evaluate the model’s performance through backtesting on historical data.
5. **Portfolio Optimization**: Use the model's predictions to optimize a portfolio, aiming to maximize return for a given level of risk.

#### Purpose

The purpose of this project is to explore the effectiveness of LSTM networks in portfolio management, specifically focusing on the integration of both market and economic data to enhance predictive accuracy and portfolio performance. This approach mirrors real-world strategies employed by top portfolio managers, who often combine macroeconomic insights with real-time market data to inform their investment decisions.


### 1. Define the Objective
You want to predict stock index prices (e.g., S&P 500, NASDAQ) using:

- Macroeconomic Indicators: CPI, interest rates, unemployment rates, etc.
- Seasonality: Capture recurring patterns during specific months or periods.
- Economic Cycles: Include features that represent economic expansions or contractions

### 2. Collect and Explore Data
**Data Sources**:
- Stock Index Data: You can get stock index price data (e.g., S&P 500) from Yahoo Finance using yfinance library.
- Macroeconomic Data: Collect data like inflation, interest rates, unemployment rates from FRED (Federal Reserve Economic Data) or other sources such as the World Bank or OECD.
- Seasonality and Cycles: You can manually engineer seasonal features or download cyclical economic data (e.g., business cycles).

In [None]:
import yfinance as yf

# Download stock index data (e.g., S&P 500)
sp500_data = yf.download('^GSPC', start='2010-01-01', end='2023-01-01')

# Load macroeconomic data (e.g., CPI, interest rates)
import pandas as pd
cpi_data = pd.read_csv('macroeconomic_data.csv')  # Example macroeconomic data

### 3. Preprocess Data
- Handling Missing Values:
Fill or interpolate missing data for continuous time series to avoid gaps.

- Normalization:
Normalize the data to ensure that features with different ranges (e.g., interest rates vs. stock prices) don't negatively impact model performance.

In [None]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(df)  # Scale the entire dataset

- **Create Time Series Sequences**:
LSTM networks need sequential data. For each time step, create input sequences (e.g., past 60 days) and the target (next day’s price).

In [None]:
def create_sequences(data, time_steps):
    sequences = []
    for i in range(len(data) - time_steps):
        sequences.append(data[i:(i+time_steps)])
    return np.array(sequences)

# Create sequences with 60 time steps
time_steps = 60
X_train, y_train = create_sequences(scaled_data, time_steps)

### 4. Feature Engineering
- Macroeconomic Indicators:
Combine stock index prices with other data sources (CPI, unemployment rates, interest rates, etc.).

In [None]:
# Combine stock data and macroeconomic indicators into one DataFrame
data = pd.merge(sp500_data, cpi_data, on='Date')

- Seasonality Features:
Create features based on seasonality, like the month or day of the week, that could affect stock prices.

In [None]:
data['Month'] = data.index.month
data['DayOfWeek'] = data.index.dayofweek

- Economic Cycle Features:
You can include economic cycle phases, like whether the economy is in expansion or recession, as a categorical feature.

In [None]:
# Manually label periods of expansion and recession (1 for expansion, 0 for recession)
data['EconomicCycle'] = [1 if condition else 0 for condition in data['SomeEconomicData']]

### 5. Train-Test Split
Split your dataset into training and testing sets. The training data will be used to train the LSTM, while the testing data will evaluate the model's performance.

In [None]:
# Split the dataset into training (80%) and testing (20%)
train_size = int(len(data) * 0.8)
train_data, test_data = data[:train_size], data[train_size:]

### 6. Build the LSTM Model
Use TensorFlow/Keras to construct the LSTM model.

**Model Architecture**:
- LSTM Layers: These will capture temporal dependencies in the data.
- Dense Layer: The output layer will predict the next stock index price.

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dropout(0.2))
model.add(LSTM(units=50))
model.add(Dropout(0.2))
model.add(Dense(1))  # Predicting one value (e.g., next day's stock price)

model.compile(optimizer='adam', loss='mean_squared_error')

### 7. Train the Model
Train the model using the prepared training data.

In [None]:
history = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2)

### 8. Evaluate and Fine-Tune the Model
Once the model is trained, evaluate its performance on the test data.

**Performance Metrics**:

Use performance metrics like RMSE, MAE, and direction accuracy to evaluate the model. Also, plot predicted vs. actual prices to visualize the performance.

In [None]:
predictions = model.predict(X_test)
# Convert scaled predictions back to original scale
predictions = scaler.inverse_transform(predictions)

In [None]:
import matplotlib.pyplot as plt
plt.plot(actual_prices, label='Actual Prices')
plt.plot(predicted_prices, label='Predicted Prices')
plt.legend()
plt.show()

### Further Improvements
1. Incorporate More Macroeconomic Indicators: Add more data, like GDP growth rate, manufacturing index, etc.
2. Experiment with Feature Importance: Use techniques like SHAP to evaluate the importance of features.
3. Tune Hyperparameters: Use Grid Search or Random Search to optimize the LSTM architecture.
4. Use Advanced Optimizers: Experiment with different optimizers such as RMSprop or AdamW.

This approach will allow you to build a comprehensive LSTM model capable of predicting stock index prices with the added advantage of considering macroeconomic factors, seasonality, and economic cycles for better accuracy.