

Data Loading and Preprocessing:

- The code starts by importing necessary libraries, including Pandas, NumPy, Matplotlib, and PyTorch.
- It reads a CSV file (aapl_raw_data.csv) into a Pandas DataFrame and performs initial data exploration (data.shape and data.tail(1)).
- It removes rows beyond the first 10,747 rows (data = data.iloc[:10747]) and handles missing values by filling them with zeros (data=data.fillna(0)).
- The data types of certain columns (open, high, low, volume, etc.) are explicitly converted to float.
- The script standardizes selected columns using StandardScaler from scikit-learn.

Embedding Layer for Date:
- Another set of code is provided, starting with the creation of a new DataFrame (data2) containing a 'date' column with dates ranging from '1980-12-12' to '2023-07-31'.
- The 'date' column is converted to a datetime format.
- Day, month, and year values are extracted from the 'date' column.
- Embedding layers are created for day, month, and year using PyTorch's nn.Embedding.
- Embeddings are generated for the day, month, and year values, and these embeddings are concatenated into a single tensor (date_embeddings).

Printing the Resulting Embeddings:
- The script prints the resulting embeddings (print(date_embeddings)).



Note: The provided code assumes specific embedding dimensions (embedding_dim = 1). You may need to adjust this dimension based on your specific use case and the size of your dataset.

If you have a specific question or if there's something specific you would like to do with the embeddings, please provide more details so that I can assist you further.






In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from sklearn import metrics
import numpy as npw
from sklearn.preprocessing import StandardScaler

import torch
import torch.nn as nn
import torch.nn.functional as F

from sklearn.preprocessing import MinMaxScaler


# Read the CSV file
data = pd.read_csv("../data/data/aapl_raw_data.csv")
data.shape
data.tail(1)

data = data.iloc[:10747]
data.tail(1)


data.isnull().sum()
data=data.fillna(0)  # Filling null values with zero
#data.isnull().sum()



#data["date"] = data["date"].astype(float)
data["open"] = data["open"].astype(float)
data["high"] = data["high"].astype(float)
data["low"] = data["low"].astype(float)
data["volume"] = data["volume"].astype(float)
data["adjusted_close"] = data["adjusted_close"].astype(float)
data["change_percent"] = data["change_percent"].astype(float)
data["avg_vol_20d"] = data["avg_vol_20d"].astype(float)

data["close"] = data["close"].astype(float)

# Specify the columns you want to standardize
columns_to_standardize = ["open", "high", "low", "volume", "adjusted_close", "change_percent", "avg_vol_20d"]

# Create a StandardScaler object
scaler = StandardScaler()

# Loop through the columns and standardize each one
for column in columns_to_standardize:
    data[column] = scaler.fit_transform(data[[column]])



import torch
import torch.nn as nn
import pandas as pd

# Sample data with a 'date' column containing dates from 1980-12-12 to 2023-07-31
data2 = pd.DataFrame({'date': pd.date_range(start='1980-12-12', end='2023-07-31')})

# Convert 'date' column to datetime
data2['date'] = pd.to_datetime(data['date'])

# Extract day, month, and year from the date column
data['day'] = data2['date'].dt.day
data['month'] = data2['date'].dt.month
data['year'] = data2['date'].dt.year

# Define the embedding dimensions
embedding_dim = 1  # You can adjust this dimension as needed

# Create embedding layers for day, month, and year
day_embedding = nn.Embedding(32, embedding_dim)  # 0-31 days
month_embedding = nn.Embedding(13, embedding_dim)  # 1-12 months
year_embedding = nn.Embedding(44, embedding_dim)  # Embedding for years from 1980 to 2023

# Convert day, month, and year to tensors with Long data type
day_tensor = torch.LongTensor(data['day'].values)
month_tensor = torch.LongTensor(data['month'].values)
year_tensor = torch.LongTensor(data['year'].values - 1980)  # Convert years to an index from 0 to 43


# Pass tensors through embedding layers to get embeddings
day_embeddings = day_embedding(day_tensor)
month_embeddings = month_embedding(month_tensor)
year_embeddings = year_embedding(year_tensor)

# Concatenate the embeddings
date_embeddings = torch.cat((day_embeddings, month_embeddings, year_embeddings), dim=1)

# Print the resulting embeddings
print(date_embeddings)









tensor([[-1.2244,  1.8514,  1.4517],
        [ 0.6590,  1.8514,  1.4517],
        [-1.4248,  1.8514,  1.4517],
        ...,
        [ 1.1204,  0.1840,  0.4753],
        [ 0.2789,  0.1840,  0.4753],
        [-0.5472,  0.1840,  0.4753]], grad_fn=<CatBackward0>)


In [2]:
'''

import pickle

# Define the path where you want to save the pickle file
pickle_path = "../data/data/date_embeddings.pkl"

# Convert tensor to numpy array before saving (optional)
date_embeddings_np = date_embeddings.detach().numpy()

# Save the embeddings to a pickle file
with open(pickle_path, 'wb') as f:
    pickle.dump(date_embeddings_np, f)

print(f"Date embeddings saved to {pickle_path}")

'''

'\n\nimport pickle\n\n# Define the path where you want to save the pickle file\npickle_path = "../data/data/date_embeddings.pkl"\n\n# Convert tensor to numpy array before saving (optional)\ndate_embeddings_np = date_embeddings.detach().numpy()\n\n# Save the embeddings to a pickle file\nwith open(pickle_path, \'wb\') as f:\n    pickle.dump(date_embeddings_np, f)\n\nprint(f"Date embeddings saved to {pickle_path}")\n\n'

Implementation of a Long Short-Term Memory (LSTM) neural network for time series prediction, specifically applied to stock price prediction:



Data Preprocessing:
Training and testing data are loaded and split using train_test_split.
A MinMaxScaler is applied to normalize the input features.
Date embeddings are concatenated with the training and testing data.

Model Definition:
An LSTM model is defined using the LSTMModel class, which extends nn.Module.
The LSTM model takes input features, date embeddings, and other hyperparameters such as hidden dimensions, number of layers, and sequence length.
The forward method processes the input through the LSTM layer and a fully connected layer (fc).

Training Loop:
Hyperparameters such as learning rate, number of epochs, and batch size are defined.
The training loop iterates over epochs, batches, and performs forward and backward passes.
The model is optimized using the Adam optimizer and Mean Squared Error (MSE) loss.
Validation loss is printed periodically during training.

Feature Extraction:
After training, the model is evaluated on the test set, and the hidden states are extracted from the LSTM layer.

Printing Hidden States:
The hidden states are printed after training.



It's worth noting a couple of potential improvements and considerations:
The LSTM model's architecture and hyperparameters should be fine-tuned based on the specific problem and dataset.
The number of layers (n_layers) is set to 4, which may be too high for some applications. Consider adjusting it based on the complexity of your data.
The model is trained using Mean Squared Error (MSE) loss, which may be appropriate for regression tasks. For classification tasks, a different loss function might be needed.
Ensure that the input data, especially the sequence length, aligns with the model architecture. The sequence length is set to 65, but it's crucial to match it with the characteristics of the time series data.
Finally, if you have a specific question or need further clarification on any part of the code, feel free to ask!







In [3]:
"""
import pickle

# Define the path from where you want to load the pickle file
pickle_path = "../data/data/date_embeddings.pkl"

# Load the embeddings from the pickle file
with open(pickle_path, 'rb') as f:
    loaded_date_embeddings = pickle.load(f)

# Print a confirmation message
print(f"Date embeddings loaded from {pickle_path}")

# Now, the variable loaded_date_embeddings contains the embeddings loaded from the pickle file.
"""

'\nimport pickle\n\n# Define the path from where you want to load the pickle file\npickle_path = "../data/data/date_embeddings.pkl"\n\n# Load the embeddings from the pickle file\nwith open(pickle_path, \'rb\') as f:\n    loaded_date_embeddings = pickle.load(f)\n\n# Print a confirmation message\nprint(f"Date embeddings loaded from {pickle_path}")\n\n# Now, the variable loaded_date_embeddings contains the embeddings loaded from the pickle file.\n'

In [4]:
import numpy as np
import torch
import torch.nn as nn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

# Load the x_train and y_train data
x_train = data[['open', 'high', 'low', 'volume','adjusted_close', 'change_percent', 'avg_vol_20d']].to_numpy()
y_train = data["close"].to_numpy()

# Split the data into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(x_train, y_train, test_size=0.33, random_state=45)

# Initialize the scaler
scaler = MinMaxScaler()

# Fit the scaler on the training data and transform both training and testing data
x_train_scaled = scaler.fit_transform(x_train)
x_test_scaled = scaler.transform(x_test)

# Convert the data to PyTorch tensors
x_train_tensor = torch.tensor(x_train_scaled, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32)
x_test_tensor = torch.tensor(x_test_scaled, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32)



# Concatenate date embeddings with your training data
x_train_feature_tensors = torch.tensor(x_train_scaled, dtype=torch.float32)
x_train_date_embeddings = date_embeddings[:len(x_train_scaled)]  # Use the same length as your training data
x_train_combined = torch.cat((x_train_feature_tensors, x_train_date_embeddings), dim=1)

# Concatenate date embeddings with your testing data
x_test_feature_tensors = torch.tensor(x_test_scaled, dtype=torch.float32)
x_test_date_embeddings = date_embeddings[len(x_train_scaled):]  # Use the remaining embeddings for testing data
x_test_combined = torch.cat((x_test_feature_tensors, x_test_date_embeddings), dim=1)


"""
# Convert the combined testing data to PyTorch tensors
 x_test_tensor = torch.tensor(x_test_combined, dtype=torch.float32)

# Create y_train_tensor and y_test_tensor
y_train_tensor = torch.tensor(y_train, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32)

"""

# Copy x_train_combined to create x_train_tensor
x_train_tensor = x_train_combined.clone().detach()
x_train_tensor = x_train_tensor.to(torch.float32)

# Copy x_test_combined to create x_test_tensor
x_test_tensor = x_test_combined.clone().detach()
x_test_tensor = x_test_tensor.to(torch.float32)




# Inside your LSTM model class
class LSTMModel(nn.Module):
    def __init__(self, input_size, date_embedding_dim, hidden_dim, n_layers, output_size, sequence_length):
        super(LSTMModel, self).__init__()

        self.input_size = input_size + date_embedding_dim  # Updated input size to include date embeddings
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers
        self.sequence_length = sequence_length

        # Define the LSTM layer as a class attribute
        self.lstm = nn.LSTM(self.input_size, hidden_dim, n_layers, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_size)

    def forward(self, x):
        batch_size = x.size(0)

        h0 = torch.zeros(self.n_layers, batch_size, self.hidden_dim).to(x.device)
        c0 = torch.zeros(self.n_layers, batch_size, self.hidden_dim).to(x.device)

        # Ensure input has the shape [batch_size, sequence_length, input_size]
        x = x.view(batch_size, 1, self.input_size)

        out, (hidden, cell) = self.lstm(x, (h0, c0))
        out = self.fc(out[:, -1, :])
        return out




# Set random seeds for reproducibility
seed = 42
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False


# Define hyperparameters
input_size = 7  # Adjust based on the number of input features (excluding change_percent)
date_embedding_dim = 3  # Adjust based on the dimension of your date embeddings
output_size = 1
hidden_dim = 16
n_layers = 4
sequence_length = 65  # Keep this as 1 for your input data
batch_size = 64



# Create a DataLoader for batch training
train_data = torch.utils.data.TensorDataset(x_train_tensor, y_train_tensor)
train_loader = torch.utils.data.DataLoader(dataset=train_data, batch_size=batch_size, shuffle=True)




# Create an instance of the LSTM model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = LSTMModel(input_size, date_embedding_dim, hidden_dim, n_layers, output_size, sequence_length).to(device)

# Define loss function and optimizer
loss_function = nn.MSELoss()
learning_rate = 0.00015
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

print(f'Hyperparameters: Learning Rate={learning_rate}, Sequence Length={sequence_length}, Batch Size={batch_size}, Input Size={input_size}, Date Embedding Dim={date_embedding_dim}, Hidden Dim={hidden_dim},'
              f'Layers={n_layers}')


# Training loop
num_epochs = 4500

for epoch in range(num_epochs):
    model.train()
    for batch_x, batch_y in train_loader:
        optimizer.zero_grad()

        # Forward pass
        outputs = model(batch_x)

        # Calculate loss
        loss = loss_function(outputs, batch_y.view(-1, 1))  # Ensure batch_y has the right shape

        # Backpropagation
        loss.backward()
        optimizer.step()

    if (epoch + 1) % 100 == 0:
        # Extract features on the testing set
        model.eval()
        with torch.no_grad():
            # Extract features from the hidden states
            hidden_states, _ = model.lstm(x_test_tensor.view(x_test_tensor.size(0), 1, -1))
            val_outputs = model(x_test_tensor)
            val_loss = loss_function(val_outputs, y_test_tensor.view(-1, 1))  # Ensure y_test_tensor has the right shape
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}, Val Loss: {val_loss.item():.4f}')
        # You can now use 'hidden_states' as the feature representations of your sequences
        # The shape of 'hidden_states' will be (batch_size, sequence_length, hidden_dim)

# Print hidden_states after training
print(hidden_states)










Hyperparameters: Learning Rate=0.00015, Sequence Length=65, Batch Size=64, Input Size=7, Date Embedding Dim=3, Hidden Dim=16,Layers=4


Epoch [100/4500], Loss: 18978.9629, Val Loss: 25122.5371
Epoch [200/4500], Loss: 12384.5264, Val Loss: 21844.5273
Epoch [300/4500], Loss: 12511.6289, Val Loss: 18733.0840
Epoch [400/4500], Loss: 8806.0029, Val Loss: 16179.9658
Epoch [500/4500], Loss: 6498.1045, Val Loss: 14008.1309
Epoch [600/4500], Loss: 22702.8613, Val Loss: 12159.3135
Epoch [700/4500], Loss: 4821.3374, Val Loss: 10576.2061
Epoch [800/4500], Loss: 16599.7891, Val Loss: 9203.2500
Epoch [900/4500], Loss: 194.1398, Val Loss: 7996.8320
Epoch [1000/4500], Loss: 5888.9082, Val Loss: 6917.7910
Epoch [1100/4500], Loss: 8622.8945, Val Loss: 5949.5586
Epoch [1200/4500], Loss: 6299.5659, Val Loss: 5083.8535
Epoch [1300/4500], Loss: 3943.5681, Val Loss: 4312.1958
Epoch [1400/4500], Loss: 1222.7190, Val Loss: 3628.8413
Epoch [1500/4500], Loss: 1927.1990, Val Loss: 3030.5356
Epoch [1600/4500], Loss: 3081.5498, Val Loss: 2509.4121
Epoch [1700/4500], Loss: 0.3001, Val Loss: 2059.1221
Epoch [1800/4500], Loss: 1275.9719, Val Loss: 167

In [16]:
print("Shape of x_train_feature_tensors:", x_train_feature_tensors.shape)
print("Shape of x_train_date_embeddings:", x_train_date_embeddings.shape)

print("Shape of x_test_feature_tensors: ", x_test_feature_tensors.shape)
print("Shape of x_test_date_embeddings: ", x_test_date_embeddings.shape)

Shape of x_train_feature_tensors: torch.Size([7200, 7])
Shape of x_train_date_embeddings: torch.Size([7200, 3])
Shape of x_test_feature_tensors:  torch.Size([3547, 7])
Shape of x_test_date_embeddings:  torch.Size([3547, 3])


In [12]:
print("Shape of x_train:", x_train.shape)
print("Shape of x_test:", x_test.shape)
print("Shape of y_train:", y_train.shape)
print("Shape of y_test:", y_test.shape)


Shape of x_train: (7200, 7)
Shape of x_test: (3547, 7)
Shape of y_train: (2837, 1)
Shape of y_test: (710, 1)


In [5]:
"""

import pickle

# Save hidden_states using pickle
with open('hidden_states.pkl', 'wb') as f:
    pickle.dump(hidden_states, f)

"""

"\n\nimport pickle\n\n# Save hidden_states using pickle\nwith open('hidden_states.pkl', 'wb') as f:\n    pickle.dump(hidden_states, f)\n\n"

In [6]:
"""
import pickle

# Load hidden_states from the saved file
with open('hidden_states.pkl', 'rb') as f:
    loaded_hidden_states = pickle.load(f)

# Now, you can use loaded_hidden_states as you would normally use hidden_states
print(loaded_hidden_states)

"""

tensor([[[ 0.4743,  0.3444, -0.5437,  ..., -0.5118, -0.0398,  0.0840]],

        [[ 0.4728,  0.3463, -0.5443,  ..., -0.5133, -0.0426,  0.0871]],

        [[ 0.4406,  0.3176, -0.5339,  ..., -0.5004, -0.0123,  0.0586]],

        ...,

        [[ 0.5916,  0.4664, -0.5911,  ..., -0.5719, -0.2051,  0.2398]],

        [[ 0.4406,  0.3168, -0.5336,  ..., -0.4998, -0.0113,  0.0576]],

        [[ 0.6520,  0.5440, -0.6259,  ..., -0.6183, -0.3507,  0.3760]]])


 The code segments demonstrate the integration of LSTM features and date embeddings into a Random Forest Regressor for predicting the closing stock prices. Random Forest model is trained on the combined features obtained from the LSTM model and the one-hot encoding for the target date. This approach leverages the strengths of both models for time series prediction.


- Generate Date Embedding for Target Date (11th August):
    -The code first defines the target date as "11th August" and creates one-hot encodings for day, month, and year.
    -It then maps the target date to a one-hot encoding and concatenates these encodings to create the date embedding for the target date.

- Combine Hidden States with Date Embedding:
    -The date embedding for the target date is broadcasted to match the shape of the hidden states obtained from the LSTM model.
    -The hidden states and the broadcasted date embedding are concatenated along the third dimension to create a new tensor (combined_states).

- Train Random Forest Regressor:
    - The code initializes a Random Forest Regressor and fits it to the combined feature tensor (combined_states_2d) and the predicted close values (close_values).

- Predict Close Value for 11th August 2023:
    -The trained Random Forest model is then used to predict the close value for the target date (11th August 2023) based on the combined features.
    
- Print Predicted Close Value:
    -The predicted close value for 11th August 2023 is printed.



Probe

In [7]:
import torch
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

model.eval()
with torch.no_grad():
    hidden_states, _ = model.lstm(x_test_tensor.view(x_test_tensor.size(0), 1, -1))
    val_outputs = model(x_test_tensor)
    y_test_predictions = val_outputs

# Define a function to generate date embeddings
def generate_date_embedding(target_date):
    day_index = int(target_date.split(" ")[0].replace("st", "").replace("nd", "").replace("rd", "").replace("th", "")) - 1
    month_index = 7  # August is the 8th month (0-based index)

    max_day = 31
    max_month = 12
    max_year = 100

    day_encoding = torch.zeros(max_day)
    month_encoding = torch.zeros(max_month)
    year_encoding = torch.zeros(max_year + 1)

    day_encoding[day_index] = 1
    month_encoding[month_index] = 1
    year_encoding[43] = 1

    date_embedding = torch.cat((day_encoding, month_encoding, year_encoding), dim=0)
     # Broadcast the date embedding to match the shape of hidden_states
    date_embedding_broadcasted = date_embedding.unsqueeze(0).repeat(hidden_states.shape[0], 1, 1)

In [8]:
import torch
import torch.nn as nn

# Define the JointModel
class JointModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(JointModel, self).__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)import torch
import torch.nn as nn

# Define the JointModel
class JointModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(JointModel, self).__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.date_embedding_layer = nn.Linear(43 + 12 + 101, 64)  # Adjust dimensions as needed
        self.fc = nn.Linear(hidden_size + 64, output_size)

    def forward(self, x, date_embedding):
        _, (hidden_states, _) = self.lstm(x)
        date_embedding_transformed = self.date_embedding_layer(date_embedding)
        combined_features = torch.cat((hidden_states.squeeze(0), date_embedding_transformed), dim=1)
        output = self.fc(combined_features)
        return output

# Adjustable Parameters
batch_size = 32
learning_rate = 0.001
num_epochs = 100
accumulation_steps = 4
input_size = 10
hidden_size = 64
output_size = 1

# Initialize the joint model
joint_model = JointModel(input_size, hidden_size, output_size)

# Define loss function and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(joint_model.parameters(), lr=learning_rate)

# Training loop
for epoch in range(10):
    optimizer.zero_grad()

    # Forward and backward passes with gradient accumulation
    for i in range(0, 7200, batch_size * accumulation_steps):
        optimizer.zero_grad()

        for j in range(accumulation_steps):
            # Generate dummy tensors for demonstration (replace with actual data)
            x_batch = torch.randn(batch_size, 1, input_size)
            date_embedding_batch = torch.randn(batch_size, 43 + 12 + 101)

            outputs = joint_model(x_batch, date_embedding_batch)
            loss = criterion(outputs, torch.randn(batch_size, 1))  # Example target tensor

            loss.backward()

        optimizer.step()

    # Print loss for monitoring
    print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')

# After training, you can proceed with the evaluation and prediction steps
# Please adapt the evaluation and prediction code based on your specific requirements

        self.date_embedding_layer = nn.Linear(43 + 12 + 101, 64)  # Adjust dimensions as needed
        self.fc = nn.Linear(hidden_size + 64, output_size)

    def forward(self, x, date_embedding):
        _, (hidden_states, _) = self.lstm(x)
        date_embedding_transformed = self.date_embedding_layer(date_embedding)
        combined_features = torch.cat((hidden_states.squeeze(0), date_embedding_transformed), dim=1)
        output = self.fc(combined_features)
        return output

# Adjustable Parameters
batch_size = 32
learning_rate = 0.001
num_epochs = 100
accumulation_steps = 4
input_size = 10
hidden_size = 64
output_size = 1

# Initialize the joint model
joint_model = JointModel(input_size, hidden_size, output_size)

# Define loss function and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(joint_model.parameters(), lr=learning_rate)

# Training loop
for epoch in range(10):
    optimizer.zero_grad()

    # Forward and backward passes with gradient accumulation
    for i in range(0, 7200, batch_size * accumulation_steps):
        optimizer.zero_grad()

        for j in range(accumulation_steps):
            # Generate dummy tensors for demonstration (replace with actual data)
            x_batch = torch.randn(batch_size, 1, input_size)
            date_embedding_batch = torch.randn(batch_size, 43 + 12 + 101)

            outputs = joint_model(x_batch, date_embedding_batch)
            loss = criterion(outputs, torch.randn(batch_size, 1))  # Example target tensor

            loss.backward()

        optimizer.step()

    # Print loss for monitoring
    print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')

# After training, you can proceed with the evaluation and prediction steps
# Please adapt the evaluation and prediction code based on your specific requirements


Epoch [1/100], Loss: 0.8511
Epoch [2/100], Loss: 1.2965
Epoch [3/100], Loss: 1.0295
Epoch [4/100], Loss: 0.5330
Epoch [5/100], Loss: 1.5196
Epoch [6/100], Loss: 1.1157
Epoch [7/100], Loss: 1.0258
Epoch [8/100], Loss: 1.2604
Epoch [9/100], Loss: 0.8456
Epoch [10/100], Loss: 1.0653


In [9]:
# Assuming `tensor` is your tensor of interest
print(date_embedding.shape)
print(date_embedding.shape)

NameError: name 'date_embedding' is not defined

In [None]:
# Check the size of the input tensor
print(x_train_tensor.size())

# Initialize the joint model with the correct input size
input_size = x_train_tensor.size(-1)  # Update the input size based on the actual size of the input tensor
hidden_size = 64  # LSTM hidden size
output_size = 1  # Output size
joint_model = JointModel(input_size, hidden_size, output_size)

# Continue with the training loop and other steps as before


torch.Size([7200, 10])


Dinamic Date

In [None]:
print(hidden_states.shape)
print(date_embedding_broadcasted.shape)
print(combined_states.shape)


torch.Size([5000, 1, 16])
torch.Size([5000, 1, 67])
torch.Size([5000, 1, 83])


In [10]:
import torch
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

model.eval()
with torch.no_grad():
    hidden_states, _ = model.lstm(x_test_tensor.view(x_test_tensor.size(0), 1, -1))
    val_outputs = model(x_test_tensor)
    y_test_predictions = val_outputs

# Define a function to generate date embeddings
def generate_date_embedding(target_date):
    day_index = int(target_date.split(" ")[0].replace("st", "").replace("nd", "").replace("rd", "").replace("th", "")) - 1
    month_index = 7  # August is the 8th month (0-based index)

    max_day = 31
    max_month = 12
    max_year = 100

    day_encoding = torch.zeros(max_day)
    month_encoding = torch.zeros(max_month)
    year_encoding = torch.zeros(max_year + 1)

    day_encoding[day_index] = 1
    month_encoding[month_index] = 1
    year_encoding[43] = 1

    date_embedding = torch.cat((day_encoding, month_encoding, year_encoding), dim=0)
    return date_embedding


# List of target dates for evaluation
list_of_dates = ["1st August", "15th August"]

for target_date in list_of_dates:
    # Generate date embedding for the target_date
    date_embedding = generate_date_embedding(target_date)

    # Broadcast the date embedding to match the shape of hidden_states
    date_embedding_broadcasted = date_embedding.unsqueeze(0).repeat(hidden_states.shape[0], 1, 1)

    # Combine hidden_states and date_embedding
    combined_states = torch.cat((hidden_states, date_embedding_broadcasted), dim=2)

    # Convert y_test_predictions to numpy array (assuming it's a tensor)
    close_values = y_test_predictions.numpy()

    # Reshape combined_states to remove the extra dimension
    combined_states_reshaped = combined_states.reshape(-1, combined_states.shape[-1])

    # Splitting the data
    X_train, X_test, y_train, y_test = train_test_split(combined_states_reshaped, close_values, test_size=0.2, random_state=43)

    # Initialize and train the RandomForestRegressor
    rf_model = RandomForestRegressor(n_estimators=100, random_state=43)
    rf_model.fit(X_train, y_train.ravel())  # Using ravel() to avoid DataConversionWarning

    # Predict on the reshaped test data (X_test)
    predicted_close_test = rf_model.predict(X_test)

    # Calculate evaluation metrics for the test predictions
    mae_test = mean_absolute_error(y_test, predicted_close_test)
    mse_test = mean_squared_error(y_test, predicted_close_test)
    rmse_test = np.sqrt(mse_test)
    r2_test = r2_score(y_test, predicted_close_test)

    # Print the metrics for the test predictions
    #print(f"Evaluation metrics for {target_date} (Test Set):")
    #print(f"MAE: {mae_test}")
    print(f"MSE: {mse_test}")
    #print(f"RMSE: {rmse_test}")
    #print(f"R2 Score: {r2_test}")

    # Predict the 'close' value for the combined_states_reshaped for the target date
    predicted_close = rf_model.predict(combined_states_reshaped)

    # Print the predicted 'close' value for the target date
    print(f"Predicted Close Value for {target_date}: {predicted_close}\n")

MSE: 0.2139630920647225
Predicted Close Value for 1st August: [ 41.62678303  43.11470383  28.55921614 ... 112.76144005  28.0626162
 174.71596466]

MSE: 0.2139630920647225
Predicted Close Value for 15th August: [ 41.62678303  43.11470383  28.55921614 ... 112.76144005  28.0626162
 174.71596466]



In [None]:
import torch
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

model.eval()
with torch.no_grad():
    hidden_states, _ = model.lstm(x_test_tensor.view(x_test_tensor.size(0), 1, -1))
    val_outputs = model(x_test_tensor)
    y_test_predictions = val_outputs

# Define a function to generate date embeddings
def generate_date_embedding(target_date):
    day_index = int(target_date.split(" ")[0].replace("st", "").replace("nd", "").replace("rd", "").replace("th", "")) - 1
    month_index = 7  # August is the 8th month (0-based index)

    max_day = 31
    max_month = 12
    max_year = 100

    day_encoding = torch.zeros(max_day)
    month_encoding = torch.zeros(max_month)
    year_encoding = torch.zeros(max_year + 1)

    day_encoding[day_index] = 1
    month_encoding[month_index] = 1
    year_encoding[43] = 1

    date_embedding = torch.cat((day_encoding, month_encoding, year_encoding), dim=0)
    return date_embedding


# List of target dates for evaluation
list_of_dates = ["1st August", "15th August"]

for target_date in list_of_dates:
    # Generate date embedding for the target_date
    date_embedding = generate_date_embedding(target_date)

    # Broadcast the date embedding to match the shape of hidden_states
    date_embedding_broadcasted = date_embedding.unsqueeze(0).repeat(hidden_states.shape[0], 1, 1)

    # Combine hidden_states and date_embedding
    combined_states = torch.cat((hidden_states, date_embedding_broadcasted), dim=2)

    # Convert y_test_predictions to numpy array (assuming it's a tensor)
    close_values = y_test_predictions.numpy()

    # Reshape combined_states to remove the extra dimension
    combined_states_reshaped = combined_states.reshape(-1, combined_states.shape[-1])

    # Splitting the data
    X_train, X_test, y_train, y_test = train_test_split(combined_states_reshaped, close_values, test_size=0.2, random_state=43)

    # Initialize and train the RandomForestRegressor
    rf_model = RandomForestRegressor(n_estimators=100, random_state=43)
    rf_model.fit(X_train, y_train.ravel())  # Using ravel() to avoid DataConversionWarning

    # Predict on the reshaped test data (X_test)
    predicted_close_test = rf_model.predict(X_test)

    # Calculate evaluation metrics for the test predictions
    mae_test = mean_absolute_error(y_test, predicted_close_test)
    mse_test = mean_squared_error(y_test, predicted_close_test)
    rmse_test = np.sqrt(mse_test)
    r2_test = r2_score(y_test, predicted_close_test)

    # Print the metrics for the test predictions
    #print(f"Evaluation metrics for {target_date} (Test Set):")
    #print(f"MAE: {mae_test}")
    print(f"MSE: {mse_test}")
    #print(f"RMSE: {rmse_test}")
    #print(f"R2 Score: {r2_test}")

    # Predict the 'close' value for the combined_states_reshaped for the target date
    predicted_close = rf_model.predict(combined_states_reshaped)

    # Print the predicted 'close' value for the target date
    print(f"Predicted Close Value for {target_date}: {predicted_close}\n")

MSE: 0.13244102817911238
Predicted Close Value for 1st August: [ 41.75436844  42.91783333  28.43019012 ... 112.78553429  28.0711599
 174.27684525]

MSE: 0.13244102817911238
Predicted Close Value for 15th August: [ 41.75436844  42.91783333  28.43019012 ... 112.78553429  28.0711599
 174.27684525]



In [None]:
import torch
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

model.eval()
with torch.no_grad():
    hidden_states, _ = model.lstm(x_test_tensor.view(x_test_tensor.size(0), 1, -1))
    val_outputs = model(x_test_tensor)
    y_test_predictions = val_outputs

# Define a function to generate date embeddings
def generate_date_embedding(target_date):
    day_index = int(target_date.split(" ")[0].replace("st", "").replace("nd", "").replace("rd", "").replace("th", "")) - 1
    month_index = 7  # August is the 8th month (0-based index)

    max_day = 31
    max_month = 12
    max_year = 100

    day_encoding = torch.zeros(max_day)
    month_encoding = torch.zeros(max_month)
    year_encoding = torch.zeros(max_year + 1)

    day_encoding[day_index] = 1
    month_encoding[month_index] = 1
    year_encoding[43] = 1

    date_embedding = torch.cat((day_encoding, month_encoding, year_encoding), dim=0)
    return date_embedding

# Load your AAPL stock data (Make sure to adjust the path as per your directory structure)
data = pd.read_csv("../data/data/aapl_raw_data.csv")

# List of target dates for evaluation
list_of_dates = ["1st August", "15th August"]

for target_date in list_of_dates:
    # Generate date embedding for the target_date
    date_embedding = generate_date_embedding(target_date)

    # Broadcast the date embedding to match the shape of hidden_states
    date_embedding_broadcasted = date_embedding.unsqueeze(0).repeat(hidden_states.shape[0], 1, 1)



    # Convert y_test_predictions to numpy array (assuming it's a tensor)
    close_values = y_test_predictions.numpy()

    # Reshape combined_states to remove the extra dimension
    combined_states_reshaped = combined_states.reshape(-1, combined_states.shape[-1])

    # Splitting the data
    X_train, X_test, y_train, y_test = train_test_split(combined_states_reshaped, close_values, test_size=0.2, random_state=43)

    # Initialize and train the RandomForestRegressor
    rf_model = RandomForestRegressor(n_estimators=100, random_state=43)
    rf_model.fit(X_train, y_train.ravel())  # Using ravel() to avoid DataConversionWarning

    # Predict on the reshaped test data (X_test)
    predicted_close_test = rf_model.predict(X_test)

    # Calculate evaluation metrics for the test predictions
    mae_test = mean_absolute_error(y_test, predicted_close_test)
    mse_test = mean_squared_error(y_test, predicted_close_test)
    rmse_test = np.sqrt(mse_test)
    r2_test = r2_score(y_test, predicted_close_test)

    # Print the metrics for the test predictions
    #print(f"Evaluation metrics for {target_date} (Test Set):")
    #print(f"MAE: {mae_test}")
    print(f"MSE: {mse_test}")
    #print(f"RMSE: {rmse_test}")
    #print(f"R2 Score: {r2_test}")

    # Predict the 'close' value for the combined_states_reshaped for the target date
    predicted_close = rf_model.predict(combined_states_reshaped)

    # Print the predicted 'close' value for the target date
    print(f"Predicted Close Value for {target_date}: {predicted_close}\n")


MSE: 0.13244102817911238
Predicted Close Value for 1st August: [ 41.75436844  42.91783333  28.43019012 ... 112.78553429  28.0711599
 174.27684525]

MSE: 0.13244102817911238
Predicted Close Value for 15th August: [ 41.75436844  42.91783333  28.43019012 ... 112.78553429  28.0711599
 174.27684525]



In [None]:
print("Shape of hidden_states:", hidden_states.shape)
print("Shape of date_embedding_broadcasted:", date_embedding_broadcasted.shape)


Shape of hidden_states: torch.Size([3547, 1, 16])
Shape of date_embedding_broadcasted: torch.Size([3547, 1, 144])


In [None]:
import torch
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Define a mock LSTM model for demonstration
class MockLSTMModel(torch.nn.Module):
    def __init__(self):
        super(MockLSTMModel, self).__init__()
        self.lstm = torch.nn.LSTM(input_size=16, hidden_size=16, num_layers=1)

    def forward(self, x):
        return self.lstm(x)

# Instantiate the mock LSTM model
model = MockLSTMModel()
model.eval()

# Generate mock hidden_states and x_test_tensor for demonstration
hidden_states = torch.randn(3547, 1, 16)
x_test_tensor = torch.randn(3547, 16)

with torch.no_grad():
    hidden_states, _ = model.lstm(x_test_tensor.view(x_test_tensor.size(0), 1, -1))
    val_outputs = model(x_test_tensor)
    y_test_predictions = val_outputs[0]

# Function to generate date embeddings
def generate_date_embedding(target_date):
    day_index = int(target_date.split(" ")[0].replace("st", "").replace("nd", "").replace("rd", "").replace("th", "")) - 1
    month_index = 7  # August is the 8th month (0-based index)

    max_day = 31
    max_month = 12
    max_year = 100

    day_encoding = torch.zeros(max_day)
    month_encoding = torch.zeros(max_month)
    year_encoding = torch.zeros(max_year + 1)

    day_encoding[day_index] = 1
    month_encoding[month_index] = 1
    year_encoding[43] = 1

    date_embedding = torch.cat((day_encoding, month_encoding, year_encoding), dim=0)
    return date_embedding

# List of target dates for evaluation
list_of_dates = ["1st August", "15th August"]

for target_date in list_of_dates:
    date_embedding = generate_date_embedding(target_date)
    date_embedding_broadcasted = date_embedding.unsqueeze(0).repeat(hidden_states.shape[0], 1, 1)

    combined_states = torch.cat((hidden_states, date_embedding_broadcasted), dim=2)
    combined_states_reshaped = combined_states.view(-1, combined_states.shape[-1])

    close_values = y_test_predictions.squeeze().numpy()

    # Debugging: Print the shapes for debugging
    print(f"Shape of combined_states_reshaped: {combined_states_reshaped.shape}")
    print(f"Shape of close_values: {close_values.shape}")

    assert combined_states_reshaped.shape[0] == close_values.shape[0], "Number of samples mismatch!"

    # Splitting the data
    X_train, X_test, y_train, y_test = train_test_split(combined_states_reshaped.numpy(), close_values, test_size=0.2, random_state=43)

    rf_model = RandomForestRegressor(n_estimators=100, random_state=43)
    rf_model.fit(X_train, y_train)

    # Predicting the 'close' value for the combined_states_reshaped for the target date
    predicted_close = rf_model.predict(combined_states_reshaped.numpy())

    print(f"Predicted Close Value for {target_date}: {predicted_close}\n")


Shape of combined_states_reshaped: torch.Size([3547, 160])
Shape of close_values: (3547, 16)


Predicted Close Value for 1st August: [[-0.0596607   0.0196106   0.03657901 ...  0.23620727 -0.06977356
  -0.03142575]
 [ 0.03013961 -0.12598329  0.08704979 ...  0.0418684  -0.09994098
   0.00280301]
 [-0.06291041 -0.06442257  0.03296388 ... -0.0788666   0.0711974
  -0.07564339]
 ...
 [ 0.14101837  0.03464901  0.11194071 ...  0.13066217  0.03582651
   0.16494781]
 [ 0.13453363 -0.05943837  0.3274678  ... -0.02344289 -0.14869018
   0.29203854]
 [ 0.07976641 -0.05209781  0.1851694  ...  0.12625204 -0.08235208
   0.16313479]]

Shape of combined_states_reshaped: torch.Size([3547, 160])
Shape of close_values: (3547, 16)
Predicted Close Value for 15th August: [[-0.0596607   0.0196106   0.03657901 ...  0.23620727 -0.06977356
  -0.03142575]
 [ 0.03013961 -0.12598329  0.08704979 ...  0.0418684  -0.09994098
   0.00280301]
 [-0.06291041 -0.06442257  0.03296388 ... -0.0788666   0.0711974
  -0.07564339]
 ...
 [ 0.14101837  0.03464901  0.11194071 ...  0.13066217  0.03582651
   0.16494781]
 [ 0.13453

1 st August 2023

In [None]:
import torch
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

# Generate Date Embedding for 1st August:
target_date_1st = "1st August"
day_index_1st = int(target_date_1st.split(" ")[0].replace("st", "").replace("nd", "").replace("rd", "").replace("th", "")) - 1
month_index_1st = 7  # August is the 8th month (0-based index)

# Assuming max_day, max_month, max_year, and hidden_states are defined elsewhere
max_day = 31
max_month = 12
max_year = 100

day_encoding_1st = torch.zeros(max_day)
month_encoding_1st = torch.zeros(max_month)
year_encoding_1st = torch.zeros(max_year + 1)

day_encoding_1st[day_index_1st] = 1
month_encoding_1st[month_index_1st] = 1
year_encoding_1st[43] = 1

date_embedding_1st = torch.cat((day_encoding_1st, month_encoding_1st, year_encoding_1st), dim=0)
date_embedding_1st_broadcasted = date_embedding_1st.reshape(1, 1, -1).expand(hidden_states.shape[0], 1, -1)
combined_states_1st = torch.cat((hidden_states, date_embedding_1st_broadcasted), dim=2)

# Extract predicted 'close' values as a numpy array for 1st August
close_values_1st = y_test_predictions.numpy()

# Reshape combined_states_1st to remove the extra dimension
combined_states_1st_2d = combined_states_1st.reshape(-1, combined_states_1st.shape[-1])

# Split the data into training and testing sets for 1st August with a different random_state
X_train_1st, X_test_1st, y_train_1st, y_test_1st = train_test_split(combined_states_1st_2d, close_values_1st, test_size=0.2, random_state=43)  # Changed random_state to 43

# Initialize the Random Forest Regressor for 1st August
rf_model_1st = RandomForestRegressor(n_estimators=100, random_state=43)
rf_model_1st.fit(X_train_1st, y_train_1st)
y_pred_1st = rf_model_1st.predict(X_test_1st)
predicted_close_1st = rf_model_1st.predict(combined_states_1st_2d)

# Calculate evaluation metrics for 1st August
mae_1st = mean_absolute_error(y_test_1st, y_pred_1st)
mse_1st = mean_squared_error(y_test_1st, y_pred_1st)
rmse_1st = np.sqrt(mse_1st)
r2_1st = r2_score(y_test_1st, y_pred_1st)

# Print the predicted "close" value for 1st August
print("Predicted Close Value for 1st August:", predicted_close_1st)

# Read the CSV file
data = pd.read_csv("../data/data/aapl_raw_data.csv")

# Print the value of the "close" column for the date 1st August 2023
date_filter = data[data['date'] == '2023-08-01']
close_value = date_filter['close'].values[0]
print(f"Closing 1st August 2023: {close_value}")


Predicted Close Value for 1st August: [[-0.0596607   0.0196106   0.03657901 ...  0.23620727 -0.06977356
  -0.03142575]
 [ 0.03013961 -0.12598329  0.08704979 ...  0.0418684  -0.09994098
   0.00280301]
 [-0.06291041 -0.06442257  0.03296388 ... -0.0788666   0.0711974
  -0.07564339]
 ...
 [ 0.14101837  0.03464901  0.11194071 ...  0.13066217  0.03582651
   0.16494781]
 [ 0.13453363 -0.05943837  0.3274678  ... -0.02344289 -0.14869018
   0.29203854]
 [ 0.07976641 -0.05209781  0.1851694  ...  0.12625204 -0.08235208
   0.16313479]]
Closing 1st August 2023: 195.61


11th August 2023

In [None]:
import torch
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

# Generate Date Embedding for 11th August:
target_date_11th = "11th August"
day_index_11th = int(target_date_11th.split(" ")[0].replace("st", "").replace("nd", "").replace("rd", "").replace("th", "")) - 1
month_index_11th = 7  # August is the 8th month (0-based index)

# Assuming max_day, max_month, max_year, and hidden_states are defined elsewhere
max_day = 31  # Example value, change as needed
max_month = 12
max_year = 100  # Example value, change as needed

day_encoding_11th = torch.zeros(max_day)
month_encoding_11th = torch.zeros(max_month)
year_encoding_11th = torch.zeros(max_year + 1)

day_encoding_11th[day_index_11th] = 1
month_encoding_11th[month_index_11th] = 1
year_encoding_11th[43] = 1

date_embedding_11th = torch.cat((day_encoding_11th, month_encoding_11th, year_encoding_11th), dim=0)
date_embedding_11th_broadcasted = date_embedding_11th.reshape(1, 1, -1).expand(hidden_states.shape[0], 1, -1)
combined_states_11th = torch.cat((hidden_states, date_embedding_11th_broadcasted), dim=2)

# Extract predicted 'close' values as a numpy array for 11th August
close_values_11th = y_test_predictions.numpy()

# Reshape combined_states_11th to remove the extra dimension
combined_states_11th_2d = combined_states_11th.reshape(-1, combined_states_11th.shape[-1])

# Split the data into training and testing sets for 11th August with a different random_state
X_train_11th, X_test_11th, y_train_11th, y_test_11th = train_test_split(combined_states_11th_2d, close_values_11th, test_size=0.2, random_state=42)  # Changed random_state to 42

# Initialize the Random Forest Regressor for 11th August
rf_model_11th = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model_11th.fit(X_train_11th, y_train_11th)
y_pred_11th = rf_model_11th.predict(X_test_11th)
predicted_close_11th = rf_model_11th.predict(combined_states_11th_2d)

# Calculate evaluation metrics for 11th August
mae_11th = mean_absolute_error(y_test_11th, y_pred_11th)
mse_11th = mean_squared_error(y_test_11th, y_pred_11th)
rmse_11th = np.sqrt(mse_11th)
r2_11th = r2_score(y_test_11th, y_pred_11th)

#print("Metrics for 11th August:")
#print(f'Mean Absolute Error (MAE): {mae_11th}')
print(f'Mean Squared Error (MSE): {mse_11th}')
#print(f'Root Mean Squared Error (RMSE): {rmse_11th}')
#print(f'R-squared (R^2): {r2_11th}')

# Print the predicted "close" value for 11th August
print("Predicted Close Value for 11th August:", predicted_close_11th)






# Read the CSV file
data = pd.read_csv("../data/data/aapl_raw_data.csv")


# Print the value of the "close" column for the date 11th August 2023
date_filter = data[data['date'] == '2023-08-11']
close_value = date_filter['close'].values[0]
print(f"The closing value on 11th August 2023 was: {close_value}")



Mean Squared Error (MSE): 0.00524252483473877
Predicted Close Value for 11th August: [[-0.06268863 -0.00119187  0.00773499 ...  0.24643942 -0.0882733
  -0.03402392]
 [ 0.03761899 -0.12755952  0.10113487 ...  0.05272629 -0.10312968
  -0.00848138]
 [-0.05485251 -0.06256528  0.02657036 ... -0.05956581  0.06104317
  -0.04874376]
 ...
 [ 0.14344806  0.02752012  0.11326391 ...  0.12436044  0.03839121
   0.15973297]
 [ 0.1126018  -0.05501558  0.31566225 ... -0.01838417 -0.15746938
   0.26818339]
 [ 0.07721089 -0.0653956   0.20254177 ...  0.12405528 -0.09182393
   0.22451101]]
The closing value on 11th August 2023 was: 177.79


31th August 2023

In [None]:
import torch
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

# Generate Date Embedding for 31st August:
target_date_31st = "31st August"
day_index_31st = int(target_date_31st.split(" ")[0].replace("st", "").replace("nd", "").replace("rd", "").replace("th", "")) - 1
month_index_31st = 7  # August is the 8th month (0-based index)

# Assuming max_day, max_month, max_year, and hidden_states are defined elsewhere
max_day = 31  # Example value, change as needed
max_month = 12
max_year = 100  # Example value, change as needed

day_encoding_31st = torch.zeros(max_day)
month_encoding_31st = torch.zeros(max_month)
year_encoding_31st = torch.zeros(max_year + 1)

day_encoding_31st[day_index_31st] = 1
month_encoding_31st[month_index_31st] = 1
year_encoding_31st[43] = 1

date_embedding_31st = torch.cat((day_encoding_31st, month_encoding_31st, year_encoding_31st), dim=0)
date_embedding_31st_broadcasted = date_embedding_31st.reshape(1, 1, -1).expand(hidden_states.shape[0], 1, -1)
combined_states_31st = torch.cat((hidden_states, date_embedding_31st_broadcasted), dim=2)

# Extract predicted 'close' values as a numpy array for 31st August
close_values_31st = y_test_predictions.numpy()

# Reshape combined_states_31st to remove the extra dimension
combined_states_31st_2d = combined_states_31st.reshape(-1, combined_states_31st.shape[-1])

# Split the data into training and testing sets for 31st August with a different random_state
X_train_31st, X_test_31st, y_train_31st, y_test_31st = train_test_split(combined_states_31st_2d, close_values_31st, test_size=0.2, random_state=43)  # Changed random_state to 43

# Initialize the Random Forest Regressor for 31st August
rf_model_31st = RandomForestRegressor(n_estimators=100, random_state=43)
rf_model_31st.fit(X_train_31st, y_train_31st)
y_pred_31st = rf_model_31st.predict(X_test_31st)
predicted_close_31st = rf_model_31st.predict(combined_states_31st_2d)

# Calculate evaluation metrics for 31st August
mae_31st = mean_absolute_error(y_test_31st, y_pred_31st)
mse_31st = mean_squared_error(y_test_31st, y_pred_31st)
rmse_31st = np.sqrt(mse_31st)
r2_31st = r2_score(y_test_31st, y_pred_31st)

#print("Metrics for 31st August:")
#print(f'Mean Absolute Error (MAE): {mae_31st}')
print(f'Mean Squared Error (MSE): {mse_31st}')
#print(f'Root Mean Squared Error (RMSE): {rmse_31st}')
#print(f'R-squared (R^2): {r2_31st}')

# Print the predicted "close" value for 31st August
print("Predicted Close Value for 31st August:", predicted_close_31st)




# Read the CSV file
data = pd.read_csv("../data/data/aapl_raw_data.csv")


# Print the value of the "close" column for the date 31th August 2023
date_filter = data[data['date'] == '2023-08-31']
close_value = date_filter['close'].values[0]
print(f"The closing value on 31st August 2023 was: {close_value}")

Mean Squared Error (MSE): 0.0056149470474229825
Predicted Close Value for 31st August: [[-0.0596607   0.0196106   0.03657901 ...  0.23620727 -0.06977356
  -0.03142575]
 [ 0.03013961 -0.12598329  0.08704979 ...  0.0418684  -0.09994098
   0.00280301]
 [-0.06291041 -0.06442257  0.03296388 ... -0.0788666   0.0711974
  -0.07564339]
 ...
 [ 0.14101837  0.03464901  0.11194071 ...  0.13066217  0.03582651
   0.16494781]
 [ 0.13453363 -0.05943837  0.3274678  ... -0.02344289 -0.14869018
   0.29203854]
 [ 0.07976641 -0.05209781  0.1851694  ...  0.12625204 -0.08235208
   0.16313479]]
The closing value on 31st August 2023 was: 187.87


In [None]:

# Generate Date Embedding for 31st August:
target_date_31st = "31st August"
# Extract Day Index for 31st August
day_index_31st = int(target_date_31st.split(" ")[0][:-2]) - 1
month_index_31st = 7  # August is the 8th month (0-based index)

day_encoding_31st = torch.zeros(max_day)
month_encoding_31st = torch.zeros(max_month)
year_encoding_31st = torch.zeros(max_year + 1)
day_encoding_31st[day_index_31st] = 1
month_encoding_31st[month_index_31st] = 1
year_encoding_31st[43] = 1

date_embedding_31st = torch.cat((day_encoding_31st, month_encoding_31st, year_encoding_31st), dim=0)
date_embedding_31st_broadcasted = date_embedding_31st.reshape(1, 1, -1).expand(hidden_states.shape[0], 1, -1)
combined_states_31st = torch.cat((hidden_states, date_embedding_31st_broadcasted), dim=2)

# Extract predicted 'close' values as a numpy array for 31st August
close_values_31st = y_test_predictions.numpy()

# Reshape combined_states_31st to remove the extra dimension
combined_states_31st_2d = combined_states_31st.reshape(-1, combined_states_31st.shape[-1])

# Split the data into training and testing sets for 31st August
X_train_31st, X_test_31st, y_train_31st, y_test_31st = train_test_split(combined_states_31st_2d, close_values_31st, test_size=0.2, random_state=42)

# Initialize the Random Forest Regressor for 31st August
rf_model_31st = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model_31st.fit(X_train_31st, y_train_31st)
y_pred_31st = rf_model_31st.predict(X_test_31st)
predicted_close_31st = rf_model_31st.predict(combined_states_31st_2d)

# Calculate evaluation metrics for 31st August
mae_31st = mean_absolute_error(y_test_31st, y_pred_31st)
mse_31st = mean_squared_error(y_test_31st, y_pred_31st)
rmse_31st = np.sqrt(mse_31st)
r2_31st = r2_score(y_test_31st, y_pred_31st)

print("\nMetrics for 31st August:")
print(f'Mean Absolute Error (MAE): {mae_31st}')
print(f'Mean Squared Error (MSE): {mse_31st}')
print(f'Root Mean Squared Error (RMSE): {rmse_31st}')
print(f'R-squared (R^2): {r2_31st}')

# Print the predicted "close" value for 31st August
print("Predicted Close Value for 31st August:", predicted_close_31st)



Metrics for 31st August:
Mean Absolute Error (MAE): 0.053133741373833074
Mean Squared Error (MSE): 0.00524252483473877
Root Mean Squared Error (RMSE): 0.07240528181520164
R-squared (R^2): 0.68463254978573
Predicted Close Value for 31st August: [[-0.06268863 -0.00119187  0.00773499 ...  0.24643942 -0.0882733
  -0.03402392]
 [ 0.03761899 -0.12755952  0.10113487 ...  0.05272629 -0.10312968
  -0.00848138]
 [-0.05485251 -0.06256528  0.02657036 ... -0.05956581  0.06104317
  -0.04874376]
 ...
 [ 0.14344806  0.02752012  0.11326391 ...  0.12436044  0.03839121
   0.15973297]
 [ 0.1126018  -0.05501558  0.31566225 ... -0.01838417 -0.15746938
   0.26818339]
 [ 0.07721089 -0.0653956   0.20254177 ...  0.12405528 -0.09182393
   0.22451101]]


In [None]:
import torch
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split



# Generate Date Embedding for 11th August:
target_date_11th = "11th August"
day_index_11th = int(target_date_11th.split(" ")[0].replace("st", "").replace("nd", "").replace("rd", "").replace("th", "")) - 1
month_index_11th = 7  # August is the 8th month (0-based index)

day_encoding_11th = torch.zeros(max_day)
month_encoding_11th = torch.zeros(max_month)
year_encoding_11th = torch.zeros(max_year + 1)
day_encoding_11th[day_index_11th] = 1
month_encoding_11th[month_index_11th] = 1
year_encoding_11th[43] = 1

date_embedding_11th = torch.cat((day_encoding_11th, month_encoding_11th, year_encoding_11th), dim=0)
date_embedding_11th_broadcasted = date_embedding_11th.reshape(1, 1, -1).expand(hidden_states.shape[0], 1, -1)
combined_states_11th = torch.cat((hidden_states, date_embedding_11th_broadcasted), dim=2)

# Extract predicted 'close' values as a numpy array for 11th August
close_values_11th = y_test_predictions.numpy()

# Reshape combined_states_11th to remove the extra dimension
combined_states_11th_2d = combined_states_11th.reshape(-1, combined_states_11th.shape[-1])

# Split the data into training and testing sets for 11th August
X_train_11th, X_test_11th, y_train_11th, y_test_11th = train_test_split(combined_states_11th_2d, close_values_11th, test_size=0.2, random_state=42)

# Initialize the Random Forest Regressor for 11th August
rf_model_11th = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model_11th.fit(X_train_11th, y_train_11th.ravel())
y_pred_11th = rf_model_11th.predict(X_test_11th)
predicted_close_11th = rf_model_11th.predict(combined_states_11th_2d)

# Calculate evaluation metrics for 11th August
mae_11th = mean_absolute_error(y_test_11th, y_pred_11th)
mse_11th = mean_squared_error(y_test_11th, y_pred_11th)
rmse_11th = np.sqrt(mse_11th)
r2_11th = r2_score(y_test_11th, y_pred_11th)

print("Metrics for 11th August:")
print(f'Mean Absolute Error (MAE): {mae_11th}')
print(f'Mean Squared Error (MSE): {mse_11th}')
print(f'Root Mean Squared Error (RMSE): {rmse_11th}')
print(f'R-squared (R^2): {r2_11th}')

# Print the predicted "close" value for 11th August
print("Predicted Close Value for 11th August:", predicted_close_11th)


# Generate Date Embedding for 31st August:
target_date_31st = "31st August"
# Extract Day Index for 31st August
day_index_31st = int(target_date_31st.split(" ")[0][:-2]) - 1
month_index_31st = 7  # August is the 8th month (0-based index)

day_encoding_31st = torch.zeros(max_day)
month_encoding_31st = torch.zeros(max_month)
year_encoding_31st = torch.zeros(max_year + 1)
day_encoding_31st[day_index_31st] = 1
month_encoding_31st[month_index_31st] = 1
year_encoding_31st[43] = 1

date_embedding_31st = torch.cat((day_encoding_31st, month_encoding_31st, year_encoding_31st), dim=0)
date_embedding_31st_broadcasted = date_embedding_31st.reshape(1, 1, -1).expand(hidden_states.shape[0], 1, -1)
combined_states_31st = torch.cat((hidden_states, date_embedding_31st_broadcasted), dim=2)

# Extract predicted 'close' values as a numpy array for 31st August
close_values_31st = y_test_predictions.numpy()

# Reshape combined_states_31st to remove the extra dimension
combined_states_31st_2d = combined_states_31st.reshape(-1, combined_states_31st.shape[-1])

# Split the data into training and testing sets for 31st August
X_train_31st, X_test_31st, y_train_31st, y_test_31st = train_test_split(combined_states_31st_2d, close_values_31st, test_size=0.2, random_state=42)

# Initialize the Random Forest Regressor for 31st August
rf_model_31st = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model_31st.fit(X_train_31st, y_train_31st.ravel())
y_pred_31st = rf_model_31st.predict(X_test_31st)
predicted_close_31st = rf_model_31st.predict(combined_states_31st_2d)

# Calculate evaluation metrics for 31st August
mae_31st = mean_absolute_error(y_test_31st, y_pred_31st)
mse_31st = mean_squared_error(y_test_31st, y_pred_31st)
rmse_31st = np.sqrt(mse_31st)
r2_31st = r2_score(y_test_31st, y_pred_31st)

print("\nMetrics for 31st August:")
print(f'Mean Absolute Error (MAE): {mae_31st}')
print(f'Mean Squared Error (MSE): {mse_31st}')
print(f'Root Mean Squared Error (RMSE): {rmse_31st}')
print(f'R-squared (R^2): {r2_31st}')

# Print the predicted "close" value for 31st August
print("Predicted Close Value for 31st August:", predicted_close_31st)


ValueError: Found input variables with inconsistent numbers of samples: [2837, 45392]

In [None]:
import torch
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split



# Generate Date Embedding for 11th August:
target_date_11th = "11th August"
day_index_11th = int(target_date_11th.split(" ")[0].replace("st", "").replace("nd", "").replace("rd", "").replace("th", "")) - 1
month_index_11th = 7  # August is the 8th month (0-based index)

day_encoding_11th = torch.zeros(max_day)
month_encoding_11th = torch.zeros(max_month)
year_encoding_11th = torch.zeros(max_year + 1)
day_encoding_11th[day_index_11th] = 1
month_encoding_11th[month_index_11th] = 1
year_encoding_11th[43] = 1

date_embedding_11th = torch.cat((day_encoding_11th, month_encoding_11th, year_encoding_11th), dim=0)
date_embedding_11th_broadcasted = date_embedding_11th.reshape(1, 1, -1).expand(hidden_states.shape[0], 1, -1)
combined_states_11th = torch.cat((hidden_states, date_embedding_11th_broadcasted), dim=2)

# Extract predicted 'close' values as a numpy array for 11th August
close_values_11th = y_test_predictions.numpy()

# Reshape combined_states_11th to remove the extra dimension
combined_states_11th_2d = combined_states_11th.reshape(-1, combined_states_11th.shape[-1])

# Split the data into training and testing sets for 11th August
X_train_11th, X_test_11th, y_train_11th, y_test_11th = train_test_split(combined_states_11th_2d, close_values_11th, test_size=0.2, random_state=42)

# Initialize the Random Forest Regressor for 11th August
rf_model_11th = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model_11th.fit(X_train_11th, y_train_11th.ravel())
y_pred_11th = rf_model_11th.predict(X_test_11th)
predicted_close_11th = rf_model_11th.predict(combined_states_11th_2d)

# Calculate evaluation metrics for 11th August
mae_11th = mean_absolute_error(y_test_11th, y_pred_11th)
mse_11th = mean_squared_error(y_test_11th, y_pred_11th)
rmse_11th = np.sqrt(mse_11th)
r2_11th = r2_score(y_test_11th, y_pred_11th)

print("Metrics for 11th August:")
print(f'Mean Absolute Error (MAE): {mae_11th}')
print(f'Mean Squared Error (MSE): {mse_11th}')
print(f'Root Mean Squared Error (RMSE): {rmse_11th}')
print(f'R-squared (R^2): {r2_11th}')

# Print the predicted "close" value for 11th August
print("Predicted Close Value for 11th August:", predicted_close_11th)


# Generate Date Embedding for 31st August:
target_date_31st = "31st August"
# Extract Day Index for 31st August
day_index_31st = int(target_date_31st.split(" ")[0][:-2]) - 1
month_index_31st = 7  # August is the 8th month (0-based index)

day_encoding_31st = torch.zeros(max_day)
month_encoding_31st = torch.zeros(max_month)
year_encoding_31st = torch.zeros(max_year + 1)
day_encoding_31st[day_index_31st] = 1
month_encoding_31st[month_index_31st] = 1
year_encoding_31st[43] = 1

date_embedding_31st = torch.cat((day_encoding_31st, month_encoding_31st, year_encoding_31st), dim=0)
date_embedding_31st_broadcasted = date_embedding_31st.reshape(1, 1, -1).expand(hidden_states.shape[0], 1, -1)
combined_states_31st = torch.cat((hidden_states, date_embedding_31st_broadcasted), dim=2)

# Extract predicted 'close' values as a numpy array for 31st August
close_values_31st = y_test_predictions.numpy()

# Reshape combined_states_31st to remove the extra dimension
combined_states_31st_2d = combined_states_31st.reshape(-1, combined_states_31st.shape[-1])

# Split the data into training and testing sets for 31st August
X_train_31st, X_test_31st, y_train_31st, y_test_31st = train_test_split(combined_states_31st_2d, close_values_31st, test_size=0.2, random_state=42)

# Initialize the Random Forest Regressor for 31st August
rf_model_31st = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model_31st.fit(X_train_31st, y_train_31st.ravel())
y_pred_31st = rf_model_31st.predict(X_test_31st)
predicted_close_31st = rf_model_31st.predict(combined_states_31st_2d)

# Calculate evaluation metrics for 31st August
mae_31st = mean_absolute_error(y_test_31st, y_pred_31st)
mse_31st = mean_squared_error(y_test_31st, y_pred_31st)
rmse_31st = np.sqrt(mse_31st)
r2_31st = r2_score(y_test_31st, y_pred_31st)

print("\nMetrics for 31st August:")
print(f'Mean Absolute Error (MAE): {mae_31st}')
print(f'Mean Squared Error (MSE): {mse_31st}')
print(f'Root Mean Squared Error (RMSE): {rmse_31st}')
print(f'R-squared (R^2): {r2_31st}')

# Print the predicted "close" value for 31st August
print("Predicted Close Value for 31st August:", predicted_close_31st)


Metrics for 11th August:
Mean Absolute Error (MAE): 0.1839889506219137
Mean Squared Error (MSE): 0.20189083672486538
Root Mean Squared Error (RMSE): 0.4493226421235251
R-squared (R^2): 0.9999893254992325
Predicted Close Value for 11th August: [ 41.63775166  43.09688847  28.30034241 ... 112.93842918  28.06641165
 174.56737762]

Metrics for 31st August:
Mean Absolute Error (MAE): 0.1839889506219137
Mean Squared Error (MSE): 0.20189083672486538
Root Mean Squared Error (RMSE): 0.4493226421235251
R-squared (R^2): 0.9999893254992325
Predicted Close Value for 31st August: [ 41.63775166  43.09688847  28.30034241 ... 112.93842918  28.06641165
 174.56737762]


In [None]:
import torch
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split



# Define the target dates
target_date_11th = "11th August"
target_date_31st = "31st August"

# Function to generate date embedding for a given target date
def generate_date_embedding(target_date):
    # Extract day and month from the target date
    day_index = int(target_date.split(" ")[0][:-2]) - 1
    month_index = 7  # August is the 8th month (0-based index)

    # Create one-hot encodings for day, month, and year
    day_encoding = torch.zeros(max_day)
    month_encoding = torch.zeros(max_month)
    year_encoding = torch.zeros(max_year + 1)
    day_encoding[day_index] = 1
    month_encoding[month_index] = 1
    year_encoding[43] = 1  # 43 corresponds to the year 2023

    # Concatenate the day, month, and year encodings to get the date embedding
    date_embedding = torch.cat((day_encoding, month_encoding, year_encoding), dim=0)

    return date_embedding

# Generate Date Embedding for 11th August
date_embedding_11th = generate_date_embedding(target_date_11th)

# Broadcast date_embedding_11th to match the shape of hidden_states
date_embedding_11th_broadcasted = date_embedding_11th.reshape(1, 1, -1).expand(hidden_states.shape[0], 1, -1)

# Combine hidden_states and date_embedding_11th_broadcasted
combined_states_11th = torch.cat((hidden_states, date_embedding_11th_broadcasted), dim=2)

# Extract predicted 'close' values as a numpy array for 11th August
close_values_11th = y_test_predictions.numpy()

# Reshape combined_states_11th to remove the extra dimension
combined_states_11th_2d = combined_states_11th.reshape(-1, combined_states_11th.shape[-1])

# Split the data into training and testing sets for 11th August
X_train_11th, X_test_11th, y_train_11th, y_test_11th = train_test_split(combined_states_11th_2d, close_values_11th, test_size=0.2, random_state=42)

# Initialize the Random Forest Regressor for 11th August
rf_model_11th = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model_11th.fit(X_train_11th, y_train_11th.ravel())
y_pred_11th = rf_model_11th.predict(X_test_11th)
predicted_close_11th = rf_model_11th.predict(combined_states_11th_2d)

# Calculate evaluation metrics for 11th August
mae_11th = mean_absolute_error(y_test_11th, y_pred_11th)
mse_11th = mean_squared_error(y_test_11th, y_pred_11th)
rmse_11th = np.sqrt(mse_11th)
r2_11th = r2_score(y_test_11th, y_pred_11th)

print("Metrics for 11th August:")
print(f'Mean Absolute Error (MAE): {mae_11th}')
print(f'Mean Squared Error (MSE): {mse_11th}')
print(f'Root Mean Squared Error (RMSE): {rmse_11th}')
print(f'R-squared (R^2): {r2_11th}')

# Print the predicted "close" value for 11th August
print("Predicted Close Value for 11th August:", predicted_close_11th)


# Generate Date Embedding for 31st August
date_embedding_31st = generate_date_embedding(target_date_31st)

# Broadcast date_embedding_31st to match the shape of hidden_states
date_embedding_31st_broadcasted = date_embedding_31st.reshape(1, 1, -1).expand(hidden_states.shape[0], 1, -1)

# Combine hidden_states and date_embedding_31st_broadcasted
combined_states_31st = torch.cat((hidden_states, date_embedding_31st_broadcasted), dim=2)

# Extract predicted 'close' values as a numpy array for 31st August
close_values_31st = y_test_predictions.numpy()

# Reshape combined_states_31st to remove the extra dimension
combined_states_31st_2d = combined_states_31st.reshape(-1, combined_states_31st.shape[-1])

# Split the data into training and testing sets for 31st August
X_train_31st, X_test_31st, y_train_31st, y_test_31st = train_test_split(combined_states_31st_2d, close_values_31st, test_size=0.2, random_state=42)

# Initialize the Random Forest Regressor for 31st August
rf_model_31st = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model_31st.fit(X_train_31st, y_train_31st.ravel())
y_pred_31st = rf_model_31st.predict(X_test_31st)
predicted_close_31st = rf_model_31st.predict(combined_states_31st_2d)

# Calculate evaluation metrics for 31st August
mae_31st = mean_absolute_error(y_test_31st, y_pred_31st)
mse_31st = mean_squared_error(y_test_31st, y_pred_31st)
rmse_31st = np.sqrt(mse_31st)
r2_31st = r2_score(y_test_31st, y_pred_31st)

print("\nMetrics for 31st August:")
print(f'Mean Absolute Error (MAE): {mae_31st}')
print(f'Mean Squared Error (MSE): {mse_31st}')
print(f'Root Mean Squared Error (RMSE): {rmse_31st}')
print(f'R-squared (R^2): {r2_31st}')

# Print the predicted "close" value for 31st August
print("Predicted Close Value for 31st August:", predicted_close_31st)


Metrics for 11th August:
Mean Absolute Error (MAE): 0.1839889506219137
Mean Squared Error (MSE): 0.20189083672486538
Root Mean Squared Error (RMSE): 0.4493226421235251
R-squared (R^2): 0.9999893254992325
Predicted Close Value for 11th August: [ 41.63775166  43.09688847  28.30034241 ... 112.93842918  28.06641165
 174.56737762]

Metrics for 31st August:
Mean Absolute Error (MAE): 0.1839889506219137
Mean Squared Error (MSE): 0.20189083672486538
Root Mean Squared Error (RMSE): 0.4493226421235251
R-squared (R^2): 0.9999893254992325
Predicted Close Value for 31st August: [ 41.63775166  43.09688847  28.30034241 ... 112.93842918  28.06641165
 174.56737762]
