## Setup and Data Loading
This section contains code for setting up the environment, loading necessary libraries, unzipping data, and mounting Google Drive.

In [15]:
import os
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression
from sklearn import linear_model
from zipfile import ZipFile
from google.colab import drive
from google.colab import files
import matplotlib.pyplot as plt

### Unzipping Data
This section unzips the data file from Google Drive and extracts it into the Colab environment.


In [6]:
# Mount Google Drive
drive.mount('/content/drive')

# Unzip data file
zip_ref = ZipFile('/content/drive/My Drive/Colab Notebooks/StockPredictor/data.zip', 'r')
zip_ref.extractall('/content')
zip_ref.close()


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Define Utility Functions
This section defines a utility function for plotting and printing Mean Squared Error (MSE).


In [21]:
def plot_and_print_mse(mse_arr):
    for mse_item in mse_arr:

          filename, mse, y_test, y_pred, X_test = mse_item

          # Plot actual vs predicted values
          plt.figure(figsize=(10, 6))
          plt.plot(y_test.values, label='Actual')
          plt.plot(y_pred, label='Predicted')
          plt.title(f'Actual vs Predicted Prices for {filename}')
          plt.xlabel('Index')
          plt.ylabel('Price')
          plt.legend()
          plt.show()

          # Print Mean Squared Error
          print(f"MSE for {filename}: {mse}")

### Data Processing and Model Evaluation
This section processes each stock data file, trains a Linear Regression model, and evaluates its performance.

By iteratively processing each stock data file, training the model, and evaluating its performance, this section provides a comprehensive analysis of the stock price predictor's capabilities. Through careful examination of the results, users can gain valuable insights into the dynamics of stock market prediction and make informed decisions regarding investment strategies.

In [23]:
directory = "/content/data/Stocks"

file_count = 0
test_lim = 50

mse_arr = []


for filename in os.listdir(directory):
  if filename.endswith(".txt"):

    file_count+=1
    if(file_count > test_lim):
      break
    try:
      data = pd.read_csv(os.path.join(directory, filename))
    except pd.errors.EmptyDataError:
      continue

    # preprocessing
    data['Close'] = data['Open'].shift(-1)
    data = data.dropna()

    dataset_size = len(data)
    test_size = 0.2
    min_train_size = int(dataset_size * (1 - test_size))



    if min_train_size <= 0:
      # data_info['data_lengths'].append(0)
      continue
    else:
      X = data[['Open', 'High', 'Low', 'Volume']]
      y = data['Close']
      X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, shuffle=False)

    model = LinearRegression()
    # model = linear_model.Ridge(alpha=.5)
    model.fit(X_train, y_train)

    # Evaluate the model
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    min_price = data['Close'].min()
    max_price = data['Close'].max()
    std_close = data['Close'].std()

    mse_arr.append((filename, mse, y_test, y_pred, X_test))






### Model Evaluation
This cell prints out the test and predicted values for the first 50 files along with their mean squared errors å

In [24]:
plot_and_print_mse(mse_arr)

Output hidden; open in https://colab.research.google.com to view.