# Stock movement prediction: Baseline vs. LSTM

## Objective
Predict if tomorrow's closing price will be higher than today's (Up=1, Down=0) using the last 30 days of data.

## Approach
1. Exploratory Data Analysis (EDA)
2. Feature engineering (lags and simple indicators)
3. Baseline models: Logistic Regression, Random Forest
4. Advanced model: LSTM
5. Evaluation: Accuracy, Precision, Recall, Confusion matrix
6. Documentation and GitHub publishing

## Notes
- Time-ordered splits to avoid leakage
- Feature scaling only fitted on training data
- Models compared on the same test window


In [None]:
# 0. Imports and basic setup
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os

from sklearn.model_selection import TimeSeriesSplit
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix, classification_report

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier

# For LSTM
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
from tensorflow.keras.callbacks import EarlyStopping

In [None]:
os.chdir("/Users/macbookpro/Desktop/Data Science Projects/Financial forecasting") 
DATA_PATH = "1. Data Source/financial_forecasting_dataset.csv" 
RESULTS_PATH = "3. Results"
df = pd.read_csv(DATA_PATH)
df.head()

Unnamed: 0,Date,Ticker,Open,High,Low,Close,Volume,GDP (%),Inflation (%),Interest Rate (%),Unemployment (%),Market Stress Level,Event Flag
0,2020-01-01,AAPL,215.19,215.61,213.25,213.62,13046059,2.14,2.3,1.35,3.58,0.26,0
1,2020-01-02,AAPL,216.79,221.37,215.51,217.65,8515186,2.73,1.93,1.52,2.86,0.18,0
2,2020-01-03,AAPL,209.71,210.58,207.9,208.23,13704556,1.73,2.0,1.44,4.31,0.06,0
3,2020-01-04,AAPL,213.8,216.45,210.78,212.32,6833411,1.77,2.32,1.59,2.59,0.35,0
4,2020-01-05,AAPL,209.33,211.16,206.23,210.03,11862560,1.9,1.81,1.12,4.88,0.27,0
