## Introduction
This notebook trains and saves machine learning models for predicting stock prices, focusing on individual tickers. 

### Key Steps:
1. Load processed data from SQLite.
2. Filter data for a specific ticker (default: XOM).
3. Preprocess and normalize features.
4. Train a Linear Regression model.
5. Save the trained model and scaler for use in evaluation and predictions.


## Import Libraries
- **pandas**: For data manipulation and analysis.
- **sqlite3**: For interacting with the SQLite database storing stock data.
- **scikit-learn**: For splitting data, training the regression model, and evaluating performance.
- **joblib**: For saving trained models and scalers.

In [1]:
# Import necessary libraries
import pandas as pd
import sqlite3
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score
import joblib

## Load Processed Data
- Load the preprocessed stock data from the SQLite database (`stocks_data.db`).
- Verify the number of rows loaded.
- Ensures the dataset is ready for filtering and training models for specific tickers.


In [2]:
# Path to SQLite database
db_path = 'database/stocks_data.db'

# Load processed data
with sqlite3.connect(db_path) as conn:
    query = "SELECT * FROM processed_stocks"
    data = pd.read_sql(query, conn)

print(f"Loaded processed data: {data.shape[0]} rows")

Loaded processed data: 67834 rows


## Filter Data for Default Ticker
- Filter data to include only rows for the default ticker (e.g., XOM).
- Prints the number of rows available for the selected ticker.
- Ensures the model is trained on data specific to the selected company.


In [3]:
# Step 1: Set default ticker
default_ticker = 'XOM'

# Step 2: Filter data for the default ticker
ticker_data = data[data['Ticker'] == default_ticker]
print(f"Loaded data for {default_ticker}: {ticker_data.shape[0]} rows")

Loaded data for XOM: 11791 rows


## Define Features and Target
- Features (**X**): Independent variables used for predictions.
  - `7-day MA`, `14-day MA`, `Volatility`, `Lag_1`, `Lag_2`.
- Target (**y**): Dependent variable to be predicted (`Adj Close`).
- Prepares the dataset for model training and evaluation.


In [4]:
# Step 3: Define features (X) and target (y)
features = ['7-day MA', '14-day MA', 'Volatility', 'Lag_1', 'Lag_2']
target = 'Adj Close'

X = ticker_data[features]
y = ticker_data[target]

## Splitting Data into Training and Testing Sets
- Divides the dataset into:
  - **Training Set**: 80% of the data, used to train the model.
  - **Testing Set**: 20% of the data, used to evaluate the model.
- Ensures fair evaluation of the model's predictive ability.

In [5]:
# Step 4: Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Normalize Features
- **StandardScaler**: Scales the features to have zero mean and unit variance.
- Fit the scaler on the training data and apply the transformation.
- Use the fitted scaler to transform the testing data.
- Ensures all features contribute equally to the model.


In [6]:
# Step 5: Normalize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)  # Fit and transform training data
X_test_scaled = scaler.transform(X_test)       # Transform testing data

## Train the Ticker-Specific Model
- **Model**: Linear Regression.
- Trains the model using the normalized training dataset.
- Tailored specifically to the selected ticker's data.


In [7]:
# Step 6: Train a Linear Regression model
model = LinearRegression()
model.fit(X_train_scaled, y_train)

## Model Evaluation
Uses the test data to evaluate the model's performance:
1. **Mean Squared Error (MSE)**: Measures average squared differences between actual and predicted values.
2. **R-squared (R²)**: Proportion of variance in the target variable explained by the model.
- Provides insights into model accuracy and generalization.


In [8]:
# Step 7: Evaluate the model
y_pred = model.predict(X_test_scaled)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Model Evaluation for {default_ticker}:")
print(f"Mean Squared Error (MSE): {mse:.2f}")
print(f"R-squared (R²): {r2:.2f}")

Model Evaluation for XOM:
Mean Squared Error (MSE): 0.75
R-squared (R²): 1.00


## Save the Model and Scaler
- Saves the trained model as `model_<TICKER>.pkl`.
- Saves the fitted scaler as `scaler_<TICKER>.pkl`.
- Files are named dynamically based on the ticker (e.g., `model_XOM.pkl` and `scaler_XOM.pkl`).
- Ensures scalability by maintaining separate models and scalers for each ticker.


In [9]:
# Step 8: Save the trained model and scaler
model_filename = f'models/model_{default_ticker}_linear.pkl'
scaler_filename = f'models/scaler_{default_ticker}_linear.pkl'

joblib.dump(model, model_filename)
joblib.dump(scaler, scaler_filename)

print(f"{default_ticker} model saved as '{model_filename}'")
print(f"{default_ticker} scaler saved as '{scaler_filename}'")

XOM model saved as 'models/model_XOM_linear.pkl'
XOM scaler saved as 'models/scaler_XOM_linear.pkl'


## Next Steps
- Extend the Flask web app to dynamically load models and scalers based on the selected ticker.
- Incorporate visualizations and interactivity for different tickers in Notebook 4.
- Evaluate model predictions for each ticker to ensure consistency across all datasets.


### Flask Integration Notes
1. **Dynamic Ticker Selection**:
   - Replace `default_ticker` with a variable passed from the Flask app.
   - Ensure the filtered dataset dynamically reflects the selected ticker.

2. **Loading Pre-Trained Models**:
   - Flask should load the corresponding model (`model_<ticker>.pkl`) and scaler (`scaler_<ticker>.pkl`) for the selected ticker.

3. **Interactive Input**:
   - Flask can accept user inputs for features (e.g., `7-day MA`, `14-day MA`, etc.) and use the selected model for predictions.

4. **Visualizations**:
   - Extend the Flask app to display performance metrics and graphs for the selected ticker.
