# Linear Regression Training with Enhanced Visualizations
This notebook trains a Linear Regression model for stock price prediction and provides detailed visualizations.

## Introduction
This notebook trains and saves machine learning models for predicting stock prices, focusing on individual tickers. 

### Key Steps:
1. Load processed data from SQLite.
2. Filter data for a specific ticker (default: XOM).
3. Preprocess and normalize features.
4. Train a Linear Regression model.
5. Save the trained model and scaler for use in evaluation and predictions.


In [None]:
## Import Libraries
- **pandas**: For data manipulation and analysis.
- **sqlite3**: For interacting with the SQLite database storing stock data.
- **scikit-learn**: For splitting data, training the regression model, and evaluating performance.
- **joblib**: For saving trained models and scalers.

## Load and Prepare Data

In [None]:

# Load processed stock data
db_path = 'database/stocks_data.db'

# Load data from SQLite
with sqlite3.connect(db_path) as conn:
    query = "SELECT * FROM processed_stocks"
    data = pd.read_sql(query, conn)
print(f"Loaded processed data: {data.shape[0]} rows")

# Filter data for the default ticker
default_ticker = 'XOM'
ticker_data = data[data['Ticker'] == default_ticker]
print(f"Loaded data for {default_ticker}: {ticker_data.shape[0]} rows")


In [None]:

# Preprocess data for Linear Regression
features = ['7-day MA', '14-day MA', 'Volatility', 'Lag_1', 'Lag_2']
target = 'Adj Close'

X = ticker_data[features]
y = ticker_data[target]

# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)


## Train Linear Regression Model

In [None]:

# Define and train Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Generate predictions
y_pred = model.predict(X_test)


## Visualization: Feature Importance

In [None]:

# Visualize feature importance (coefficients)
feature_importance = pd.DataFrame({
    'Feature': features,
    'Coefficient': model.coef_
}).sort_values(by='Coefficient', ascending=False)

plt.figure(figsize=(10, 6))
plt.bar(feature_importance['Feature'], feature_importance['Coefficient'])
plt.title('Feature Importance (Linear Regression)')
plt.xlabel('Feature')
plt.ylabel('Coefficient')
plt.grid()
plt.show()


## Visualization: Residual Trends

In [None]:

# Plot residual trends over time
residuals = y_test - y_pred

plt.figure(figsize=(14, 7))
plt.plot(residuals.reset_index(drop=True), label='Residuals')
plt.axhline(y=0, color='red', linestyle='--', label='Zero Error Line')
plt.title('Residual Trends Over Time (Linear Regression)')
plt.xlabel('Data Points')
plt.ylabel('Residuals')
plt.legend()
plt.grid()
plt.show()


## Visualization: R² by Feature Contribution

In [None]:

# Compute R² contribution for each feature
r2_values = []
for i, feature in enumerate(features):
    temp_X = X_test[:, i].reshape(-1, 1)
    temp_model = LinearRegression()
    temp_model.fit(temp_X, y_test)
    r2_values.append(temp_model.score(temp_X, y_test))

# Plot R² by feature
plt.figure(figsize=(10, 6))
plt.bar(features, r2_values)
plt.title('R² Contribution by Feature (Linear Regression)')
plt.xlabel('Feature')
plt.ylabel('R² Score')
plt.grid()
plt.show()
