# Final Analysis and Results

## Introduction
This notebook provides a summary of the final analysis and results for the Online News Popularity prediction project. 
In the previous notebooks, we explored the data, performed feature engineering, built multiple machine learning models, 
and optimized the best-performing model (LightGBM) through hyperparameter tuning. 
This notebook summarizes the key findings, evaluates the final model's performance, 
and presents the feature importance analysis.

## 1. Load the Dataset and Model

In [4]:
# Import necessary libraries
import pandas as pd
import numpy as np
import joblib
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Load the dataset (assuming X_test and y_test have been pre-saved or can be reloaded)
# If necessary, replace with code to load the test dataset.
X_test = np.load('X_test.npy')  # Example placeholder for the test dataset
y_test = np.load('y_test.npy')  # Example placeholder for the actual target variable

# Load the best LightGBM model after tuning
best_lgb_model = joblib.load('best_lgb_model.pkl')

FileNotFoundError: [Errno 2] No such file or directory: 'X_test.npy'

## 2. Model Evaluation

In [7]:
# Predict on the test set
y_pred_tuned = best_lgb_model.predict(X_test)

# Calculate performance metrics
mae_tuned = mean_absolute_error(y_test, y_pred_tuned)
mse_tuned = mean_squared_error(y_test, y_pred_tuned)
rmse_tuned = np.sqrt(mse_tuned)
r2_tuned = r2_score(y_test, y_pred_tuned)

# Display the results
print(f'Tuned LightGBM Regressor - MAE: {mae_tuned:.2f}')
print(f'Tuned LightGBM Regressor - MSE: {mse_tuned:.2f}')
print(f'Tuned LightGBM Regressor - RMSE: {rmse_tuned:.2f}')
print(f'Tuned LightGBM Regressor - R2 Score: {r2_tuned:.2f}')


NameError: name 'best_lgb_model' is not defined

## 3. Feature Importance Analysis

In [10]:
# Assuming you have a list of feature names
feature_names = ['feature1', 'feature2', 'feature3', 'feature4', 'feature5']  # Replace with actual feature names

# Get feature importance by gain
importance_gain = best_lgb_model.booster_.feature_importance(importance_type='gain')

# Create a DataFrame for gain-based feature importance
feature_importance_gain_df = pd.DataFrame({'Feature': feature_names, 'Importance': importance_gain})
feature_importance_gain_df = feature_importance_gain_df.sort_values(by='Importance', ascending=False)

# Display feature importance
print(feature_importance_gain_df)

# Plot gain-based feature importance
plt.figure(figsize=(10, 8))
plt.barh(feature_importance_gain_df['Feature'], feature_importance_gain_df['Importance'], color='lightgreen')
plt.xlabel('Gain')
plt.title('Feature Importance by Gain for LightGBM')
plt.gca().invert_yaxis()  # Invert y-axis to show most important features at the top
plt.show()

NameError: name 'best_lgb_model' is not defined

## 4. Final Remarks
The final LightGBM model, after hyperparameter tuning, demonstrates strong performance in predicting the popularity of online news articles. 
The RMSE and R² scores indicate that the model generalizes well to the unseen test data.

Feature importance analysis revealed that certain features, such as content length, title sentiment, and the timing of publication, 
have a significant impact on predicting the number of shares an article will receive.

Moving forward, continuous monitoring and updating of the model with new data will ensure that it stays relevant and accurate over time.