## Tree Based Modeling for Interval Target (SAS Viya)

**EXAMPLE:** Tree Based Modeling for Class Target using Python & SAS Viya  
**DATA SOURCE:**  
Data: bike_sharing_demand.csv   
Fanaee-T, H. (2013). Bike Sharing Dataset. UCI Machine Learning Repository. [Link](https://doi.org/10.24432/C5W894) 

**DESCRIPTION:** This template demonstrates a workflow for preprocessing data in Python and building predictive models using tree-based modeling techniques in SAS Viya.  
**PURPOSE:** The goal is to predict the count of bikes rented per hour using various predictor variables, such as weather, season, temperature, hour, month, and weekday.  
**DETAILS:**  
- Models built in SAS Viya include: Decision Tree, Forest, and Gradient Boosting
- Score the validation and test data
- Model Assessment & Model Comparison: Feature Importance and Mean Square Error plots


In [None]:
# Importing necessary libraries
import os
import pandas as pd
import warnings
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
from sasviya.ml.tree import DecisionTreeRegressor, ForestRegressor, GradientBoostingRegressor

# Suppress warnings
warnings.filterwarnings("ignore")

### Data Loading and Preprocessing
- **Importing Data and Defining Variables**
    - Load the dataset
    - Define variables necessary for further analysis.

In [None]:
# Construct the workspace path
workspace = f"{os.path.abspath('')}/../../data/"

# Importing Data and Defining Variables
data = pd.read_csv(workspace + "bike_sharing_demand.csv")

# Splitting the data into Train, Validation, and Test sets (40% Train, 30% Validation, 30% Test)
train_data, temp_test_data = train_test_split(data, test_size=0.6, random_state=42)
val_data, test_data = train_test_split(temp_test_data, test_size=0.5, random_state=42)

# Create X and y variables for modeling
X_train, y_train = train_data.drop(columns=['count']), train_data['count']
X_val, y_val = val_data.drop(columns=['count']), val_data['count']
X_test, y_test = test_data.drop(columns=['count']), test_data['count']

# Print first 5 rows of train dataset
print("Top 5 rows of bikesharing train dataset:")
print(train_data.head(5))

### Decision Tree Model Training, Scoring and Evaluation  

For more information regarding SAS Viya Decision Tree Regressor, refer to [this link](https://documentation.sas.com/?cdcId=workbenchcdc&cdcVersion=default&docsetId=explore&docsetTarget=n1q2r5bpivhhavn1kcqs4cf2ase6.htm).


In [None]:
# Initialize the SAS Viya Decision Tree Regressor
sas_dtree = DecisionTreeRegressor(criterion='variance')

# Fit the model
sas_dtree.fit(X_train, y_train)

# Make predictions on training data
train_predictions = sas_dtree.predict(X_train)

# Evaluate model's performance on training data
dt_train_mse = mean_squared_error(y_train, train_predictions)
print(f"Training Mean Squared Error (Decision Tree): {dt_train_mse:.3f}")

# Make predictions on validation data
val_predictions = sas_dtree.predict(X_val)

# Evaluate model's performance on validation data
dt_val_mse = mean_squared_error(y_val, val_predictions)
print(f"Validation Mean Squared Error (Decision Tree): {dt_val_mse:.3f}")

# Make predictions on test data
test_predictions = sas_dtree.predict(X_test)

# Evaluate model's performance on test data
dt_test_mse = mean_squared_error(y_test, test_predictions)
print(f"Test Mean Squared Error (Decision Tree): {dt_test_mse:.3f}")

### Forest Model Training, Scoring and Evaluation
For more information regarding SAS Viya Forest Regressor, refer to [this link](https://documentation.sas.com/?cdcId=workbenchcdc&cdcVersion=default&docsetId=explore&docsetTarget=n0ridmyac4ramsn10hbcolej70jv.htm).


In [None]:
# Initialize the SAS Viya Forest Regressor
sas_forest_model = ForestRegressor(n_estimators=100, random_state=42)

# Fit the model
sas_forest_model.fit(X_train, y_train)

# Make predictions on training data
rf_train_predictions = sas_forest_model.predict(X_train)

# Evaluate model's performance on training data
rf_train_mse = mean_squared_error(y_train, rf_train_predictions)
print(f"Training Mean Squared Error (Forest): {rf_train_mse:.3f}")

# Make predictions on validation data
rf_val_predictions = sas_forest_model.predict(X_val)

# Evaluate model's performance on validation data
rf_val_mse = mean_squared_error(y_val, rf_val_predictions)
print(f"Validation Mean Squared Error (Forest): {rf_val_mse:.3f}")

# Make predictions on test data
rf_test_predictions = sas_forest_model.predict(X_test)

# Evaluate model's performance on test data
rf_test_mse = mean_squared_error(y_test, rf_test_predictions)
print(f"Test Mean Squared Error (Forest): {rf_test_mse:.3f}")

### Gradient Boosting Model Training, Scoring and Evaluation
For more information regarding SAS Viya Gradient Boosting Regressor, refer to [this link](https://documentation.sas.com/?cdcId=workbenchcdc&cdcVersion=default&docsetId=explore&docsetTarget=p1qf6527qwg4g5n179gglp8xzlgi.htm).


In [None]:
# Initialize the SAS Viya Gradient Boosting Regressor
sas_gb_model = GradientBoostingRegressor(n_estimators=100, random_state=42,calc_feature_importances=True)

# Fit the model
sas_gb_model.fit(X_train, y_train)

# Make predictions on training data
gb_train_predictions = sas_gb_model.predict(X_train)

# Evaluate model's performance on training data
gb_train_mse = mean_squared_error(y_train, gb_train_predictions)
print(f"Gradient Boosting Training Mean Squared Error: {gb_train_mse:.3f}")

# Make predictions on validation data
gb_val_predictions  = sas_gb_model.predict(X_val)

# Evaluate model's performance on validation data
gb_val_mse = mean_squared_error(y_val, gb_val_predictions)
print(f"Gradient Boosting Validation Mean Squared Error: {gb_val_mse:.3f}")

# Make predictions on test data
gb_test_predictions = sas_gb_model.predict(X_test)

# Evaluate model's performance on test data
gb_test_mse = mean_squared_error(y_test, gb_test_predictions)
print(f"Test Mean Squared Error (Gradient Boosting): {gb_test_mse:.3f}")

### Plot Variable Importance chart


In [None]:
def plot_feature_importances(model, X_train):
    """
    Plot the feature importances for a given model based on the training partition.

    Parameters:
    model (object): Trained regression model with a `feature_importances_` attribute.
    X_train (DataFrame):DataFrame containing the training features used to train the model.

    """
    # Extract feature importances from the model
    feature_importances = model.feature_importances_
    column_names = X_train.columns.tolist()
    matching_columns = set(column_names) & set(feature_importances['Variable'].tolist())
    importance_df = feature_importances[feature_importances['Variable'].isin(matching_columns)]
    importance_df.sort_values(by='Importance', ascending=False, inplace=True)

    # Plot variable importances
    plt.figure(figsize=(8, 6))
    plt.bar(importance_df['Variable'], importance_df['Importance'])
    plt.xlabel('Feature')
    plt.ylabel('Importance')
    plt.title(f'{model.__class__.__name__} Variable Importance')
    plt.xticks(rotation=45)
    plt.show()

models = [sas_dtree, sas_forest_model, sas_gb_model]
# Plot feature importances for each model based on the training partition
for model in models:
    plot_feature_importances(model, X_train)

### Overall Model Comparsion
&emsp; Compare Mean Squared Error (MSE) across the models

In [None]:
# Define partitions
partitions = ['Training', 'Validation', 'Test']

# Define model names
model_names = ['Decision Tree', 'Random Forest', 'Gradient Boosting']

# Define MSE values for each model and partition
mse_values = {
    'Decision Tree': [dt_train_mse, dt_val_mse, dt_test_mse],
    'Random Forest': [rf_train_mse, rf_val_mse, rf_test_mse],
    'Gradient Boosting': [gb_train_mse, gb_val_mse, gb_test_mse]
}

# Plotting the MSE values for each model by partition
plt.figure(figsize=(10, 6))
for model in model_names:
    mse_vals = mse_values[model]
    plt.plot(partitions, mse_vals, marker='o', label=model)

plt.xlabel('Partition')
plt.ylabel('Mean Squared Error')
plt.title('Mean Squared Error for Different Models by Partition')
plt.legend()
plt.grid(True)
plt.show()
