This project is an experiment to see how one can incorporate real world knowledge into classical ML models. 

Deep Learning models are not competitive in tabular datasets. Models like XGBoost are on par with the best DL approaches have to offer but require considerably less training time and can be tuned easily. On the other hand, DL models can levarage large scale language models like GPT-2. These language models have been trained on a trove of information and contain knowledge that can improve downstream performance on many tasks. A typical way DL models incorporate pre-trained language models is by incorporating them as a submodule and training on the whole dataset. Such approaches cannot be easily extended onto classical ML models that dont use gradient descent as the way to optimize them.

The "knowledge" from language models are usually contained in the embeddings they generate. Since these embeddings are just vectors, we can just use them as the inuts of classical ML models. So how do we get these embeddings? A simple way would be to just convert a row in a table as a sentence and feed it to the language model. But the way we construct our sentence has a big impact on the embeddings generated by the language model. Language models are trained to missing or the next word words given a sentence. If we naively convert a table row to a sentence, the model might not get the task that we are expecting it to do (eg. generating an embedding that can help a decision tree regressor better predict a certain value. Ideally this embedding will contain some external information that can be used to infer our target variable).

So how can we get the model to give us the "right" knowledge? This is where prompting language models come in. Prompting involves crafting our input in such a way that the it helps the language model get the larger context of what it expects us to do. Prompting has been succesfull in making large language models like GPT-3 perform zero-shot tasks on a variety of tasks that it was never trained for.

There are a few ways of crafting such prompts. Here we will be trying out something called **prompt tuning**. Before we dive into prompting, let us first get a dataset for which we can easily incorporate some real world knowledge and build a baseline XGBoost model. 

### Imports

In [1]:
from copy import deepcopy
import itertools
from functools import partial

import polars as pl
import numpy as np

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
from sklearn.metrics import mean_squared_error

import torch
from torch.optim import Adam
from torch.nn import CosineEmbeddingLoss
from torch.utils.data import DataLoader

from transformers import AutoTokenizer, AutoModel
from sentence_transformers import SentenceTransformer, util
import xgboost

# custom modules
from data_utils import load_data
from model import PrefrozenEmbeddings, embed_sentences
from prompt_utils import *

from tqdm.notebook import tqdm


  from pandas import MultiIndex, Int64Index


### Load the Dataset

> We will be using a library called Polars for loading our datasets into dataframes. Polars is simiar to Pandas but provides a more elegant API

In [2]:
df = load_data(2002, 2005)
df.head()

Year,Month,DayofMonth,DayOfWeek,Origin,counts,airport
i64,i64,i64,i64,str,u32,str
2002,1,28,1,"""ANC""",48,"""Ted Stevens Anchorage International"""
2002,12,11,3,"""MEM""",135,"""Memphis International"""
2003,1,21,2,"""PHX""",486,"""Phoenix Sky Harbor International"""
2003,3,18,2,"""GRR""",40,"""Kent County International"""
2003,5,14,3,"""DSM""",29,"""Des Moines International"""


The dataset above contains the number of flights from a specific airport in the United States during a particular date. The number of flights is given by the counts column. Our task is to predict this number, given the date and airport.

Origin represents the unique iata indentifier for the airport in question. The actual name of the airport is given in the 'airport' column

Right away we can see how external knowledge can help us here. Take an airport like Chicago O'Hare International. It is known to be pretty busy with an average of around 1000 flights coming and going out of the airport per day. Compare that to another airport like Dawson Community Airport, one of the quietest airports in the United States, we can easily guess the average number of flights per day.

Let's take an XGBoost Regressor as the baseline and see how well it fares on this dataset. As usual with any ML task, we need to first split the dataset and prep it for the XGBoost model. The following code does this

Split the dataset into train and test sets

In [3]:
# Take data from 2002 to 2004 inclusive
df_train = df.filter((pl.col('Year') >= 2002) & (pl.col('Year') <= 2004))
# Shuffle the dataframe using train_test_split. 0.0 as test_size ensures that all the data is forwarded to the test_split
df_train, _ = train_test_split(df_train, test_size=0.1)

# Take data from the year 2005 for test split
df_test = df.filter((pl.col('Year') <= 2005) & (pl.col('Year') >= 2005))
# Shuffle the data and take a random sample of 1/2 of the total data
_, df_test = train_test_split(df_test, test_size=0.5)

In [4]:
# Need to make sure that there are no airports present in test split that are not in trian split
train_airports = set(df_train.select('Origin').distinct()['Origin'].to_list())
test_airports = set(df_test.select('Origin').distinct()['Origin'].to_list())

df_test = df_test.filter(
    pl.col('Origin').is_in(list(test_airports - train_airports)).is_not()
)

---

### Train a baseline XGBoost model

With our train and test sets defined, lets train an XGBoost regression model. For the sake of simplicity, lets just use the default model parameters instead of doing a hyperparameter search

Featurize the train and test sets

In [5]:
# We need to convert each airport id from a string to an integer. Using the Label encoder from scikit-learn for this purpose
# The label encoder is fit for the whole dataset to prevent OOV when trying to transform the test set
all_origins = df.select('Origin').distinct().Origin.to_numpy()
origin_encoder = LabelEncoder()
origin_encoder.fit(all_origins)

# Applying the fitted label encoder to get the featurize the train set
X_train = df_train.with_column(
    pl.Series('origin_encoded', origin_encoder.transform(df_train.Origin.to_numpy()))
).select([
    pl.all().exclude(['Year', 'Origin', 'counts', 'airport', 'Month_name'])
]).to_numpy()

y_train = df_train.counts.to_numpy()


# Applying the fitted label encoder to get the featurize the test set
X_test = df_test.with_column(
    pl.Series('origin_encoded', origin_encoder.transform(df_test.Origin.to_numpy()))
).select([
    pl.all().exclude(['Year', 'Origin', 'counts', 'airport', 'Month_name'])
]).to_numpy()

y_test = df_test.counts.to_numpy()

Train the baseline XGBoost model

In [6]:
xgb = xgboost.XGBRegressor(
    objective='reg:squarederror',
    n_jobs = -1,
)

model = xgb.fit(X_train, y_train)
y_pred = model.predict(X_test)

r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)

print(f'Mean Absolute Error (MSE): \t{mae} \n\
Mean Squared Error (MAE): \t{mse} \n\
R2 Score (R2):\t\t\t{r2}')

Mean Absolute Error (MSE): 	12.653685640736983 
Mean Squared Error (MAE): 	1106.991683366821 
R2 Score (R2):			0.950744653908094


The above metrics gives the performance of our baseline model. We want MAE and MSE to decrase while R2 to increase if we have a better model than the above baseline one.

---

### Prompting

Before going into prompt tuning, lets try out the approach of converting a table row into a sentence. Ideally we want to use only columns that makes sense to a language model

In [7]:
# Print all the columns in the dataframe
df.columns

['Year', 'Month', 'DayofMonth', 'DayOfWeek', 'Origin', 'counts', 'airport']

Right away we can see that the **airport** column is a good candidate for feeding into the language model. Take an airport like *Chicago O'Hare International*. It is known to be pretty busy with an average of around 1000 flights coming and going out of the airport per day. Compare that to another airport like Dawson Community Airport, one of the quietest airports in the United States, we can easily guess the average number of flights per day. 

A large language model should have come across these associations. If we can somehow craft a sentence that to tease this knowledfe out, we can feed it into an XGBoost model for it to learn.

One idea is to craft a of query and a set of value sentences as follows:

>**Query**:
>
>How crowded is the Chicago O'Hare International airport?

>**Values**: 
>- The Chicago O'Hare International airport is very crowded. There are more than 800 flights every day
>- The Chicago O'Hare International airport moderately crowded. There around 400 flights every day
>- The Chicago O'Hare International airport is slightly crowded. There are around 100 flights every day
>- The Chicago O'Hare International airport is not crowded. There are less than 50 flights every day

Here we have a set of value sentences that tells the level of *crowdedness* of the airport. We can ask a language model for the value sentence that makes the most sense given the query sentence. Chicago O'Hare International airport is pretty crowded and it has more than 800 flights everyday (on average). This is sort of a Question Answer task. Here the right answer is the 1st sentence in the values list. Hopefully the language model has learnt enough to pick the right one. Also note that the value sentence has an estimate of the value that we want to predict (the number of flights from the airport). This information should help the downstream model that we use for the regression task.

So how do we use this in practice and how can we feed the information in the correct value sentence to an XGBoost model?

A straightforward (maybe a bit simplistic) approach is to convert each sentence into an embedding. Thus we have a query embedding and a set of value embeddings. Finding the right value sentence can be seen as selecting the value embedding that has the highest cosine similarity score with the query embedding.

> This approach of using cosine similarity doesnt always work, hence why I mentioned it is a bit simplistic. We will look at a better approach to prompting later.

After getting the right value embedding, we can feed it as part of the input to an XGBoost model. We can use the *sentence_transformers* library for the purpose of embedding query and value sentences. 

In [8]:
# The main task of the following function is to return a set of embeddings that capture the level of `crowdedness` of airports.
# The df variable is a dataframe that can be either the train or test split of the original dataset. This function returns a matrix
# having the same number of rows as df. 
# Each row in the matrix is an embedding that captures the level of `crowdedness` of the airport
# present in the same row of the df variable.
def generate_embeddings(df, language_model):
    # Remove duplicate entries in the df. This is to prevent duplicated effort when embedding sentences
    origins = df.select(['Origin', 'airport']).distinct()

    embeddings = []
    for airport in origins['airport']:
        # For every airport, craft the possible query and value sentences. Then embed them using the language model
        query = language_model.encode(f'How crowded is the {airport} airport?')
        values = language_model.encode([
            f'The {airport} airport is very crowded. There are more than 800 flights every day',
            f'The {airport} airport is moderately crowded. There around 400 flights every day',
            f'The {airport} airport is slightly crowded. There are around 100 flights every day',
            f'The {airport} airport is not crowded. There are less than 50 flights every day'
        ])

        # Find the most similar value embedding to a query and append it to the list
        sims = util.cos_sim(query, values)
        most_sim_idx = np.argmax(sims)
        embeddings.append(values[most_sim_idx])

    embeddings = np.vstack(embeddings)
    origins = origins.with_column(pl.Series('prompt_embeddings', embeddings))

    # Since we had removed duplicate entries before, we need to repopulate them when we eventually merge
    # the embeddings data with other dataset features.
    origin_embedding_map = {k:v for k,v in origins.select(['Origin', 'prompt_embeddings']).rows()}
    embeddings = df.select(
        pl.col('Origin').apply(lambda x: origin_embedding_map[x]).alias('embeddings')
    )
    embeddings = embeddings['embeddings'].to_list()
    embeddings = np.vstack(embeddings)
    return embeddings

Generate the airport embeddings for both train and test set. Instead of a large scale model like BERT as our language model, lets use a much smaller one to keep our computational overhead low

In [9]:
language_model = SentenceTransformer('all-MiniLM-L6-v2', device='cuda', cache_folder='model_cache')
X_embs_train = generate_embeddings(df_train, language_model)
X_embs_test = generate_embeddings(df_test, language_model)

Featurize the train and test sets and attach the airport embeddings that we got earlier

In [10]:
all_origins = df.select('Origin').distinct().Origin.to_numpy()
origin_encoder = LabelEncoder()
origin_encoder.fit(all_origins)

X_train = df_train.with_column(
    pl.Series('origin_encoded', origin_encoder.transform(df_train.Origin.to_numpy()))
).select([
    pl.all().exclude(['Year', 'Origin', 'counts', 'airport', 'Month_name'])
]).to_numpy()

X_train = np.hstack([X_train, X_embs_train])
y_train = df_train.counts.to_numpy()


X_test = df_test.with_column(
    pl.Series('origin_encoded', origin_encoder.transform(df_test.Origin.to_numpy()))
).select([
    pl.all().exclude(['Year', 'Origin', 'counts', 'airport', 'Month_name'])
]).to_numpy()

X_test = np.hstack([X_test, X_embs_test])
y_test = df_test.counts.to_numpy()

Train the knowledge enchanged XGBoost model

In [11]:
xgb = xgboost.XGBRegressor(
    objective='reg:squarederror',
    n_jobs=-1)

model = xgb.fit(X_train, y_train)
y_pred = model.predict(X_test)

r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)

print(f'Mean Absolute Error (MSE): \t{mae} \n\
Mean Squared Error (MAE): \t{mse} \n\
R2 Score (R2):\t\t\t{r2}')

Mean Absolute Error (MSE): 	11.818757277005197 
Mean Squared Error (MAE): 	1035.2457913406263 
R2 Score (R2):			0.9539369712448194


We can already see an improvement in performance by incorporating those embeddings. But this approach comes with a lot of downsides:
- How do we make sure that the format of the query and value sentences are okay?
- We have defined 4 levels of *crowdedness* for our value sentences. What if we used only 3?
- In each level, we have mentioned a threshold. Is that the right one to use?
- What if we want to incorporate another attribute like month name?

I arrived at the template you saw before after a lot of trial and error. This approach is not feasible in the long run

---

### Prompt Tuning

Till now we have been trying to craft the model to get it to do what we want. Prompt tuning turns the idea around; make the model decide the kind of input it needs to see to get the desired output. This idea was first specified in this [paper](https://arxiv.org/abs/2104.08691) where it was used in the context of NLP tasks. Here we extend this idea onto tabular datasets.

<br>

So how does it work? Lets start with the same query value approach as last time but we here we use only 1 value statement

>Query -  _ _ _ _ _ ; Airport Chicago O'Hare International, Number of flights: _

>Value - Airport: Chicago O'Hare International, Number of flights: 800

Notice the _ present in the query statement. There are a bunch of them at the start of the sentence and another one at the end. We are going to let the model figure out the way to fill in those blanks in such a way that the cosine similarity of the filled query embedding and the value embedding is maximized. 


So how do we make the model fill them? The idea is to introduce special trainable embeddings. During the training phase the language model decides how these embeddings should be modified. To make things easier for the language model, each airport has its own set of trainable embeddings. This of these as specific notes that the language model creates for each airport that helps it decide the number of flights. Our query sentence, in practice looks like the one below:

>Query: [ORD_1] [ORD_2] [ORD_3] [ORD_4] [ORD_5]; Airport Chicago O'Hare International, Number of flights: [VAL]

ORD is the iata code for Chicago O'Hare International airport (the *Origin* column in the dataframe has this information). [ORD_1] is a special token that represents the first trainable embedding for the Chicago O'Hare International airport. [VAL], on the other hand, is a special token to represent the value of the number of flights.

Generate generate query and value pairs using the prompt tuning template

In [12]:
query_prompt_format = pl.format(
    '{}; Airport: {}, Number of flights: {}', 
    pl.col('airport_tokens'), pl.col('airport'), pl.lit('[VAL]')
)

value_prompt_format = pl.format(
    'Airport: {}, Number of flights: {}', 
    pl.col('airport'), pl.col('counts')
)

num_prompts = 8
query_value_prompts_train = generate_query_value_prompts(df_train, query_prompt_format, value_prompt_format, num_prompts)
query_value_prompts_test = generate_query_value_prompts(df_test, query_prompt_format, value_prompt_format, num_prompts)

query_value_prompts_train.head()

query,value
str,str
"""[FAY_0] [FAY_1] [FAY_2] [FAY_3] [FAY_4] [FAY_5] [FAY_6] [FAY_7]; Airport: Fayetteville Municipal, Number of flights: [VAL]""","""Airport: Fayetteville Municipal, Number of flights: 0"""
"""[CVG_0] [CVG_1] [CVG_2] [CVG_3] [CVG_4] [CVG_5] [CVG_6] [CVG_7]; Airport: Cincinnati Northern Kentucky Intl, Number of flights: [VAL]""","""Airport: Cincinnati Northern Kentucky Intl, Number of flights: 100"""
"""[ORD_0] [ORD_1] [ORD_2] [ORD_3] [ORD_4] [ORD_5] [ORD_6] [ORD_7]; Airport: Chicago O'Hare International, Number of flights: [VAL]""","""Airport: Chicago O'Hare International, Number of flights: 900"""
"""[CHO_0] [CHO_1] [CHO_2] [CHO_3] [CHO_4] [CHO_5] [CHO_6] [CHO_7]; Airport: Charlottesville-Albermarle, Number of flights: [VAL]""","""Airport: Charlottesville-Albermarle, Number of flights: 0"""
"""[CSG_0] [CSG_1] [CSG_2] [CSG_3] [CSG_4] [CSG_5] [CSG_6] [CSG_7]; Airport: Columbus Metropolitan, Number of flights: [VAL]""","""Airport: Columbus Metropolitan, Number of flights: 0"""


The above gives a sample of how the query and value statements look like for some other airports. Here we have set 8 as the number of special trainable embeddings per airport. This can be modified by setting the **num_prompts** variable

Now we are in a position to download the necessary language model and modify it to accomodate these special embeddings. As before, we will be using a small language model instead of BERT. The way we going to train this model is by freezing everythin in the language model except the special token embeddings that we are going to add into it. I found this method, instead of finetuning the whole model, to be more reliable.

Download and set up the language model

In [13]:
airport_tokens = df.select('Origin').distinct()
airport_tokens = airport_tokens['Origin']
airport_tokens = airport_tokens.apply(partial(airport_token_sequencer, num_prompts=num_prompts)).to_list()
airport_tokens = list(itertools.chain.from_iterable(airport_tokens))

# Get the language model and tokenizer from huggingface
tokenizer = AutoTokenizer.from_pretrained('nreimers/MiniLM-L6-H384-uncased')
language_model = AutoModel.from_pretrained('nreimers/MiniLM-L6-H384-uncased')

# Add the special tokens into the tokenizer
num_added_tokens = tokenizer.add_tokens(airport_tokens, special_tokens=True)    # Add airport tokens
num_added_tokens += tokenizer.add_tokens(['[VAL]'], special_tokens=True)        # Add [VAL] token
assert num_added_tokens == len(airport_tokens) + 1

# create a copy of the original word token embeddings
pretrained_word_embeddings = deepcopy(language_model.embeddings.word_embeddings)
# call resize_token_embeddings to accomodate the new special tokens
language_model.resize_token_embeddings(len(tokenizer))

# freeze all layers in the model
for param in language_model.parameters():
    param.requires_grad = False

# create an instance of PrefrozenEmbeddings. This lets us have an embedding layer where only a portion of it is not frozen
prefrozen_word_embeddings = PrefrozenEmbeddings(pretrained_word_embeddings, num_added_tokens)
# update the instance of word_embeddings to prefrozen ones
language_model.embeddings.word_embeddings = prefrozen_word_embeddings

Set up dataloaders for training

In [14]:
train_dataloader = DataLoader(
    query_value_prompts_train.distinct().to_numpy(),        # type: ignore
    shuffle=True, batch_size=128, collate_fn=lambda x: x
)
      
test_dataloader = DataLoader(
    query_value_prompts_test.distinct().to_numpy(),         # type: ignore
    shuffle=True, batch_size=128, collate_fn=lambda x: x
)        

In [15]:
def train_step(model, optimizer, criterion, tokenizer, device):
    epoch_train_loss = 0.0
    model.train()
    for batch in train_dataloader:
            batch = np.vstack(batch)
            query_embeddings = embed_sentences(batch[:, 0].tolist(), model, tokenizer, device)
            value_embeddings = embed_sentences(batch[:, 1].tolist(), model, tokenizer, device)
            labels = torch.ones(batch.shape[0], dtype=torch.int64).to(device)
            loss = criterion(query_embeddings, value_embeddings, labels)

            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            epoch_train_loss += loss.item()
    
    return epoch_train_loss


def valid_step(model, criterion, tokenizer, device):
    epoch_valid_loss = 0.0
    model.eval()
    with torch.no_grad():
            for batch in test_dataloader:
                batch = np.vstack(batch)
                query_embeddings = embed_sentences(batch[:, 0].tolist(), model, tokenizer, device)
                value_embeddings = embed_sentences(batch[:, 1].tolist(), model, tokenizer, device)
                labels = torch.ones(batch.shape[0], dtype=torch.int64).to(device)
                loss = criterion(query_embeddings, value_embeddings, labels)

                epoch_valid_loss += loss.item()

    return epoch_valid_loss


def train(model, optimizer, criterion, device):
    train_losses = []
    valid_losses = []

    num_epocs = 100
    for e in tqdm(range(num_epocs)):

        epoch_train_loss = train_step(model, optimizer, criterion, tokenizer, device)
        epoch_valid_loss = valid_step(model, criterion, tokenizer, device)

        train_losses.append(epoch_train_loss / len(train_dataloader))
        valid_losses.append(epoch_valid_loss / len(test_dataloader))
        
        if e%10 == 0 or e == (num_epocs-1) or e == 0:
            print(e, train_losses[-1], valid_losses[-1]) 

Train the language model with the Adam optimizer

In [16]:
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
optimizer = Adam(language_model.parameters(), lr=1e-2)
criterion = CosineEmbeddingLoss()
language_model = language_model.to(device)

train(language_model, optimizer, criterion, device)

  0%|          | 0/100 [00:00<?, ?it/s]

0 0.09843588123718898 0.07993354896704356
10 0.07015504688024521 0.04951440046230952
20 0.05861681327223778 0.03597246979673704
30 0.05252701913317045 0.028399234637618065
40 0.04823049157857895 0.024140097200870514
50 0.04554252326488495 0.021263479565580685
60 0.043673320362965264 0.01907672422627608
70 0.041288524866104126 0.017413972566525143
80 0.04170997813344002 0.016202207033832867
90 0.039898098756869636 0.015267488546669483
99 0.03942900151014328 0.014578453886012236


In [17]:
# The main task of the following function is to return a set of embeddings from the language model that we trained just now.
# The df variable is a dataframe that can be either the train or test split of the original dataset. This function returns a matrix
# having the same number of rows as df. 
# Each row in the matrix is an embedding that captures the level of `crowdedness` of the airport present in the same row of the dataframe
def generate_embeddings(df, model, tokenizer, device):
    
    # For every distinct airport in the dataframe, get the appropriate query and value formats for it.
    # The query and value entences will be in the format that was described just before.
    query_prompts = pl.concat([
        df.select('Origin'),
        generate_query_value_prompts(
            df, query_prompt_format, value_prompt_format, num_prompts
        ).select('query')
    ], how='horizontal').distinct()
    
    with torch.no_grad():
        # Use the trained model to convert the query sentences into embeddings
        prompt_embs_train = embed_sentences(
            query_prompts.query.to_list(), 
            model, tokenizer, device
        ).cpu()

    # Since we removed duplicate entries to speed up calculations, we need to repopulate them
    origin_prompt_emb_map = dict()
    for i in range(query_prompts.shape[0]):
        origin = query_prompts['Origin'][i]
        emb = prompt_embs_train[i, :].numpy()

        origin_prompt_emb_map[origin] = emb

    return np.vstack(
        df.select([
            pl.col('Origin').apply(lambda x: origin_prompt_emb_map[x]).alias('emb')
        ]).emb.to_list()
    )

Get the query embeddings for both train and test split from the trained model. We will be using them to train the XGBoost model

In [18]:
X_embs_train = generate_embeddings(df_train, language_model, tokenizer, device)
X_embs_test = generate_embeddings(df_test, language_model, tokenizer, device)

Prepare the input features and labels for the XGBoost model

In [19]:
all_origins = df.select('Origin').distinct().Origin.to_numpy()
origin_encoder = LabelEncoder()
origin_encoder.fit(all_origins)

X_train = df_train.with_column(
    pl.Series('origin_encoded', origin_encoder.transform(df_train.Origin.to_numpy()))
).select([
    pl.all().exclude(['Year', 'Origin', 'counts', 'airport', 'Month_name'])
]).to_numpy()

X_train = np.hstack([X_train, X_embs_train])
y_train = df_train.counts.to_numpy()

X_test = df_test.with_column(
    pl.Series('origin_encoded', origin_encoder.transform(df_test.Origin.to_numpy()))
).select([
    pl.all().exclude(['Year', 'Origin', 'counts', 'airport', 'Month_name'])
]).to_numpy()

X_test = np.hstack([X_test, X_embs_test])
y_test = df_test.counts.to_numpy()

Train the XGBoost model with the extra embeddings

In [20]:
xgb = xgboost.XGBRegressor(
    objective='reg:squarederror', 
    n_jobs=-1)

model = xgb.fit(X_train, y_train)
y_pred = model.predict(X_test)

r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)

print(f'Mean Absolute Error (MSE): \t{mae} \n\
Mean Squared Error (MAE): \t{mse} \n\
R2 Score (R2):\t\t\t{r2}')

Mean Absolute Error (MSE): 	11.834416948577415 
Mean Squared Error (MAE): 	1037.6167624434936 
R2 Score (R2):			0.9538314754186082


---