<a id='introduction'></a>
# Introduction

This offline notebook uses [my other notebook](https://www.kaggle.com/angyalfold/roberta-large-k-fold-models/)'s model and tokenizer to make predictions. The idea here is to use 
Roberta large + SVM regressor, Roberta large + ridge regressor, Roberta large + custom NN regressor and combine the results (ensemble). Each prediction would use k=5 pre-trained model for making predictions. The pre-trained models were created in [my second notebook](https://www.kaggle.com/angyalfold/roberta-large-k-fold-models) 

This notebook is part of a series:
1. Pretrain roberta large on the CommonLit dataset [here](https://www.kaggle.com/angyalfold/pretrain-roberta-large-on-clrp-data/).
2. Produce k models which can later be used for determining the readability of texts [here](https://www.kaggle.com/angyalfold/roberta-large-k-fold-models).
3. Make predictions with a custom NN regressor [here](https://www.kaggle.com/angyalfold/roberta-large-with-custom-regressor-pytorch/).
4. Ensemble (Roberta large + SVR, Roberta large + Ridge, Roberta large + custom NN head) (this notebook)

I use Maunish' [notebook](https://www.kaggle.com/maunish/clrp-roberta-svm/) as reference.

<a id="toc"></a>
# Table of contents
* [Introduction](#introduction)
* [Classes & configs](#classes)
    * [Configs](#classes_config)
    * [Data set](#classes_data_set)
    * [Model](#classes_model)
* [Read data](#read_data)
    * [Read train data](#read_data_train_data)
    * [Read test data](#read_data_test_data)
* [Setup](#setup)
    * [RMSE score](#setup_rmse_score)
    * [Convert pandas' dataframes to dataloader](#setup_dataframe_to_dataloader)
    * [Get embeddings from model](#setup_embeddings_from_model)
    * [Get predictions with a given regressor](#setup_prediction_with_regressor)
    * [Get predictions with custom NN regressor](#setup_prediction_with_nn)
    * [SVR](#setup_svr)
    * [Ridge regressor](#setup_ridge)
* [Make predictions](#make_predictions)
    * [Run models with regressor](#make_predictions_run_models_with_regressor)
    * [Run models with NN](#make_predictions_run_models_with_nn)
    * [Get predictions](#make_predictions_get_predictions)
* [Save results](#save_results)

<a id='classes'></a>
# Classes & configs
[[back to top]](#toc)

<a id='classes_config'></a>
## Config
[[back to top]](#toc)

In [None]:
import torch

config = {
    'batch_size': 8,
    'best_pretrained_roberta_folder': '../input/pretrain-roberta-large-on-clrp-data/clrp_roberta_large/best_model/',
    'num_of_folds': 5,
    'num_of_models': 5,
    'seed': 2021,
    'sentence_max_length': 256
}

for (k, v) in config.items():
    print(f"The value for {k}: {v}")
    
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

print(f"Device: {device}")

<a id=classes_data_set></a>
## Data set
[[back to top]](#toc)

In [None]:
import torch

class ReadabilityDataset(torch.utils.data.Dataset):
    """Custom dataset for the Readability task"""
    def __init__(self, encodings):
        self.encodings = encodings
    
    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        return item
        
    def __len__(self):
        return len(self.encodings['input_ids'])
    

print(ReadabilityDataset.__doc__)

<a id=classes_model></a>
## Model
[[back to top]](#toc)

Note, that the concept of attention is awesomely explained in [Lena Voita](https://lena-voita.github.io/)'s excellent notebook [here](https://lena-voita.github.io/nlp_course/seq2seq_and_attention.html).

In [None]:
from torch import nn

class AttentionHead(nn.Module):
    """Class implementing the attention head of the model."""
    def __init__(self, in_features, hidden_dim):
        super().__init__()
        self.in_features = in_features
        self.middle_features = hidden_dim
        self.W = nn.Linear(in_features, hidden_dim)
        self.V = nn.Linear(hidden_dim, 1)
        self.out_features = hidden_dim
       
    
    def forward(self, features):
        att = torch.tanh(self.W(features))
        score = self.V(att)
        attention_weights = torch.softmax(score, dim=1)
        context_vector = attention_weights * features
        
        return torch.sum(context_vector, dim=1)


print(AttentionHead.__doc__)

On top of *ReadabilityRobertaModel* a regressor, such as SVM or Ridge could be placed. That way a regressor could assign a continous value to the output of *ReadabilityRobertaModel*.

In [None]:
from torch import nn
from transformers import RobertaModel
from transformers import RobertaConfig

class ReadabilityRobertaModel(nn.Module):
    """Custom model for the Readability task containing a Roberta layer and a custom NN head."""
        
    def __init__(self):
        super(ReadabilityRobertaModel, self).__init__()
        
        self.model_config = RobertaConfig.from_pretrained(config['best_pretrained_roberta_folder'])
        self.model_config.update({
            "output_hidden_states": True,
            "hidden_dropout_prob": 0.0,
            "layer_norm_eps": 1e-7
        })
        
        self.roberta = RobertaModel.from_pretrained(config['best_pretrained_roberta_folder'],
                                                    config=self.model_config)
        self.attention_head = AttentionHead(self.model_config.hidden_size, 
                                            self.model_config.hidden_size)
        self.dropout = nn.Dropout(0.1)
        self.regressor = nn.Linear(self.model_config.hidden_size, 1)
        
        
    def forward(self, tokens, attention_mask):
        x = self.roberta(input_ids=tokens, attention_mask=attention_mask)[0]
        x = self.attention_head(x)

        return x
    
    
    def freeze_roberta(self):
        """
        Freezes the parameters of the Roberta model so when ReadabilityRobertaModel is 
        trained only the wieghts of the custom regressor are modified.
        """
        for param in self.roberta.named_parameters():
            param[1].requires_grad=False
    
    def unfreeze_roberta(self):
        """
        Unfreezes the parameters of the Roberta model so when ReadabilityRobertaModel is 
        trained both the wieghts of the custom regressor and of the underlying Roberta
        model are modified.
        """
        for param in self.roberta.named_parameters():
            param[1].requires_grad=True

    
print(ReadabilityRobertaModel.__doc__)

## Model with custom NN regressor

Unlike *ReadabilityRobertaModel*, *ReadabilityRobertaModelWithCustomNNHead* returns with the actual predicted value without the need for a further regressor (i.e.: the last layer of this NN functions as a regressor).

In [None]:
from torch import nn
from transformers import RobertaModel
from transformers import RobertaConfig

class ReadabilityRobertaModelWithCustomNNHead(nn.Module):
    """Custom model for the Readability task containing a Roberta layer and a custom NN head."""
        
    def __init__(self):
        super(ReadabilityRobertaModelWithCustomNNHead, self).__init__()
        
        self.model_config = RobertaConfig.from_pretrained(config['best_pretrained_roberta_folder'])
        self.model_config.update({
            "output_hidden_states": True,
            "hidden_dropout_prob": 0.0,
            "layer_norm_eps": 1e-7
        })
        
        self.roberta = RobertaModel.from_pretrained(config['best_pretrained_roberta_folder'],
                                                    config=self.model_config)
        self.attention_head = AttentionHead(self.model_config.hidden_size, 
                                            self.model_config.hidden_size)
        self.dropout = nn.Dropout(0.1)
        self.regressor = nn.Linear(self.model_config.hidden_size, 1)
        
        
    def forward(self, tokens, attention_mask):
        x = self.roberta(input_ids=tokens, attention_mask=attention_mask)[0]
        x = self.attention_head(x)
        x = self.dropout(x)
        x = self.regressor(x)
        return x
    
    
    def freeze_roberta(self):
        """
        Freezes the parameters of the Roberta model so when ReadabilityRobertaModel is 
        trained only the wieghts of the custom regressor are modified.
        """
        for param in self.roberta.named_parameters():
            param[1].requires_grad=False
    
    def unfreeze_roberta(self):
        """
        Unfreezes the parameters of the Roberta model so when ReadabilityRobertaModel is 
        trained both the wieghts of the custom regressor and of the underlying Roberta
        model are modified.
        """
        for param in self.roberta.named_parameters():
            param[1].requires_grad=True

    
print(ReadabilityRobertaModelWithCustomNNHead.__doc__)

<a id='read_data'></a>
# Read data
[[back to top]](#toc)

<a id='read_data_train_data'></a>
## Read train data
[[back to top]](#toc)

In [None]:
import pandas as pd

train_csv_path = '/kaggle/input/commonlitreadabilityprize/train.csv'
train_data = pd.read_csv(train_csv_path)

print('The total # of samples is {}.'.format(len(train_data)))

Add values to the *bin* column as described [here](https://www.kaggle.com/angyalfold/roberta-large-k-fold-models/).

In [None]:
import numpy as np

# create & fill bins column (needed for kfold)
num_of_bins = int(np.floor(1 + np.log2(len(train_data))))
train_data.loc[:,'bin'] = pd.cut(train_data['target'], bins=num_of_bins, labels=False)
bins = train_data['bin'].to_numpy()

target = train_data['target'].to_numpy()

<a id='read_data_test_data'></a>
## Read test data
[[back to top]](#toc)

In [None]:
import pandas as pd

test_csv_path = '/kaggle/input/commonlitreadabilityprize/test.csv'
test_data = pd.read_csv(test_csv_path)

print('The total # of samples is {}.'.format(test_data.shape[0]))

<a id='setup'></a>
# Setup
[[back to top]](#toc)

In the following I collected some helper methods and variables which will be needed for making the actual predictions.

<a id='setup_rmse_score'></a>
## RMSE score
[[back to top]](#toc)

In [None]:
import numpy as np

from sklearn.metrics import mean_squared_error


def rmse_score(y_true,y_pred):
    return np.sqrt(mean_squared_error(y_true,y_pred))

<a id='setup_dataframe_to_dataloader'></a>
## Convert pandas' dataframes to dataloader
[[back to top]](#toc)

In [None]:
from torch.utils.data import DataLoader

def get_dataloader_from_dataframes(df, tokenizer):
    """Converts a complete dataframe (with all columns included) into a dataloader."""
    texts = df['excerpt'].values.tolist()
    data_encodings = tokenizer(texts, max_length=config['sentence_max_length'],
                              truncation=True, padding=True)
    dataset = ReadabilityDataset(data_encodings)
    dataloader = DataLoader(dataset, batch_size=config['batch_size'])
    
    return dataloader

print(get_dataloader_from_dataframes.__doc__)

<a id='setup_embeddings_from_model'></a>
## Get embeddings from model
[[back to top]](#toc)

In [None]:
import numpy as np
import torch

from tqdm.auto import tqdm

tqdm.pandas()

def get_embeddings_from_model(model_path, data, tokenizer):
    """Get embeddings (which can then be fed to regressors) using the provided model."""
    
    # Setup model
    model = ReadabilityRobertaModel()
    model.load_state_dict(torch.load(model_path, map_location=device))
    model.to(device)
    model.eval()
    
    # create dataloader from data
    dataloader = get_dataloader_from_dataframes(data, tokenizer)
    
    # compute embeddings
    embeddings = list()
    with torch.no_grad():
        print('Getting embeddings:')
        for i, batch in enumerate(tqdm(dataloader)):
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
        
            output = model(tokens=input_ids, attention_mask=attention_mask)
            output = output.cpu().detach().numpy()
            embeddings.extend(output)
            
    return np.array(embeddings)
        
    
print(get_embeddings_from_model.__doc__)

<a id='setup_prediction_with_regressor'></a>
## Get predictions with a given regressor
[[back to top]](#toc)

In [None]:
import numpy as np

from sklearn.model_selection import StratifiedKFold


def get_predictions_with_regressor(X, y, X_test, bins, regressor):
    """This method uses SVM on top of ROberta to predict the readibility score of a text."""
    scores = list()
    preds = np.zeros(len(X_test))
    kfold = StratifiedKFold(n_splits=config['num_of_folds'], shuffle=True,
                            random_state=config['seed'])
    
    print('Getting predictions:')
    for k, (train_idx, test_idx) in enumerate(kfold.split(X, bins)):
                
        X_train, y_train = X[train_idx], y[train_idx]
        X_valid, y_valid = X[test_idx], y[test_idx]
        
        regressor.fit(X_train, y_train)
        prediction = regressor.predict(X_valid)
        score = rmse_score(y_valid, prediction)
        print(f'Fold {k} rmse_score: {score}')
        scores.append(score)
        preds += regressor.predict(X_test)
        
        
    print(f'Mean rmse: {np.mean(scores)}')
    return np.array(preds) / config['num_of_folds']


print(get_predictions_with_regressor.__doc__)

<a id='setup_prediction_with_nn'></a>
## Get predictions with custom NN regressor
[[back to top]](#toc)

In [None]:
import torch

from tqdm.auto import tqdm

tqdm.pandas()


def get_predictions_with_custom_NN_head(model_path, data, tokenizer):
    
    # setup model
    model = ReadabilityRobertaModelWithCustomNNHead()
    model.load_state_dict(torch.load(model_path, map_location=device))
    model.to(device)
    model.eval()
    
    # convert data into dataloader
    dataloader = get_dataloader_from_dataframes(data, tokenizer)
    
    # iteration for predictions
    predictions = list()
    for i, batch in enumerate(tqdm(dataloader)):
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        
        output = torch.flatten(model(tokens=input_ids, attention_mask=attention_mask))
        output = output.cpu().detach().numpy().tolist()
        
        predictions.extend(output)
        
    torch.cuda.empty_cache()
    return predictions

<a id='setup_svr'></a>
## SVR regressor
[[back to top]](#toc)

In [None]:
from sklearn.svm import SVR

svr_regressor = SVR(C=10, kernel='rbf', gamma='auto')

print('SVR regressor has been initialized.')

<a id='setup_ridge'></a>
## Ridge regressor
[[back to top]](#toc)

Ridge regressor: alpha = 50 is taken from [this notebook](https://www.kaggle.com/solorzano/clrp-roberta-ridge).

In [None]:
from sklearn.linear_model import Ridge

ridge_regressor = Ridge(alpha=50.0)

print('Ridge regressor has been initialized')

<a id='make_predictions'></a>
# Make predictions
[[back to top]](#toc)

<a id='make_predictions_run_models_with_regressor'></a>
## Run models with regressor
[[back to top]](#toc)

In [None]:
import numpy as np
from tqdm.auto import tqdm

from transformers import RobertaTokenizer

tqdm.pandas()


def run_all_models_with_regressor(regressor):
    predictions = np.zeros(test_data.shape[0])

    for i in tqdm(range(config['num_of_models'])):
        print(f'Model # {i}:')
        model_path = f'../input/roberta-large-k-fold-models/model{i}/model{i}.bin'
        tokenizer_path = f'../input/roberta-large-k-fold-models/model{i}/'
        tokenizer = RobertaTokenizer.from_pretrained(tokenizer_path)

        train_embeddings = get_embeddings_from_model(model_path, train_data, tokenizer)
        test_embeddings = get_embeddings_from_model(model_path, test_data, tokenizer)

        preds = get_predictions_with_regressor(train_embeddings, target, test_embeddings,
                                               bins, regressor)

        predictions = predictions + preds
        print(f'Predictions for model {i}:')
        print(preds)


    predictions = predictions / config['num_of_models']
    print('Final predictions:')
    print(predictions)
    
    return predictions

<a id='make_predictions_run_models_with_nn'></a>
## Run models with NN regressor
[[back to top]](#toc)

In [None]:
import numpy as np
from tqdm.auto import tqdm

from transformers import RobertaTokenizer

tqdm.pandas()


def run_all_models_with_custom_NN_head():
    predictions = np.zeros(test_data.shape[0])

    for i in tqdm(range(config['num_of_models'])):
        print(f'Model # {i}:')
        model_path = f'../input/roberta-large-k-fold-models/model{i}/model{i}.bin'
        tokenizer_path = f'../input/roberta-large-k-fold-models/model{i}/'
        tokenizer = RobertaTokenizer.from_pretrained(tokenizer_path)

        train_embeddings = get_embeddings_from_model(model_path, train_data, tokenizer)
        test_embeddings = get_embeddings_from_model(model_path, test_data, tokenizer)

        preds = get_predictions_with_custom_NN_head(model_path, test_data, tokenizer)

        predictions = predictions + preds
        print(f'Predictions for model {i}:')
        print(preds)


    predictions = predictions / config['num_of_models']
    print('Final predictions:')
    print(predictions)
    
    return predictions

<a id='NN'></a>
## Get predictions
[[back to top]](#toc)

In [None]:
svr_predictions = run_all_models_with_regressor(svr_regressor) 
ridge_predictions =  run_all_models_with_regressor(ridge_regressor)
custom_NN_head_predictions = run_all_models_with_custom_NN_head()

predictions = (svr_predictions + ridge_predictions + custom_NN_head_predictions) / 3

print(predictions)

<a id='save_results'></a>
# Save results
[[back to top]](#toc)

In [None]:
submission = pd.DataFrame()
submission['id'] = test_data['id']
submission['target'] = predictions
submission.to_csv('submission.csv', index=False)

print('Saved predictions.')