<a href="https://colab.research.google.com/github/nishipy/clrp/blob/main/light_weight_roberta_base_batch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Overview
This is kernel is almost the same as [Lightweight Roberta solution in PyTorch](https://www.kaggle.com/andretugan/lightweight-roberta-solution-in-pytorch), but instead of "roberta-base", it starts from [Maunish's pre-trained model](https://www.kaggle.com/maunish/clrp-roberta-base).

Acknowledgments: some ideas were taken from kernels by [Torch](https://www.kaggle.com/rhtsingh) and [Maunish](https://www.kaggle.com/maunish).

In addition, we use the [stratified_kfold train dataset](https://www.kaggle.com/takeshikobayashi/commonlit-train-datasetfor) training the model.

## Original notebook
- Lightweight Roberta solution
  - https://www.kaggle.com/andretugan/pre-trained-roberta-solution-in-pytorch
- pretraied with MLM
  - https://www.kaggle.com/maunish/clrp-pytorch-roberta-pretrain

# Prepare

## Checking GPU status

In [1]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
  print('and then re-execute this cell.')
else:
  print(gpu_info)

Sun Jul  4 09:06:26 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.27       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   50C    P0    31W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Download dataset from kaggle

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### kaggle.json

In [3]:
!mkdir -p /root/.kaggle/
!cp ./drive/MyDrive/kaggle/commonlit/kaggle.json ~/.kaggle/kaggle.json
!chmod 600 ~/.kaggle/kaggle.json

### Competition dataset

In [4]:
!mkdir -p ../input/commonlitreadabilityprize/
!kaggle competitions download -c commonlitreadabilityprize -p ../input/commonlitreadabilityprize/
!cp -f ./drive/MyDrive/kaggle/commonlit/train_stratiKfold.csv.zip ../input/commonlitreadabilityprize/

Downloading sample_submission.csv to ../input/commonlitreadabilityprize
  0% 0.00/108 [00:00<?, ?B/s]
100% 108/108 [00:00<00:00, 183kB/s]
Downloading test.csv to ../input/commonlitreadabilityprize
  0% 0.00/6.79k [00:00<?, ?B/s]
100% 6.79k/6.79k [00:00<00:00, 7.03MB/s]
Downloading train.csv.zip to ../input/commonlitreadabilityprize
  0% 0.00/1.13M [00:00<?, ?B/s]
100% 1.13M/1.13M [00:00<00:00, 77.1MB/s]


In [5]:
!unzip -o ../input/commonlitreadabilityprize/train.csv.zip -d ../input/commonlitreadabilityprize/
!unzip -o ../input/commonlitreadabilityprize/train_stratiKfold.csv.zip -d ../input/commonlitreadabilityprize/

Archive:  ../input/commonlitreadabilityprize/train.csv.zip
  inflating: ../input/commonlitreadabilityprize/train.csv  
Archive:  ../input/commonlitreadabilityprize/train_stratiKfold.csv.zip
  inflating: ../input/commonlitreadabilityprize/train_stratiKfold.csv  


In [6]:
!ls ../input/commonlitreadabilityprize/

sample_submission.csv  train.csv      train_stratiKfold.csv
test.csv	       train.csv.zip  train_stratiKfold.csv.zip


### Model pre-trained with MLM 
- Notebook
  - https://www.kaggle.com/maunish/clrp-pytorch-roberta-pretrain
- Model data
  - https://www.kaggle.com/maunish/clrp-roberta-base

In [7]:
!mkdir -p ../input/commonlitreadabilityprize/pretrained-model/
!kaggle datasets download maunish/clrp-roberta-base -p ../input/commonlitreadabilityprize/pretrained-model/

Downloading clrp-roberta-base.zip to ../input/commonlitreadabilityprize/pretrained-model
100% 3.00G/3.01G [00:24<00:00, 59.1MB/s]
100% 3.01G/3.01G [00:24<00:00, 131MB/s] 


In [8]:
!unzip -o ../input/commonlitreadabilityprize/pretrained-model/clrp-roberta-base.zip -d ../input/commonlitreadabilityprize/pretrained-model/

Archive:  ../input/commonlitreadabilityprize/pretrained-model/clrp-roberta-base.zip
  inflating: ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/config.json  
  inflating: ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/merges.txt  
  inflating: ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/pytorch_model.bin  
  inflating: ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/special_tokens_map.json  
  inflating: ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/tokenizer_config.json  
  inflating: ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/training_args.bin  
  inflating: ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/vocab.json  
  inflating: ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base_chk/checkpoint-600/config.json  
  inflating: ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base_chk/ch

# Install dependencies

In [9]:
!pip install transformers accelerate datasets

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/fd/1a/41c644c963249fd7f3836d926afa1e3f1cc234a1c40d80c5f03ad8f6f1b2/transformers-4.8.2-py3-none-any.whl (2.5MB)
[K     |████████████████████████████████| 2.5MB 4.8MB/s 
[?25hCollecting accelerate
[?25l  Downloading https://files.pythonhosted.org/packages/f7/fa/d173d923c953d930702066894abf128a7e5258c6f64cf088d2c5a83f46a3/accelerate-0.3.0-py3-none-any.whl (49kB)
[K     |████████████████████████████████| 51kB 6.7MB/s 
[?25hCollecting datasets
[?25l  Downloading https://files.pythonhosted.org/packages/08/a2/d4e1024c891506e1cee8f9d719d20831bac31cb5b7416983c4d2f65a6287/datasets-1.8.0-py3-none-any.whl (237kB)
[K     |████████████████████████████████| 245kB 42.7MB/s 
[?25hCollecting huggingface-hub==0.0.12
  Downloading https://files.pythonhosted.org/packages/2f/ee/97e253668fda9b17e968b3f97b2f8e53aa0127e8807d24a547687423fe0b/huggingface_hub-0.0.12-py3-none-any.whl
Collecting sacremoses
[?25l  Downloadin

In [10]:
import os
import math
import random
import time

import numpy as np
import pandas as pd

import torch
import torch.nn as nn
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

from transformers import AdamW
from transformers import AutoTokenizer
from transformers import AutoModel
from transformers import AutoConfig
from transformers import get_cosine_schedule_with_warmup

from sklearn.model_selection import KFold

import gc
gc.enable()

# Set constant

In [11]:
NUM_FOLDS = 5
NUM_EPOCHS = 3
BATCH_SIZE = 32
# BATCH_SIZE = 16
MAX_LEN = 248
#(eval_rmse, step_size)
EVAL_SCHEDULE = [(0.50, 32), (0.49, 16), (0.48, 8), (0.47, 4), (0.46, 2), (-1., 1)]
#EVAL_SCHEDULE = [(0.50, 16), (0.49, 8), (0.48, 4), (0.47, 2), (-1., 1)]
ROBERTA_PATH = "../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/"
TOKENIZER_PATH = "../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/"
#ROBERTA_PATH = "../input/clrp-roberta-base/clrp_roberta_base"
#TOKENIZER_PATH = "../input/clrp-roberta-base/clrp_roberta_base"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

In [12]:
EVAL_SCHEDULE[0][1]

32

# Define utility functions

In [13]:
def set_random_seed(random_seed):
    random.seed(random_seed)
    np.random.seed(random_seed)
    os.environ["PYTHONHASHSEED"] = str(random_seed)

    torch.manual_seed(random_seed)
    torch.cuda.manual_seed(random_seed)
    torch.cuda.manual_seed_all(random_seed)

    torch.backends.cudnn.deterministic = True

train_dfには、Stratified kfold済みのデータセットを利用する。

In [14]:
#Use stratified k-fold train dataset
#train_df = pd.read_csv("/kaggle/input/commonlitreadabilityprize/train.csv")
train_df = pd.read_csv("../input/commonlitreadabilityprize/train_stratiKfold.csv")

# Remove incomplete entries if any.
train_df.drop(train_df[(train_df.target == 0) & (train_df.standard_error == 0)].index,
              inplace=True)
train_df.reset_index(drop=True, inplace=True)

test_df = pd.read_csv("../input/commonlitreadabilityprize/test.csv")
submission_df = pd.read_csv("../input/commonlitreadabilityprize/sample_submission.csv")

In [15]:
#TokenizerはRoberta-baseと同じ
tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_PATH)

In [16]:
train_df[train_df['kfold']!=1]

Unnamed: 0.1,Unnamed: 0,id,url_legal,license,excerpt,target,standard_error,kfold,bins
1,1,bf24448fb,,,"Anywhere there is a frontier, where there are ...",-1.866238,0.510911,3,4
2,2,7cad0f936,,,"A great violinist, Ole Bull by name, visited t...",-0.578482,0.471768,2,6
4,4,91e87e7dc,,,Hans stopped snoring and awoke at supper-time....,-0.186015,0.492731,2,7
5,5,20a9f9032,,,The Government of the United States has viewed...,-1.391438,0.499195,4,5
6,6,daab29b47,,,Forty years ago women were given no representa...,-1.291128,0.531642,2,5
...,...,...,...,...,...,...,...,...,...
2826,2827,d25b7c3aa,https://www.commonlit.org/texts/the-center-of-...,CC BY-NC-SA 2.0,"The sun is a star, just like the other million...",-0.580631,0.457745,2,6
2828,2829,3c1662f6d,,,"It was the northwest coast of Australia, the c...",-1.678689,0.493150,2,4
2830,2831,64b635d77,https://www.africanstorybook.org/,CC BY 4.0,"Once long ago, the birds had a meeting. They w...",0.639650,0.503652,2,9
2831,2832,d6764322c,,,"As an adult, I might learn new actions by taki...",1.024258,0.549119,3,10


# Dataset

In [17]:
class LitDataset(Dataset):
    def __init__(self, df, inference_only=False):
        super().__init__()

        self.df = df        
        self.inference_only = inference_only
        self.text = df.excerpt.tolist()
        #改行を消してみる。元のNotebookではここはコメントアウトされている
        #self.text = [text.replace("\n", " ") for text in self.text]
        
        if not self.inference_only:
            self.target = torch.tensor(df.target.values, dtype=torch.float32)        
    
        self.encoded = tokenizer.batch_encode_plus(
            self.text,
            padding = 'max_length',            
            max_length = MAX_LEN,
            truncation = True,
            return_attention_mask=True
        )        
 

    def __len__(self):
        return len(self.df)

    
    def __getitem__(self, index):        
        input_ids = torch.tensor(self.encoded['input_ids'][index])
        attention_mask = torch.tensor(self.encoded['attention_mask'][index])
        
        if self.inference_only:
            return (input_ids, attention_mask)            
        else:
            target = self.target[index]
            return (input_ids, attention_mask, target)

# Model
The model is inspired by the one from [Maunish](https://www.kaggle.com/maunish/clrp-roberta-svm).

In [18]:
class LitModel(nn.Module):
    def __init__(self):
        super().__init__()

        config = AutoConfig.from_pretrained(ROBERTA_PATH)
        #config.jsonに書いてある設定値を更新する
        config.update({"output_hidden_states":True, 
                       "hidden_dropout_prob": 0.0,
                       "layer_norm_eps": 1e-7})                       
        
        self.roberta = AutoModel.from_pretrained(ROBERTA_PATH, config=config)  
            
        self.attention = nn.Sequential(            
            nn.Linear(768, 512),            
            nn.Tanh(),                       
            nn.Linear(512, 1),
            nn.Softmax(dim=1)
        )        

        self.regressor = nn.Sequential(                        
            nn.Linear(768, 1)                        
        )
        

    def forward(self, input_ids, attention_mask):
        roberta_output = self.roberta(input_ids=input_ids,
                                      attention_mask=attention_mask)        

        # There are a total of 13 layers of hidden states.
        # 1 for the embedding layer, and 12 for the 12 Roberta layers.
        # We take the hidden states from the last Roberta layer.
        last_layer_hidden_states = roberta_output.hidden_states[-1]

        # The number of cells is MAX_LEN.
        # The size of the hidden state of each cell is 768 (for roberta-base).
        # In order to condense hidden states of all cells to a context vector,
        # we compute a weighted average of the hidden states of all cells.
        # We compute the weight of each cell, using the attention neural network.
        weights = self.attention(last_layer_hidden_states)
                
        # weights.shape is BATCH_SIZE x MAX_LEN x 1
        # last_layer_hidden_states.shape is BATCH_SIZE x MAX_LEN x 768        
        # Now we compute context_vector as the weighted average.
        # context_vector.shape is BATCH_SIZE x 768
        context_vector = torch.sum(weights * last_layer_hidden_states, dim=1)        
        
        # Now we reduce the context vector to the prediction score.
        return self.regressor(context_vector)

## Define eval

In [19]:
#MSEで評価
def eval_mse(model, data_loader):
    """Evaluates the mean squared error of the |model| on |data_loader|"""
    model.eval()            
    mse_sum = 0

    with torch.no_grad():
        for batch_num, (input_ids, attention_mask, target) in enumerate(data_loader):
            input_ids = input_ids.to(DEVICE)
            attention_mask = attention_mask.to(DEVICE)                        
            target = target.to(DEVICE)           
            
            pred = model(input_ids, attention_mask)                       

            mse_sum += nn.MSELoss(reduction="sum")(pred.flatten(), target).item()
                

    return mse_sum / len(data_loader.dataset)

## Define predict

In [20]:
def predict(model, data_loader):
    """Returns an np.array with predictions of the |model| on |data_loader|"""
    model.eval()

    result = np.zeros(len(data_loader.dataset))    
    index = 0
    
    with torch.no_grad():
        for batch_num, (input_ids, attention_mask) in enumerate(data_loader):
            input_ids = input_ids.to(DEVICE)
            attention_mask = attention_mask.to(DEVICE)
                        
            pred = model(input_ids, attention_mask)                        

            result[index : index + pred.shape[0]] = pred.flatten().to("cpu")
            index += pred.shape[0]

    return result

### Define Train

In [21]:
def train(model, model_path, train_loader, val_loader,
          optimizer, scheduler=None, num_epochs=NUM_EPOCHS):    
    best_val_rmse = None
    best_epoch = 0
    step = 0
    last_eval_step = 0
    #EVAL_SCHEDULE = [(0.50, 16), (0.49, 8), (0.48, 4), (0.47, 2), (-1., 1)]
    #-> EVAL_SCHEDULE[0][1] = 16
    eval_period = EVAL_SCHEDULE[0][1]    

    start = time.time()

    #Epoch数だけ繰り返す
    for epoch in range(num_epochs):                           
        val_rmse = None         

        for batch_num, (input_ids, attention_mask, target) in enumerate(train_loader):
            input_ids = input_ids.to(DEVICE)
            attention_mask = attention_mask.to(DEVICE)            
            target = target.to(DEVICE)                        

            optimizer.zero_grad()
            
            model.train()

            pred = model(input_ids, attention_mask)
                                                        
            mse = nn.MSELoss(reduction="mean")(pred.flatten(), target)
                        
            mse.backward()

            #https://stackoverflow.com/questions/60120043/optimizer-and-scheduler-for-bert-fine-tuning
            #`optimizer.step()`の直後、`scheduler.step()`をすべてのバッチで呼び出して、学習率を更新します。
            optimizer.step()
            if scheduler:
                scheduler.step()
            
            #eval_period(初期値は16）stepごとにRMSEを評価
            if step >= last_eval_step + eval_period:
                # Evaluate the model on val_loader.
                elapsed_seconds = time.time() - start
                num_steps = step - last_eval_step
                print(f"\n{num_steps} steps took {elapsed_seconds:0.3} seconds")
                last_eval_step = step
                
                val_rmse = math.sqrt(eval_mse(model, val_loader))                            

                print(f"Epoch: {epoch} batch_num: {batch_num}", 
                      f"val_rmse: {val_rmse:0.4}")

                #EVAL_SCHEDULEに定義したrmseによって
                #eval_periodを変更する
                for rmse, period in EVAL_SCHEDULE:
                    if val_rmse >= rmse:
                        eval_period = period
                        break                               
                
                #ベストスコアを記録
                if not best_val_rmse or val_rmse < best_val_rmse:                    
                    best_val_rmse = val_rmse
                    best_epoch = epoch
                    torch.save(model.state_dict(), model_path)
                    print(f"New best_val_rmse: {best_val_rmse:0.4}")
                else:       
                    print(f"Still best_val_rmse: {best_val_rmse:0.4}",
                          f"(from epoch {best_epoch})")                                    
                    
                start = time.time()

            #stepをインクリメント                                          
            step += 1
                        
    
    return best_val_rmse

## Create Optimizer

In [22]:
def create_optimizer(model):
    named_parameters = list(model.named_parameters())    
    
    roberta_parameters = named_parameters[:197]    
    attention_parameters = named_parameters[199:203]
    regressor_parameters = named_parameters[203:]
        
    attention_group = [params for (name, params) in attention_parameters]
    regressor_group = [params for (name, params) in regressor_parameters]

    parameters = []
    parameters.append({"params": attention_group})
    parameters.append({"params": regressor_group})

    for layer_num, (name, params) in enumerate(roberta_parameters):
        weight_decay = 0.0 if "bias" in name else 0.01

        lr = 2e-5

        if layer_num >= 69:        
            lr = 5e-5

        if layer_num >= 133:
            lr = 1e-4

        parameters.append({"params": params,
                           "weight_decay": weight_decay,
                           "lr": lr})

    return AdamW(parameters)

## Run

In [23]:
gc.collect()

SEED = 1000
list_val_rmse = []

for fold in range(NUM_FOLDS): 
    print(f"\nFold {fold + 1}/{NUM_FOLDS}")
    model_path = f"model_{fold + 1}.pth"
        
    set_random_seed(SEED + fold)

    #Stratified kfold train dataset用に修正
    train_dataset = LitDataset(train_df[train_df['kfold']!=fold])    
    val_dataset = LitDataset(train_df[train_df['kfold']==fold])    
    
    #https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
    train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE,
                              drop_last=True, shuffle=True, num_workers=2)    
    val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE,
                            drop_last=False, shuffle=False, num_workers=2)    
    
    #random_seedは、Foldごとに変わる
    set_random_seed(SEED + fold)    
    
    model = LitModel().to(DEVICE)
    
    optimizer = create_optimizer(model)
    #Schedulerには、get_cosine_schedule_with_warmupを使っている
    #その他の選択肢: https://huggingface.co/transformers/main_classes/optimizer_schedules.html#schedules                        
    scheduler = get_cosine_schedule_with_warmup(
        optimizer,
        num_training_steps=NUM_EPOCHS * len(train_loader),
        num_warmup_steps=50)    
    
    list_val_rmse.append(train(model, model_path, train_loader,
                               val_loader, optimizer, scheduler=scheduler))

    del model
    gc.collect()
    
    print("\nPerformance estimates:")
    print(list_val_rmse)
    print("Mean:", np.array(list_val_rmse).mean())

    


Fold 1/5


Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.decoder.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.bias', 'roberta


32 steps took 24.9 seconds
Epoch: 0 batch_num: 32 val_rmse: 0.7311
New best_val_rmse: 0.7311

32 steps took 23.8 seconds
Epoch: 0 batch_num: 64 val_rmse: 0.5772
New best_val_rmse: 0.5772

32 steps took 24.0 seconds
Epoch: 1 batch_num: 26 val_rmse: 0.5178
New best_val_rmse: 0.5178

32 steps took 23.8 seconds
Epoch: 1 batch_num: 58 val_rmse: 0.4914
New best_val_rmse: 0.4914

16 steps took 12.1 seconds
Epoch: 2 batch_num: 4 val_rmse: 0.5145
Still best_val_rmse: 0.4914 (from epoch 1)

32 steps took 23.8 seconds
Epoch: 2 batch_num: 36 val_rmse: 0.4763
New best_val_rmse: 0.4763

4 steps took 2.99 seconds
Epoch: 2 batch_num: 40 val_rmse: 0.4755
New best_val_rmse: 0.4755

4 steps took 2.98 seconds
Epoch: 2 batch_num: 44 val_rmse: 0.4753
New best_val_rmse: 0.4753

4 steps took 3.0 seconds
Epoch: 2 batch_num: 48 val_rmse: 0.4767
Still best_val_rmse: 0.4753 (from epoch 2)

4 steps took 2.99 seconds
Epoch: 2 batch_num: 52 val_rmse: 0.4765
Still best_val_rmse: 0.4753 (from epoch 2)

4 steps took 2

Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.decoder.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.bias', 'roberta


32 steps took 24.7 seconds
Epoch: 0 batch_num: 32 val_rmse: 0.7327
New best_val_rmse: 0.7327

32 steps took 23.8 seconds
Epoch: 0 batch_num: 64 val_rmse: 0.5808
New best_val_rmse: 0.5808

32 steps took 24.0 seconds
Epoch: 1 batch_num: 26 val_rmse: 0.5305
New best_val_rmse: 0.5305

32 steps took 23.9 seconds
Epoch: 1 batch_num: 58 val_rmse: 0.5399
Still best_val_rmse: 0.5305 (from epoch 1)

32 steps took 24.0 seconds
Epoch: 2 batch_num: 20 val_rmse: 0.5026
New best_val_rmse: 0.5026

32 steps took 23.9 seconds
Epoch: 2 batch_num: 52 val_rmse: 0.491
New best_val_rmse: 0.491

16 steps took 11.9 seconds
Epoch: 2 batch_num: 68 val_rmse: 0.4915
Still best_val_rmse: 0.491 (from epoch 2)

Performance estimates:
[0.47532520409721, 0.49095675664899885]
Mean: 0.48314098037310443

Fold 3/5


Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.decoder.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.bias', 'roberta


32 steps took 24.8 seconds
Epoch: 0 batch_num: 32 val_rmse: 0.6453
New best_val_rmse: 0.6453

32 steps took 23.9 seconds
Epoch: 0 batch_num: 64 val_rmse: 0.6176
New best_val_rmse: 0.6176

32 steps took 24.1 seconds
Epoch: 1 batch_num: 26 val_rmse: 0.5525
New best_val_rmse: 0.5525

32 steps took 23.9 seconds
Epoch: 1 batch_num: 58 val_rmse: 0.5413
New best_val_rmse: 0.5413

32 steps took 24.1 seconds
Epoch: 2 batch_num: 20 val_rmse: 0.4977
New best_val_rmse: 0.4977

16 steps took 11.9 seconds
Epoch: 2 batch_num: 36 val_rmse: 0.482
New best_val_rmse: 0.482

8 steps took 5.98 seconds
Epoch: 2 batch_num: 44 val_rmse: 0.4694
New best_val_rmse: 0.4694

2 steps took 1.5 seconds
Epoch: 2 batch_num: 46 val_rmse: 0.4688
New best_val_rmse: 0.4688

2 steps took 1.49 seconds
Epoch: 2 batch_num: 48 val_rmse: 0.4686
New best_val_rmse: 0.4686

2 steps took 1.49 seconds
Epoch: 2 batch_num: 50 val_rmse: 0.4686
Still best_val_rmse: 0.4686 (from epoch 2)

2 steps took 1.49 seconds
Epoch: 2 batch_num: 52 

Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.decoder.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.bias', 'roberta


32 steps took 24.8 seconds
Epoch: 0 batch_num: 32 val_rmse: 0.6801
New best_val_rmse: 0.6801

32 steps took 23.8 seconds
Epoch: 0 batch_num: 64 val_rmse: 0.6027
New best_val_rmse: 0.6027

32 steps took 24.1 seconds
Epoch: 1 batch_num: 26 val_rmse: 0.5339
New best_val_rmse: 0.5339

32 steps took 23.8 seconds
Epoch: 1 batch_num: 58 val_rmse: 0.5293
New best_val_rmse: 0.5293

32 steps took 24.0 seconds
Epoch: 2 batch_num: 20 val_rmse: 0.48
New best_val_rmse: 0.48

8 steps took 5.97 seconds
Epoch: 2 batch_num: 28 val_rmse: 0.4761
New best_val_rmse: 0.4761

4 steps took 2.99 seconds
Epoch: 2 batch_num: 32 val_rmse: 0.472
New best_val_rmse: 0.472

4 steps took 2.98 seconds
Epoch: 2 batch_num: 36 val_rmse: 0.4701
New best_val_rmse: 0.4701

4 steps took 2.98 seconds
Epoch: 2 batch_num: 40 val_rmse: 0.4846
Still best_val_rmse: 0.4701 (from epoch 2)

8 steps took 5.97 seconds
Epoch: 2 batch_num: 48 val_rmse: 0.4722
Still best_val_rmse: 0.4701 (from epoch 2)

4 steps took 2.98 seconds
Epoch: 2 b

Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.decoder.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.bias', 'roberta


32 steps took 24.8 seconds
Epoch: 0 batch_num: 32 val_rmse: 0.718
New best_val_rmse: 0.718

32 steps took 23.9 seconds
Epoch: 0 batch_num: 64 val_rmse: 0.6327
New best_val_rmse: 0.6327

32 steps took 24.1 seconds
Epoch: 1 batch_num: 26 val_rmse: 0.5429
New best_val_rmse: 0.5429

32 steps took 23.9 seconds
Epoch: 1 batch_num: 58 val_rmse: 0.5321
New best_val_rmse: 0.5321

32 steps took 24.0 seconds
Epoch: 2 batch_num: 20 val_rmse: 0.5341
Still best_val_rmse: 0.5321 (from epoch 1)

32 steps took 23.9 seconds
Epoch: 2 batch_num: 52 val_rmse: 0.5219
New best_val_rmse: 0.5219

Performance estimates:
[0.47532520409721, 0.49095675664899885, 0.4685706614425472, 0.47010970041040945, 0.5219292142015153]
Mean: 0.48537830736013615


# Inference

In [24]:
test_dataset = LitDataset(test_df, inference_only=True)

In [25]:
all_predictions = np.zeros((len(list_val_rmse), len(test_df)))

test_dataset = LitDataset(test_df, inference_only=True)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE,
                         drop_last=False, shuffle=False, num_workers=2)

for index in range(len(list_val_rmse)):            
    model_path = f"model_{index + 1}.pth"
    print(f"\nUsing {model_path}")
                        
    model = LitModel()
    model.load_state_dict(torch.load(model_path))    
    model.to(DEVICE)
    
    all_predictions[index] = predict(model, test_loader)
    
    del model
    gc.collect()


Using model_1.pth


Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.decoder.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.bias', 'roberta


Using model_2.pth


Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.decoder.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.bias', 'roberta


Using model_3.pth


Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.decoder.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.bias', 'roberta


Using model_4.pth


Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.decoder.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.bias', 'roberta


Using model_5.pth


Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.decoder.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.bias', 'roberta

In [26]:
predictions = all_predictions.mean(axis=0)
submission_df.target = predictions
print(submission_df)
submission_df.to_csv("submission.csv", index=False)

          id    target
0  c0f722661 -0.475947
1  f0953f0a5 -0.526951
2  0df072751 -0.400781
3  04caf4e0c -2.507379
4  0e63f8bea -1.775736
5  12537fe78 -1.348232
6  965e592c0  0.289456


# Upload data

In [27]:
!date +"%Y%m%d%I%M%S"

20210704092830


In [30]:
!mkdir -p ./output/
!cp -f ./model* ./output/
!cp -f ./drive/MyDrive/kaggle/commonlit/Lightweight-Roberta-base/dataset-metadata-batch.json ./output/dataset-metadata.json
!sed -i -e "s/lightweight-roberta-base/lightweight-roberta-base-`date +"%Y%m%d%I%M%S"`/" ./output/dataset-metadata.json
!cat ./output/dataset-metadata.json
!kaggle datasets create -p ./output/

{
  "licenses": [
    {
      "name": "CC0-1.0"
    }
  ], 
  "id": "iamnishipy/lightweight-roberta-base-20210704095435-batch", 
  "title": "Lightweight-Roberta-base with changing batch"
}Starting upload for file model_3.pth
100% 477M/477M [00:38<00:00, 13.1MB/s]
Upload successful: model_3.pth (477MB)
Starting upload for file model_5.pth
100% 477M/477M [00:35<00:00, 13.9MB/s]
Upload successful: model_5.pth (477MB)
Starting upload for file model_4.pth
100% 477M/477M [00:32<00:00, 15.4MB/s]
Upload successful: model_4.pth (477MB)
Starting upload for file model_2.pth
100% 477M/477M [00:38<00:00, 13.0MB/s]
Upload successful: model_2.pth (477MB)
Starting upload for file dataset-metadata-batch.json
100% 173/173 [00:06<00:00, 25.9B/s]
Upload successful: dataset-metadata-batch.json (173B)
Starting upload for file model_1.pth
100% 477M/477M [00:33<00:00, 14.8MB/s]
Upload successful: model_1.pth (477MB)
Your private Dataset is being created. Please check progress at /api/v1/datasets/status//iamni