<a href="https://colab.research.google.com/github/nishipy/clrp/blob/main/light_weight_roberta_base_baseline_epoch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Overview
This is kernel is almost the same as [Lightweight Roberta solution in PyTorch](https://www.kaggle.com/andretugan/lightweight-roberta-solution-in-pytorch), but instead of "roberta-base", it starts from [Maunish's pre-trained model](https://www.kaggle.com/maunish/clrp-roberta-base).

Acknowledgments: some ideas were taken from kernels by [Torch](https://www.kaggle.com/rhtsingh) and [Maunish](https://www.kaggle.com/maunish).

In addition, we use the [stratified_kfold train dataset](https://www.kaggle.com/takeshikobayashi/commonlit-train-datasetfor) training the model.

## Original notebook
- Lightweight Roberta solution
  - https://www.kaggle.com/andretugan/pre-trained-roberta-solution-in-pytorch
- pretraied with MLM
  - https://www.kaggle.com/maunish/clrp-pytorch-roberta-pretrain

# Prepare

## Checking GPU status

In [32]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
  print('and then re-execute this cell.')
else:
  print(gpu_info)

Sun Jul  4 11:45:10 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.27       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   41C    P0    35W / 250W |   7491MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Download dataset from kaggle

In [33]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### kaggle.json

In [34]:
!mkdir -p /root/.kaggle/
!cp ./drive/MyDrive/kaggle/commonlit/kaggle.json ~/.kaggle/kaggle.json
!chmod 600 ~/.kaggle/kaggle.json

### Competition dataset

In [35]:
!mkdir -p ../input/commonlitreadabilityprize/
!kaggle competitions download -c commonlitreadabilityprize -p ../input/commonlitreadabilityprize/
!cp -f ./drive/MyDrive/kaggle/commonlit/train_stratiKfold.csv.zip ../input/commonlitreadabilityprize/

sample_submission.csv: Skipping, found more recently modified local copy (use --force to force download)
test.csv: Skipping, found more recently modified local copy (use --force to force download)
train.csv.zip: Skipping, found more recently modified local copy (use --force to force download)


In [36]:
!unzip -o ../input/commonlitreadabilityprize/train.csv.zip -d ../input/commonlitreadabilityprize/
!unzip -o ../input/commonlitreadabilityprize/train_stratiKfold.csv.zip -d ../input/commonlitreadabilityprize/

Archive:  ../input/commonlitreadabilityprize/train.csv.zip
  inflating: ../input/commonlitreadabilityprize/train.csv  
Archive:  ../input/commonlitreadabilityprize/train_stratiKfold.csv.zip
  inflating: ../input/commonlitreadabilityprize/train_stratiKfold.csv  


In [37]:
!ls ../input/commonlitreadabilityprize/

pretrained-model       train.csv	      train_stratiKfold.csv.zip
sample_submission.csv  train.csv.zip
test.csv	       train_stratiKfold.csv


### Model pre-trained with MLM 
- Notebook
  - https://www.kaggle.com/maunish/clrp-pytorch-roberta-pretrain
- Model data
  - https://www.kaggle.com/maunish/clrp-roberta-base

In [38]:
!mkdir -p ../input/commonlitreadabilityprize/pretrained-model/
!kaggle datasets download maunish/clrp-roberta-base -p ../input/commonlitreadabilityprize/pretrained-model/

clrp-roberta-base.zip: Skipping, found more recently modified local copy (use --force to force download)


In [39]:
!unzip -o ../input/commonlitreadabilityprize/pretrained-model/clrp-roberta-base.zip -d ../input/commonlitreadabilityprize/pretrained-model/

Archive:  ../input/commonlitreadabilityprize/pretrained-model/clrp-roberta-base.zip
  inflating: ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/config.json  
  inflating: ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/merges.txt  
  inflating: ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/pytorch_model.bin  
  inflating: ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/special_tokens_map.json  
  inflating: ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/tokenizer_config.json  
  inflating: ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/training_args.bin  
  inflating: ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/vocab.json  
  inflating: ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base_chk/checkpoint-600/config.json  
  inflating: ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base_chk/ch

# Install dependencies

In [40]:
!pip install transformers accelerate datasets



In [41]:
import os
import math
import random
import time

import numpy as np
import pandas as pd

import torch
import torch.nn as nn
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

from transformers import AdamW
from transformers import AutoTokenizer
from transformers import AutoModel
from transformers import AutoConfig
from transformers import get_cosine_schedule_with_warmup

from sklearn.model_selection import KFold

import gc
gc.enable()

# Set constant

In [42]:
NUM_FOLDS = 5
NUM_EPOCHS = 10
#NUM_EPOCHS = 3
BATCH_SIZE = 16
MAX_LEN = 248
#(eval_rmse, step_size)
EVAL_SCHEDULE = [(0.50, 16), (0.49, 8), (0.48, 4), (0.47, 2), (-1., 1)]
ROBERTA_PATH = "../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/"
TOKENIZER_PATH = "../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/"
#ROBERTA_PATH = "../input/clrp-roberta-base/clrp_roberta_base"
#TOKENIZER_PATH = "../input/clrp-roberta-base/clrp_roberta_base"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

In [43]:
EVAL_SCHEDULE[0][1]

16

# Define utility functions

In [44]:
def set_random_seed(random_seed):
    random.seed(random_seed)
    np.random.seed(random_seed)
    os.environ["PYTHONHASHSEED"] = str(random_seed)

    torch.manual_seed(random_seed)
    torch.cuda.manual_seed(random_seed)
    torch.cuda.manual_seed_all(random_seed)

    torch.backends.cudnn.deterministic = True

train_dfには、Stratified kfold済みのデータセットを利用する。

In [45]:
#Use stratified k-fold train dataset
#train_df = pd.read_csv("/kaggle/input/commonlitreadabilityprize/train.csv")
train_df = pd.read_csv("../input/commonlitreadabilityprize/train_stratiKfold.csv")

# Remove incomplete entries if any.
train_df.drop(train_df[(train_df.target == 0) & (train_df.standard_error == 0)].index,
              inplace=True)
train_df.reset_index(drop=True, inplace=True)

test_df = pd.read_csv("../input/commonlitreadabilityprize/test.csv")
submission_df = pd.read_csv("../input/commonlitreadabilityprize/sample_submission.csv")

In [46]:
#TokenizerはRoberta-baseと同じ
tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_PATH)

In [47]:
train_df[train_df['kfold']!=1]

Unnamed: 0.1,Unnamed: 0,id,url_legal,license,excerpt,target,standard_error,kfold,bins
1,1,bf24448fb,,,"Anywhere there is a frontier, where there are ...",-1.866238,0.510911,3,4
2,2,7cad0f936,,,"A great violinist, Ole Bull by name, visited t...",-0.578482,0.471768,2,6
4,4,91e87e7dc,,,Hans stopped snoring and awoke at supper-time....,-0.186015,0.492731,2,7
5,5,20a9f9032,,,The Government of the United States has viewed...,-1.391438,0.499195,4,5
6,6,daab29b47,,,Forty years ago women were given no representa...,-1.291128,0.531642,2,5
...,...,...,...,...,...,...,...,...,...
2826,2827,d25b7c3aa,https://www.commonlit.org/texts/the-center-of-...,CC BY-NC-SA 2.0,"The sun is a star, just like the other million...",-0.580631,0.457745,2,6
2828,2829,3c1662f6d,,,"It was the northwest coast of Australia, the c...",-1.678689,0.493150,2,4
2830,2831,64b635d77,https://www.africanstorybook.org/,CC BY 4.0,"Once long ago, the birds had a meeting. They w...",0.639650,0.503652,2,9
2831,2832,d6764322c,,,"As an adult, I might learn new actions by taki...",1.024258,0.549119,3,10


# Dataset

In [48]:
class LitDataset(Dataset):
    def __init__(self, df, inference_only=False):
        super().__init__()

        self.df = df        
        self.inference_only = inference_only
        self.text = df.excerpt.tolist()
        #改行を消してみる。元のNotebookではここはコメントアウトされている
        #self.text = [text.replace("\n", " ") for text in self.text]
        
        if not self.inference_only:
            self.target = torch.tensor(df.target.values, dtype=torch.float32)        
    
        self.encoded = tokenizer.batch_encode_plus(
            self.text,
            padding = 'max_length',            
            max_length = MAX_LEN,
            truncation = True,
            return_attention_mask=True
        )        
 

    def __len__(self):
        return len(self.df)

    
    def __getitem__(self, index):        
        input_ids = torch.tensor(self.encoded['input_ids'][index])
        attention_mask = torch.tensor(self.encoded['attention_mask'][index])
        
        if self.inference_only:
            return (input_ids, attention_mask)            
        else:
            target = self.target[index]
            return (input_ids, attention_mask, target)

# Model
The model is inspired by the one from [Maunish](https://www.kaggle.com/maunish/clrp-roberta-svm).

In [49]:
class LitModel(nn.Module):
    def __init__(self):
        super().__init__()

        config = AutoConfig.from_pretrained(ROBERTA_PATH)
        #config.jsonに書いてある設定値を更新する
        config.update({"output_hidden_states":True, 
                       "hidden_dropout_prob": 0.0,
                       "layer_norm_eps": 1e-7})                       
        
        self.roberta = AutoModel.from_pretrained(ROBERTA_PATH, config=config)  
            
        self.attention = nn.Sequential(            
            nn.Linear(768, 512),            
            nn.Tanh(),                       
            nn.Linear(512, 1),
            nn.Softmax(dim=1)
        )        

        self.regressor = nn.Sequential(                        
            nn.Linear(768, 1)                        
        )
        

    def forward(self, input_ids, attention_mask):
        roberta_output = self.roberta(input_ids=input_ids,
                                      attention_mask=attention_mask)        

        # There are a total of 13 layers of hidden states.
        # 1 for the embedding layer, and 12 for the 12 Roberta layers.
        # We take the hidden states from the last Roberta layer.
        last_layer_hidden_states = roberta_output.hidden_states[-1]

        # The number of cells is MAX_LEN.
        # The size of the hidden state of each cell is 768 (for roberta-base).
        # In order to condense hidden states of all cells to a context vector,
        # we compute a weighted average of the hidden states of all cells.
        # We compute the weight of each cell, using the attention neural network.
        weights = self.attention(last_layer_hidden_states)
                
        # weights.shape is BATCH_SIZE x MAX_LEN x 1
        # last_layer_hidden_states.shape is BATCH_SIZE x MAX_LEN x 768        
        # Now we compute context_vector as the weighted average.
        # context_vector.shape is BATCH_SIZE x 768
        context_vector = torch.sum(weights * last_layer_hidden_states, dim=1)        
        
        # Now we reduce the context vector to the prediction score.
        return self.regressor(context_vector)

## Define eval

In [50]:
#MSEで評価
def eval_mse(model, data_loader):
    """Evaluates the mean squared error of the |model| on |data_loader|"""
    model.eval()            
    mse_sum = 0

    with torch.no_grad():
        for batch_num, (input_ids, attention_mask, target) in enumerate(data_loader):
            input_ids = input_ids.to(DEVICE)
            attention_mask = attention_mask.to(DEVICE)                        
            target = target.to(DEVICE)           
            
            pred = model(input_ids, attention_mask)                       

            mse_sum += nn.MSELoss(reduction="sum")(pred.flatten(), target).item()
                

    return mse_sum / len(data_loader.dataset)

## Define predict

In [51]:
def predict(model, data_loader):
    """Returns an np.array with predictions of the |model| on |data_loader|"""
    model.eval()

    result = np.zeros(len(data_loader.dataset))    
    index = 0
    
    with torch.no_grad():
        for batch_num, (input_ids, attention_mask) in enumerate(data_loader):
            input_ids = input_ids.to(DEVICE)
            attention_mask = attention_mask.to(DEVICE)
                        
            pred = model(input_ids, attention_mask)                        

            result[index : index + pred.shape[0]] = pred.flatten().to("cpu")
            index += pred.shape[0]

    return result

### Define Train

In [52]:
def train(model, model_path, train_loader, val_loader,
          optimizer, scheduler=None, num_epochs=NUM_EPOCHS):    
    best_val_rmse = None
    best_epoch = 0
    step = 0
    last_eval_step = 0
    #EVAL_SCHEDULE = [(0.50, 16), (0.49, 8), (0.48, 4), (0.47, 2), (-1., 1)]
    #-> EVAL_SCHEDULE[0][1] = 16
    eval_period = EVAL_SCHEDULE[0][1]    

    start = time.time()

    #Epoch数だけ繰り返す
    for epoch in range(num_epochs):                           
        val_rmse = None         

        for batch_num, (input_ids, attention_mask, target) in enumerate(train_loader):
            input_ids = input_ids.to(DEVICE)
            attention_mask = attention_mask.to(DEVICE)            
            target = target.to(DEVICE)                        

            optimizer.zero_grad()
            
            model.train()

            pred = model(input_ids, attention_mask)
                                                        
            mse = nn.MSELoss(reduction="mean")(pred.flatten(), target)
                        
            mse.backward()

            #https://stackoverflow.com/questions/60120043/optimizer-and-scheduler-for-bert-fine-tuning
            #`optimizer.step()`の直後、`scheduler.step()`をすべてのバッチで呼び出して、学習率を更新します。
            optimizer.step()
            if scheduler:
                scheduler.step()
            
            #eval_period(初期値は16）stepごとにRMSEを評価
            if step >= last_eval_step + eval_period:
                # Evaluate the model on val_loader.
                elapsed_seconds = time.time() - start
                num_steps = step - last_eval_step
                print(f"\n{num_steps} steps took {elapsed_seconds:0.3} seconds")
                last_eval_step = step
                
                val_rmse = math.sqrt(eval_mse(model, val_loader))                            

                print(f"Epoch: {epoch} batch_num: {batch_num}", 
                      f"val_rmse: {val_rmse:0.4}")

                #EVAL_SCHEDULEに定義したrmseによって
                #eval_periodを変更する
                for rmse, period in EVAL_SCHEDULE:
                    if val_rmse >= rmse:
                        eval_period = period
                        break                               
                
                #ベストスコアを記録
                if not best_val_rmse or val_rmse < best_val_rmse:                    
                    best_val_rmse = val_rmse
                    best_epoch = epoch
                    torch.save(model.state_dict(), model_path)
                    print(f"New best_val_rmse: {best_val_rmse:0.4}")
                else:       
                    print(f"Still best_val_rmse: {best_val_rmse:0.4}",
                          f"(from epoch {best_epoch})")                                    
                    
                start = time.time()

            #stepをインクリメント                                          
            step += 1
                        
    
    return best_val_rmse

## Create Optimizer

In [53]:
def create_optimizer(model):
    named_parameters = list(model.named_parameters())    
    
    roberta_parameters = named_parameters[:197]    
    attention_parameters = named_parameters[199:203]
    regressor_parameters = named_parameters[203:]
        
    attention_group = [params for (name, params) in attention_parameters]
    regressor_group = [params for (name, params) in regressor_parameters]

    parameters = []
    parameters.append({"params": attention_group})
    parameters.append({"params": regressor_group})

    for layer_num, (name, params) in enumerate(roberta_parameters):
        weight_decay = 0.0 if "bias" in name else 0.01

        lr = 2e-5

        if layer_num >= 69:        
            lr = 5e-5

        if layer_num >= 133:
            lr = 1e-4

        parameters.append({"params": params,
                           "weight_decay": weight_decay,
                           "lr": lr})

    return AdamW(parameters)

## Run

In [54]:
gc.collect()

SEED = 1000
list_val_rmse = []

for fold in range(NUM_FOLDS): 
    print(f"\nFold {fold + 1}/{NUM_FOLDS}")
    model_path = f"model_{fold + 1}.pth"
        
    set_random_seed(SEED + fold)

    #Stratified kfold train dataset用に修正
    train_dataset = LitDataset(train_df[train_df['kfold']!=fold])    
    val_dataset = LitDataset(train_df[train_df['kfold']==fold])    
    
    #https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
    train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE,
                              drop_last=True, shuffle=True, num_workers=2)    
    val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE,
                            drop_last=False, shuffle=False, num_workers=2)    
    
    #random_seedは、Foldごとに変わる
    set_random_seed(SEED + fold)    
    
    model = LitModel().to(DEVICE)
    
    optimizer = create_optimizer(model)
    #Schedulerには、get_cosine_schedule_with_warmupを使っている
    #その他の選択肢: https://huggingface.co/transformers/main_classes/optimizer_schedules.html#schedules                        
    scheduler = get_cosine_schedule_with_warmup(
        optimizer,
        num_training_steps=NUM_EPOCHS * len(train_loader),
        num_warmup_steps=50)    
    
    list_val_rmse.append(train(model, model_path, train_loader,
                               val_loader, optimizer, scheduler=scheduler))

    del model
    gc.collect()
    
    print("\nPerformance estimates:")
    print(list_val_rmse)
    print("Mean:", np.array(list_val_rmse).mean())

    


Fold 1/5


Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.bias', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.weight', 'rober


16 steps took 7.01 seconds
Epoch: 0 batch_num: 16 val_rmse: 0.9574
New best_val_rmse: 0.9574

16 steps took 6.33 seconds
Epoch: 0 batch_num: 32 val_rmse: 0.7136
New best_val_rmse: 0.7136

16 steps took 6.33 seconds
Epoch: 0 batch_num: 48 val_rmse: 0.6355
New best_val_rmse: 0.6355

16 steps took 6.33 seconds
Epoch: 0 batch_num: 64 val_rmse: 0.6183
New best_val_rmse: 0.6183

16 steps took 6.33 seconds
Epoch: 0 batch_num: 80 val_rmse: 0.6518
Still best_val_rmse: 0.6183 (from epoch 0)

16 steps took 6.34 seconds
Epoch: 0 batch_num: 96 val_rmse: 0.5687
New best_val_rmse: 0.5687

16 steps took 6.33 seconds
Epoch: 0 batch_num: 112 val_rmse: 0.593
Still best_val_rmse: 0.5687 (from epoch 0)

16 steps took 6.33 seconds
Epoch: 0 batch_num: 128 val_rmse: 0.4965
New best_val_rmse: 0.4965

8 steps took 3.18 seconds
Epoch: 0 batch_num: 136 val_rmse: 0.488
New best_val_rmse: 0.488

4 steps took 1.58 seconds
Epoch: 0 batch_num: 140 val_rmse: 0.5306
Still best_val_rmse: 0.488 (from epoch 0)

16 steps t

Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.bias', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.weight', 'rober


16 steps took 6.87 seconds
Epoch: 0 batch_num: 16 val_rmse: 1.027
New best_val_rmse: 1.027

16 steps took 6.33 seconds
Epoch: 0 batch_num: 32 val_rmse: 0.7094
New best_val_rmse: 0.7094

16 steps took 6.33 seconds
Epoch: 0 batch_num: 48 val_rmse: 0.8308
Still best_val_rmse: 0.7094 (from epoch 0)

16 steps took 6.33 seconds
Epoch: 0 batch_num: 64 val_rmse: 0.696
New best_val_rmse: 0.696

16 steps took 6.33 seconds
Epoch: 0 batch_num: 80 val_rmse: 0.6487
New best_val_rmse: 0.6487

16 steps took 6.33 seconds
Epoch: 0 batch_num: 96 val_rmse: 0.5595
New best_val_rmse: 0.5595

16 steps took 6.33 seconds
Epoch: 0 batch_num: 112 val_rmse: 0.5961
Still best_val_rmse: 0.5595 (from epoch 0)

16 steps took 6.33 seconds
Epoch: 0 batch_num: 128 val_rmse: 0.5302
New best_val_rmse: 0.5302

16 steps took 6.51 seconds
Epoch: 1 batch_num: 3 val_rmse: 0.5278
New best_val_rmse: 0.5278

16 steps took 6.33 seconds
Epoch: 1 batch_num: 19 val_rmse: 0.5715
Still best_val_rmse: 0.5278 (from epoch 1)

16 steps to

Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.bias', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.weight', 'rober


16 steps took 6.87 seconds
Epoch: 0 batch_num: 16 val_rmse: 0.9434
New best_val_rmse: 0.9434

16 steps took 6.34 seconds
Epoch: 0 batch_num: 32 val_rmse: 0.8244
New best_val_rmse: 0.8244

16 steps took 6.33 seconds
Epoch: 0 batch_num: 48 val_rmse: 0.6613
New best_val_rmse: 0.6613

16 steps took 6.34 seconds
Epoch: 0 batch_num: 64 val_rmse: 0.582
New best_val_rmse: 0.582

16 steps took 6.33 seconds
Epoch: 0 batch_num: 80 val_rmse: 0.6072
Still best_val_rmse: 0.582 (from epoch 0)

16 steps took 6.33 seconds
Epoch: 0 batch_num: 96 val_rmse: 0.5693
New best_val_rmse: 0.5693

16 steps took 6.34 seconds
Epoch: 0 batch_num: 112 val_rmse: 0.5173
New best_val_rmse: 0.5173

16 steps took 6.33 seconds
Epoch: 0 batch_num: 128 val_rmse: 0.5574
Still best_val_rmse: 0.5173 (from epoch 0)

16 steps took 6.51 seconds
Epoch: 1 batch_num: 3 val_rmse: 0.4899
New best_val_rmse: 0.4899

4 steps took 1.58 seconds
Epoch: 1 batch_num: 7 val_rmse: 0.4977
Still best_val_rmse: 0.4899 (from epoch 1)

8 steps took

Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.bias', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.weight', 'rober


16 steps took 6.91 seconds
Epoch: 0 batch_num: 16 val_rmse: 1.059
New best_val_rmse: 1.059

16 steps took 6.33 seconds
Epoch: 0 batch_num: 32 val_rmse: 0.9468
New best_val_rmse: 0.9468

16 steps took 6.34 seconds
Epoch: 0 batch_num: 48 val_rmse: 0.6678
New best_val_rmse: 0.6678

16 steps took 6.34 seconds
Epoch: 0 batch_num: 64 val_rmse: 0.7193
Still best_val_rmse: 0.6678 (from epoch 0)

16 steps took 6.33 seconds
Epoch: 0 batch_num: 80 val_rmse: 0.5942
New best_val_rmse: 0.5942

16 steps took 6.34 seconds
Epoch: 0 batch_num: 96 val_rmse: 0.571
New best_val_rmse: 0.571

16 steps took 6.34 seconds
Epoch: 0 batch_num: 112 val_rmse: 0.6096
Still best_val_rmse: 0.571 (from epoch 0)

16 steps took 6.34 seconds
Epoch: 0 batch_num: 128 val_rmse: 0.5144
New best_val_rmse: 0.5144

16 steps took 6.55 seconds
Epoch: 1 batch_num: 3 val_rmse: 0.5789
Still best_val_rmse: 0.5144 (from epoch 0)

16 steps took 6.34 seconds
Epoch: 1 batch_num: 19 val_rmse: 0.4895
New best_val_rmse: 0.4895

4 steps took

Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.bias', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.weight', 'rober


16 steps took 6.91 seconds
Epoch: 0 batch_num: 16 val_rmse: 0.8846
New best_val_rmse: 0.8846

16 steps took 6.34 seconds
Epoch: 0 batch_num: 32 val_rmse: 0.7855
New best_val_rmse: 0.7855

16 steps took 6.34 seconds
Epoch: 0 batch_num: 48 val_rmse: 0.6569
New best_val_rmse: 0.6569

16 steps took 6.34 seconds
Epoch: 0 batch_num: 64 val_rmse: 0.6468
New best_val_rmse: 0.6468

16 steps took 6.34 seconds
Epoch: 0 batch_num: 80 val_rmse: 0.6375
New best_val_rmse: 0.6375

16 steps took 6.34 seconds
Epoch: 0 batch_num: 96 val_rmse: 0.5701
New best_val_rmse: 0.5701

16 steps took 6.34 seconds
Epoch: 0 batch_num: 112 val_rmse: 0.5739
Still best_val_rmse: 0.5701 (from epoch 0)

16 steps took 6.33 seconds
Epoch: 0 batch_num: 128 val_rmse: 0.5414
New best_val_rmse: 0.5414

16 steps took 6.56 seconds
Epoch: 1 batch_num: 3 val_rmse: 0.5268
New best_val_rmse: 0.5268

16 steps took 6.34 seconds
Epoch: 1 batch_num: 19 val_rmse: 0.6247
Still best_val_rmse: 0.5268 (from epoch 1)

16 steps took 6.33 secon

# Inference

In [55]:
test_dataset = LitDataset(test_df, inference_only=True)

In [56]:
all_predictions = np.zeros((len(list_val_rmse), len(test_df)))

test_dataset = LitDataset(test_df, inference_only=True)
test_loader = DataLoader(test_dataset, batch_size=BATCH_SIZE,
                         drop_last=False, shuffle=False, num_workers=2)

for index in range(len(list_val_rmse)):            
    model_path = f"model_{index + 1}.pth"
    print(f"\nUsing {model_path}")
                        
    model = LitModel()
    model.load_state_dict(torch.load(model_path))    
    model.to(DEVICE)
    
    all_predictions[index] = predict(model, test_loader)
    
    del model
    gc.collect()


Using model_1.pth


Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.bias', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.weight', 'rober


Using model_2.pth


Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.bias', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.weight', 'rober


Using model_3.pth


Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.bias', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.weight', 'rober


Using model_4.pth


Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.bias', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.weight', 'rober


Using model_5.pth


Some weights of the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.bias', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaModel were not initialized from the model checkpoint at ../input/commonlitreadabilityprize/pretrained-model/clrp_roberta_base/ and are newly initialized: ['roberta.pooler.dense.weight', 'rober

In [None]:
predictions = all_predictions.mean(axis=0)
submission_df.target = predictions
print(submission_df)
submission_df.to_csv("submission.csv", index=False)

# Upload data

In [59]:
!mkdir -p ./output/
!cp -f ./model* ./output/
#CHANGEME
!cp -f ./drive/MyDrive/kaggle/commonlit/Lightweight-Roberta-base/dataset-metadata-epoch.json ./output/dataset-metadata.json
!sed -i -e "s/lightweight-roberta-base/lightweight-roberta-base-`TZ=JST-9 date +"%Y%m%d%H%M%S"`/" ./output/dataset-metadata.json
!sed -i -e "s/Lightweight-Roberta-base/Roberta-base-`TZ=JST-9 date +"%m%d%H%M%S"`/" ./output/dataset-metadata.json
!kaggle datasets create -p ./output/

Starting upload for file model_3.pth
100% 477M/477M [00:10<00:00, 46.0MB/s]
Upload successful: model_3.pth (477MB)
Starting upload for file model_5.pth
100% 477M/477M [00:11<00:00, 43.4MB/s]
Upload successful: model_5.pth (477MB)
Starting upload for file model_4.pth
100% 477M/477M [00:09<00:00, 51.1MB/s]
Upload successful: model_4.pth (477MB)
Starting upload for file model_2.pth
100% 477M/477M [00:09<00:00, 52.5MB/s]
Upload successful: model_2.pth (477MB)
Starting upload for file dataset-metadata-epoch.json
100% 174/174 [00:02<00:00, 61.7B/s]
Upload successful: dataset-metadata-epoch.json (174B)
Starting upload for file model_1.pth
100% 477M/477M [00:10<00:00, 48.5MB/s]
Upload successful: model_1.pth (477MB)
Your private Dataset is being created. Please check progress at /api/v1/datasets/status//iamnishipy/lightweight-roberta-base-20210705012222-epoch


In [60]:
!cat ./output/dataset-metadata.json

{
  "licenses": [
    {
      "name": "CC0-1.0"
    }
  ], 
  "id": "iamnishipy/lightweight-roberta-base-20210705012222-epoch", 
  "title": "Roberta-base-0705012222 with changing epochs"
}