<a href="https://colab.research.google.com/github/jaswanth99999/GPT/blob/master/GPT1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.26.1-py3-none-any.whl (6.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m46.9 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.11.0
  Downloading huggingface_hub-0.12.1-py3-none-any.whl (190 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.3/190.3 KB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m77.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.12.1 tokenizers-0.13.2 transformers-4.26.1


In [2]:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import torch
from torch.utils.data import Dataset 
import random
import time
import datetime
import random
from transformers import GPT2LMHeadModel, GPT2Config
import numpy as np
from torch.utils.data import random_split
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

In [3]:
req='/content/data.txt'
with open(req) as f:
    req = f.read()

In [4]:
all_sentences = req.split("\n")

In [5]:
all_sentences[:100]

['Hokkaido was formerly known as Ezo  Yezo  Yeso  or Yesso.',
 'According to Matsuura  the name was thought up because the Ainu called the region Kai.',
 'In contrast to the island of Honshu  Hokkaido saw an absence of conflict during this time period.',
 'From the Middle Ages  the people in Hokkaido began to be called Ezo.',
 'Hokkaido subsequently became known as Ezochi  蝦夷地  lit.',
 'The disputes eventually developed into war.',
 'Takeda Nobuhiro killed the Ainu leader  Koshamain  and defeated the opposition in 1457.',
 'The Matsumae family s economy relied upon trade with the Ainu.',
 'They held authority over the south of Ezochi until the end of the Edo period.',
 'There were numerous revolts by the Ainu against the feudal rule.',
 'The last large scale resistance was Shakushain s revolt in 1669–1672.',
 'In 1789  a smaller movement known as the Menashi–Kunashir rebellion was crushed.',
 'Meiji Restoration     Hokkaido was known as Ezochi until the Meiji Restoration.',
 'Ezochi wa

In [7]:
print("sample size : ",len(all_sentences))

sample size :  40389


In [8]:
from transformers import GPT2Tokenizer
#get pretrained tokenizer
#Pad token: special token used to make arrays of tokens the same size for batching purpose.
tokenizer = GPT2Tokenizer.from_pretrained('gpt2', bos_token='<sos>', eos_token='<eos>', pad_token='<pad>')

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [9]:
print( tokenizer.encode("Japan Tokyo") )
print( tokenizer.encode("Japan") )
print( tokenizer.encode("japan tokyo") )
print( tokenizer.encode("japan") )
print( tokenizer.encode("tokyo") )

[16504, 11790]
[16504]
[73, 2674, 284, 2584, 78]
[73, 2674]
[83, 482, 8226]


In [10]:
max_len = max([len(tokenizer.encode(s)) for s in all_sentences])

print(f"max_len {max_len}")

max_len 85


In [11]:
#since we will be feeding with sentences from wikipedia
#we can mark beginning and end of sentences with with sos and eos
#The attention mask is a binary tensor indicating the position of the padded indices so that the model does not attend to them
def tokenize_seq(sent,tokenizer,max_length):
  return tokenizer('<sos>'+ sent + '<eos>', truncation=True, max_length=max_length, padding="max_length")

class JapanDataset(Dataset):

  def __init__(self, sentences, tokenizer, gpt2_type="gpt2", max_length=max_len):

    self.tokenizer = tokenizer 
    self.input_ids = []
    self.attn_masks = []

    for sentence in sentences:      
      encodings = tokenize_seq(sentence,tokenizer,max_length)
            
      self.input_ids.append(torch.tensor(encodings['input_ids']))
      self.attn_masks.append(torch.tensor(encodings['attention_mask']))
    
  def __len__(self):
    return len(self.input_ids)

  def __getitem__(self, idx):
    return self.input_ids[idx], self.attn_masks[idx]   

def format_time(elapsed):
    return str(datetime.timedelta(seconds=int(round((elapsed)))))  

In [12]:
import gc
gc.collect() 

26

In [13]:
#create an instance of Dataset
dataset = JapanDataset(all_sentences, tokenizer, max_length=max_len)

# Split into training and validation sets
train_size = int(0.9 * len(dataset))
val_size = len(dataset) - train_size

train_set, val_set = random_split(dataset, [train_size, val_size])
print("train_size :",train_size)
print("val_size   :",val_size)

gc.collect() 

train_size : 36350
val_size   : 4039


0

In [14]:
dataset[0]

(tensor([50257,    39,   482,    74, 44354,   373, 15734,  1900,   355,   412,
         10872,   220,   575,  8471,    78,   220,  3363,    78,   220,   393,
           575,   408,    78,    13, 50258, 50259, 50259, 50259, 50259, 50259,
         50259, 50259, 50259, 50259, 50259, 50259, 50259, 50259, 50259, 50259,
         50259, 50259, 50259, 50259, 50259, 50259, 50259, 50259, 50259, 50259,
         50259, 50259, 50259, 50259, 50259, 50259, 50259, 50259, 50259, 50259,
         50259, 50259, 50259, 50259, 50259, 50259, 50259, 50259, 50259, 50259,
         50259, 50259, 50259, 50259, 50259, 50259, 50259, 50259, 50259, 50259,
         50259, 50259, 50259, 50259, 50259]),
 tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))

In [15]:
#define dataloaders
train_dataloader = DataLoader(train_set,  sampler = RandomSampler(train_set), batch_size = 32)
validation_dataloader = DataLoader(val_set, sampler = SequentialSampler(val_set), batch_size = 32 )

In [16]:
# Create default config
configuration = GPT2Config.from_pretrained('gpt2', output_hidden_states=False)
# Load pretrained gpt2
model = GPT2LMHeadModel.from_pretrained("gpt2", config=configuration)
model.resize_token_embeddings(len(tokenizer))

# Create device
device = torch.device("cuda")
model.cuda()


optimizer = torch.optim.Adam(model.parameters(),lr = 0.0005)
model = model.to(device)

Downloading (…)"pytorch_model.bin";:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [17]:
def eval_keywords(keywords):
  model.eval()
  for keyword in keywords:
    input_seq = "<sos> " + keyword
    generated = torch.tensor(tokenizer.encode(input_seq)).unsqueeze(0)
    generated = generated.to(device)
    sample_outputs = model.generate(
                                generated, 
                                do_sample=True,   
                                top_k=30, 
                                max_length = 50,
                                top_p=0.90, 
                                num_return_sequences=2
                                )
    for i, sample_output in enumerate(sample_outputs):
      print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))

In [18]:
keywords = ["Osaka","Japan","Kyoto","Yokohama","Kanto","Nikko","Japan has","Tokyo is the","Osaka is the","Kyoto is the"]


In [19]:
#call model with a batch of input
def process_one_batch(batch):
  b_input_ids = batch[0].to(device)
  b_labels = batch[0].to(device)
  b_masks = batch[1].to(device)
  outputs  = model(b_input_ids,  attention_mask = b_masks,labels=b_labels)
  return outputs

#do one epoch for training
def train_epoch():
  t0 = time.time()
  total_train_loss = 0
  model.train()
  for step, batch in enumerate(train_dataloader):
        
        model.zero_grad()        
        outputs = process_one_batch( batch)
        loss = outputs[0]  
        batch_loss = loss.item()
        total_train_loss += batch_loss

        loss.backward()
        optimizer.step()

    
  avg_train_loss = total_train_loss / len(train_dataloader)  
  print("avg_train_loss",avg_train_loss)  
  elapsed_time = format_time(time.time() - t0)
  print("elapsed time for 1 training epoch : ",elapsed_time)

#The training loss indicates how well the model is fitting the training data, while the validation loss indicates how well the model fits new data.
#do one epoch for eval
def eval_epoch():
  t0 = time.time()
  total_eval_loss = 0
  nb_eval_steps = 0
  # Evaluate data for one epoch
  for batch in validation_dataloader:            
        
    with torch.no_grad():        
      outputs = process_one_batch( batch)
      loss = outputs[0]              
      batch_loss = loss.item()
      total_eval_loss += batch_loss         

  avg_val_loss = total_eval_loss / len(validation_dataloader)
  print("avg_val_loss",avg_val_loss) 
  elapsed_time = format_time(time.time() - t0)
  print("elapsed time for 1 eval epoch : ",elapsed_time)

In [20]:
train_epoch()
eval_epoch()
eval_keywords( keywords )

avg_train_loss 0.9387909352359637
elapsed time for 1 training epoch :  0:13:55


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


avg_val_loss 0.7956156190924757
elapsed time for 1 eval epoch :  0:00:32


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Osaka was chosen as a venue for the 2017 Summer Olympics.
1:  Osaka Castle was completed in October  1945  with the original building still on.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Japan ranked 11th in the world for a number of other things  including health and education.
1:  Japan s population increased from 8.5 billion in 1950 to 9.4 billion in 1969.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Kyoto is also known for a variety of sake styles.
1:  Kyoto     Kyoto is one of the city s traditional cities.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Yokohama s capital is Tokyo  Japan s largest city.
1:  Yokohama Prefecture   光来后 是   日本  Yokohama Prefecture  is a prefecture of Japan located in the Chūchi River delta.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Kanto has a population of approximately 5 000 000 persons  mostly in Japan.
1:  Kanto is a member of the Asian Group of Nations.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Nikko was an English writer who influenced both the English and the French language.
1:  Nikko Kiyomaro  author of The Tale of the Great Gatsukuri and A History of the Yayoi.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Japan has the oldest surviving Japanese historical language as an independent language.
1:  Japan has the most powerful military in the world and a strong naval power in the world.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Tokyo is the city of over 2 900 shops  including a bakery.
1:  Tokyo is the main city of the Jōmon Prefecture.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Osaka is the capital  and has the largest metropolitan population density in Japan.
1:  Osaka is the country s fifth largest city and the second largest city.
0:  Kyoto is the birthplace of the Utsuno kami in the Kamakura Prefecture.
1:  Kyoto is the only city that exhibits the effects of modern Japan.


In [21]:
train_epoch()
eval_epoch()
eval_keywords( keywords )

avg_train_loss 0.6927844657969306
elapsed time for 1 training epoch :  0:14:07


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


avg_val_loss 0.7801225589016291
elapsed time for 1 eval epoch :  0:00:32


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Osaka Castle s most notable architect is Takashi Mikasa.
1:  Osaka was also the birthplace of the first Baháʼí school  Shintoism.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Japan s foreign trade with Asia is high.
1:  Japan also won the gold medal in the 2002 Olympics in South Korea.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Kyoto s economy relied heavily on imported oil.
1:  Kyoto  is famous for its abundance of fish and other notable foods.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Yokohama hosted the tournament in 2014  the only time Japan has won the tournament.
1:  Yokohama was hit by three bombs  including four torpedoes  and sunk.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Kanto     In Japan  the term kanto originated in kanji and kotai languages.
1:  Kanto is a region in Central Asia.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Nikko Kiyotan was appointed as the first mayor of Tokushima.
1:  Nikko Gaku Ichirō and his wife Sadako were the first Japanese to be killed by a Japanese civilian.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Japan has a high level of biodiversity  with more than 70 000 species.
1:  Japan has a large number of ethnic Japanese Muslims.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Tokyo is the birthplace of Shohoku University.
1:  Tokyo is the only city in Japan to have a UNESCO World Heritage Site.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Osaka is the birthplace of Osaka International Animation City.
1:  Osaka is the largest city of the prefecture with 9  of its land area.
0:  Kyoto is the only Japanese city without a UNESCO World Heritage Site.
1:  Kyoto is the second largest city in Japan to host the Olympic Games.


In [22]:
train_epoch()
eval_epoch()
eval_keywords( keywords )

avg_train_loss 0.595903958862936
elapsed time for 1 training epoch :  0:14:08


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


avg_val_loss 0.7953800962665888
elapsed time for 1 eval epoch :  0:00:32


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Osaka became the first major city in the nation to host a football championship.
1:  Osaka also has many types of street food  such as ramen.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Japan is the largest economy in the world  with a wide range of exports.
1:  Japan has a high concentration of high income  ranging from 47.5 billion yen per capita in 2000.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Kyoto was the first city of the ancient Kyoto Date clan.
1:  Kyoto is a major financial center for Shinto.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Yokohama was the birthplace of the first American  William Adams White  in 1877.
1:  Yokohama City is often called to be the birthplace of the kabuki tradition.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Kanto     The Kansai region has a mild climate and low elevation.
1:  Kanto League was established in 1854 and played its first games at Suita City in 1854.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Nikko Bashō  former governor of Okinawa who joined the fray later died.
1:  Nikko was a very popular actress.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Japan has a significant minority of Japanese nationals.
1:  Japan has a low unemployment rate of around 0.1.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Tokyo is the largest city in Japan with a population of 744 million people.
1:  Tokyo is the capital and largest city of Japan.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Osaka is the southernmost prefecture of Japan and the 5th largest in Japan.
1:  Osaka is the capital and largest city in Japan.
0:  Kyoto is the sixth most densely populated prefecture  and the eighth largest in Japan.
1:  Kyoto is the second largest city in Japan with a total area of 2 403.3 km  3 209 sq mi.


In [23]:
train_epoch()
eval_epoch()
eval_keywords( keywords )

avg_train_loss 0.5098082236945629
elapsed time for 1 training epoch :  0:14:05


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


avg_val_loss 0.8430612763081948
elapsed time for 1 eval epoch :  0:00:32


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Osaka is known as an orography center.
1:  Osaka Prefecture is divided into 47 administrative prefectures and eight traditional regions.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Japan also has a significant fleet of DC 7s.
1:  Japan s population is expected to drop by around 100 million by 2050.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Kyoto was designated as a core city on April 1  1889.
1:  Kyoto MEA has a diverse assemblage of food crops that range from grasses to tomatoes.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Yokohama Metropolitan Government  the largest city in the region  serves as a public and cultural center.
1:  Yokohama became the capital of Utsunomiya  a period of prosperity and culture.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Kanto s first major landform was the Shimoda Peninsula in the Kanto period  ca.
1:  Kanto was the birthplace of the first Japanese pottery by him.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Nikko is a traditional Japanese style sushi rice and is served at many Japanese restaurants.
1:  Nikko is often made from unpolished rice flour  usually without any rice sediment.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Japan has a high level of immunisation.
1:  Japan has a low unemployment rate of around 2.4 percent.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Tokyo is the fifth largest financial contributor to UNHCR programs.
1:  Tokyo is the most populous city and has the second highest number of cities in the world.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Osaka is the largest city in Japan and the second largest in the world by area.
1:  Osaka is the largest city in Japan with a population of around 8 million people.
0:  Kyoto is the only land link to the rest of Japan.
1:  Kyoto is the world s most populous city at 9 972 people per square mile.


In [24]:
train_epoch()
eval_epoch()
eval_keywords( keywords )

avg_train_loss 0.4387820489232389
elapsed time for 1 training epoch :  0:14:06


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


avg_val_loss 0.8817067169767665
elapsed time for 1 eval epoch :  0:00:32


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Osaka also has Osaka  Kobe and Kyoto Onsen.
1:  Osaka is known for cattle breeding and southeastern Japan for minke whales.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Japan     Rice production in Japan is listed by volume.
1:  Japan has no separate sex life laws  and its main religion is Christianity.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Kyoto was supposed to be a major port by international trade.
1:  Kyoto  Tokyo  and Taiwan      Kyoto was the capital of Japan during the Edo period.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Yokohama was the last city of Japan to host the 2002 Summer Olympics.
1:  Yokohama      Yokohama is known for the production of high quality wagyū  rice rice wagyū.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Kanto region      Japanese food restaurant chains in Japan have existed throughout the Jōmon region.
1:  Kanto and nearby islands were also affected.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Nikko Shimbun was the first to unite Japan  the last nation to unite Japan.
1:  Nikko and Kyoto were other major transportation cities for the emperor.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Japan has the second longest coastline in the Asia Pacific region.
1:  Japan has a significant trade surplus with foreign countries such as the United States  Canada and Germany.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Tokyo is the sixth largest city in the world  but sixth only in the nation.
1:  Tokyo is the capital and largest city of Japan and a major port of entry for shipping.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Osaka is the  most prosperous city  for baseball  and the second most expensive city in Japan.
1:  Osaka is the fourth most expensive city for expatriates while Kyoto is the fourth most expensive city.
0:  Kyoto is the capital and largest city of Japan  and Kyoto is the largest city.
1:  Kyoto is the second most visited city in Japan and a 5th of the country.


In [25]:
train_epoch()
eval_epoch()
eval_keywords( keywords )

avg_train_loss 0.3820668010854385
elapsed time for 1 training epoch :  0:14:11


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


avg_val_loss 0.9301616765382722
elapsed time for 1 eval epoch :  0:00:32


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Osaka has many theaters for performing arts and historical centers.
1:  Osaka Castle s collection  in katakana  Osaka   is the most famous castle in Japan.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Japan is the world s largest exporter and ninth largest importer.
1:  Japan has a significant number of non indigenous minority ethnic groups.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Kyoto  which has become a major center of tourism  has been a key city since 1996.
1:  Kyoto was designated as a city on 1 January 1939  by government ordinance.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Yokohama was the industrial center most clearly defined in the development of capitalism in Japan.
1:  Yokohama Burning  The Deadly 1923 Earthquake and Fire that Helped Forge Paths.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Kanto is an industrial hub for the C40  Japan Economic Forum.
1:  Kanto is the largest urban area of Japan  with a population of 1.15 million people per year.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Nikko Yusoku was a reformer  who advocated the opening up of Japan.
1:  Nikko Matsuo is considered the founder of portrait art in Japan.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Japan has a rich culture and an international network of international aid organizations.
1:  Japan has the sixth longest coastline in the world at 29 751 km  18 486 mi.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Tokyo is the capital  most populous city on Japan  followed by Osaka and Yokohama.
1:  Tokyo is the cultural capital of Japan and a major tourist destination.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Osaka is the cultural capital of Japan and the largest city in Japan.
1:  Osaka is the largest city on the Japan Sea coast of Asia.
0:  Kyoto is the seat of government since 1868  with the grand duke as emperor.
1:  Kyoto is the capital city of Japan and 5th largest city in the world.


In [26]:
train_epoch()
eval_epoch()
eval_keywords( keywords )

avg_train_loss 0.3322537901588309
elapsed time for 1 training epoch :  0:14:00


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


avg_val_loss 0.9936226085415036
elapsed time for 1 eval epoch :  0:00:32


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Osaka was also the home of the first Japanese coin  mint  in 1872.
1:  Osaka also hosts 36 embassies.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Japan s industrial sector is heavily dependent on imported fossil fuels.
1:  Japan    In 1868  Emperor Kōbun rewarded Kusunoki with governorship of the whole Ryukyu region.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Kyoto was officially established in 1854.
1:  Kyoto has the sixth longest coastline in the world at 29 751 km  18 486 mi.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Yokohama  Osaka  Nagoya gun  Kobe  and Yokohama gun were some of Musashi s earliest readers.
1:  Yokohama Burning  The Deadly 1923 Earthquake and Fire that Helped Forge the Path to World War II.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Kanto is also home to various mountain peaks over Mount Fuji.
1:  Kanto     In 1564  Japan entered into conflict against the Nanbanjin.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Nikko Takaaki becomes prime minister  August 30.
1:  Nikko spent a year in jail on charges of treason and treason.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Japan has the sixth longest coastline in the world at 29 751 km  18 486 mi.
1:  Japan has a high level of economic inequality  which has increased in the past few decades.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Tokyo is the home of Tesla  Inc. and Boeing  Inc.  the oldest and the oldest commercial airport in the world.
1:  Tokyo is the first capital to have a theatrical tradition.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Osaka is the cultural capital of Japan.
1:  Osaka is the capital and largest city in Japan  and Osaka is the largest metropolitan area.
0:  Kyoto is the capital and largest city of Japan.
1:  Kyoto is the second most important Asian city in the country after Tokyo.


In [27]:
train_epoch()
eval_epoch()
eval_keywords( keywords )

avg_train_loss 0.294493797838583
elapsed time for 1 training epoch :  0:14:04


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


avg_val_loss 1.0296927530934492
elapsed time for 1 eval epoch :  0:00:32


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Osaka Castle is the oldest standing castle in Japan.
1:  Osaka Castle is the oldest standing castle in Japan.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Japan had a strong tradition of samurai marrying locals and established schools where children could live.
1:  Japan has been a member of the G7 since the inception of the Big Six in 1949.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Kyoto  鉓守  lit.    Kyoto      References            Further reading    Perrin  Elizabeth  1988.
1:  Kyoto  沖縄県  Chokutō  is a city in Japan Prefecture  Japan.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Yokohama has an annual Momotarō Festival  or Momotarō Festival.
1:  Yokohama  Kobe  Osaka  Kyoto  and Yokohama are sometimes still referred to as  urban styles.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Kanto is home to multinational companies such as Toyota  Honda  Mazda  Suzuki and Mazda.
1:  Kanto  山村   the family name of the Emperor   山國  pronounced Waegawa at the time.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Nikko Tange won the Golden Boot Award for Novels in 2009.
1:  Nikko Tange won the other two endurance races at Sebring.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Japan has gone out of the tournament without a single player.
1:  Japan has a low unemployment rate of around 2.4 percent.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Tokyo is the home of Tange Associates Team  Japanese.
1:  Tokyo is the third most populous city of Japan  after Tokyo and Kyoto.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Osaka is the home of the government supported troupe at National Bunraku Theatre.
1:  Osaka is the third largest city in the nation in terms of area.
0:  Kyoto is the setting of the future of the traditional railway system in Japan.
1:  Kyoto is the most renowned city destination for business  destination  and chief negotiator.


In [28]:
train_epoch()
eval_epoch()
eval_keywords( keywords )

avg_train_loss 0.2664179746979769
elapsed time for 1 training epoch :  0:14:02


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


avg_val_loss 1.0853934273945065
elapsed time for 1 eval epoch :  0:00:32


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Osaka Castle played a pivotal role in the Meiji Restoration.
1:  Osaka is known for its fine sake  senbei  鮒の爪  and teriyaki  稹の爪.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Japan is also home to nine forest ecoregions  an important level of biodiversity.
1:  Japan had a high level of economic inequality  which encouraged a more independent stance.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Kyoto  Japan  has several extensive islands  of which the most is Chūgoku  the longest.
1:  Kyoto was a founding member of the League of Nations  1921–1940.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Yokohama Metropolitan Government is located on the northwest coast of the island of Kyūshū.
1:  Yokohama Metropolitan Government  Ogasawara mura  maps photos Waseda University — Hayashi Shihei.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Kanto region foods made from the Mekong and Tonle Sap rivers has been classified as a food of legend.
1:  Kanto was the 13th largest economy in the world at that time  535.5 million.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Nikko Kashihara won the Palme blossom in March 1996.
1:  Nikko Morita was also involved with the rising popular cult of Christianity.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Japan has also sparked controversy by supporting quasi commercial whaling.
1:  Japan has a significant nonoil production industry  petroleum is a frequent source of employment.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Tokyo is the second largest city to have no regional newspapers.
1:  Tokyo is the second largest city by land area in the Chūgoku region  2019.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


0:  Osaka is the second of Japan s major cities  having passed the Japan Railways in 2003.
1:  Osaka is the home of the hyuganatsu fruit  日本索 日   fruit of the oak  潟田統  fruit of the cherry blossom.
0:  Kyoto is the second most visited city in Japan with a total area of 2 605 km2  1 680 sq mi.
1:  Kyoto is the most visited city in Japan with over 400 industry consortiums.


In [None]:
# Validation loss more than 1 may indicate that the model is underfitting. Underfitting occurs when the model is unable to accurately model the training data, and hence generates large errors.