# Inference

Lets used the model fine-tuned in the notebook for generation. This notebook will:
- Try to improve on the stopping criteria created in the training notebook.
- Experiment with different was to generate data from the model.


In [1]:
from transformers import AutoModelForCausalLM, GPT2Tokenizer
from transformers import StoppingCriteria, StoppingCriteriaList
import torch
import random

In [2]:
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
PROJECT_MODEL = "efarish/GPT2_FT_By_NT_RAND_v11"
model = AutoModelForCausalLM.from_pretrained(PROJECT_MODEL)
model = model.to( device )
model.eval()
tokenizer = GPT2Tokenizer.from_pretrained(PROJECT_MODEL)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.01k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/328M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/119 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/525 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/999k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/470 [00:00<?, ?B/s]

Below is an attempt an creating a stopping criteria so that the text generated from the model ends in a period.

In [3]:
class StoppingCriteriaSub(StoppingCriteria):
    def __init__(self, stops = [], encounters=1):
        super().__init__()
        self.stops = [stop.to( device ) for stop in stops]
        self.encounters = encounters

    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor):
        #print(tokenizer.decode(input_ids[0]))
        for stop in self.stops:
            if sum( input_ids[0] == stop ) >= self.encounters: return True
        return False

In [4]:
stop_words = ['.']
stop_words_ids = [tokenizer(stop_word, return_tensors='pt', add_special_tokens=False)['input_ids'].squeeze() for stop_word in stop_words]
stopping_criteria = StoppingCriteriaList([StoppingCriteriaSub(stops=stop_words_ids,
                                                              encounters=2)])

Now lets use the model `generate` function with the stopping criteria.

In [5]:
def get_model_input(_input: str):
  prompt = tokenizer.bos_token + _input
  generated = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0)
  generated = generated.to(device)
  return generated

generated = get_model_input("Jesus said")

In [6]:
output = model.generate(
            generated,
            stopping_criteria=stopping_criteria,
            do_sample=True,
            max_new_tokens=100,
            #no_repeat_ngram_size=2,
)
out = tokenizer.decode(output[0], skip_special_tokens=True)
out = out.replace('"', "")
out

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


'Jesus said to him, Father, I thank you that you have heard me. I knew that you always hear me, but I said this for the benefit of the people standing here, that they may believe that you sent me. Then he returned to Jerusalem and was obedient to them.'

This looks better.

Now lets try stream the text generated to the notebook instead of waiting for the entire response to be generated.

In [10]:
from transformers import TextIteratorStreamer
from threading import Thread

input = tokenizer(["Jesus said"], return_tensors="pt")
streamer = TextIteratorStreamer( tokenizer )
generation_kwargs = dict(input, streamer=streamer,
                         stopping_criteria=stopping_criteria,
                         do_sample=True,
                         max_new_tokens=100,)
thread = Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()
generated_text = ""
for new_text in streamer:
    print(new_text.replace('"', ""), end="")


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Jesus said to them, 'It is I; don't be afraid. I am the First and the Last.

This has some odd characters but looks pretty good. It appears the model could use some more training on a larger training set.