# Sanity Check: Local Inference

You need to generate predictions so you can test the model.  Axolotl uploaded the trained model to

[pbevan11/llama-3-8b-ocr-correction](https://huggingface.co/pbevan11/llama-3-8b-ocr-correction)

In [1]:
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
model_id='pbevan11/llama-3-8b-ocr-correction' # this will be different for you based upon hub_model_id
model = AutoPeftModelForCausalLM.from_pretrained(model_id).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

adapter_config.json:   0%|          | 0.00/728 [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/449 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
This can be used to load a bitsandbytes version that is different from the PyTorch CUDA version.
If this was unintended set the BNB_CUDA_VERSION variable to an empty string: export BNB_CUDA_VERSION=
If you use the manual override make sure the right libcudart.so is in your LD_LIBRARY_PATH
For example by adding the following to your .bashrc: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<path_to_cuda_dir/lib64



adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


### Prompt Template

Next, we have to construct a prompt template that is as close as possible to the prompt template we saw earlier.

In [26]:
def prompt(instruction, inp):
    return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{inp}

### Response:
"""

def prompt_tok(instruction, inp, return_ids=False):
    _p = prompt(instruction, inp)
    input_ids = tokenizer(_p, return_tensors="pt", truncation=True).input_ids.cuda()
    out_ids = model.generate(input_ids=input_ids, max_new_tokens=5000, 
                          do_sample=False)
    ids = out_ids.detach().cpu().numpy()
    if return_ids: return out_ids
    
    full_output = tokenizer.batch_decode(ids, skip_special_tokens=True)[0]
    response_start = full_output.find("### Response:")
    if response_start != -1:
        return full_output[response_start + len("### Response:"):]
    else:
        return full_output[len(_p):]

## Sanity Check Examples

Next, we sanity check a few examples to make sure that:

- Our prompt template is constructed correctly (indeed I made some mistakes at first!)
- The model is working as expected.

In [34]:
instruction = "You are an assistant that takes a piece of text that has been corrupted during OCR digitisation, and produce a corrected version of the same text."
inp = "Do Not Kule Oi't hy.er-l'rieed AjijqIi: imac - Analyst (fteuiers) Hcuiers - A | ) | ilf, <;/) in |) nter |iic . conic! deeiilf. l.o sell n lower-|)rieofl wersinn oi its Macintosh cornutor to nttinct ronsnnu-rs already euami'red ot its iPod music jiayo-r untl annoyoil. by sccnrit.y problems ivitJi Willtlows PCs , Piper.iaffray analyst. (Jcne Muster <aid on Tlinrtiday."

inp2 = "program * Turbine Build * the Buzz for 'Middle-Earth Online ' ( Reuteri ) Reutert - The people crammed into a.neeting room at the Providence Convention Center were.contemplating a long visit to an exotic land . They wanted to .know about the sbape of the mountain * and what the weather.would be like . They asked if the nativea would be approachable ."

# Try a historical one
inp3 = """
jer'jl ~D. feritil°l England in het-comprehensive historic yet to make. ill From the sources above referred t° ;oerlTl the present undertaking toprovdlw, ~ biographical sketches for general real', 1 fro" .added a few nomes ofrecent times, ta7ii, of of the church,_ as e.g. Nikon Fdtrio founder of "The Christian Brothe'le`elia treated without 'any leaning to the fro P-4 practice which- they may exhibit'diSeloolrl°!tis the English Church, but simplY.,llo,e- terirk holiness and self4lenial whichiS4ll3_, j Ar`—•elpgii saint in all ages and conntriesi.ancl"—AL of the series toeommemorate, rather'''. 01 tical rank or-social station. wad 11- Py'ff/1; Sermons and treatises en:rep-e-, pit thing like the same manner as' „ `,-9). person. Itio words can pourtrx,h'oe bly the size, and beautp,. aooeat 111,.0t Crystal Palace, as a picture.
"""

In [35]:
out = prompt_tok(instruction, inp)
print(out.replace('\\', ' '))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



Do Not Rule Out Lower-Priced Mac - Analyst (Reuters) Reuters - Apple Inc.  may be considering a lower-priced version of its Macintosh computer to attract consumers already enamored of its iPod music player and annoyed by security problems with Windows PCs, PiperJaffray analyst Gene Munster said on Thursday.


In [30]:
out = prompt_tok(instruction, inp2)
print(out.replace('\\', ' '))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.



Programs Turbine Builds the Buzz for 'Middle-Earth Online' (Reuters) Reuters - The people crammed into a meeting room at the Providence Convention Center were contemplating a long visit to an exotic land. They wanted to know about the shape of the mountains and what the weather would be like. They asked if the natives would be approachable.


In [23]:
out = prompt_tok(instruction, inp3)
print(out.replace('\\', ' '))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


eremiah D. Farrell England in her-comprehensive historic yet to make. Ill From the sources above referred to, the present undertaking to provide, biographical sketches for general reading, from added a few names of recent times, such as of the church, as e.g. Nikon, founder of The Christian Brothers, treated without any leaning to the form of practice which they may exhibit in school, but simply to their holiness and self-denial which all saints in all ages and countries can and should emulate, and all of the series to commemorate rather than of technical rank or social station. His Sermons and treatises are replete with everything like the same manner as a person. His words can pour forth by the size, and beauty, as a picture.


## Homework - Token Check

Optional Homework!  Check that there aren't any token issues https://hamel.dev/notes/llm/finetuning/05_tokenizer_gotchas.html