We want to obtain the Hugging face model, that is built in call_api, giving it the same parameters as the simplest of calls (single niah for mistral). Then, I create a new jsonl file containing the prompt, the answer, and both. Tokenization is non deterministic so test multiple times.

In [1]:
from model_wrappers import HuggingFaceModel
model_name_or_path="mistralai/Mistral-7B-Instruct-v0.2"
temperature=0.0
top_k=1
top_p=1.0
stop_words=""
tokens_to_generate=128
#We load the LLM
llm=HuggingFaceModel(
    name_or_path=model_name_or_path,
    do_sample=temperature > 0,
    repetition_penalty=1,
    temperature=temperature,
    top_k=top_k,
    top_p=top_p,
    stop=stop_words,
    max_new_tokens=tokens_to_generate,
)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

For debugging and comprehension purposes, this cell is to run call_api.py:

In [10]:
%%sh
MODEL_NAME="mistralai/Mistral-7B-Instruct-v0.2"
ROOT_DIR="../../results"  
MODEL_DIR="../../models"
BENCHMARK="synthetic"
MAX_SEQ_LENGTH="4096"
RESULTS_DIR="${ROOT_DIR}/${MODEL_NAME}/${BENCHMARK}/${MAX_SEQ_LENGTH}"
DATA_DIR="${RESULTS_DIR}/data"
PRED_DIR="${RESULTS_DIR}/pred"
TASK="niah_single_1"
MODEL_FRAMEWORK="hf"
MODEL_PATH=$MODEL_NAME
TEMPERATURE="0.0"
TOP_P="1.0"
TOP_K="1"
 python call_api.py \
            --data_dir ${DATA_DIR} \
            --save_dir ${PRED_DIR} \
            --benchmark ${BENCHMARK} \
            --task ${TASK} \
            --server_type ${MODEL_FRAMEWORK} \
            --model_name_or_path ${MODEL_PATH} \
            --temperature ${TEMPERATURE} \
            --top_k ${TOP_K} \
            --top_p ${TOP_P} \


    


Predict niah_single_1 
from ../../results/mistralai/Mistral-7B-Instruct-v0.2/synthetic/4096/data/niah_single_1/validation.jsonl
to ../../results/mistralai/Mistral-7B-Instruct-v0.2/synthetic/4096/pred/niah_single_1.jsonl
DATA:[]


Loading checkpoint shards: 100%|██████████| 3/3 [00:11<00:00,  3.82s/it]
0it [00:00, ?it/s]


Used time: 0.4 minutes


In [2]:
from nemo.collections.asr.parts.utils.manifest_utils import read_manifest
import torch
from tqdm import tqdm
example_path="example.jsonl"
data=read_manifest(example_path)
for i in tqdm(range(len(data))):
    variables=data[i]
    prompt=variables["prompt"]
    answer=variables["answer"]
    concat=variables["concatenation"]
    # assert prompt+answer==concat
    prompttokenization=llm.tokenizer(prompt,return_tensors="pt").input_ids
    answertokenization=llm.tokenizer(answer,return_tensors="pt",add_special_tokens=False).input_ids
    concattokenization=llm.tokenizer(concat,return_tensors="pt").input_ids
    # print(f"Prompt:       {' '.join(map(str, prompttokenization.tolist()))}")
    # print(f"Answer:       {' '.join(map(str, answertokenization.tolist()))}")
    # print(f"Concatenation:{' '.join(map(str, concattokenization.tolist()))}")
    assert torch.allclose(torch.cat([prompttokenization,answertokenization],1),concattokenization)




    
100%|██████████| 500/500 [00:06<00:00, 79.06it/s]


### If we don't remove the special tokens on the answer

We can observe that if we remove the first two tokens of the answer's tokenization, then the concatenation of the tokenization
of the prompt and the tokenization of the answer is the same as the tokenization of the concatenation of the prompt and the answer.

### If we remove the special tokens on the answer

#### If we put no spaces 

```
-End of prompt:       "...,2245, 28804]"
-Beginning of answer: "[                 415, 2841, 9693,..."
-Transition in concat:"...,2245, 28804, 1014, 2841, 9693,..."
```

We can observe that here the first token of the answer differs between the two cases.

#### If we put spaces

```
-End of prompt:       "...,2245, 28804]"
-Beginning of answer: "[                28705, 415, 2841, 9693,..."
-Transition in concat:"...,2245, 28804,        415, 2841, 9693,..."
```

We can observe that here there is an additional token at the beginning of the answer, even though we deactivated the special tokens (including BOS).

#### If we put double spaces

```
-End of prompt:       "...,2245, 28804]"
-Beginning of answer: "[                  259, 415, 2841, 9693,..."
-Transition in concat:"...,2245, 28804, 28705, 415, 2841, 9693,..."
```

There is again a difference in the first token of the answer in the two cases.