### Abstract

基于统计学中对模型输出不确定性估计理论，在beam search过程中，通过采样，在生成过程中计算token的不确定性，作为token真实性的参考指标。

------

LLM建模在自回归语言模型上，不以信息真实性为目的，因此有概率产生虚假信息。LLM之前，在翻译、摘要生成、虚假新闻检测等领域，就有针对生成结果可靠性的相关研究。研究主要又分为如何避免、如何检测两个方向。如何减轻LLM生成虚假信息，同样可以从这两个方面入手：

- 在训练过程中干预，主要有引导reward模型判别虚假内容；提高数据质量；进行数据增强等方法。

- 在推理过程中干预，通过某种方式判定生成结果的真实性。

针对第二点，不少文献基于rough、bleu、perplexity等指标来设计“真实性”指标。但是文本的真实性往往是一字之差，这些sentence、甚至是paragraph级别的指标粒度太粗，因此，我认为应该设计一个基于token粒度的真实性评估方法。以此以基础，可以衍生出干预beam search过程，惩罚token出现概率等措施。

### Theory

在research中，hallucinate被分为两种情况：

- Intrinsic，模型编造见过的事实
- Extrinsic，模型编造没见过的事实

在uncertainty度量问题中，同样把uncertainty分为两种：

- Epistemic，数据集中包含正确信息
- Aleatoric，数据集包含噪声

Intrinsic和Epistemic非常相似，都是模型在见过的数据集上犯错。在数据集包含正确信息的前提下，beam search过程中可以参考Epistemic uncertainty度量估计的方式，来计算模型对token的不确定性程度，作为“真实性”的某种度量。

估计不确定性，最直观的方法是方差，但是beam search时每个token只计算了一次，单个数值无法计算方差，需要设法获取模型每个时间步的不同输出。现代网络结构往往含有dropout层，在训练过程中，dropout下的模型参数可以看作二项分布，推理过程中打开dropout，即可视作是对模型参数的采样。推理过程先关闭dropout，获得时间步$t$的$token_t$，再打开dropout，推理$N$次，网络的$N$个输出，计算$N$个概率分布的熵、熵的方差、$token_t$的概率方差等指标，代表模型对输出的确定程度，某种程度上可以反应$token_t$是否真实。

### Experiment
在chatglm-6b上测试

主要hack了chatglm-6b的forward过程，添加了dropout层的控制。

模拟了一个beam_size=1，不做随机操作的beam search，每个时间步做如下操作：
- 1. 关闭dropout
- 2. forward，获取token
- 3. 打开dropout
- 4. 计算N次，获取N个概率分布，基于N个概率分布计算分布的entropy、std(entropy)、avg(entropy)、avg(token_prob)指标，作为token的真实程度估计

In [1]:
import pickle as pkl
import torch
from torch import nn
import numpy as np
import tqdm.auto as tqdm
from typing import List, Tuple, Optional, Callable, Union
from transformers.modeling_outputs import (
    BaseModelOutputWithPast,
    CausalLMOutputWithPast,
    BaseModelOutputWithPastAndCrossAttentions,
)
from transformers import AutoTokenizer, AutoModel
from transformers.generation.logits_process import LogitsProcessor
from transformers.generation.utils import LogitsProcessorList, StoppingCriteriaList, GenerationConfig, ModelOutput

In [2]:
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).float()

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.


Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

In [3]:
# chatglm在推理时没有加入dropout，这里hack forward函数，添加drop层

def add_infer_dropout(self, dropout_rate = 0.05):
    print("add infer dropout to model")
    self.infer_dropout = torch.nn.Dropout(dropout_rate)

def chat_glm_forward(
        self,
        input_ids: Optional[torch.LongTensor] = None,
        position_ids: Optional[torch.LongTensor] = None,
        attention_mask: Optional[torch.Tensor] = None,
        past_key_values: Optional[Tuple[Tuple[torch.Tensor, torch.Tensor], ...]] = None,
        inputs_embeds: Optional[torch.LongTensor] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
        enable_dropout: bool = False,
) -> Union[Tuple[torch.Tensor, ...], BaseModelOutputWithPast]:
    output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
    output_hidden_states = (
        output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
    )
    use_cache = use_cache if use_cache is not None else self.config.use_cache
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    if self.gradient_checkpointing and self.training:
        if use_cache:
            logger.warning_once(
                "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
            )
            use_cache = False

    if input_ids is not None and inputs_embeds is not None:
        raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
    elif input_ids is not None:
        batch_size, seq_length = input_ids.shape[:2]
    elif inputs_embeds is not None:
        batch_size, seq_length = inputs_embeds.shape[:2]
    else:
        raise ValueError("You have to specify either input_ids or inputs_embeds")

    if inputs_embeds is None:
        inputs_embeds = self.word_embeddings(input_ids)

    if past_key_values is None:
        if self.pre_seq_len is not None:
            past_key_values = self.get_prompt(batch_size=input_ids.shape[0], device=input_ids.device,
                                              dtype=inputs_embeds.dtype)
        else:
            past_key_values = tuple([None] * len(self.layers))

        if attention_mask is None:
            attention_mask = self.get_masks(
                input_ids,
                device=input_ids.device
            )


        if position_ids is None:
            MASK, gMASK = self.config.mask_token_id, self.config.gmask_token_id
            seqs = input_ids.tolist()

            mask_positions, use_gmasks = [], []
            for seq in seqs:
                mask_token = gMASK if gMASK in seq else MASK
                use_gmask = mask_token == gMASK
                mask_positions.append(seq.index(mask_token))
                use_gmasks.append(use_gmask)

            position_ids = self.get_position_ids(
                input_ids,
                mask_positions=mask_positions,
                device=input_ids.device,
                use_gmasks=use_gmasks
            )

    if self.pre_seq_len is not None and attention_mask is not None:
        prefix_attention_mask = torch.ones(batch_size, 1, input_ids.size(-1), self.pre_seq_len).to(
            attention_mask.device)
        prefix_attention_mask = (prefix_attention_mask < 0.5).bool()
        attention_mask = torch.cat((prefix_attention_mask, attention_mask), dim=3)

    # [seq_len, batch, hidden_size]
    hidden_states = inputs_embeds.transpose(0, 1)

    presents = () if use_cache else None
    all_self_attentions = () if output_attentions else None
    all_hidden_states = () if output_hidden_states else None

    if attention_mask is None:
        attention_mask = torch.zeros(1, 1, device=input_ids.device).bool()
    else:
        attention_mask = attention_mask.to(hidden_states.device)

    for i, layer in enumerate(self.layers):

        if output_hidden_states:
            all_hidden_states = all_hidden_states + (hidden_states,)
        layer_past = past_key_values[i]

        if self.gradient_checkpointing and self.training:
            layer_ret = torch.utils.checkpoint.checkpoint(
                layer,
                hidden_states,
                position_ids,
                attention_mask,
                torch.tensor(i),
                layer_past,
                use_cache,
                output_attentions
            )
        else:
            layer_ret = layer(
                hidden_states,
                position_ids=position_ids,
                attention_mask=attention_mask,
                layer_id=torch.tensor(i),
                layer_past=layer_past,
                use_cache=use_cache,
                output_attentions=output_attentions
            )

        hidden_states = layer_ret[0]
        
        if enable_dropout:
            hidden_states = self.infer_dropout(hidden_states)
            

        if use_cache:
            presents = presents + (layer_ret[1],)

        if output_attentions:
            all_self_attentions = all_self_attentions + (layer_ret[2 if use_cache else 1],)

    # Final layer norm.
    hidden_states = self.final_layernorm(hidden_states)

    if output_hidden_states:
        all_hidden_states = all_hidden_states + (hidden_states,)

    if not return_dict:
        return tuple(v for v in [hidden_states, presents, all_hidden_states, all_self_attentions] if v is not None)

    return BaseModelOutputWithPast(
        last_hidden_state=hidden_states,
        past_key_values=presents,
        hidden_states=all_hidden_states,
        attentions=all_self_attentions,
    )



def chat_glm_for_cd_forward(
        self,
        input_ids: Optional[torch.Tensor] = None,
        position_ids: Optional[torch.Tensor] = None,
        attention_mask: Optional[torch.Tensor] = None,
        past_key_values: Optional[Tuple[torch.FloatTensor]] = None,
        inputs_embeds: Optional[torch.Tensor] = None,
        labels: Optional[torch.Tensor] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
        enable_dropout: bool = False,
):
    use_cache = use_cache if use_cache is not None else self.config.use_cache
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict

    transformer_outputs = self.transformer(
        input_ids=input_ids,
        position_ids=position_ids,
        attention_mask=attention_mask,
        past_key_values=past_key_values,
        inputs_embeds=inputs_embeds,
        use_cache=use_cache,
        output_attentions=output_attentions,
        output_hidden_states=output_hidden_states,
        return_dict=return_dict,
        enable_dropout=enable_dropout,
    )

    hidden_states = transformer_outputs[0]

    lm_logits = self.lm_head(hidden_states).permute(1, 0, 2).contiguous()

    loss = None
    if labels is not None:
        lm_logits = lm_logits.to(torch.float32)

        # Shift so that tokens < n predict n
        shift_logits = lm_logits[..., :-1, :].contiguous()
        shift_labels = labels[..., 1:].contiguous()
        # Flatten the tokens
        loss_fct = CrossEntropyLoss(ignore_index=-100)
        loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))

        lm_logits = lm_logits.to(hidden_states.dtype)
        loss = loss.to(hidden_states.dtype)

    if not return_dict:
        output = (lm_logits,) + transformer_outputs[1:]
        return ((loss,) + output) if loss is not None else output

    return CausalLMOutputWithPast(
        loss=loss,
        logits=lm_logits,
        past_key_values=transformer_outputs.past_key_values,
        hidden_states=transformer_outputs.hidden_states,
        attentions=transformer_outputs.attentions,
    )


type(model.transformer).add_infer_dropout = add_infer_dropout
type(model).forward = chat_glm_for_cd_forward
type(model.transformer).forward = chat_glm_forward

In [4]:
DROPOUT_RATE = 0.1
FORWARD_N = 8

model.transformer.add_infer_dropout(DROPOUT_RATE)

add infer dropout to model


In [5]:
def generate(prompt, tokenizer, model, forward_n):
    # generation params
    bos_token_id, eos_token_id = tokenizer.bos_token_id, tokenizer.eos_token_id
    max_length = 32
    
    logits_processor = LogitsProcessorList()
    stopping_criteria = StoppingCriteriaList()
    
    
    inputs = tokenizer([prompt], return_tensors="pt")
    inputs = inputs.to(model.device)
    input_ids = inputs["input_ids"]
    
    # beam search
    batch_size, input_ids_seq_length = input_ids.shape[:2]
    
    unfinished_sequences = input_ids.new(input_ids.shape[0]).fill_(1)
    
    token_mc_stddev = list()
    token_mc_entropy_avg = list()
    token_mc_entropy_stddev = list()
    generated_tokens = list()

    while len(generated_tokens) < max_length:
        print(len(generated_tokens), end=' ')
        model_inputs = model.prepare_inputs_for_generation(input_ids)
        
        # 第一次forward，关闭dropout，计算模型token
        model.eval()
        outputs = model(**model_inputs, return_dict=True, output_attentions=False, output_hidden_states=False)
        next_token_logits = outputs.logits[:, -1, :]
        next_token_scores = logits_processor(input_ids, next_token_logits)
        probs = nn.functional.softmax(next_token_scores, dim=-1)
        next_tokens = torch.argmax(probs, dim=-1)
        generated_tokens.append(next_tokens)
        
        # 采样n次，开启dropout，计算新token的支持概率
        step_mc_entropies = list()
        step_mc_probs = list()
        for j in range(forward_n):
            model.train()
            outputs = model(**model_inputs, return_dict=True, output_attentions=False, output_hidden_states=False,
                            enable_dropout=True)
            next_token_logits = outputs.logits[:, -1, :]
            next_token_scores = logits_processor(input_ids, next_token_logits)
            probs = nn.functional.softmax(next_token_scores, dim=-1).detach().numpy()
#             token_mc_probs[-1][j, 0] = probs[0][next_tokens[0]]
            step_mc_entropies.append(-np.sum(probs * np.log2(probs)))
            step_mc_probs.append(probs[0][next_tokens[0]])
        token_mc_stddev.append(np.var(step_mc_probs))
        token_mc_entropy_stddev.append(np.var(step_mc_entropies))
        token_mc_entropy_avg.append(np.mean(step_mc_entropies))
            
        if next_tokens[0] == tokenizer.eos_token_id:
            print("")
            break

        input_ids = torch.cat([input_ids, next_tokens[:, None]], dim=-1)

    return [tid.numpy() for tid in generated_tokens], token_mc_stddev, token_mc_entropy_avg, token_mc_entropy_stddev

In [6]:
# visualize
def colored_background(r, g, b, text):
    return f"\033[48;2;{r};{g};{b}m{text}\033[0m"
    
def print_seq(tokens, uncertainty):
    l = 150
    h = 255
    max_ = np.max(uncertainty)
    min_ = np.min(uncertainty)
    k = (h - l) / (max_ - min_)
    for token, v in zip(tokens, l + k * (uncertainty - min_)):
        print(colored_background(0, int(v), 0, token), end='')
    print("")

In [7]:
prompts = [
    "梅西的生日是？",
    "周树人的生日是？",
    "简要描述法国国旗"
]

with open("result.pkl", 'wb') as f:
    for prompt in prompts:
        outputs = generate(prompt, tokenizer, model, forward_n=FORWARD_N)
        record = {"text": prompt, "outputs": outputs}
        print("measure by mc_stddev")
        print_seq([tokenizer.decode(tk) for tk in outputs[0]], outputs[1])
        print("measure by mc_enctopy_average")
        print_seq([tokenizer.decode(tk) for tk in outputs[0]], outputs[2])
        print("measure by mc_enctopy_stddev")
        print_seq([tokenizer.decode(tk) for tk in outputs[0]], outputs[3])
        pkl.dump(record, f)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
measure by mc_stddev
[48;2;0;150;0mLionel[0m[48;2;0;150;0mMessi[0m[48;2;0;150;0m的生日[0m[48;2;0;255;0m是[0m[48;2;0;161;0m1[0m[48;2;0;199;0m9[0m[48;2;0;167;0m8[0m[48;2;0;150;0m7[0m[48;2;0;151;0m年[0m[48;2;0;161;0m6[0m[48;2;0;199;0m月[0m[48;2;0;150;0m2[0m[48;2;0;150;0m4[0m[48;2;0;192;0m日[0m[48;2;0;185;0m。[0m[48;2;0;201;0m[0m
measure by mc_enctopy_average
[48;2;0;182;0mLionel[0m[48;2;0;255;0mMessi[0m[48;2;0;211;0m的生日[0m[48;2;0;199;0m是[0m[48;2;0;200;0m1[0m[48;2;0;150;0m9[0m[48;2;0;150;0m8[0m[48;2;0;179;0m7[0m[48;2;0;200;0m年[0m[48;2;0;208;0m6[0m[48;2;0;198;0m月[0m[48;2;0;212;0m2[0m[48;2;0;159;0m4[0m[48;2;0;194;0m日[0m[48;2;0;180;0m。[0m[48;2;0;177;0m[0m
measure by mc_enctopy_stddev
[48;2;0;239;0mLionel[0m[48;2;0;150;0mMessi[0m[48;2;0;188;0m的生日[0m[48;2;0;255;0m是[0m[48;2;0;189;0m1[0m[48;2;0;207;0m9[0m[48;2;0;158;0m8[0m[48;2;0;226;0m7[0m[48;2;0;202;0m年[0m[48;2;0;229;0m6[0m[48;2;0

受限计算资源，长度限制在32

background color代表句子内的相对确定程度，在大量真实语料下，可以计算语料内所有token的相关指标，挑选合适的作为评判标准。

### What to do
基于统计学不确定性估计计算token真实性指标，优点在于无监督，有理论支持，实现方便，后续扩展方便（token惩罚、设计sentence真实性、不确定性干预训练过程等）；缺点在于真实场景下，需要一个“指标的指标”来评估效果。

hallucinate的问题是难以定义，这也是相关研究大都自建数据集，标注，评估效果的原因。在实际场景中，先要明确定义什么是”hallucinate“，例如和结构化的知识库不一致，和prompt中提供的信息不一致，基于明确定义才能有后续的优化目标。

信息提取可能是改善hallucinate的问题途径之一，可以在训练过程中，抽取entity，替换错误结果，引导reward模型识别错误；在推理过程中抽取关键entity，计算不确定性等等（大部分token不是关键信息）。

### Reference & Notes

-  [Survey of Hallucination in Natural Language Generation](https://arxiv.org/pdf/2202.03629.pdf) (2022)
   -  Measurement (section 4)
      -  statistic-based，与语料越“相似”约可信，用gram-based、vector-based度量相似性
      -  model-based
         -  信息抽取，比较三元组
         -  Qa-based，反复问，看回答是否一致（不确定的答案导致解码过程倾向于随机token从而导致不一致）
         -  NLI-based，外部模型打分，看回答和语料是否“entailment”
         -  LM-based，一个只在summarization上训练的LM0和一个在source及summarization训练的LM1，比较token的概率，小于LM1时认为是虚假信息，设计整段文本的指标
   -  Method (section 5)
      -  better dataset (clean, argument...)
      -  model architecture
         -  Encoder .... (skipped because gpt use decoder only)
         -  **Decoder （问答是one-to-many问题，核心想法是检测多次回答的一致性）**
            -  [ ] multi-branch
            -  [ ] Uncertainty-aware
            -  [ ] dual
            -  [ ] tree-based
            -  [ ] ... 

-  [SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models](https://arxiv.org/pdf/2303.08896.pdf) (2023)
   -  method
      -  method1: 确定的答案收束，错误的答案发散，在有token prob的情况下，计算 f = sum(ppl(r))/|r| 代表信息真实程度。
      -  method2: 基于BertScore。设计了类似列表相似度的指标，通过beamsearch生成的多个答案，根据句子相似度计算“不可靠”程度。
      -  method3: Qa-based...



-  [A Token-level Reference-free Hallucination Detection Benchmark forFree-form Text Generation](https://arxiv.org/pdf/2104.08704.pdf) (2022)
   - token level classification based on created dataset

-  [Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation](https://arxiv.org/abs/2208.05309)

-  [UNCERTAINTY ESTIMATION IN AUTOREGRESSIVE STRUCTURED PREDICTION](https://arxiv.org/pdf/2002.07650.pdf)

-  [Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning](http://proceedings.mlr.press/v48/gal16.pdf)

- [monte-carlo-dropout](https://towardsdatascience.com/monte-carlo-dropout-7fd52f8b6571)
...