# Notebook 6.1: ChatGLM2

## 6.1.1 Overview
This is an example shows how to run [ChatGLM2-6B](https://github.com/THUDM/ChatGLM2-6B) Chinese inference on low-cost PCs (without the need of discrete GPU) using [BigDL-LLM](https://github.com/intel-analytics/BigDL/tree/main/python/llm) APIs. ChatGLM2-6B is the second-generation version of the open-source bilingual (Chinese-English) chat model [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B) proposed by [THUDM](https://github.com/THUDM). ChatGLM2-6B also can be found in [Huggingface models](https://huggingface.co/models) in following [link](https://huggingface.co/THUDM/chatglm2-6b).

Before conducting inference, you may need to prepare environment according to [Chapter 2 Environment Setup](../ch_2_Quick_Start).

## Installation

Install BigDL-LLM through:

In [None]:
! pip install bigdl-llm[all]

The all option is for installing other required packages by BigDL-LLM.

If you need, install LangChain through:

In [None]:
! pip install langchain

## Inference

### Create Prompt Template

Before inference, you need to create a prompt template. Here we give an example prompt template refers to [ChatGLM2-6B prompt template](https://huggingface.co/THUDM/chatglm2-6b/blob/main/modeling_chatglm.py#L1007). You can tune the prompt based on your own model as well.

In [1]:
CHATGLM_V2_PROMPT_TEMPLATE = "问：{prompt}\n\n答："

### Load Model

Load model with low-precision optimization(INT4) for lower resource cost using BigDL-LLM APIs, which convert the relevant layers in the model into INT4 format. You can specify the argument `model_path` with both Huggingface repo id or local model path.

In [None]:
from bigdl.llm.transformers import AutoModel

model_path = "THUDM/chatglm2-6b" # repo id or model path
model = AutoModel.from_pretrained(model_path,
                                  load_in_4bit=True,
                                  trust_remote_code=True)

### Load Tokenizer

The quantized model is compatible with the tokenizer provided by the [Huggingface transformers library](https://huggingface.co/docs/transformers/index). Thus, you can directly use Huggingface transformers APIs to load tokenizer. A tokenizer maps between texts and lists of integers. We use it to encode input texts to integers for model to calculate, and decode the model output integers to texts for human to read.

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(model_path,
                                          trust_remote_code=True)

### Generate predicted tokens

Then, you can do inference with loaded model and tokenizer, the argument `max_new_tokens` stands for the maximum length of output, you can adjust it on your demands.

In [9]:
import time
import torch

prompt = "AI是什么？"
n_predict = 128

with torch.inference_mode():
    prompt = CHATGLM_V2_PROMPT_TEMPLATE.format(prompt=prompt)
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    st = time.time()
    output = model.generate(input_ids,
                            max_new_tokens=n_predict)
    end = time.time()
    output_str = tokenizer.decode(output[0], skip_special_tokens=True)
    print(f'Inference time: {end-st} s')
    print('-'*20, 'Prompt', '-'*20)
    print(prompt)
    print('-'*20, 'Output', '-'*20)
    print(output_str)

Inference time: 503.20414662361145 s
-------------------- Prompt --------------------
问：AI是什么？

答：
-------------------- Output --------------------
问：AI是什么？

答： AI指的是人工智能,是一种能够通过学习和推理来执行任务的计算机程序。它可以模仿人类的思维方式,做出类似人类的决策,并且具有自主学习、自我进化的能力。

AI 技术包括机器学习、深度学习、自然语言处理、计算机视觉、机器人技术等,可以应用于各种领域,如医疗、金融、制造业、军事、能源等。

AI 技术的发展已经带来了许多改变和进步,但同时也引起了人们的担忧和争议,涉及到隐私、安全、道德和社会影响等方面的问题。


### Stream Chat

In [6]:
import torch

with torch.inference_mode():
    question = "AI 是什么？"
    response_ = ""
    print('-'*20, 'Stream Chat Output', '-'*20)
    for response, history in model.stream_chat(tokenizer, question, history=[]):
        print(response.replace(response_, ""), end="")
        response_ = response

-------------------- Stream Chat Output --------------------
AI指的是人工智能,是一种能够通过学习和理解数据,以及应用数学、逻辑、推理等知识,来完成一些需要人类智能才能完成的任务的技术。AI可以包括机器学习、深度学习、自然语言处理、计算机视觉等不同的技术,这些技术可以让计算机更好地理解人类语言、图像和声音,并且能够自主地学习和改进,具有非常广泛的应用前景。

## Use in LangChain

[LangChain](https://python.langchain.com/docs/get_started/introduction.html) is a widely used framework for developing applications powered by language models. In this section, we will show how to integrate BigDL-LLM with LangChain. You can follow this [instruction](https://python.langchain.com/docs/get_started/installation) to prepare environment for LangChain.

### Create Prompt Template

Before inference, you need to create a prompt template. Here we give an example prompt template contains two input variables, `history` and `human_input`. You can tune the prompt based on your own model as well.

In [10]:
CHATGLM_V2_PROMPT_TEMPLATE = """{history}\n\n问：{human_input}\n\n答："""

### Prepare Chain

Use [LangChain API](https://api.python.langchain.com/en/latest/api_reference.html) `LLMChain` to construct a chain for inference. Here we use BigDL-LLM APIs to construct a `LLM` object, which will load model with low-precision optimization automatically.

In [None]:
from langchain import LLMChain, PromptTemplate
from bigdl.llm.langchain.llms import TransformersLLM
from langchain.memory import ConversationBufferWindowMemory

llm_model_path = "THUDM/chatglm2-6b" # the path to the huggingface llm model

prompt = PromptTemplate(input_variables=["history", "human_input"], template=CHATGLM_V2_PROMPT_TEMPLATE)
max_new_tokens = 128

llm = TransformersLLM.from_model_id(
        model_id=llm_model_path,
        model_kwargs={"trust_remote_code": True},
)

# Following code are complete the same as the use-case
llm_chain = LLMChain(
    llm=llm,
    prompt=prompt,
    verbose=True,
    llm_kwargs={"max_new_tokens":max_new_tokens},
    memory=ConversationBufferWindowMemory(k=2),
)


### Predict

In [21]:
text = "AI 是什么？"
response_text = llm_chain.predict(human_input=text,stop="\n\n")
print(response_text)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m

问：AI 是什么？

答：[0m

[1m> Finished chain.[0m
 AI指的是人工智能,是一种能够通过学习和理解数据,以及应用适当的算法和数学模型,来执行与人类智能相似的任务的技术。AI可以包括机器学习、自然语言处理、计算机视觉、知识表示、推理、决策等多种技术。
