## 比較適用於不同模型差異

在本 Notebook 中，將使用另一個模型 `Flan-T5-Small` 與 `Mistral-7B` 並行比較。

`Flan-T5-Small` 好處是更加輕量，無須 GPU 即可執行，且只要 1GB RAM。

### Requirements and Imports

如果有依照前面流程啟用 Workbench Image 的話，預設都會包含所需 Libraries，因此可以直接 Import 使用。但如果未使用正確 Image 的話，則需要執行第一行 `pip install ...` 來安裝相依套件。

In [None]:
# !pip install --no-cache-dir --no-dependencies --disable-pip-version-check -r requirements.txt # Uncomment only if you have not selected the right workbench image

import json
import os
from os import listdir
from os.path import isfile, join
from langchain.chains import LLMChain
from langchain.llms import HuggingFaceTextGenInference
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

### Langchain pipeline

這邊將定義兩個不同的 LLM Endpoints，以及兩個不同的 Pipelines。

P.S. 以下兩個模型都事前部署在 OpenShift 叢集中。

In [None]:
# Main LLM Inference Server URL
inference_server_url = "http://llm.ic-shared-llm.svc.cluster.local:3000/"

# LLM definition
llm = HuggingFaceTextGenInference(
    inference_server_url=inference_server_url,
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

In [None]:
# Flan-T5-Small LLM Inference Server URL
inference_server_url_flan_t5 = "http://llm-flant5.ic-shared-llm.svc.cluster.local:3000/"

# LLM definition
llm_flant5 = HuggingFaceTextGenInference(
    inference_server_url=inference_server_url_flan_t5,
    max_new_tokens=96,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

兩個 LLM 都使用同樣的 `template` 讓 LLM 模型針對特定要求回答。例如下面 Template 最後一段定義:

`我會給你(模型)一條短信，然後問一個關於它的問題。對這個問題給予盡可能準確和簡潔的答案`。

In [None]:
template="""<s>[INST]
You are a helpful, respectful and honest assistant.
Always assist with care, respect, and truth. Respond with utmost utility yet securely.
Avoid harmful, unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity.
I will give you a text, then ask a question about it. Give a precise and as concise as possible answer to this question.

### TEXT:
{text}

### QUESTION:
{query}

### ANSWER:
[/INST]
"""
PROMPT = PromptTemplate(input_variables=["input"], template=template)

同前幾個範例，建立兩個用來查詢模型的 **conversation** 物件。

In [None]:
conversation = LLMChain(llm=llm,
                        prompt=PROMPT,
                        verbose=False
                        )
                        
conversation_flant5 = LLMChain(llm=llm_flant5,
                        prompt=PROMPT,
                        verbose=False
                        )

現在已經準備完成查詢模型了！

在此範例中，將查詢一個範例檔案來看看會發生什麼事情。

In [None]:
filename = 'claims/claim1.json'

# Opening JSON file
claims = {}
with open(filename, 'r') as file:
    data = json.load(file)
claims[filename] = data

# Analyze the claim
print(f"***************************")
print(f"* Claim: {filename}")
print(f"***************************")
print("Original content:")
print("-----------------")
print(f"Subject: {claims[filename]['subject']}\nContent:\n{claims[filename]['content']}\n\n")
print('Analysis with Mistral-7B:')
print("--------")
text_input = f"Subject: {claims[filename]['subject']}\nContent:\n{claims[filename]['content']}"
sentiment_query = "What is the sentiment of the person sending this claim?"
location_query = "Where does the event the claim is related to happen?"
time_query = "When does the event the claim is related to happen? If possible, specify the date and the time."
print(f"- Sentiment: ")
conversation.predict(text=text_input, query=sentiment_query);
print("\n- Location: ")
conversation.predict(text=text_input, query=location_query);
print("\n- Time: ")
conversation.predict(text=text_input, query=time_query);
print("\n\n                          ----====----\n")
print('Analysis with Flan-T5-Small:')
print("--------")
text_input = f"Subject: {claims[filename]['subject']}\nContent:\n{claims[filename]['content']}"
sentiment_query = "What is the sentiment of the person sending this claim?"
location_query = "Where does the event the claim is related to happen?"
time_query = "When does the event the claim is related to happen? If possible, specify the date and the time."
print(f"- Sentiment: ")
conversation_flant5.predict(text=text_input, query=sentiment_query);
print("\n- Location: ")
conversation_flant5.predict(text=text_input, query=location_query);
print("\n- Time: ")
conversation_flant5.predict(text=text_input, query=time_query);
print("\n\n                          ----====----\n")

會看到 `Flan-T5-Small` 效率比較好，因為該模型是個只有 8000 萬參數的 LLM。雖然查詢效率高，但跟`Mistral-7B` 比較的話，會發現結果準確性相差甚遠。因為 `Mistral-7B` 具有 70 億參數。

在使用 LLM 時，需要根據`效能`、`準確性`、`所需資源`與`相關成本`之間找到適當平衡。因此多個模型比較非常重要，開發者或使用者可以透過`合理性檢驗(Sanity checks)`來確保資料變化或模型發展，始終保持預期內。