# LangChain: 평가

## 개요:

* 예제 생성 
* 수동 평가(및 디버깅) 
* LLM 지원 평가

In [1]:
# !pip install python-dotenv
# !pip install openai
# !pip install promptlayer
# !pip install langchain
# !pip install docarray
# !pip install tiktoken

In [2]:
import os
import openai

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
# os.environ["LANGCHAIN_PROJECT"] = "DEEPLEARNING.AI"
# os.environ["LANGCHAIN_API_KEY"] = "ls__a60ef12993333333333333"

# os.environ["LANGCHAIN_ENDPOINT"] = "http://localhost:1984"
# os.environ["LANGCHAIN_PROJECT"] = "DEEPLEARNING.AI"

# Google Colab와 같이 환경 변수에 설정이 어려운 경우 아래 주석을 제거한 값을 설정
# os.environ["OPENAI_API_KEY"] = "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"
# os.environ["PROMPTLAYER_API_KEY"] = "pl_43flkdsjfladjfldsa72b636"
# openai.api_key  = os.environ['OPENAI_API_KEY']

In [3]:
base_dir = '.'

# Google Colab를 사용하는 경우 아래 코드의 주석을 제거한 다음 실행하여야 함
# from google.colab import drive
# drive.mount('/gdrive')
# base_dir = '/gdrive/My Drive/Colab Notebooks/DeepLearning.AI/03.LangChain for LLM Application Development'

## QandA 애플리케이션 만들기

In [4]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI, PromptLayerChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch

In [5]:
file = base_dir + '/api_ko.csv'
loader = CSVLoader(file_path=file)
data = loader.load()

In [6]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

In [7]:
# llm = ChatOpenAI(temperature = 0.0)
llm = PromptLayerChatOpenAI(pl_tags=["api_qa", "2023-07-08"], temperature=0.0)
qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=index.vectorstore.as_retriever(), 
    verbose=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)

### 테스트 데이터 포인트 마련하기

In [8]:
data[10]

Document(page_content=': 10\n유형: method\ncomponent: $p\nname: deleteSubmission\ndescription: submission을 삭제합니다.\nparameter: submissionID\tString\tY\t삭제하고자 하는 submission의 ID\nreturn: \nexception: \nsample: <xmp  class=\'js sample\'>$p.deleteSubmission( "submission1" );\n//"submission1"에 해당하는 submssion이 삭제됩니다. 이후 $p.executeSubmission("submission1");을 호출하면 아무 동작을 하지 않게 됩니다.</xmp>\nbuilt since: 5.0_3.3377A.20181128.161740\nbuilt last: 5.0_5.4811B.20230203.095105', metadata={'source': './api_ko.csv', 'row': 10})

In [9]:
data[11]

Document(page_content=': 11\n유형: method\ncomponent: $p\nname: download\ndescription: download 모듈이 구현된 서버의 URL을 호출하여 다운로드 가능한 인터페이스를 화면에서 제공합니다.\nparameter: actionUrl\tString\tY\t파일 다운로드가 구현되어있는 url.\nXML\tString\tN\t문자열은 xmlValue라는 이름으로 서버로 올라간다. 값을 지정하지 않은 경우(undefined인 경우) xmlValue라는 값은 제외하고 서버로 전송한다.\nsendMethod\tString\tN\tget, post와 같은 전송 방식, 기본값은 post이다.\nisXHR\tString\tY\txhr 통신 유무 (기본값은 false)\nreturn: \nexception: \nsample: <xmp  class=\'js sample\'>var url = "/download.do"        //파일 다운로드가 구현 되어있는 서버 url. ( 웹스퀘어의 기본 모듈에는 제공되지 않는다)\n$p.download( url );</xmp>\nbuilt since: 5.0_3.3377A.20181128.161740\nbuilt last: 5.0_5.4811B.20230203.095105', metadata={'source': './api_ko.csv', 'row': 11})

### 하드코딩된 예제

In [10]:
examples = [
]

### LLM으로 생성된 예제

In [11]:
from langchain.evaluation.qa import QAGenerateChain

In [12]:
# example_gen_chain = QAGenerateChain.from_llm(ChatOpenAI())
example_gen_chain = QAGenerateChain.from_llm(PromptLayerChatOpenAI(pl_tags=["api_qa", "2023-07-08"]))

In [13]:
len(data)

6375

In [14]:
[{"doc": t.page_content} for t in data[:5]]

[{'doc': ': 0\n유형: method\ncomponent: $p\nname: $\ndescription: jQuery selector를 인자로 받아 jQuery 객체를 반환한다. <br />id selector를 인자로 받은 경우 해당 id가 함수를 호출한 페이지에 있는 웹스퀘어 객체인 경우 웹스퀘어 객체의 실제 id로 변환한 다음 함수를 실행한다.\nparameter: \nreturn: Object\tjQuery 객체\nexception: \nsample: $p.$("#group1").wq("invoke", "setDisabled", "true"); // 스크립트가 실행된 페이지의 group1 객체를 찾아 group1.invoke("setDisabled", "true"); 를 실행\nbuilt since: 5.0_3.3377A.20181128.161740\nbuilt last: 5.0_5.4811B.20230203.095105'},
 {'doc': ': 1\n유형: method\ncomponent: $p\nname: URLEncoder\ndescription: 주어진 문자열을 `application/x-www-form-urlencoded` MIME 형식의 문자열로 변환합니다.\nparameter: str\tString\tY\t문자열\nreturn: String\t변환된 application/x-www-form-urlencoded MIME Format문자열을 반환합니다\nexception: \nsample: <xmp  class=\'js sample\'>var encodeStr = $p.URLEncoder( "문자열" );\n//return 예시 ) "%b9%ae%c0%da%bf%ad"</xmp>\nbuilt since: 5.0_3.3377A.20181128.161740\nbuilt last: 5.0_5.4811B.20230203.095105'},
 {'doc': ': 2\n유형: method\ncomponent: $p\nname: ajax\ndesc

In [15]:
new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[100:200]]
)



In [16]:
# new_examples = example_gen_chain.apply_and_parse(
#     [{"doc": t.page_content} for t in data[:5]]
# )

In [17]:
new_examples[0]

{'query': 'What is the purpose of the setCookieAsync function in the WebSquare.cookie component?',
 'answer': "The setCookieAsync function is used to asynchronously save a cookie with the given name (sName) and value (sValue) in the browser's cookies. The value stored in the cookie will be deleted when the browser is closed. It is recommended to use the setCookieAsync function when accessing the cookie asynchronously in IE to avoid screen flickering."}

In [18]:
new_examples[1]

{'query': 'What is the purpose of the "dateAdd" method in the WebSquare.date component?',
 'answer': 'The purpose of the "dateAdd" method in the WebSquare.date component is to add the specified offset value to a given date.'}

In [19]:
new_examples[2]

{'query': 'What is the purpose of the "dateDiff" method in the WebSquare.date component?',
 'answer': 'The "dateDiff" method in the WebSquare.date component is used to return the difference between two dates.'}

In [20]:
data[1]

Document(page_content=': 1\n유형: method\ncomponent: $p\nname: URLEncoder\ndescription: 주어진 문자열을 `application/x-www-form-urlencoded` MIME 형식의 문자열로 변환합니다.\nparameter: str\tString\tY\t문자열\nreturn: String\t변환된 application/x-www-form-urlencoded MIME Format문자열을 반환합니다\nexception: \nsample: <xmp  class=\'js sample\'>var encodeStr = $p.URLEncoder( "문자열" );\n//return 예시 ) "%b9%ae%c0%da%bf%ad"</xmp>\nbuilt since: 5.0_3.3377A.20181128.161740\nbuilt last: 5.0_5.4811B.20230203.095105', metadata={'source': './api_ko.csv', 'row': 1})

In [21]:
new_examples

[{'query': 'What is the purpose of the setCookieAsync function in the WebSquare.cookie component?',
  'answer': "The setCookieAsync function is used to asynchronously save a cookie with the given name (sName) and value (sValue) in the browser's cookies. The value stored in the cookie will be deleted when the browser is closed. It is recommended to use the setCookieAsync function when accessing the cookie asynchronously in IE to avoid screen flickering."},
 {'query': 'What is the purpose of the "dateAdd" method in the WebSquare.date component?',
  'answer': 'The purpose of the "dateAdd" method in the WebSquare.date component is to add the specified offset value to a given date.'},
 {'query': 'What is the purpose of the "dateDiff" method in the WebSquare.date component?',
  'answer': 'The "dateDiff" method in the WebSquare.date component is used to return the difference between two dates.'},
 {'query': "What is the purpose of the 'dateTimeAdd' method in the WebSquare.date component?",
  

In [None]:
# new_examples = [t['qa_pairs'] for t in new_examples]

### Combine examples

In [22]:
examples += new_examples

In [23]:
qa.run(examples[3]["query"])



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


"The purpose of the 'dateTimeAdd' method in the WebSquare.date component is to add a specified number of days or hours to a given date and time. The method takes three parameters: the specified date and time, the offset (number of days or hours to add), and the type of value to be incremented (hour or minute). The method returns the resulting date and time after the addition."

In [24]:
examples[3]["answer"]

"The 'dateTimeAdd' method in the WebSquare.date component is used to add a specified number of days, months, hours, minutes, or time to a given date and time. The type parameter determines the unit of measurement for the offset."

In [25]:
examples[3]["query"]

"What is the purpose of the 'dateTimeAdd' method in the WebSquare.date component?"

## Manual Evaluation
qa.run으로 실행한 다음 기존에 생성한 answer와 비교한다.

In [26]:
import langchain
langchain.debug = True

In [27]:
qa.run(examples[0]["query"])

[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA] Entering Chain run with input:
[0m{
  "query": "What is the purpose of the setCookieAsync function in the WebSquare.cookie component?"
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 3:chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 3:chain:StuffDocumentsChain > 4:chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "What is the purpose of the setCookieAsync function in the WebSquare.cookie component?",
  "context": ": 101\n유형: method\ncomponent: WebSquare.cookie\nname: setCookieAsync\ndescription: 비동기로 Cookie에서 쿠키명이 sName, 값이 sValue인 쿠키를 저장.\r\n<br />Cookie에 저장한 값은 브라우저가 종료되면 삭제됨. \r\n<br />IE에서 비동기로 Cookie에 접근하면 화면 깜빡임이 발생. 이 경우 setCookieAsync 함수 사용을 권장.\nparameter: sName\tString\tY\tcookie 이름\nsValue\tString\tY\tcookie 값\nSameSite\tString\tN\tSameSite 속성값 (None, Lax, Strict)\nreturn: \nexception: \nsample: <xmp  class='j

'The purpose of the setCookieAsync function in the WebSquare.cookie component is to asynchronously save a cookie with the specified name and value. The value stored in the cookie will be deleted when the browser is closed. It is recommended to use the setCookieAsync function when accessing cookies asynchronously in Internet Explorer to avoid screen flickering.'

In [28]:
qa.run(examples[2]["query"])

[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA] Entering Chain run with input:
[0m{
  "query": "What is the purpose of the \"dateDiff\" method in the WebSquare.date component?"
}
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 3:chain:StuffDocumentsChain] Entering Chain run with input:
[0m[inputs]
[32;1m[1;3m[chain/start][0m [1m[1:chain:RetrievalQA > 3:chain:StuffDocumentsChain > 4:chain:LLMChain] Entering Chain run with input:
[0m{
  "question": "What is the purpose of the \"dateDiff\" method in the WebSquare.date component?",
  "context": ": 103\n유형: method\ncomponent: WebSquare.date\nname: dateDiff\ndescription: 날짜 사이의 차이를 반환합니다. (시작일 : from, 종료일 : to)\nparameter: day1\tString\tY\t시작 날짜\nday2\tString\tY\t끝 날짜\nreturn: String\t두 날짜의 차이\nexception: \nsample: <xmp  class='js sample'>var diff = WebSquare.date.dateDiff( \"20120120\", \"20120210\" );\n//diff : 21</xmp>\nbuilt since: 2.0_1.1984A.20120424.105444\nbuilt last: 5.0_5.4811B.20230203.095105<<<<>>>>>: 9\

'The "dateDiff" method in the WebSquare.date component is used to calculate the difference between two dates and return the result. It takes two parameters, the start date and the end date, and returns the difference between the two dates.'

In [29]:
examples[2]

{'query': 'What is the purpose of the "dateDiff" method in the WebSquare.date component?',
 'answer': 'The "dateDiff" method in the WebSquare.date component is used to return the difference between two dates.'}

In [30]:
# Turn off the debug mode
langchain.debug = False

## LLM assisted evaluation

In [31]:
predictions = qa.apply(examples)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=600).



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m


Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=600).



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new Ret

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised Timeout: Request timed out: HTTPSConnectionPool(host='api.openai.com', port=443): Read timed out. (read timeout=600).



[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


[1m> Entering new Ret

In [32]:
from langchain.evaluation.qa import QAEvalChain

In [33]:
# llm = ChatOpenAI(temperature=0)
llm = PromptLayerChatOpenAI(pl_tags=["api_qa", "2023-07-08"], temperature=0.0)
eval_chain = QAEvalChain.from_llm(llm)

In [None]:
graded_outputs = eval_chain.evaluate(examples, predictions)

In [None]:
examples

In [None]:
predictions

In [None]:
graded_outputs

In [None]:
for i, eg in enumerate(examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print("Predicted Grade: " + graded_outputs[i]['text'])
    print()

In [None]:
import os

from langchain.chat_models import ChatOpenAI
from langchain.client import run_on_dataset

# llm = ChatOpenAI(temperature=0)
llm = PromptLayerChatOpenAI(pl_tags=["api_qa", "2023-07-08"], temperature=0.0)

chain_results = run_on_dataset(
  dataset_name="ds-granular-windscreen-29",
  llm_or_chain_factory=llm,
  project_name="pt-insecure-hierarchy-31",
)