<a href="https://colab.research.google.com/github/jinseriouspark/commerce-llm/blob/main/evaluation_with_LLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
criteria_1_representativeness = """
Human: You will be given all reviews and the keywords of reviews.
Your task is to rate the keywords on criteria.
Please make sure you read and understand criteria and steps carefully. Please keep this document open while reviewing, and refer to it as needed.

<criteria>
Representativeness (1-5) - the extent to which the keywords captures key words across the entirety of the reviews, demonstrating a comprehensive understanding.
</criteria>

<steps>
Step 1. Carefully read the reviews and keywords provided.
Step 2. Evaluate how representative the keywords is across the entirety of the reviews, considering coherence.
Step 3. Rate the representativeness on a scale of 1 to 5:
    1 - Insignificantly Representative: The keywords fails to capture key words from the reviews, lacking a comprehensive understanding of the content.
    2 - Barely Representative: The keywords includes only a few relevant keywords, with significant gaps in understanding the reviews.
    3 - Moderately Representative: The keywords captures some key words, but misses others, resulting in a partial understanding of the reviews.
    4 - Largely Representative: The keywords effectively includes relevant keywords, demonstrating a good understanding of the reviews, although some areas may be lacking.
    5 - Fully Representative: The keywords comprehensively captures key words from the reviews, exhibiting a thorough understanding of the content.
Step 4. Provide the criteria rating from 1 to 5 based solely on the provided criteria.
Step 5. 4. Provide specific feedback on the issues you identified and suggestions for improvement. Be constructive and professional.
Step 6. Submit your criteria score and feedback comments to appropriate team member for consideration.
</steps>

<output_format>
The output is provided in json format and has `thoughts`, `criteria_rating`, and `feedback` as keys.
For detailed output format description, please check the provided json format.

{json_schema}
{'thoughs': 'your_thought_',
'criteria_rating' : 1,
'feedback' : 'your_feedback'
}
"""


In [None]:
llm = TextGeneration( prompt = criteria_1_representativeness)
kw_model = KeyLLM(llm)

# KEYWORD EXTRACTION BY keybert
- keybert 란?
  - CountVectorizer 를 사용하여 문장 속 keyword, keyphrase  후보를 만든다
  - Language model을 사용해서 문장과 키워드를 각각 임베딩 한다 (sentence-transformers 등을 사용)
  - 인풋 문장과 후보 키워들 간의 코사인ㅇ 유사도를 계산하고, 가장 큰 유사도부터 '키워드' 라고 지정한다

- Mistral 7B?
  - [주소](https://https://mistral.ai/news/announcing-mistral-7b/)
  - 더 빠른 추론을 위해 GQA grouped-query attention 을 사용함
  - SWA Sliding window attention을 사용하여 더 적은 비용으로 더 긴 시퀀스를 처리
    -
  - (자체 벤치마크 기준) Llama 2 13B 보다 우수하고 Llama34B와 유사한 수준


- SentenceTransformer 란?

- original model quantization 이란 무엇이고, 언제 사용하는가 ?
  -
  - 메모리 문제 발생을 해결하기 위해 사용

In [None]:
# KeyLLM with KeyBERT + Mistral 7b

In [1]:
%%capture
!pip install --upgrade git+https://github.com/UKPLab/sentence-transformers
!pip install keybert ctransformers[cuda]
!pip install --upgrade git+https://github.com/huggingface/transformers

In [2]:
from ctransformers import AutoModelForCausalLM

# GPU가속이 가능한 시스템에서 특정 레이어를 GPU로 오프로드 하고자 한다면 gpu_layers 파라미터로 조절 가능
# GPU 가속이 가능한 경우 GPU에 레이어를 할당해 계산속도 향상
# GPU 레이어를 사용할 수 없는 경우 0으로 설정

model = AutoModelForCausalLM.from_pretrained(
    'TheBloke/Mistral-7B-Instruct-v0.1-GGUF',
    model_file='mistral-7b-instruct-v0.1.Q4_K_M.gguf',
    model_type = 'mistral',
    gpu_layers = 50,
    hf = True
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

config.json:   0%|          | 0.00/31.0 [00:00<?, ?B/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

mistral-7b-instruct-v0.1.Q4_K_M.gguf:   0%|          | 0.00/4.37G [00:00<?, ?B/s]

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

In [6]:
# 트랜스포머 파이프라인을 생성하고자 함
# 이 파이프라인의 장점은 많은 곳에서 사용할 뿐 아니라 백엔드에서처럼 사용됨
# 토크나이저는 허깅페이스에있는 것으로 사용하고자 함

from transformers import AutoTokenizer, pipeline

# tokenier
tokenizer = AutoTokenizer.from_pretrained('mistralai/Mistral-7B-Instruct-v0.1')

generator = pipeline(
    model = model, tokenizer = tokenizer,
    task='text-generation',
    max_new_tokens=50,
    repetition_penalty=1.1
)

In [9]:
# test
response = generator("""
I have the following sentence :

* I don\'t buy these shoese again, the size that I guess is so big and there are no information about that, I should get a refund
Extract 5 keywords about size from the sentence
""")


print(response[0]['generated_text'])

# 'size' 와 관련된 키워드 추출을 해주었음


I have the following sentence :

* I don't buy these shoese again, the size that I guess is so big and there are no information about that, I should get a refund
Extract 5 keywords about size from the sentence

**Answer:**
1. Size
2. Guess
3. Big
4. Refund
5. Information


Mistral 7B Template 의 경우 아래와 같이 구성하는 것이 좋다

```
<s>[INST]
{{ User Prompt1}}
[/INST]
{{Model Answer1}}
</s>
[INST]
{{User Prompt2}}
[/INST]
{{model Answer2}}
```

In [25]:
import pandas as pd
review_keywords = pd.read_csv('/content/drive/MyDrive/commerce-llm/results/review_all_keybert.csv', index_col = 'Unnamed: 0')

In [26]:
review_keywords.head()

Unnamed: 0,origin,keywords
0,별루에여 지직 거려여ㅠ,"[('지직', 0.7013), ('별루에여', 0.6623), ('거려여ㅠ', 0...."
1,진짜 별로에요 머리털다 뽑힐거같아요,"[('머리털다', 0.775), ('별로에요', 0.7453), ('진짜', 0.3..."
2,사이즈가 너무 작네요. 알감자 수준 그래도 싱싱하고 맛은 있어요. 첫날 받아서 감자...,"[('손질하기', 0.7494), ('사이즈가', 0.7031), ('싱싱하고', ..."
3,색상도화사하고 좋아요ㆍ,"[('색상도화사하고', 0.9869), ('좋아요ㆍ', 0.2959)]"
4,얌얌 손님 온다해서 주문한 양고기! 손님들은 처음 맛 본 양고기! 매우 만족하시면서...,"[('손님들은', 0.7852), ('만족하시면서', 0.7063), ('온다해서'..."


In [27]:
import ast
review_keywords['keywords'] = review_keywords['keywords'].apply(lambda x : ast.literal_eval(x))

In [28]:
rk_list= []
for i in review_keywords.values:
  origin_review = i[0]
  keywords = [kw[0] for kw in i[1] if kw[1]>0.5]
  keywords_str = '"'+'","'.join(keywords)+'"'
  rk_list.append((origin_review, keywords_str))

In [44]:
for row in rk_list:
  reviews= row[0]
  keywords_of_reviews = row[1]
  message = f"""
  You will be given all reviews and the keywords of reviews.
  Your task is to rate the keywords on criteria.
  Please make sure you read and understand criteria and steps carefully. Please keep this document open while reviewing, and refer to it as needed.

  #reviews
  {reviews}

  #keywords of reviews
  {keywords_of_reviews}

  #criteria
  Representativeness (1-5) - the extent to which the keywords captures key words across the entirety of the reviews, demonstrating a comprehensive understanding.

  #steps
  Step 1. Carefully read the reviews and keywords provided.
  Step 2. Evaluate how representative the keywords is across the entirety of the reviews, considering coherence.
  Step 3. Rate the representativeness on a scale of 1 to 5:
      1 - Insignificantly Representative: The keywords fails to capture key words from the reviews, lacking a comprehensive understanding of the content.
      2 - Barely Representative: The keywords includes only a few relevant keywords, with significant gaps in understanding the reviews.
      3 - Moderately Representative: The keywords captures some key words, but misses others, resulting in a partial understanding of the reviews.
      4 - Largely Representative: The keywords effectively includes relevant keywords, demonstrating a good understanding of the reviews, although some areas may be lacking.
      5 - Fully Representative: The keywords comprehensively captures key words from the reviews, exhibiting a thorough understanding of the content.
  Step 4. Provide the criteria rating from 1 to 5 based solely on the provided criteria.
  Step 5. 4. Provide specific feedback on the issues you identified and suggestions for improvement. Be constructive and professional.
  Step 6. Submit your criteria score and feedback comments to appropriate team member for consideration.


  #output_format
  The output is provided in json format and has `thoughts`, `criteria_rating`, and `feedback` as keys.
  For detailed output format description, please check the provided json format.

  <json_schema>
  {{'thoughs': 'your_thought_',
  'criteria_rating' : 1,
  'feedback' : 'your_feedback'
  }}

  """
  response = generator(message)
  break




In [46]:
response[0]['generated_text']

'\n  You will be given all reviews and the keywords of reviews.\n  Your task is to rate the keywords on criteria.\n  Please make sure you read and understand criteria and steps carefully. Please keep this document open while reviewing, and refer to it as needed.\n\n  #reviews\n  별루에여 지직 거려여ㅠ\n\n  #keywords of reviews\n  "지직","별루에여"\n\n  #criteria\n  Representativeness (1-5) - the extent to which the keywords captures key words across the entirety of the reviews, demonstrating a comprehensive understanding.\n  \n  #steps\n  Step 1. Carefully read the reviews and keywords provided.\n  Step 2. Evaluate how representative the keywords is across the entirety of the reviews, considering coherence.\n  Step 3. Rate the representativeness on a scale of 1 to 5:\n      1 - Insignificantly Representative: The keywords fails to capture key words from the reviews, lacking a comprehensive understanding of the content.\n      2 - Barely Representative: The keywords includes only a few relevant keyword

In [47]:
essay="""
24년 신년계획을 세운다

세워도 이루지 못할 것들을 계속 만들어왔다. 돈을 벌던 순간부터 조금씩, 매년 1월 1일이 되면 부자되고 싶다는 이야기를 써대고 살을 빼겠다는 결심을 다졌다. 책 몇 권을 읽겠다는 다짐은 너무나 익숙하다. 그렇게 써댄 몇년간의 동일한 목표를 다시 쳐다본다. 세워도 이루지 못할 것들, 이걸 내가 종이에 다시 쓰는게 의미가 있을까. 12월 31일을 보다 잘 지냈다는 느낌을 주고 싶어서, 1월 1일의 불안감을 잠재우기 위한 나만의 짧고 굵은 의식은 아니었을까. 그 이후에 되돌아 본 기억이 없기 때문이다. 만약 그게 정말 내게 소중해서, 며칠을 두고 봐도 또 보고 싶은 것이었다면 아마 이렇게 ‘방치’ 해두지는 않았을 것이다.

신년계획은 [방치] 되었다.

적어도 나의 것은 그러했다. 내버려 둔 것이다. 그게 그냥 거기에 있어도 있는 듯, 없는 듯, 관리하지 않았다. 그게 내게 이로움을 주지도 해악을 끼치지도 않았기 떄문이다. 그게 문제였다. 아무런 의미가 없는 것을 반복적으로 써댔다는게 진짜 큰 문제였다. 방치된 채로 주인을 만나지 못했던 신년계획은 그나마, 23년을 되돌아보는 주인을 한 번 만날 뿐이다. 그 전까지 그 ‘문장’ 은 어떤 의미를 가졌든 아무런 힘을 갖지 못한다. 그게 설령 1조 부자가 되겠다는 다짐이었든, 인스타그램 100만 팔로우를 얻고 싶다는 등의 문장이었어도 말이다. 방치된 그 글귀는 종이 위 검정 액체로 어떤 형태를 갖고 있는 것 외에 아무런 의미가 없다. 심지어 그것은 그림보다도 더 의미가 없다. 주인의 안일함을 쓸데없이 자극하기만 하기 때문이다. 편안한 기분을 느끼게 할 수 있는 휴양지 사진이거나, 귀여운 얼굴을 가진 강아지 사진은, 적어도 그것 보단 조금 더 자주 주인을 마주쳤을 것이다. 불안감을 자극하고 모자람을 드러내게 해주는 그 글자들은 주인에 의해 의도적으로 태어났고 방치되었다.

방치된 것들을 다시 꺼내 [부활]시킬 준비를 한다.

몇 십년간 방치된 그 계획을 그대로 다시 꺼낸다. 가만히 두기만 해도 죄책감을 유발하던, 다 죽어가던 식물의 흙을 갈아주기로 결심한 것 처럼 말이다. 시들어서 죽은 것 같지만, 뿌리는 계속 흙 속에 두고 겨우겨우 살아가고 있는 생명체인 ‘신년 계획’ 의 흙을 갈아주기로 결심했다. 그 계획이 생생한 이파리를 가지기 까지는 너무나 많은 노력과 시간이 필요하다. 그 전에 먼저 해야 할 것을 수행한다. 영양가 없는 흙 속에서 연명하던 그의 뿌리를 꺼내, 영양가가 높은 흙으로 옮겨주는 것이다. 그 흙에 가만히 있기만 해도 삶을 살아갈 수 있을 정도의 에너지를 받을 수 있다. 그 다음 햇빛과 물을 차례로 매일같이 쐬어주는 것이다. 그 ‘신년계획’ 녀석이 잘 성장할 수 있는 것들에 끊임없이 노출시키는 것이다. 그 다음으로, 자라나는 이파리를 보면서 식물용 영양제도 좀 꽂아주려고 한다.

<어른이 되겠다>는 시든 꿈은 그 때 뿌리내린 ‘생각’ 속에 박혀 죽지도 살지도 못한 채 그 크기만 유지해왔다. 그 꿈을 꺼내 새로운 ‘생각’ 에 넣어주려고 한다. 가만히 그 공간에서 살아나기만 해도 도움을 받을 수 있는 공간 말이다. 더 많은 <어른>이 되고 싶다는 사람들 속에 나를 던져넣도록 부단히 노력을 해야한다. 그리고 그 꿈이 더욱 성장할 수 있도록 많은 요소들을 찾아 집어넣어주어야 할 것이다. 이미 어른이 된 존재들 옆에 있어보려고 하고, 더 어른이 되기 위한 성장 수단들을 찾을 것이다.

그렇게 해서 나의 꿈이었던 ‘어른’ 이파리가 하나 씩 생기를 찾길 바란다. 부활을 할 수 있기를 바란다. 어차피 이렇게 된 거, 책임지고 끝까지 가본다. 이제는 내가 선택한 것이다. 내가 ‘살리기로’ 선택한 것이다. 얼마나 더 클 수 있을까? 너무나 기대된다. 나의 선택이 만드는 그 끝은 어떠할까.

기대되는 24년의 1일차를 지나고 있다.

"""
message = f"""You are a commentator. Your task is to write a report on an essay.
When presented with the essay, come up with interesting questions to ask, and answer each question.
Afterward, combine all the information and write a report in the markdown format.

# Essay:
{essay}

# Instructions:
## Summarize:
In clear and concise language, summarize the key points and themes presented in the essay.

## Interesting Questions:
Generate three distinct and thought-provoking questions that can be asked about the content of the essay. For each question:
- After "Q: ", describe the problem
- After "A: ", provide a detailed explanation of the problem addressed in the question.
- Enclose the ultimate answer in <>.

## Write a report
Using the essay summary and the answers to the interesting questions, create a comprehensive report in Markdown format.

"""
response = generator(message)




In [49]:
response[0]['generated_text']

'You are a commentator. Your task is to write a report on an essay.\nWhen presented with the essay, come up with interesting questions to ask, and answer each question.\nAfterward, combine all the information and write a report in the markdown format.\n\n# Essay:\n\n24년 신년계획을 세운다\n\n세워도 이루지 못할 것들을 계속 만들어왔다. 돈을 벌던 순간부터 조금씩, 매년 1월 1일이 되면 부자되고 싶다는 이야기를 써대고 살을 빼겠다는 결심을 다졌다. 책 몇 권을 읽겠다는 다짐은 너무나 익숙하다. 그렇게 써댄 몇년간의 동일한 목표를 다시 쳐다본다. 세워도 이루지 못할 것들, 이걸 내가 종이에 다시 쓰는게 의미가 있을까. 12월 31일을 보다 잘 지냈다는 느낌을 주고 싶어서, 1월 1일의 불안감을 잠재우기 위한 나만의 짧고 굵은 의식은 아니었을까. 그 이후에 되돌아 본 기억이 없기 때문이다. 만약 그게 정말 내게 소중해서, 며칠을 두고 봐도 또 보고 싶은 것이었다면 아마 이렇게 ‘방치’ 해두지는 않았을 것이다.\n\n신년계획은 [방치] 되었다. \n\n적어도 나의 것은 그러했다. 내버려 둔 것이다. 그게 그냥 거기에 있어도 있는 듯, 없는 듯, 관리하지 않았다. 그게 내게 이로움을 주지도 해악을 끼치지도 않았기 떄문이다. 그게 문제였다. 아무런 의미가 없는 것을 반복적으로 써댔다는게 진짜 큰 문제였다. 방치된 채로 주인을 만나지 못했던 신년계획은 그나마, 23년을 되돌아보는 주인을 한 번 만날 뿐이다. 그 전까지 그 ‘문장’ 은 어떤 의미를 가졌든 아무런 힘을 갖지 못한다. 그게 설령 1조 부자가 되겠다는 다짐이었든, 인스타그램 100만 팔로우를 얻고 싶다는 등의 문장이었어도 말이다. 방치된 그 글귀는 종이 위 검정 액체로 어떤 형태를 갖고 있는 것 외에

In [None]:
# 대답을 하지 못하는 상태. 대답을 하게 하려면 어떤 부분을 알아야 할까?