<a href="https://colab.research.google.com/github/hr1588/NLP/blob/main/nlp_ch1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!git clone https://github.com/rickiepark/nlp-with-transformers.git
%cd nlp-with-transformers
from install import *
install_requirements(chapter=1)

Cloning into 'nlp-with-transformers'...
remote: Enumerating objects: 538, done.[K
remote: Counting objects: 100% (272/272), done.[K
remote: Compressing objects: 100% (165/165), done.[K
remote: Total 538 (delta 165), reused 184 (delta 107), pack-reused 266[K
Receiving objects: 100% (538/538), 46.22 MiB | 5.32 MiB/s, done.
Resolving deltas: 100% (264/264), done.
/content/nlp-with-transformers
⏳ Installing base requirements ...
✅ Base requirements installed!
Using transformers v4.25.1
Using datasets v2.8.0
Using accelerate v0.15.0
Using sentencepiece v0.1.97
Using sacremoses v0.0.41


#텍스트 분류

In [None]:
text = """Dear Amazon, last week I ordered an Optimus Prime action figure \
from your online store in Germany. Unfortunately, when I opened the package, \
I discovered to my horror that I had been sent an action figure of Megatron \
instead! As a lifelong enemy of the Decepticons, I hope you can understand my \
dilemma. To resolve the issue, I demand an exchange of Megatron for the \
Optimus Prime figure I ordered. Enclosed are copies of my records concerning \
this purchase. I expect to hear from you soon. Sincerely, Bumblebee."""

In [None]:
from transformers import pipeline
# 원시 텍스트를 미세 튜닝된 모델의 예측으로 변환하기 위해 필요한 모든 단계를 추상화

In [None]:
classifier = pipeline('text-classification')

import pandas as pd
outputs = classifier(text)
pd.DataFrame(outputs)

Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Unnamed: 0,label,score
0,NEGATIVE,0.901546


- text-classification 파이프라인은 감성 분석을 위해 설계된 모델을 사용

- 다중 분류와 다중 레이블 분류도 지원

- 예측이 dict로 반환되므로 df로 출력 가능

# 개체명 인식

In [None]:
ner = pipeline('ner', aggregation_strategy = 'simple')
# aggregation_strategy = 모델 예측에 따른 단어 그룹화
output = ner(text)
pd.DataFrame(output)

Downloading:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

Unnamed: 0,entity_group,score,word,start,end
0,ORG,0.87901,Amazon,5,11
1,MISC,0.990859,Optimus Prime,36,49
2,LOC,0.999755,Germany,90,97
3,MISC,0.556571,Mega,208,212
4,PER,0.590256,##tron,212,216
5,ORG,0.669692,Decept,253,259
6,MISC,0.498349,##icons,259,264
7,MISC,0.775363,Megatron,350,358
8,MISC,0.987854,Optimus Prime,367,380
9,PER,0.812096,Bumblebee,502,511


- 해당 파이프라인은 모든 개체명을 감지하고 ORG(조직), LOC(위치), PER(사람) 같은 카테고리에 할당

- MISC(그 외)는 기타 카테고리

- 점수는 모델이 해당 개체명을 얼마나 확신하는지를 표현

# 질문 답변

In [None]:
reader = pipeline("question-answering")

que = "What does the customer want?"
output = reader(question = que, context = text)
pd.DataFrame(output, index = [0])

Unnamed: 0,score,start,end,answer
0,0.631292,335,358,an exchange of Megatron


- 답변과 함께 ner처럼 답이 위치한 문자 인덱스에 해당하는 start와 end 정수 반환

- 추출적 질문 답변 : 답변을 텍스트에서 직접 추출하는 경우

# 요약

In [None]:
summarizer = pipeline("summarization")
outputs = summarizer(text, max_length = 60, clean_up_tokenization_spaces = True)
outputs[0]['summary_text']

' Bumblebee ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead. As a lifelong enemy of the Decepticons, I hope you can understand'

- max_length와 clean_up_tokenization_spaces 매개변수를 통해 실행 시점에 출력을 조정

- clean_up_tokenization_spaces : 토큰화 공간을 정리할지 여부

# 번역

In [None]:
translator = pipeline("translation_en_to_de",
                      model = "Helsinki-NLP/opus-mt-en-de")

outputs = translator(text, clean_up_tokenization_spaces = True, min_length = 100)
outputs[0]['translation_text']

Downloading:   0%|          | 0.00/1.33k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/298M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/768k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/797k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.27M [00:00<?, ?B/s]

'Sehr geehrter Amazon, letzte Woche habe ich eine Optimus Prime Action Figur aus Ihrem Online-Shop in Deutschland bestellt. Leider, als ich das Paket öffnete, entdeckte ich zu meinem Entsetzen, dass ich stattdessen eine Action Figur von Megatron geschickt worden war! Als lebenslanger Feind der Decepticons, Ich hoffe, Sie können mein Dilemma verstehen. Um das Problem zu lösen, Ich fordere einen Austausch von Megatron für die Optimus Prime Figur habe ich bestellt. Anbei sind Kopien meiner Aufzeichnungen über diesen Kauf. Ich erwarte, bald von Ihnen zu hören. Aufrichtig, Bumblebee.'

- 영어를 독일어로 번역

- 파이프라인의 기본 모델을 오버라이드해서 애플리케이션에 가장 잘 맞는 모델 선택

# 텍스트 생성

In [None]:
from transformers import set_seed
set_seed(42)

generator = pipeline('text-generation')
response = 'Dear Bumblebee, I am sorry to hear that your order was mixed up.'
prompt = text + "\n\nCustomer service response:\n" + response
outputs = generator(prompt, max_length = 200)
print(outputs[0]['generated_text'])

Dear Amazon, last week I ordered an Optimus Prime action figure from your online
store in Germany. Unfortunately, when I opened the package, I discovered to my
horror that I had been sent an action figure of Megatron instead! As a lifelong
enemy of the Decepticons, I hope you can understand my dilemma. To resolve the
issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered.
Enclosed are copies of my records concerning this purchase. I expect to hear
from you soon. Sincerely, Bumblebee.

Customer service response:
Dear Bumblebee, I am sorry to hear that your order was mixed up. The order was
completely mislabeled, which is very common in our online store, but I can
appreciate it because it was my understanding from this site and our customer
service of the previous day that your order was not made correct in our mind and
that we are in a process of resolving this matter. We can assure you that your
order


- print 함수가 없으면 줄바꿈 작동 X
- 자동 완성 기능으로 고객 피드백에 빠르게 응답 가능