In [1]:
text = '''Dear Amazon, last week I ordered an Optimus Prime action figure from your online store in Germany.
unFortunately, when I opened the package, I discovered to my horror hat I had been sent an action figure of Megatron instedad!
As a lifelong enemy of the Deceptions, I hope you can understand my dilemma. To resolve the issue, 
I demand an exchange of Megatron ofr the Optimus Prime figure I ordered.
Enclosed are copies of my records concerning this purchase. I expect to hear from you soon.
Sincerly, Bumblebee.'''

## 1. 감성분석(sentiment analysis) - text classification

    - 위와 같은 고객 피드백 데이터가 긍정인지 부정인지 알고자 할 때
    sentiment analysis(감성 분석) 작업을 하는데, 이는 text classification(텍스트 분류)에 해당함

In [6]:
# 트랜스포머를 사용하여 예시 텍스트의 감성을 분류
# text-classification은 다중 분류(multiclass classification)와 다중 레이블 분류(multilabel classification)
from transformers import pipeline

classifier = pipeline('text-classification')


No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

In [7]:
import pandas as pd

outputs = classifier(text)
pd.DataFrame(outputs)

Unnamed: 0,label,score
0,NEGATIVE,0.968888


    -> 위 모델은 텍스트가 부정적이라고 예측함
    - 감성 분석 작업에서 파이프라인은 positive와 negative 레이블 중 하나를 반환함

## 2. 개체명 인식(Named entity recognition; NER)

    - 제품, 장소, 사람 같은 실제 객체인 개체명(naemd entity)을 추출하는 작업

In [8]:
ner_tagger = pipeline('ner', aggregation_strategy='simple')
outputs = ner_tagger(text)
pd.DataFrame(outputs)

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)


Downloading:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.24G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Unnamed: 0,entity_group,score,word,start,end
0,ORG,0.942881,Amazon,5,11
1,MISC,0.994278,Optimus Prime,36,49
2,LOC,0.999709,Germany,90,97
3,MISC,0.91528,Megatron,207,215
4,ORG,0.811659,Deceptions,253,263
5,MISC,0.947643,Megatron,350,358
6,MISC,0.993323,Optimus Prime,367,380
7,PER,0.738231,Bumblebee,501,510


    - 위 파이프라인에서는 모든 개체명을 감지해서 `ORG(조직)`, `LOC(위치)`, `PER(사람)` 같은 카테고리에 할당함
    모델 예측에 따라 단어를 그룹화하기 위해서 AGGREGATION_STRATEGY 매개변수 사용
    
    예를 들어, 
    'Optimus Prime'은 두 단어로 구성되지만, 하나의 카테고리 MISC(그 외)에 할당됨
    점수는 모델이 개체명을 얼마나 확신하는지 나타내며, 'Deceptions', 첫 번째 'Megatron'에서 가장 확신이 낮아서,
    두 경우 하나의 개체로 묶이지 못했음
    
    
    - 위의 word 열의 해시(hash, #) 기호는 토크나이저가(tokenizer)가 생성한 것
    토크나이저는 단어를 토큰(token)이라는 기본 단외로 분할함

## 3. 질문 답변(Question Answering)

In [9]:
reader = pipeline('question-answering')
question = 'What does the customer want?'
outputs = reader(question=question, context=text)
pd.DataFrame([outputs])

No model was supplied, defaulted to distilbert-base-cased-distilled-squad (https://huggingface.co/distilbert-base-cased-distilled-squad)


Downloading:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/249M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/426k [00:00<?, ?B/s]

Unnamed: 0,score,start,end,answer
0,0.355419,335,397,an exchange of Megatron ofr the Optimus Prime ...


    - 파이프라인은 답변과 함께 답이 위치한 문자 인덱스에 해당하는 start와 end 정수도 반환함
    - 질문 답변에는 여러 유형이 있는데, 텍스트에서 직접 추출하기 때문에 `추출적 질문 답변(extractive question answering)` 임

## 4. 요약(Text Summarization)

    - 긴 텍스트를 입력으로 받고, 관련 사실이 포함된 간단한 버전을 생성

In [11]:
summarizer = pipeline('summarization')
outputs = summarizer(text, max_length=45, clean_up_tokenization_spaces=True)
print(outputs[0]['summary_text'])

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)
Your min_length=56 must be inferior than your max_length=45.


 Dear Amazon, last week I ordered an Optimus Prime action figure from your online store in Germany. When I opened the package, I discovered to my horror hat I had been sent an action figure of Megatron.


In [13]:
outputs

[{'summary_text': ' Dear Amazon, last week I ordered an Optimus Prime action figure from your online store in Germany. When I opened the package, I discovered to my horror hat I had been sent an action figure of Megatron.'}]

    - max_length, clean_up_tokenization_spaces 키워드는 실행 시점에 출력을 조정함
    - 피드백이 이해할 수 없는 언어로 되어 있따면, 구글 번역을 이용하거나, 트랜스포머 모델을 사용해 번역함

## 5. 번역

In [15]:
translator = pipeline('translation_en_to_de',
                     model='Helsinki-NLP/opus-mt-en-de')

outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)
print(outputs[0]['translation_text'])

Liebe Amazon, letzte Woche bestellte ich eine Optimus Prime Action Figur aus Ihrem Online-Shop in Deutschland. Unfortunately, als ich das Paket öffnete, entdeckte ich zu meinem Horror-Hut hatte ich eine Action Figur von Megatron instedad geschickt worden! Als lebenslanger Feind der Täuschungen, Ich hoffe, Sie können mein Dilemma verstehen. Um das Problem zu lösen, Ich fordere einen Austausch von Megatron von der Optimus Prime Figur bestellte ich. Angeschlossen sind Kopien meiner Aufzeichnungen über diesen Kauf. Ich erwarte, von Ihnen bald zu hören. Aufrichtig, Bumblebee.


In [17]:
outputs

[{'translation_text': 'Liebe Amazon, letzte Woche bestellte ich eine Optimus Prime Action Figur aus Ihrem Online-Shop in Deutschland. Unfortunately, als ich das Paket öffnete, entdeckte ich zu meinem Horror-Hut hatte ich eine Action Figur von Megatron instedad geschickt worden! Als lebenslanger Feind der Täuschungen, Ich hoffe, Sie können mein Dilemma verstehen. Um das Problem zu lösen, Ich fordere einen Austausch von Megatron von der Optimus Prime Figur bestellte ich. Angeschlossen sind Kopien meiner Aufzeichnungen über diesen Kauf. Ich erwarte, von Ihnen bald zu hören. Aufrichtig, Bumblebee.'}]

    - 파이프라인의 기본 모델을 오버라이드(override)해서 애플리케이션에 가장 잘 맞는 모델 선택

## 6. 텍스트 생성

In [18]:
generator = pipeline('text-generation')
response = 'Dear Bumblebee, I am sorrty to hear that your order was mixed up.'
prompt = text + '\n\nCustomer service response:\n' + response
outputs = generator(prompt, max_length=200)
print(outputs[0]['generated_text'])

No model was supplied, defaulted to gpt2 (https://huggingface.co/gpt2)


Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/523M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Dear Amazon, last week I ordered an Optimus Prime action figure from your online store in Germany.
unFortunately, when I opened the package, I discovered to my horror hat I had been sent an action figure of Megatron instedad!
As a lifelong enemy of the Deceptions, I hope you can understand my dilemma. To resolve the issue, 
I demand an exchange of Megatron ofr the Optimus Prime figure I ordered.
Enclosed are copies of my records concerning this purchase. I expect to hear from you soon.
Sincerly, Bumblebee.

Customer service response:
Dear Bumblebee, I am sorrty to hear that your order was mixed up.

I was thinking of asking about the Transformers figures.

You are an excellent source for the Optimus Prime action figure.

But the one figure your mailer is selling is a re-print.

Doesn't matter what model was the original


In [19]:
outputs

[{'generated_text': "Dear Amazon, last week I ordered an Optimus Prime action figure from your online store in Germany.\nunFortunately, when I opened the package, I discovered to my horror hat I had been sent an action figure of Megatron instedad!\nAs a lifelong enemy of the Deceptions, I hope you can understand my dilemma. To resolve the issue, \nI demand an exchange of Megatron ofr the Optimus Prime figure I ordered.\nEnclosed are copies of my records concerning this purchase. I expect to hear from you soon.\nSincerly, Bumblebee.\n\nCustomer service response:\nDear Bumblebee, I am sorrty to hear that your order was mixed up.\n\nI was thinking of asking about the Transformers figures.\n\nYou are an excellent source for the Optimus Prime action figure.\n\nBut the one figure your mailer is selling is a re-print.\n\nDoesn't matter what model was the original"}]