## 번역

In [3]:
from transformers import pipeline

translator = pipeline("translation_en_to_fr") # 기본 모델 T5-base 사용

No model was supplied, defaulted to t5-base and revision 686f1db (https://huggingface.co/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [24]:
translator("The sun is rising amid the hills.")

[{'translation_text': "Le soleil s'élève dans les collines."}]

## 감성 분석

In [25]:
textClassifier = pipeline("text-classification")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [26]:
textClassifier("Call me Ishmael. \
Some years ago—never mind how long precisely—having little \
or no money in my purse, and nothing particular to interest me on shore, \
I thought I would sail about a little and see the watery part of the world. \
It is a way I have of driving off the spleen, and regulating the circulation.")

[{'label': 'NEGATIVE', 'score': 0.9846920371055603}]

In [39]:
textClassifier(["To be, or not to be","With mirth and laughter let old wrinkles come"])

[{'label': 'NEGATIVE', 'score': 0.810395359992981},
 {'label': 'POSITIVE', 'score': 0.9776005148887634}]

## 텍스트 추론 (Natural Language Inference : NLI)
* 문장이 참(entailment), 거짓(contradiction), 중립(neutral)인지 가려낸다

In [27]:
nliClassifier = pipeline("text-classification", model = "roberta-large-mnli")

Some weights of the model checkpoint at roberta-large-mnli were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [28]:
nliClassifier("Staying clean is a good thing. Hygiene is a lovely thing.")

[{'label': 'ENTAILMENT', 'score': 0.7261566519737244}]

In [29]:
nliClassifier("Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. I have a waterphobia.")

[{'label': 'CONTRADICTION', 'score': 0.9656215906143188}]

In [42]:
nliClassifier("Cats were represented in social and religious practices of ancient Egypt for more than 3,000 years. Power systems is a major subfield of Electrical Engineering.")

[{'label': 'NEUTRAL', 'score': 0.5616686940193176}]

## QNLI
* 문장이 가설과 관련된 정보를 포함하고 있는지 알 수 있다.

In [43]:
qnliClassifier = pipeline("text-classification", model = "cross-encoder/qnli-electra-base")

In [32]:
qnliClassifier("What's the major subfield of Electrical Engineering?, Staying clean is a good thing.")

[{'label': 'LABEL_0', 'score': 0.0025376155972480774}]

In [33]:
qnliClassifier("What's the major subfield of Electrical Engineering?, Power and Electronics engineerings are usually preferred by new undergrad students.")

[{'label': 'LABEL_0', 'score': 0.7401779294013977}]

## Question Answering

In [13]:
qaModel = pipeline("question-answering")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [14]:
passage = "THE TIME TRAVELLER (for so it will be convenient to speak of him) \
was expounding a recondite matter to us. \
His grey eyes shone and twinkled, and his usually pale face was flushed \
and animated. The fire burned brightly, and the soft radiance of the \
incandescent lights in the lilies of silver caught the bubbles \
that flashed and passed in our glasses. \
Our chairs, being his patents, embraced and caressed us rather than submitted \
to be sat upon, and there was that luxurious after-dinner atmosphere \
when thought runs gracefully free of the trammels of precision. \
And he put it to us in this way—marking the points with a lean forefinger—as \
we sat and lazily admired his earnestness over this new paradox \
(as we thought it) and his fecundity."

qaModel("What was colour of time traveller eyes?", passage)

{'score': 0.9327439069747925, 'start': 111, 'end': 115, 'answer': 'grey'}

In [15]:
qaModel("Where were we sitting?", passage)

{'score': 0.06857698410749435,
 'start': 351,
 'end': 361,
 'answer': 'Our chairs'}

In [16]:
qaModel("What was our reaction to time traveller?", passage)

{'score': 0.0344405323266983,
 'start': 107,
 'end': 139,
 'answer': 'His grey eyes shone and twinkled'}

## 문법 수정하기

In [44]:
grammarClassifier = pipeline("text-classification", model = "textattack/distilbert-base-uncased-CoLA")

In [40]:
grammarClassifier("he go to school everyday out missing the bus")

[{'label': 'LABEL_0', 'score': 0.6497087478637695}]

In [41]:
grammarClassifier("This chapter will provide an overview of performing common NLP tasks.")

[{'label': 'LABEL_1', 'score': 0.9765014052391052}]

## 토큰 분류
* 2가지 step이 있다
  * Named Entity Recognition (NER)
  * Part-of-Speech (PoS) tagging

In [34]:
tokenClassifier = pipeline("ner")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [35]:
tokenClassifier("Ptolemy mentions in his Geographia a city called Labokla which may have been in reference to ancient Lahore.")

[{'entity': 'I-PER',
  'score': 0.9345262,
  'index': 1,
  'word': 'Ptolemy',
  'start': 0,
  'end': 7},
 {'entity': 'I-MISC',
  'score': 0.9682059,
  'index': 5,
  'word': 'G',
  'start': 24,
  'end': 25},
 {'entity': 'I-MISC',
  'score': 0.8515827,
  'index': 6,
  'word': '##eo',
  'start': 25,
  'end': 27},
 {'entity': 'I-MISC',
  'score': 0.86348933,
  'index': 7,
  'word': '##graph',
  'start': 27,
  'end': 32},
 {'entity': 'I-MISC',
  'score': 0.92860067,
  'index': 8,
  'word': '##ia',
  'start': 32,
  'end': 34},
 {'entity': 'I-LOC',
  'score': 0.991808,
  'index': 12,
  'word': 'Lab',
  'start': 49,
  'end': 52},
 {'entity': 'I-LOC',
  'score': 0.9888537,
  'index': 13,
  'word': '##ok',
  'start': 52,
  'end': 54},
 {'entity': 'I-LOC',
  'score': 0.9976913,
  'index': 14,
  'word': '##la',
  'start': 54,
  'end': 56},
 {'entity': 'I-LOC',
  'score': 0.99820256,
  'index': 23,
  'word': 'Lahore',
  'start': 101,
  'end': 107}]

In [36]:
posTagger = pipeline("token-classification", model = "vblagoje/bert-english-uncased-finetuned-pos")

In [37]:
posTagger("Ptolemy mentions in his Geographia a city called Labokla which may have been in reference to ancient Lahore.")

[{'entity': 'PROPN',
  'score': 0.9985763,
  'index': 1,
  'word': 'ptolemy',
  'start': 0,
  'end': 7},
 {'entity': 'VERB',
  'score': 0.99952936,
  'index': 2,
  'word': 'mentions',
  'start': 8,
  'end': 16},
 {'entity': 'ADP',
  'score': 0.9994611,
  'index': 3,
  'word': 'in',
  'start': 17,
  'end': 19},
 {'entity': 'PRON',
  'score': 0.998919,
  'index': 4,
  'word': 'his',
  'start': 20,
  'end': 23},
 {'entity': 'PROPN',
  'score': 0.5962589,
  'index': 5,
  'word': 'geo',
  'start': 24,
  'end': 27},
 {'entity': 'PROPN',
  'score': 0.77628046,
  'index': 6,
  'word': '##graph',
  'start': 27,
  'end': 32},
 {'entity': 'NOUN',
  'score': 0.6260992,
  'index': 7,
  'word': '##ia',
  'start': 32,
  'end': 34},
 {'entity': 'DET',
  'score': 0.9993963,
  'index': 8,
  'word': 'a',
  'start': 35,
  'end': 36},
 {'entity': 'NOUN',
  'score': 0.9986626,
  'index': 9,
  'word': 'city',
  'start': 37,
  'end': 41},
 {'entity': 'VERB',
  'score': 0.9963213,
  'index': 10,
  'word': 'cal

## 텍스트 요약

In [46]:
summarizer = pipeline("summarization") # default로 distilbart-cnn-12-6 모델 사용

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [48]:
summarizer("“Filby became pensive. “Clearly,” the Time Traveller proceeded, “any real body must have extension in four directions: it must have Length, Breadth, Thickness, and—Duration. But through a natural infirmity of the flesh, which I will explain to you in a moment, we incline to overlook this fact. There are really four dimensions, three which we call the three planes of Space, and a fourth, Time. There is, however, a tendency to draw an unreal distinction between the former three dimensions and the latter, because it happens that our consciousness moves intermittently in one direction along the latter from the beginning to the end of our lives.”", min_length=10, max_length=30)

[{'summary_text': ' There are really four dimensions, three which we call the three planes of Space, and a fourth, Time . There is a tendency to'}]

* 다른 모델로 요약하기

In [4]:
bartLargeSummarizer = pipeline("summarization", "facebook/bart-large-cnn")
bartLargeSummarizer("“Filby became pensive. “Clearly,” the Time Traveller proceeded, “any real body must have extension in four directions: it must have Length, Breadth, Thickness, and—Duration. But through a natural infirmity of the flesh, which I will explain to you in a moment, we incline to overlook this fact. There are really four dimensions, three which we call the three planes of Space, and a fourth, Time. There is, however, a tendency to draw an unreal distinction between the former three dimensions and the latter, because it happens that our consciousness moves intermittently in one direction along the latter from the beginning to the end of our lives.”", min_length=10, max_length=30)

[{'summary_text': '"There are really four dimensions, three which we call the three planes of Space, and a fourth, Time" "Our consciousness moves'}]

## Text Generation

In [17]:
gpt2Generator = pipeline("text-generation", model="gpt2")

Downloading (…)lve/main/config.json: 100%|██████████| 665/665 [00:00<00:00, 153kB/s]
Downloading (…)"pytorch_model.bin";: 100%|██████████| 548M/548M [01:23<00:00, 6.59MB/s] 
Downloading (…)neration_config.json: 100%|██████████| 124/124 [00:00<00:00, 17.1kB/s]
Downloading (…)olve/main/vocab.json: 100%|██████████| 1.04M/1.04M [00:02<00:00, 503kB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████| 456k/456k [00:01<00:00, 271kB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 1.36M/1.36M [00:02<00:00, 577kB/s]


In [19]:
gpt2Generator("Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. ", max_length = 200, num_return_sequences=3)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text. \xa0A recurrent neural network (NFC) of a machine learns to create an artificial language with low complexity. \xa0 This novel model uses neural network architectures where there are high-dimensional information for each of the three inputs with a high probability of finding information of the highest order. \xa0The NFC is then trained to recognize sentences that are of high probability and those that cannot. \xa0For instance, a speaker from a group of non-verbal groups may see several words that do the same thing, and may recognize the same sentence. \xa0If one fails to recognize sentences in a corpus, the system will learn over time over and over more complex representations, each of which will learn something different. \xa0 The goal for this model is to create a language that allows one to create a different kind of world, from a 

In [20]:
text2textGenerator = pipeline("text2text-generation")

No model was supplied, defaulted to t5-base and revision 686f1db (https://huggingface.co/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


In [22]:
text2textGenerator("question: Who was Feynman ? context: Richard Feynman was a Physicist. Being one of the most famous scientist ever, he is still remembered in the scientific society.")

[{'generated_text': 'Physicist'}]

## Fill Mask

In [27]:
maskFiller = pipeline("fill-mask")
maskFiller("DNA is the <mask> of life.")

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'score': 0.26325371861457825,
  'token': 4811,
  'token_str': ' foundation',
  'sequence': 'DNA is the foundation of life.'},
 {'score': 0.11198776215314865,
  'token': 1453,
  'token_str': ' basis',
  'sequence': 'DNA is the basis of life.'},
 {'score': 0.06848201155662537,
  'token': 22157,
  'token_str': ' spice',
  'sequence': 'DNA is the spice of life.'},
 {'score': 0.03729137405753136,
  'token': 14981,
  'token_str': ' essence',
  'sequence': 'DNA is the essence of life.'},
 {'score': 0.034392744302749634,
  'token': 9813,
  'token_str': ' origin',
  'sequence': 'DNA is the origin of life.'}]