## Hugging Face Exercise

### 오태건 (20224071)

In [3]:
# !pip3 install -U transformers
# !pip install sentencepiece

In [4]:
from transformers import pipeline

## Sentiment Analysis

In [5]:
classifier = pipeline('sentiment-analysis')

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [6]:
text = [
    "Fly me to the moon, and let me play among the stars",
    "April is the cruellest month, breeding Lilacs out of the dead land."
]

In [7]:
classifier(text) # 긍정/부정 판별

[{'label': 'POSITIVE', 'score': 0.9996751546859741},
 {'label': 'NEGATIVE', 'score': 0.9489614963531494}]

## Zero-shot classification

In [8]:
classifier = pipeline('zero-shot-classification')

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [9]:
text = [
    "German finance minister urges EU to rein in public spending",
    "China seeks more island security pacts to boost clout in Pacific"
]

In [10]:
# 제시한 labels에 따라 분류 
classifier(
    text, 
    candidate_labels = [
        'education', 'politics', 'business', 'economy', 'europe', 'asia',
    ]
)

[{'sequence': 'German finance minister urges EU to rein in public spending',
  'labels': ['europe', 'politics', 'economy', 'business', 'education', 'asia'],
  'scores': [0.40189293026924133,
   0.25528109073638916,
   0.24060110747814178,
   0.07709478586912155,
   0.01616509258747101,
   0.008965006098151207]},
 {'sequence': 'China seeks more island security pacts to boost clout in Pacific',
  'labels': ['politics', 'asia', 'business', 'economy', 'europe', 'education'],
  'scores': [0.5034116506576538,
   0.27670812606811523,
   0.1455715000629425,
   0.03344884142279625,
   0.02069881185889244,
   0.02016107365489006]}]

## Text Generation

In [11]:
# gpt2 다운 받음 
generator = pipeline('text-generation')

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [12]:
# "The Myth of Sisyphus" by Albert Camus
text = "There is but one truly serious philosophical problem, and that is suicide. Judging"

In [13]:
# 주어진 문장으로 제시한 길이에 따라 문장을 작성해줌 
generator(text, max_length=256)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'There is but one truly serious philosophical problem, and that is suicide. Judging by the media frenzy to come, many pundits are telling us that suicide is a terrible thing, that it is morally wrong. It is simply madness.\n\nLet\'s begin with a statement that has attracted a lot of heat lately. It comes from a great economist named Steven Zahn who came up with that very argument in his seminal work, Money: The Life of A Sociopath\'s Quest for Money (1980). In his original article, Zahn states an argument that is really just a short-form problem of a self-destructive psychology. However as you might expect, this same book states, after it was published, an argument that has no basis in fact.\n\nThis "problem" has actually been presented several times before, but it did not originate in the book. Rather, it came up in Zahn\'s book, The Money Problem. He is an Austrian School of economics professor, known around the country as the "Börsebergian Economics professor". T

## Mask filling

In [14]:
unmasker = pipeline('fill-mask')

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [15]:
# Billie Eilish
text = "So you're a <mask> guy, Like it really rough guy"

In [16]:
# rough 랑 가까운 단어는 embedding 을 통해 했었고
# context 를 이해하여 <mask> 에 적합한 단어를 채워줌 
unmasker(text)

[{'score': 0.927448034286499,
  'token': 6744,
  'token_str': ' rough',
  'sequence': "So you're a rough guy, Like it really rough guy"},
 {'score': 0.030522378161549568,
  'token': 1828,
  'token_str': ' tough',
  'sequence': "So you're a tough guy, Like it really rough guy"},
 {'score': 0.0017657509306445718,
  'token': 1099,
  'token_str': ' bad',
  'sequence': "So you're a bad guy, Like it really rough guy"},
 {'score': 0.0016832515830174088,
  'token': 15455,
  'token_str': ' nasty',
  'sequence': "So you're a nasty guy, Like it really rough guy"},
 {'score': 0.0015439123380929232,
  'token': 543,
  'token_str': ' hard',
  'sequence': "So you're a hard guy, Like it really rough guy"}]

## NER(Named Entity Recongnition)

In [17]:
ner = pipeline('ner')

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [18]:
text = "Steven Paul Jobs (February 24, 1955 – October 5, 2011) was an American entrepreneur, inventor, business magnate, media proprietor, and investor. He was the co-founder, chairman, and CEO of Apple; the chairman and majority shareholder of Pixar; a member of The Walt Disney Company's board of directors following its acquisition of Pixar; and the founder, chairman, and CEO of NeXT. He is widely recognized as a pioneer of the personal computer revolution of the 1970s and 1980s, along with his early business partner and fellow Apple co-founder Steve Wozniak."

In [19]:
ner(text)

[{'entity': 'I-PER',
  'score': 0.99945754,
  'index': 1,
  'word': 'Steven',
  'start': 0,
  'end': 6},
 {'entity': 'I-PER',
  'score': 0.9994562,
  'index': 2,
  'word': 'Paul',
  'start': 7,
  'end': 11},
 {'entity': 'I-PER',
  'score': 0.999501,
  'index': 3,
  'word': 'Job',
  'start': 12,
  'end': 15},
 {'entity': 'I-PER',
  'score': 0.99722654,
  'index': 4,
  'word': '##s',
  'start': 15,
  'end': 16},
 {'entity': 'I-MISC',
  'score': 0.9967591,
  'index': 18,
  'word': 'American',
  'start': 62,
  'end': 70},
 {'entity': 'I-ORG',
  'score': 0.9993734,
  'index': 45,
  'word': 'Apple',
  'start': 189,
  'end': 194},
 {'entity': 'I-ORG',
  'score': 0.9989691,
  'index': 53,
  'word': 'Pi',
  'start': 237,
  'end': 239},
 {'entity': 'I-ORG',
  'score': 0.9961754,
  'index': 54,
  'word': '##xa',
  'start': 239,
  'end': 241},
 {'entity': 'I-ORG',
  'score': 0.9993247,
  'index': 55,
  'word': '##r',
  'start': 241,
  'end': 242},
 {'entity': 'I-ORG',
  'score': 0.9991516,
  'inde

## Questing answering

In [20]:
qa = pipeline('question-answering')

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [21]:
qa(
    context=text,
    question='Which companies are founded by Steve Jobs?'
)

{'score': 0.4551231563091278, 'start': 189, 'end': 194, 'answer': 'Apple'}

## Summarization

In [22]:
summarizer = pipeline('summarization')

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

In [23]:
# 주어진 text를 64 길이 안으로 요약해줘 
summarizer(text, max_length=64)

[{'summary_text': " Steven Paul Jobs was the co-founder, chairman, and CEO of Apple . He is widely recognized as a pioneer of the personal computer revolution of the 1970s and 1980s, along with his early business partner Steve Wozniak . He was a member of The Walt Disney Company's board of directors"}]

## Translation

In [26]:
# en > france
translator = pipeline('translation', model='Helsinki-NLP/opus-mt-en-fr')

In [27]:
translator('Hello World')

[{'translation_text': 'Bonjour le monde'}]

In [28]:
translator(text)

[{'translation_text': "Steven Paul Jobs (24 février 1955 – 5 octobre 2011) était un entrepreneur américain, inventeur, magnat des affaires, propriétaire de médias et investisseur. Il était le cofondateur, président et directeur général d'Apple; le président et actionnaire majoritaire de Pixar; un membre du conseil d'administration de la Walt Disney Company suite à son acquisition de Pixar; et le fondateur, président et chef de la direction de NeXT. Il est largement reconnu comme un pionnier de la révolution informatique personnelle des années 1970 et 1980, ainsi que son premier associé et cofondateur d'Apple Steve Wozniak."}]

In [29]:
# ko > en 
translator = pipeline('translation', model='Helsinki-NLP/opus-mt-ko-en')

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.39k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/312M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

Downloading (…)olve/main/source.spm:   0%|          | 0.00/842k [00:00<?, ?B/s]

Downloading (…)olve/main/target.spm:   0%|          | 0.00/813k [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.72M [00:00<?, ?B/s]

In [30]:
text = "한국산 가상화폐 루나와 테라USD(UST) 폭락으로 손실을 본 투자자들이 발행사 테라폼랩스의 권도형 최고경영자(CEO)를 고소했다."

In [31]:
translator(text)

[{'translation_text': "After losing a Korean virtual currency, Luna turusD (UST), investors filed charges against CEO's high-powered top manager for the launch service terafos."}]

## Sentiment Analysis - Korean

In [35]:
# 서울대 연구실에서 매일경제 뉴스를 데이터로 학습시킨 모델임 
classifier = pipeline('sentiment-analysis', model='snunlp/KR-FinBert-SC')


In [36]:
text = [
    "한국산 가상화폐 루나와 테라USD(UST) 폭락으로 손실을 본 투자자들이 발행사 테라폼랩스의 권도형 최고경영자(CEO)를 고소했다.",
    "외국인, 올해 국내 주식 15조 원 순매도…삼성만 5조 원 팔았다",
    "尹, 탈원전 정상화 추진 “원전 수출 증진 위해 韓美 노력”",
]

In [37]:
classifier(text)

[{'label': 'negative', 'score': 0.9798452258110046},
 {'label': 'negative', 'score': 0.9699409604072571},
 {'label': 'positive', 'score': 0.9954456090927124}]