# 1. 환경설정 

```bash
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -c huggingface transformers
```

# 2. Imports 

In [37]:
import transformers
import torch

# 3. sentiment-analysis

In [2]:
sentiment = transformers.pipeline('sentiment-analysis')

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [3]:
sentiment.model

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

In [4]:
print(sentiment(["I like Olympic games as it's very exciting."]))

[{'label': 'POSITIVE', 'score': 0.9998026490211487}]


In [5]:
print(sentiment(["I'm against to hold Olympic games in Tokyo in terms of preventing the covid19 to be spread."]))

[{'label': 'NEGATIVE', 'score': 0.9791859984397888}]


# 4. question-answering

In [6]:
qa = transformers.pipeline("question-answering")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [7]:
olympic_wiki_text = """
The 2020 Summer Olympics (Japanese: 2020年夏季オリンピック, Hepburn: Nisen Nijū-nen Kaki Orinpikku), officially the Games of the XXXII Olympiad (第三十二回オリンピック競技大会, Dai Sanjūni-kai Orinpikku Kyōgi Taikai) and branded as Tokyo 2020 (東京2020), is an ongoing international multi-sport event being held from 23 July to 8 August 2021 in Tokyo, Japan, with some preliminary events that began on 21 July.

Tokyo was selected as the host city during the 125th IOC Session in Buenos Aires, Argentina, on 7 September 2013.[3] Originally scheduled to take place from 24 July to 9 August 2020, the event was postponed to 2021 in March 2020 as a result of the COVID-19 pandemic, the first such instance in the history of the Olympic Games (previous games had been cancelled but not rescheduled).[4] However, the event retains the Tokyo 2020 name for marketing and branding purposes.[5] It is being held largely behind closed doors with no public spectators permitted due to the declaration of a state of emergency.[b] The Summer Paralympics will be held between 24 August and 5 September 2021, 16 days after the completion of the Olympics.[6]

The 2020 Games are the fourth Olympic Games to be held in Japan, following the Tokyo 1964 (Summer), Sapporo 1972 (Winter), and Nagano 1998 (Winter) games.[c] Tokyo is the first city in Asia to hold the Summer Games twice. The 2020 Games are the second of three consecutive Olympics to be held in East Asia, following the 2018 Winter Olympics in Pyeongchang, South Korea, and preceding the 2022 Winter Olympics in Beijing, China.

The 2020 Games introduced new competitions and re-introduced competitions that once were held but were subsequently removed. New ones include 3x3 basketball, freestyle BMX and mixed gender team events in a number of existing sports, as well as the return of madison cycling for men and an introduction of the same event for women. New IOC policies also allow the host organizing committee to add new sports to the Olympic program for just one Games. The disciplines added by the Japanese Olympic Committee are baseball and softball, karate, sport climbing, surfing, and skateboarding, the last four of which make their Olympic debuts.[7]

Bermuda, the Philippines, and Qatar won their first-ever Olympic gold medals.[8][9][10] San Marino, Turkmenistan, and Burkina Faso won their first-ever Olympic medals.[11][12][13]

"""
print(qa(question="What caused Tokyo Olympic postponed?", context=olympic_wiki_text))

{'score': 0.6619062423706055, 'start': 635, 'end': 652, 'answer': 'COVID-19 pandemic'}


# 5. fill-mask

In [77]:
# pipeline에 과업(fill-mask) 및 모델 지정
unmasker = transformers.pipeline('fill-mask', model='bert-base-uncased')

# pipeline을 인스턴스화한 변수 unmasker에 [MASK] 토큰을 지닌 입력문장 투입
unmasker("MLM and NSP is the [MASK] task of BERT.")


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.2572782337665558,
  'token': 2364,
  'token_str': 'main',
  'sequence': 'mlm and nsp is the main task of bert.'},
 {'score': 0.2074074000120163,
  'token': 3078,
  'token_str': 'primary',
  'sequence': 'mlm and nsp is the primary task of bert.'},
 {'score': 0.06773321330547333,
  'token': 2034,
  'token_str': 'first',
  'sequence': 'mlm and nsp is the first task of bert.'},
 {'score': 0.06548508256673813,
  'token': 2430,
  'token_str': 'central',
  'sequence': 'mlm and nsp is the central task of bert.'},
 {'score': 0.061673976480960846,
  'token': 3937,
  'token_str': 'basic',
  'sequence': 'mlm and nsp is the basic task of bert.'}]

In [78]:
# 모델명이 바뀌었음에 유의
unmasker = transformers.pipeline('fill-mask', model='distilbert-base-uncased')
unmasker("MLM and NSP is the [MASK] task of BERT.")


[{'score': 0.25902432203292847,
  'token': 3078,
  'token_str': 'primary',
  'sequence': 'mlm and nsp is the primary task of bert.'},
 {'score': 0.16309870779514313,
  'token': 2364,
  'token_str': 'main',
  'sequence': 'mlm and nsp is the main task of bert.'},
 {'score': 0.08182774484157562,
  'token': 4563,
  'token_str': 'core',
  'sequence': 'mlm and nsp is the core task of bert.'},
 {'score': 0.040237728506326675,
  'token': 7037,
  'token_str': 'dual',
  'sequence': 'mlm and nsp is the dual task of bert.'},
 {'score': 0.02484491653740406,
  'token': 4054,
  'token_str': 'principal',
  'sequence': 'mlm and nsp is the principal task of bert.'}]

In [79]:
# 모델명이 바뀌었음에 유의
unmasker = transformers.pipeline('fill-mask', model='albert-base-v2')
unmasker("mlm and nsp is the [MASK] task of bert.")


Some weights of the model checkpoint at albert-base-v2 were not used when initializing AlbertForMaskedLM: ['albert.pooler.bias', 'albert.pooler.weight']
- This IS expected if you are initializing AlbertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing AlbertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.04760162904858589,
  'token': 6612,
  'token_str': 'ultimate',
  'sequence': 'mlm and nsp is the ultimate task of bert.'},
 {'score': 0.024472415447235107,
  'token': 20766,
  'token_str': 'hardest',
  'sequence': 'mlm and nsp is the hardest task of bert.'},
 {'score': 0.023495109751820564,
  'token': 1256,
  'token_str': 'primary',
  'sequence': 'mlm and nsp is the primary task of bert.'},
 {'score': 0.02157513238489628,
  'token': 407,
  'token_str': 'main',
  'sequence': 'mlm and nsp is the main task of bert.'},
 {'score': 0.018087901175022125,
  'token': 18369,
  'token_str': 'foremost',
  'sequence': 'mlm and nsp is the foremost task of bert.'}]