In [2]:
import warnings
warnings.filterwarnings('ignore')

### 1. Pipeline Library
- 1.1 pipeline("task 종류", 원하는 task 모델)

- 1.2 task 모델 파라미터를 지정해주지 않으면 default task 모델이 적용

In [3]:
from transformers import pipeline 

#### 1.1 Sentiment
- Default Model : distilbert-base-uncased-finetuned-sst-2-english

In [4]:
clf_sentiment = pipeline("sentiment-analysis")
print(clf_sentiment("this movie is so funny.."))
print(clf_sentiment(["I'm glad to meet you", "I don't think so"]))

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9998550415039062}]
[{'label': 'POSITIVE', 'score': 0.9997770190238953}, {'label': 'NEGATIVE', 'score': 0.9888671040534973}]


In [18]:
kor_clf_sentiment = pipeline("sentiment-analysis", "snunlp/KR-FinBert-SC")
print(kor_clf_sentiment("아니 나 너무 화가나"))
print(kor_clf_sentiment(["롯데 자이언츠 오늘도 패배","롯데 자이언츠 2023년 11연승 성공, 이대로 우승 예상"]))

[{'label': 'negative', 'score': 0.6988539099693298}]
[{'label': 'negative', 'score': 0.9147351384162903}, {'label': 'neutral', 'score': 0.9993079900741577}]


#### 1.2 Zero-shot Classification
- Defualt Model : facebook/bart-large-mnli

In [4]:
clf_zeroshot = pipeline("zero-shot-classification")
clf_zeroshot( "You are a very nice person",
                candidate_labels = ["compliment", "curse"])

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'You are a very nice person',
 'labels': ['compliment', 'curse'],
 'scores': [0.998981773853302, 0.001018223469145596]}

In [20]:
kor_clf_zeroshot = pipeline("zero-shot-classification","MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7")
kor_clf_zeroshot( "넌 정말 착한 사람이야",
                candidate_labels = ["칭찬", "욕설"])

{'sequence': '넌 정말 착한 사람이야',
 'labels': ['칭찬', '욕설'],
 'scores': [0.9879165887832642, 0.012083346955478191]}

#### 3. Text Generation
- Text Generation default model : GPT2

- parameters : max_length, num_return_sequence 

In [21]:
generator = pipeline("text-generation")
print(generator("I'm a Korean. and love BTS",
                max_length=30,
                num_return_sequences=2))

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'I\'m a Korean. and love BTS. When they release, it\'s really awesome."\n\nHowever, one thing about BTS\'s'}, {'generated_text': "I'm a Korean. and love BTS, BTS is my favorite BTS since it's on a huge wave now, and you can't"}]


In [23]:
kor_generator = pipeline("text-generation","skt/kogpt2-base-v2")
print(kor_generator("롯데 자이언츠는 2022년 ssg를 꺾고",
                max_length=30,
                num_return_sequences=2))


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


[{'generated_text': '롯데 자이언츠는 2022년 ssg를 꺾고 KBO리그 우승을 노린다.\n2월 25일 KIA 타이거즈전을 시작으로'}, {'generated_text': '롯데 자이언츠는 2022년 ssg를 꺾고 리그 2위를 달리며 선두를 달렸다.\n선두이던 2위 SK'}]


#### 4. Mask Filiing
- Mask filling default model : distilroberta-base

- mask token은 model마다 틀림, roberta -> <mask>, bert -> [MASK]

- default top_k : 5

In [6]:
unmasker = pipeline("fill-mask")
unmasker("I'am  a korean famous boy group and have song called 'You Only Live'. My <mask> is a BTS",
top_k = 2)

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'score': 0.4297890067100525,
  'token': 766,
  'token_str': ' name',
  'sequence': "I'am  a korean famous boy group and have song called 'You Only Live'. My name is a BTS"},
 {'score': 0.05251489207148552,
  'token': 23390,
  'token_str': ' idol',
  'sequence': "I'am  a korean famous boy group and have song called 'You Only Live'. My idol is a BTS"}]

In [26]:
kor_unmasker = pipeline("fill-mask", "klue/bert-base")
kor_unmasker("롯데 자이언츠는 2022년 [MASK]을 향해 도전한다. 20년간 하지못한 우승 이번에는 할 수 있을까? ",
    top_k = 2)


Some weights of the model checkpoint at klue/bert-base were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.14065705239772797,
  'token': 4564,
  'token_str': '우승',
  'sequence': '롯데 자이언츠는 2022년 우승 을 향해 도전한다. 20년간 하지못한 우승 이번에는 할 수 있을까?'},
 {'score': 0.05443330481648445,
  'token': 1,
  'token_str': '[UNK]',
  'sequence': '롯데 자이언츠는 2022년 을 향해 도전한다. 20년간 하지못한 우승 이번에는 할 수 있을까?'}]

#### 5. Named Entity Recognition
- default model : bert-large-cased-finetuned-conll03-english

- grouped_entites : multiple word를 single word로 인식

In [7]:
ner = pipeline("ner", grouped_entities=True)
ner("My name is BTS and I song 'You Only live' and I live in Busan")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'entity_group': 'ORG',
  'score': 0.9910667,
  'word': 'BTS',
  'start': 11,
  'end': 14},
 {'entity_group': 'MISC',
  'score': 0.6939592,
  'word': 'You Only live',
  'start': 27,
  'end': 40},
 {'entity_group': 'LOC',
  'score': 0.99817884,
  'word': 'Busan',
  'start': 56,
  'end': 61}]

#### 6. Question Answering
- default model : distilbert-base-cased-distilled-squad

- input format : (question="질문", context="본문") 

In [8]:
QA = pipeline("question-answering")
QA(question="What is my favorite sports team",
    context="I love Lotte Giants and hate KIA tigers")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'score': 0.684985876083374, 'start': 7, 'end': 19, 'answer': 'Lotte Giants'}

In [28]:
kor_QA = pipeline("question-answering","ainize/klue-bert-base-mrc")
kor_QA(question="롯데 자이언츠의 2023년 결과는?",
    context="2023년 ssg는 준우승, 롯데 자이언츠는 우승을 했다.")

{'score': 0.5984599590301514, 'start': 11, 'end': 14, 'answer': '준우승'}

#### 7. Summarization
- default model : shleifer/distilbart-cnn-12-6

In [9]:
summarizer = pipeline("summarization")

summarizer(
"""
 America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'summary_text': ' America has changed dramatically during recent years . The number of engineering graduates in the U.S. has declined in traditional engineering disciplines such as mechanical, civil, electrical, chemical, and aeronautical engineering . Rapidly developing economies such as China and India, as well as other industrial countries in Europe and Asia, continue to encourage and advance the teaching of engineering .'}]

In [29]:
kor_summarizer = pipeline("summarization","ainize/kobart-news")

kor_summarizer(
"""
국내 전반적인 경기침체로 상가 건물주의 수익도 전국적인 감소세를 보이고 있는 것으로 나타났다.수익형 부동산 연구개발기업 상가정보연구소는 한국감정원 통계를 분석한 결과 전국 중대형 상가 순영업소득(부동산에서 발생하는 임대수입, 기타수입에서 제반 경비를 공제한 순소득)이 1분기 ㎡당 3만4200원에서 3분기 2만5800원으로 감소했다고 17일 밝혔다. 수도권, 세종시, 지방광역시에서 순영업소득이 가장 많이 감소한 지역은 3분기 1만3100원을 기록한 울산으로, 1분기 1만9100원 대비 31.4% 감소했다. 이어 대구(-27.7%), 서울(-26.9%), 광주(-24.9%), 부산(-23.5%), 세종(-23.4%), 대전(-21%), 경기(-19.2%), 인천(-18.5%) 순으로 감소했다. 지방 도시의 경우도 비슷했다. 경남의 3분기 순영업소득은 1만2800원으로 1분기 1만7400원 대비 26.4% 감소했으며 제주(-25.1%), 경북(-24.1%), 충남(-20.9%), 강원(-20.9%), 전남(-20.1%), 전북(-17%), 충북(-15.3%) 등도 감소세를 보였다. 조현택 상가정보연구소 연구원은 "올해 내수 경기의 침체된 분위기가 유지되며 상가, 오피스 등을 비롯한 수익형 부동산 시장의 분위기도 경직된 모습을 보였고 오피스텔, 지식산업센터 등의 수익형 부동산 공급도 증가해 공실의 위험도 늘었다"며 "실제 올 3분기 전국 중대형 상가 공실률은 11.5%를 기록하며 1분기 11.3% 대비 0.2% 포인트 증가했다"고 말했다. 그는 "최근 소셜커머스(SNS를 통한 전자상거래),  음식 배달 중개 애플리케이션, 중고 물품 거래 애플리케이션 등의 사용 증가로 오프라인 매장에 영향을 미쳤다"며 "향후 지역, 콘텐츠에 따른 상권 양극화 현상은 심화될 것으로 보인다"고 덧붙였다.
"""
)

Downloading:   0%|          | 0.00/1.45k [00:00<?, ?B/s]

You passed along `num_labels=3` with an incompatible id to label map: {'0': 'NEGATIVE', '1': 'POSITIVE'}. The number of labels wil be overwritten to 2.
You passed along `num_labels=3` with an incompatible id to label map: {'0': 'NEGATIVE', '1': 'POSITIVE'}. The number of labels wil be overwritten to 2.


Downloading:   0%|          | 0.00/496M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/302 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/682k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/239 [00:00<?, ?B/s]

[{'summary_text': '수익형 부동산 연구개발기업 상가정보연구소는 한국감정원 통계를 분석한 결과 전국 중대형 상가 순영업소득이 1분기 m2당 3만4200원에서 3분기 2만5800원으로 감소했다고 17일 밝혔는데 국내 전반적인 경기침체로 건물주의 수익도 전국적인 감소세를 보이고 있는 것으로 나타났고 전국 중대형 상가 순영업소득이 3분기 m2당 3만4200원에서 3분기 2만5800원으로 감소했다고 밝혔다.'}]

#### 8.Translation
- ValueError: This tokenizer cannot be instantiated. Please make sure you have `sentencepiece` installed in order to use this tokenizer. 로 작동이 안됨 문제 해결 중

- Model을 따로 지정을 해주어야 함

- ex) pipeline("translation_fr_to_en")

In [1]:
#translator = pipeline("translation_fr_to_en")
#translator("Ce cours est produit par Hugging Face.")