# pipeline
Hugging Face 파이프라인은 다음과 같은 기능을 제공합니다.

- 사전 훈련된 모델: 다양한 언어와 도메인에 대한 사전 훈련된 챗봇 모델을 제공합니다.
- 모델 학습: 사용자 정의 챗봇 모델을 학습할 수 있는 도구를 제공합니다.
- 모델 평가: 챗봇 모델의 성능을 평가할 수 있는 도구를 제공합니다.
- 모델 배포: 챗봇 모델을 프로덕션 환경에 배포할 수 있는 도구를 제공합니다

In [1]:
import transformers
print(transformers.__version__)

  from .autonotebook import tqdm as notebook_tqdm


4.37.1


## 002. 감성분석
- Distlbert 모델을 이용해 감성분석을 수행하고 아래 문장들이 긍정인지 부정인지 판단하라

- "I like Olympic games as it's very exciting"
- "I'm against to hold Olympic games in Tokyo in terms of preventing the COVID19 to be spread"

### cf. 객체? 인스턴스?
- 파이썬은 객체 지향 프로그래밍 언어로, 거의 모든 것은 객체로 이루어진다. 객체란 데이터와 함수의 모음이고, 클래스는 객체를 찍어내기 위한 거푸집이다. 클래스를 사용하면 조금의 변형만으로 다양한 객체를 양산할 수 있다.
- 인스턴스는 클래스 객체를 사용해 만든 객체를 의미한다. 이렇게 클래스로 객체를 만드는 과정을 인스턴스화라고 한다.

### distilbert?
- maller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT
- has 40% less parameters than google-bert/bert-base-uncased, runs 60% faster while preserving over 95% of BERT’s performances as measured on the GLUE language understanding benchmark.


In [2]:
from transformers import pipeline

sentiment = pipeline('sentiment-analysis')




No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
model.safetensors: 100%|██████████| 268M/268M [00:27<00:00, 9.80MB/s] 
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
tokenizer_config.json: 100%|██████████| 48.0/48.0 [00:00<?, ?B/s]
vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 9.44MB/s]


In [3]:
sentiment.model

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

In [6]:
sentences = ["I like Olympic games as it's very exciting", "I'm against to hold Olympic games in Tokyo in terms of preventing the COVID19 to be spread"]
print(sentiment(sentences[0]))
print(sentiment(sentences[1]))


[{'label': 'POSITIVE', 'score': 0.9997889399528503}]
[{'label': 'NEGATIVE', 'score': 0.9758421182632446}]


## 003
- save question answering as qa and put question text to the keword argument 'question', and put 원본 텍스트 to keword argument 'context' 

cf. keyword argument
특정한 파라미터명과 함께 함수에 전달되는 값

In [7]:
from transformers import pipeline

qa = pipeline('question-answering')

# 질문
question = "How do you think about human?"

# 분석대상
context = """

Humans (Homo sapiens) are bipedal primates characterized by their high intelligence, complex social structures, and ability to use language. Here's a breakdown of some key aspects of what makes us human:

Biology:

Species: Homo sapiens, the most recent member of the genus Homo.
Anatomy: Upright walking posture, large and complex brains, opposable thumbs, and advanced vocal cords.
Brain: The human brain is the largest relative to body size of any extant species. It allows for complex cognitive abilities like reasoning, problem-solving, abstract thinking, and creativity.
Senses: Humans have well-developed senses of sight, hearing, touch, smell, and taste, enabling them to interact with their environment effectively.
Behavior:

Social: Humans are highly social creatures who form complex relationships with each other. We cooperate in groups to achieve common goals, share knowledge and resources, and provide emotional support.
Language: Humans have a unique ability to use complex language for communication. Language allows us to share ideas, thoughts, and feelings with others, and it plays a vital role in our social interactions and cultural development.
Culture: Humans create and transmit culture across generations. Culture includes our customs, traditions, beliefs, art, music, and technology. It shapes our behavior and identity.
Tool Use: Humans have a long history of using and developing tools. Tools allow us to modify our environment, gather food, and build complex structures.
Evolution:

Humans evolved from ape-like ancestors over millions of years. The exact timeline and details are still being studied, but key factors in our evolution include:
Bipedalism (walking upright)
Increased brain size
Development of language
Use of tools
Uniqueness:

While humans share some characteristics with other animals, several aspects set us apart:

High Intelligence: Our large brains allow for complex cognitive abilities that enable us to solve problems, adapt to new situations, and make plans for the future.
Language: Our ability to use complex language allows for sophisticated communication and the transmission of knowledge across generations.
Culture: Humans create and transmit culture, which shapes our behavior and identity.
Self-Awareness: Humans are aware of ourselves as individuals and have a sense of agency over our actions.
The study of humans is a vast field that encompasses many disciplines, including biology, anthropology, psychology, sociology, and history. Scientists continue to learn more about what makes us human and how we came to be the way we are.
"""


print(qa(question=question, context=context))


No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
config.json: 100%|██████████| 473/473 [00:00<?, ?B/s] 
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
model.safetensors: 100%|██████████| 261M/261M [00:27<00:00, 9.60MB/s] 
tokenizer_config.json: 100%|██████████| 29.0/29.0 [00:00<00:00, 29.0kB/s]
vocab.txt: 100%|██████████| 213k/213k [00:00<00:00, 2.37MB/s]
tokenizer.json: 100%|██████████| 436k/436k [00:00<00:00, 664kB/s]


{'score': 0.0077745309099555016, 'start': 416, 'end': 467, 'answer': 'largest relative to body size of any extant species'}


In [8]:
# qa의 default 모델 확인
qa.model

DistilBertForQuestionAnswering(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(28996, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
      

# DistillBERT 파인튜닝 및 평가
004 ~ 013

- 트랜스포머 기반의 사전학습 모델의 훌륭한 점은 사전학습모델이 토큰과 문장들의 관계를 unlabeled 대규모 텍스트 데이터를 갖고 학습을 완료했다는 점
- 이후 labeled 소규모 데이터를 대상으로 finetuning을 하면 높은 정확도를 얻을 수 있다.

In [9]:
import torch
torch.cuda.is_available()

False

In [11]:
import torch

# CUDA 사용 가능 여부 확인
device = torch.device("cuda")
print(torch.cuda.is_available())


False
