<a href="https://colab.research.google.com/github/hjae0520/class2022Spring/blob/main/huggingface_gradio.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install gradio
import gradio as gr

In [None]:
gr.Interface.load("huggingface/gpt2").launch();

### [Image classification](https://huggingface.co/tasks/image-classification)
: Image classification is the task of assigning a label or class to an entire image. Images are expected to have only one class for each image.Image classification models take an image as input and return a prediction about which class the image belongs to.

e.g. https://huggingface.co/google/vit-base-patch16-224 \
How to use

In [None]:
!pip install transformers

In [4]:
from transformers import ViTFeatureExtractor, ViTForImageClassification
from PIL import Image
import requests

url = 'http://images.cocodataset.org/val2017/000000039769.jpg'    
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224')        # pretrained: 기존 데이터로 미리 학습된 행렬 = 모델, feature extractor -> 사진의 특징 파악
model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224')              # imageclassification -> 파악한 특징으로 class 추측

inputs = feature_extractor(images=image, return_tensors="pt")                                 # image 자체는 숫자에 불과 -> 이를 압축해서 정보로서의 가치를 추출하는 것이 feature extractor
outputs = model(**inputs)                                                                     # 압축된 정보인 input을 model에 입력 -> 출력은 classified된 정보
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes                             출력은 1000개의 class 각각에 대한 확률 -> 다 더하면 1
predicted_class_idx = logits.argmax(-1).item()                                                # idx = argmax는 그 중 가장 높은 확률의 class를 찾으라는 것
print("Predicted class:", model.config.id2label[predicted_class_idx])

Downloading:   0%|          | 0.00/160 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/68.0k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/330M [00:00<?, ?B/s]

Predicted class: Egyptian cat


Demo in Gradio

In [None]:
def func (image):
  feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224')                # 위의 script를 function화
  model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224')                      # input은 image output은 predicted_class

  inputs = feature_extractor(images=image, return_tensors="pt")
  outputs = model(**inputs)
  logits = outputs.logits
  # model predicts one of the 1000 ImageNet classes
  predicted_class_idx = logits.argmax(-1).item()
  predicted_class = model.config.id2label[predicted_class_idx]
  return predicted_class

In [None]:
import os            # example에 쓸 사진 다운
url = "https://raw.githubusercontent.com/hsnam95/class2022Spring/main/tiger.jpg"
os.system("curl " + url + " > tiger.jpg")
url = "https://raw.githubusercontent.com/hsnam95/class2022Spring/main/dog.jpg"
os.system("curl " + url + " > dog.jpg")

In [None]:
gr.Interface(fn=func, inputs='image', outputs='text', examples = ['tiger.jpg', 'dog.jpg']).launch()       #example은 빼도 작동함, example은 반드시 list 형태로, 입출력 유형을 정해줘야 함

### [Fill-Mask](https://huggingface.co/tasks/fill-mask)
: Masked language modeling is the task of masking some of the words in a sentence and predicting which words should replace those masks. These models are useful when we want to get a statistical understanding of the language in which the model is trained in.

e.g. https://huggingface.co/bert-base-uncased \
How to use

In [None]:
!pip install transformers

In [None]:
from transformers import pipeline
unmasker = pipeline('fill-mask', model='bert-base-uncased')
unmasker("Hello I'm a [MASK] model.")

Demo in Gradio

In [None]:
import pandas as pd                             # 위 script의 output을 보면 list 안에 5개의 dictionary가 있음 -> 이 list를 df로 받으면 dataframe이 됨, dictionary의 각 항목이 표 한 칼럼
def func (text):
  unmasker = pipeline('fill-mask', model='bert-base-uncased')
  result = unmasker(text)
  df = pd.DataFrame(result)
  return df

In [None]:
examples = ["Hello I'm a [MASK] model.", "It is raining outside. I feel [MASK]."]

In [None]:
gr.Interface(fn=func, inputs='text', outputs='dataframe', examples = examples).launch()

### [Token classification](https://huggingface.co/tasks/token-classification)
: Token classification is a natural language understanding task in which a label is assigned to some tokens in a text. Some popular token classification subtasks are Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging. NER models could be trained to identify specific entities in a text, such as dates, individuals and places; and PoS tagging would identify, for example, which words in a text are verbs, nouns, and punctuation marks.

> NER: 중요한 4종류의 명사에 이름을 붙임

e.g. https://huggingface.co/dslim/bert-base-NER \
How to use

In [None]:
!pip install transformers

recognize four types of entities: location (LOC), organizations (ORG), person (PER) and Miscellaneous (MISC).

In [None]:
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")                        # pretrained = AI model 1) AutoTokenizer: 단어 수준으로 문장 분해
model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")          #                       2) AutoModel for token classification: 그 단어들 중 NER을 recognize함

nlp = pipeline("ner", model=model, tokenizer=tokenizer)         # nlp에 tokenizer와 classification model이 합쳐져 있음, 이거를 밑에 함수만드는데 활용
example = "My name is Wolfgang and I live in Berlin"

ner_results = nlp(example)                                      # example 문장을 두 모델이 합쳐진 nlp에 입력
print(ner_results)

Demo in Gradio

In [None]:
import pandas as pd
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
def func (text):
  tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")
  model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")
  nlp = pipeline("ner", model=model, tokenizer=tokenizer)
  result = nlp(text)
  df = pd.DataFrame(result)                                       # 결과값이 list, 그 안에 여러 dictionary가 있을 때 이를 바로 df로 받을 수 있다.
  return df

In [None]:
examples = ["My name is Wolfgang and I live in Berlin", "I will visit Seoul to see Chris"]

In [None]:
gr.Interface(fn=func, inputs='text', outputs='dataframe', examples = examples).launch()

### [Sentence similarity](https://huggingface.co/tasks/sentence-similarity)
: Sentence Similarity is the task of determining how similar two texts are. Sentence similarity models convert input texts into vectors (embeddings) that capture semantic information and calculate how close (similar) they are between them. This task is particularly useful for information retrieval and clustering/grouping.

e.g. https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 \
How to use

In [None]:
!pip install sentence_transformers

In [None]:
from sentence_transformers import SentenceTransformer, util
sentences = ["This is an example sentence", "it is one example writing"]

model = SentenceTransformer('sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2')
embeddings = model.encode(sentences)
print(embeddings)                               # embeddings -> [[1번 list - 384개], [2번 list - 384개]]

In [5]:
len(embeddings[1])

384

In [None]:
cosine_scores = util.pytorch_cos_sim(embeddings[0], embeddings[1])    # cos similarity로 유사도 측정 ( cos 0 = 1 = 완전히 같음, cos 90 radian = 0 = 완전히 다름)
cosine_scores                                                         # cos similarity는 차원이 많아지더라도 원점과 두 벡터가 이루는 각은 항상 존재

Demo in Gradio

In [None]:
def func (text1, text2):
  from sentence_transformers import SentenceTransformer, util
  model = SentenceTransformer('sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2')
  embeddings = model.encode([text1, text2])
  cosine_scores = util.pytorch_cos_sim(embeddings[0], embeddings[1])
  return cosine_scores

In [None]:
examples = [["This is an example sentence", "it is one example writing"], ["A frog is hopping near the pond", "I love Korean Food"]]

In [None]:
gr.Interface(fn=func, inputs=['text', 'text'], outputs='number', examples = examples).launch()