## Sentence Classification

#### Sentiment Analysis

In [1]:
import transformers, torch
from transformers import pipeline

print(transformers.__version__)
print(torch.cuda.is_available())


4.46.3
True


In [3]:
# 감성 분류 파이프라인 생성 (미세조정된 BERT 불러오기)
clf = pipeline("sentiment-analysis")

# 문장 하나를 바로 분류
print(clf("The acting was great and the story was touching."))   # → POSITIVE
print(clf("The plot was boring and the pacing was slow."))       # → NEGATIVE

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


[{'label': 'POSITIVE', 'score': 0.9998830556869507}]
[{'label': 'NEGATIVE', 'score': 0.9998018145561218}]


#### NER (Named-entity recognition)

In [4]:
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")
model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "My name is Wolfgang and I live in Berlin"

ner_results = nlp(example)
for entity in ner_results:
	print(f"Entity: {entity['word']}, Score: {entity['score']:.2f}, Label: {entity['entity']}")


tokenizer_config.json:   0%|          | 0.00/59.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/829 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/433M [00:00<?, ?B/s]

Some weights of the model checkpoint at dslim/bert-base-NER were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


Entity: Wolfgang, Score: 1.00, Label: B-PER
Entity: Berlin, Score: 1.00, Label: B-LOC


In [5]:
example = "My name is Sylvain and I work at Hugging Face in Brooklyn."
print(example)
ner_results = nlp(example)
print(ner_results)
for entity in ner_results:
	print(f"Entity: {entity['word']}, Score: {entity['score']:.2f}, Label: {entity['entity']}")


My name is Sylvain and I work at Hugging Face in Brooklyn.
[{'entity': 'B-PER', 'score': 0.9986273, 'index': 4, 'word': 'S', 'start': 11, 'end': 12}, {'entity': 'B-PER', 'score': 0.93460417, 'index': 5, 'word': '##yl', 'start': 12, 'end': 14}, {'entity': 'B-PER', 'score': 0.7915617, 'index': 6, 'word': '##va', 'start': 14, 'end': 16}, {'entity': 'B-PER', 'score': 0.90470797, 'index': 7, 'word': '##in', 'start': 16, 'end': 18}, {'entity': 'B-ORG', 'score': 0.96700376, 'index': 12, 'word': 'Hu', 'start': 33, 'end': 35}, {'entity': 'B-ORG', 'score': 0.88534623, 'index': 13, 'word': '##gging', 'start': 35, 'end': 40}, {'entity': 'I-ORG', 'score': 0.9884615, 'index': 14, 'word': 'Face', 'start': 41, 'end': 45}, {'entity': 'B-LOC', 'score': 0.9971419, 'index': 16, 'word': 'Brooklyn', 'start': 49, 'end': 57}]
Entity: S, Score: 1.00, Label: B-PER
Entity: ##yl, Score: 0.93, Label: B-PER
Entity: ##va, Score: 0.79, Label: B-PER
Entity: ##in, Score: 0.90, Label: B-PER
Entity: Hu, Score: 0.97, Labe

#### Word2Vec

In [None]:
import gensim, numpy as np
from gensim.models import Word2Vec

In [9]:
# Example text data
sentences = [
    "I feel good today",
    "The weather is clear and warm today",
    "I ate kimchi for lunch",
    "Exercising makes me feel better"
]

# Preprocess text data (split words by spaces)
sentences = [sentence.split() for sentence in sentences]

# Train Word2Vec model
model = Word2Vec(sentences, vector_size=10, window=3, min_count=1, sg=0)

# Check the vector for a specific word
vector = model.wv['feel']
print("Vector for the word 'feel':", vector)



Vector for the word 'feel': [ 0.07380505 -0.01533471 -0.04536613  0.06554051 -0.0486016  -0.01816018
  0.0287658   0.00991874 -0.08285215 -0.09448818]


In [10]:
# Find similar words
similar_words = model.wv.most_similar('feel', topn=3)
print("Words similar to 'feel':", similar_words)

Words similar to 'feel': [('today', 0.5436006188392639), ('warm', 0.35868826508522034), ('I', 0.32933861017227173)]
