# SENTIMENT ANALYSIS USING BERT

Sentiment analysis, also known as opinion mining, is a natural language processing (NLP) technique that involves determining the sentiment or emotional tone expressed in a piece of text. It aims to classify the subjective information present in text as positive, negative, or neutral.

In [1]:
import torch
from transformers import BertTokenizer, BertForSequenceClassification

# Load pre-trained BERT model and tokenizer
model_name = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)  # 2 for binary sentiment classification

# Set device (CPU or GPU)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

# Input text for sentiment analysis
text = "I really enjoyed the movie, it was great!"

# Preprocess text
encoded_input = tokenizer.encode_plus(
    text,
    add_special_tokens=True,
    max_length=128,
    padding='max_length',
    truncation=True,
    return_tensors='pt'
)

input_ids = encoded_input['input_ids'].to(device)
attention_mask = encoded_input['attention_mask'].to(device)

# Perform sentiment analysis
with torch.no_grad():
    model.eval()
    input_ids = input_ids.to(device)
    attention_mask = attention_mask.to(device)

    outputs = model(input_ids, attention_mask=attention_mask)
    logits = outputs.logits

    predicted_labels = torch.argmax(logits, dim=1)
    sentiment_label = "Positive" if predicted_labels.item() == 1 else "Negative"

# Print sentiment label
print("Sentiment:", sentiment_label)


Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Sentiment: Positive


# Word2Vec

Word2Vec is a popular algorithm for learning word embeddings, which are dense vector representations of words in a continuous vector space. It was introduced by Tomas Mikolov et al. in 2013 at Google.

In [1]:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer=CountVectorizer()
data_corpus=["guru99 is the best site for online tutorials. I love to visit guru99."]
vocabulary=vectorizer.fit(data_corpus)
X= vectorizer.transform(data_corpus)
print(X.toarray())
print(vocabulary.get_feature_names())

[[1 1 2 1 1 1 1 1 1 1 1]]
['best', 'for', 'guru99', 'is', 'love', 'online', 'site', 'the', 'to', 'tutorials', 'visit']




In [4]:
from gensim.models import Word2Vec

# Training data
sentences = [["I", "love", "chocolate"],
             ["I", "love", "ice", "cream"],
             ["I", "like", "to", "eat", "chocolate", "ice", "cream"]]

# Train the Word2Vec model
model = Word2Vec(sentences, min_count=1)

# Find similar words
similar_words = model.wv.most_similar("chocolate")
print("Similar words to 'chocolate':")
for word, similarity in similar_words:
    print(word, similarity)

# Perform vector arithmetic
result = model.wv.most_similar(positive=["chocolate", "ice"], negative=["cream"])
print("\nVector arithmetic: 'chocolate' + 'ice' - 'cream' =")
for word, similarity in result:
    print(word, similarity)

Similar words to 'chocolate':
eat 0.1315944939851761
cream 0.06800692528486252
to 0.04157429561018944
ice -0.01351268868893385
like -0.013528075069189072
love -0.04461246356368065
I -0.11166319251060486

Vector arithmetic: 'chocolate' + 'ice' - 'cream' =
to 0.17760558426380157
eat 0.1094869002699852
love 0.07128658890724182
like 0.05822988972067833
I -0.09019022434949875


# Named Entity Recognition

Named Entity Recognition (NER) is a natural language processing (NLP) technique used to identify and classify named entities in text. Named entities are real-world objects such as persons, organizations, locations, dates, quantities, and other specific categories. NER involves automatically detecting and classifying these entities within a given text or document.

In [6]:
import spacy

# Load the English language model in spaCy
nlp = spacy.load("en_core_web_sm")

# Text for named entity recognition
text = "Apple Inc. is planning to open a new store in New York City."

# Process the text
doc = nlp(text)

# Extract named entities
named_entities = []
for ent in doc.ents:
    named_entities.append((ent.text, ent.label_))

# Print the named entities
print("Named entities:")
for entity, label in named_entities:
    print(entity, label)


Named entities:
Apple Inc. ORG
New York City GPE
