<a href="https://colab.research.google.com/github/koksal100/NLP/blob/main/Sentiment_Analysis_with_Word2Vec_Embedding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**WAYS FOR CREATING EMBEDDING**
1. Word2Vec
Word2Vec is a widely used method for generating word embeddings. It includes two models: Skip-gram and Continuous Bag of Words (CBOW). Skip-gram predicts the context around a word to learn its embedding vector, while CBOW predicts the word itself given its context. Word2Vec models are typically trained on large text corpora and the resulting word embeddings are used in natural language processing tasks.

2. GloVe (Global Vectors for Word Representation)
GloVe is another popular method for word embeddings developed by Stanford University. GloVe calculates word embeddings based on the co-occurrence statistics of words. It uses statistics from large text datasets to learn word vectors.

3. FastText
FastText is a word embedding model developed by Facebook. It breaks down each word into character n-grams and averages the vectors of these n-grams to create the word embedding vector. This method is effective in capturing morphological features and semantic meanings of subwords.

4. ELMo (Embeddings from Language Models)
ELMo generates word embeddings using deep learning models. It combines outputs from language models and weights them to create word embeddings. ELMo considers both previous and subsequent words to better understand the context of a word.

5. BERT (Bidirectional Encoder Representations from Transformers)
BERT is a language model developed by Google that not only creates word embeddings but also excels in various natural language processing tasks. It is pretrained on large text corpora and uses bidirectional transformers to capture context and semantics effectively.

6. Transformer Encoder
Transformer Encoder is an architecture designed for natural language processing tasks, including creating word embeddings. It utilizes the attention mechanism to generate word embeddings and is particularly effective for handling long-range dependencies and parallel processing.

In [40]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import skipgrams
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np


In [69]:
sentences = [
    "The quick brown fox jumped over the lazy dog, surprising everyone in the park.",
    "I learned Python programming in university, which opened up a world of possibilities.",
    "Artificial intelligence continues to revolutionize industries around the globe.",
    "The majestic mountains were covered in a blanket of fresh, powdery snow.",
    "She meticulously planned every detail of her upcoming wedding.",
    "In the bustling city, the sound of honking horns filled the air.",
    "He was a brilliant scientist who dedicated his life to researching rare diseases.",
    "The ancient ruins stood silently, telling tales of civilizations long gone.",
    "They embarked on a thrilling adventure across the rugged terrain.",
    "The sun set behind the horizon, painting the sky in hues of orange and pink.",
    "Her contagious laughter filled the room, brightening everyone's mood.",
    "We gathered around the campfire, sharing stories under the starry night sky.",
    "The intricate design of the cathedral left visitors in awe of its beauty.",
    "After years of hard work, she finally achieved her dream of becoming a published author.",
    "The stormy seas tossed the ship to and fro, testing the sailors' resolve.",
    "He found solace in the pages of his favorite book, escaping into fantastical worlds.",
    "The bustling market was a feast for the senses, with vibrant colors and enticing aromas.",
    "They trekked through dense jungles, discovering hidden treasures along the way.",
    "The melody of the piano echoed through the concert hall, captivating the audience.",
    "She embarked on a culinary journey, mastering the art of French cuisine."
]

In [70]:
tokenizer = Tokenizer()
tokenizer.fit_on_texts(sentences)
word2index = tokenizer.word_index
index2word = {v: k for k, v in word2index.items()}
vocab_size = len(word2index) + 1


In [71]:
tokenizer.word_index

{'the': 1,
 'of': 2,
 'in': 3,
 'a': 4,
 'to': 5,
 'she': 6,
 'her': 7,
 'and': 8,
 'around': 9,
 'bustling': 10,
 'filled': 11,
 'he': 12,
 'was': 13,
 'his': 14,
 'they': 15,
 'embarked': 16,
 'on': 17,
 'sky': 18,
 'through': 19,
 'quick': 20,
 'brown': 21,
 'fox': 22,
 'jumped': 23,
 'over': 24,
 'lazy': 25,
 'dog': 26,
 'surprising': 27,
 'everyone': 28,
 'park': 29,
 'i': 30,
 'learned': 31,
 'python': 32,
 'programming': 33,
 'university': 34,
 'which': 35,
 'opened': 36,
 'up': 37,
 'world': 38,
 'possibilities': 39,
 'artificial': 40,
 'intelligence': 41,
 'continues': 42,
 'revolutionize': 43,
 'industries': 44,
 'globe': 45,
 'majestic': 46,
 'mountains': 47,
 'were': 48,
 'covered': 49,
 'blanket': 50,
 'fresh': 51,
 'powdery': 52,
 'snow': 53,
 'meticulously': 54,
 'planned': 55,
 'every': 56,
 'detail': 57,
 'upcoming': 58,
 'wedding': 59,
 'city': 60,
 'sound': 61,
 'honking': 62,
 'horns': 63,
 'air': 64,
 'brilliant': 65,
 'scientist': 66,
 'who': 67,
 'dedicated': 68,

In [72]:
vocab_size

174

In [73]:
sequences = tokenizer.texts_to_sequences(sentences)
sequences

[[1, 20, 21, 22, 23, 24, 1, 25, 26, 27, 28, 3, 1, 29],
 [30, 31, 32, 33, 3, 34, 35, 36, 37, 4, 38, 2, 39],
 [40, 41, 42, 5, 43, 44, 9, 1, 45],
 [1, 46, 47, 48, 49, 3, 4, 50, 2, 51, 52, 53],
 [6, 54, 55, 56, 57, 2, 7, 58, 59],
 [3, 1, 10, 60, 1, 61, 2, 62, 63, 11, 1, 64],
 [12, 13, 4, 65, 66, 67, 68, 14, 69, 5, 70, 71, 72],
 [1, 73, 74, 75, 76, 77, 78, 2, 79, 80, 81],
 [15, 16, 17, 4, 82, 83, 84, 1, 85, 86],
 [1, 87, 88, 89, 1, 90, 91, 1, 18, 3, 92, 2, 93, 8, 94],
 [7, 95, 96, 11, 1, 97, 98, 99, 100],
 [101, 102, 9, 1, 103, 104, 105, 106, 1, 107, 108, 18],
 [1, 109, 110, 2, 1, 111, 112, 113, 3, 114, 2, 115, 116],
 [117, 118, 2, 119, 120, 6, 121, 122, 7, 123, 2, 124, 4, 125, 126],
 [1, 127, 128, 129, 1, 130, 5, 8, 131, 132, 1, 133, 134],
 [12, 135, 136, 3, 1, 137, 2, 14, 138, 139, 140, 141, 142, 143],
 [1, 10, 144, 13, 4, 145, 146, 1, 147, 148, 149, 150, 8, 151, 152],
 [15, 153, 19, 154, 155, 156, 157, 158, 159, 1, 160],
 [1, 161, 2, 1, 162, 163, 19, 1, 164, 165, 166, 1, 167],
 [6, 16, 1

In [74]:
window_size = 2
skip_grams = [skipgrams(sequence, vocabulary_size=vocab_size, window_size=window_size) for sequence in sequences]

In [75]:
#return word tuples and 0/1 regarding these tuple have a positive relationship or not.

In [76]:
embed_size = 100

# Modelin giriş katmanları
input_target = tf.keras.layers.Input((1,))
input_context = tf.keras.layers.Input((1,))

embedding = tf.keras.layers.Embedding(vocab_size, embed_size, input_length=1, name='embedding')

target = embedding(input_target)
target = tf.keras.layers.Reshape((embed_size,))(target)
context = embedding(input_context)
context = tf.keras.layers.Reshape((embed_size,))(context)

dot_product = tf.keras.layers.Dot(axes=1)([target, context])
output = tf.keras.layers.Dense(1, activation='sigmoid')(dot_product)

model = tf.keras.models.Model(inputs=[input_target, input_context], outputs=output)
model.compile(loss='binary_crossentropy', optimizer='adam')

# Eğitim verilerini hazırlama ve eğitme
for epoch in range(100):
    loss = 0
    for target_context, label in skip_grams:
        if len(target_context) == 0:
            continue
        target_array = np.array([pair[0] for pair in target_context], dtype='int32')
        context_array = np.array([pair[1] for pair in target_context], dtype='int32')
        labels = np.array(label, dtype='int32')
        X = [target_array, context_array]
        Y = labels
        loss += model.train_on_batch(X, Y)
    print(f'Epoch: {epoch+1}, Loss: {loss}')


Epoch: 1, Loss: 13.866081357002258
Epoch: 2, Loss: 13.795098960399628
Epoch: 3, Loss: 13.721691250801086
Epoch: 4, Loss: 13.623933136463165
Epoch: 5, Loss: 13.489246666431427
Epoch: 6, Loss: 13.305572271347046
Epoch: 7, Loss: 13.06225198507309
Epoch: 8, Loss: 12.75157880783081
Epoch: 9, Loss: 12.370360493659973
Epoch: 10, Loss: 11.920932829380035
Epoch: 11, Loss: 11.411242246627808
Epoch: 12, Loss: 10.85390168428421
Epoch: 13, Loss: 10.264495521783829
Epoch: 14, Loss: 9.659625113010406
Epoch: 15, Loss: 9.055167555809021
Epoch: 16, Loss: 8.464980781078339
Epoch: 17, Loss: 7.900113254785538
Epoch: 18, Loss: 7.368484348058701
Epoch: 19, Loss: 6.87497091293335
Epoch: 20, Loss: 6.421793594956398
Epoch: 21, Loss: 6.009070664644241
Epoch: 22, Loss: 5.635419055819511
Epoch: 23, Loss: 5.298501417040825
Epoch: 24, Loss: 4.9954639077186584
Epoch: 25, Loss: 4.723259173333645
Epoch: 26, Loss: 4.478859268128872
Epoch: 27, Loss: 4.2593841180205345
Epoch: 28, Loss: 4.062168516218662
Epoch: 29, Loss: 3

In [77]:
skip_grams[0]

([[1, 19],
  [26, 28],
  [21, 107],
  [24, 155],
  [3, 102],
  [27, 26],
  [1, 1],
  [1, 9],
  [28, 31],
  [22, 111],
  [1, 21],
  [28, 1],
  [27, 127],
  [3, 71],
  [1, 30],
  [20, 22],
  [29, 164],
  [22, 20],
  [27, 28],
  [1, 25],
  [1, 20],
  [29, 1],
  [28, 33],
  [24, 25],
  [25, 3],
  [26, 154],
  [26, 1],
  [28, 146],
  [26, 25],
  [28, 155],
  [21, 20],
  [23, 21],
  [20, 128],
  [3, 27],
  [27, 25],
  [27, 122],
  [1, 23],
  [1, 51],
  [22, 155],
  [20, 47],
  [25, 72],
  [1, 33],
  [25, 26],
  [25, 135],
  [23, 151],
  [1, 29],
  [26, 19],
  [1, 28],
  [27, 3],
  [28, 26],
  [1, 3],
  [24, 22],
  [20, 1],
  [1, 42],
  [23, 77],
  [20, 21],
  [24, 17],
  [21, 22],
  [23, 148],
  [1, 26],
  [24, 23],
  [25, 1],
  [28, 27],
  [21, 21],
  [21, 1],
  [23, 101],
  [3, 42],
  [21, 25],
  [24, 160],
  [3, 28],
  [24, 1],
  [29, 142],
  [22, 24],
  [26, 47],
  [3, 1],
  [26, 69],
  [25, 71],
  [21, 23],
  [1, 24],
  [24, 104],
  [1, 34],
  [21, 14],
  [3, 31],
  [26, 27],
  [25, 24]

In [80]:
word_vectors = model.get_layer('embedding').get_weights()[0]

In [81]:
word_vectors.shape

(174, 100)

**THERE IS A 100 NUMERS LENGTH VECTOR FOR EACH WORD IN THE WORD_VECTORS EMBEDDINGS**

In [89]:
# Cümleler listesi
sentences = [
    "The brilliant scientist dedicated his life to researching rare diseases.",
    "Stormy seas tossed the ship, testing the sailors' resolve.",
    "Majestic mountains were covered in a blanket of fresh powdery snow.",
    "The bustling city, with its constant sound of honking horns, was overwhelming.",
    "The intricate design of the cathedral left visitors in awe of its beauty.",
    "The market was a feast for the senses but left me feeling dizzy from the vibrant colors and enticing aromas.",
    "After years of hard work, she finally achieved her dream of becoming a published author.",
    "She meticulously planned every detail of the upcoming wedding, but things still seemed to go wrong.",
    "The thrilling adventure across rugged terrain was an unforgettable experience.",
    "The ancient ruins stood silently, telling tales of civilizations long gone, which was rather melancholic.",
    "The sun set behind the horizon, painting the sky in hues of orange and pink.",
    "The sky was overcast and gray, casting a gloomy shadow over the city.",
    "Contagious laughter filled the room, brightening everyone's mood.",
    "Despite his efforts, he couldn't escape the feeling of failure after the project fell through.",
    "We gathered around the campfire, sharing stories under the starry night.",
    "The dense jungles were trekked through, but they seemed to hold more dangers than anticipated.",
    "The melody of the piano echoed through the concert hall, captivating the audience.",
    "The room fell silent after hearing the news of the accident, leaving everyone in a state of shock.",
    "The culinary journey included mastering the art of French cuisine.",
    "The shipwreck was a tragic reminder of the dangers lurking in the stormy seas."
]

# Etiketler listesi (1: pozitif, 0: negatif)
labels = [
    1,  # "The brilliant scientist dedicated his life to researching rare diseases."
    0,  # "Stormy seas tossed the ship, testing the sailors' resolve."
    1,  # "Majestic mountains were covered in a blanket of fresh powdery snow."
    0,  # "The bustling city, with its constant sound of honking horns, was overwhelming."
    1,  # "The intricate design of the cathedral left visitors in awe of its beauty."
    0,  # "The market was a feast for the senses but left me feeling dizzy from the vibrant colors and enticing aromas."
    1,  # "After years of hard work, she finally achieved her dream of becoming a published author."
    0,  # "She meticulously planned every detail of the upcoming wedding, but things still seemed to go wrong."
    1,  # "The thrilling adventure across rugged terrain was an unforgettable experience."
    0,  # "The ancient ruins stood silently, telling tales of civilizations long gone, which was rather melancholic."
    1,  # "The sun set behind the horizon, painting the sky in hues of orange and pink."
    0,  # "The sky was overcast and gray, casting a gloomy shadow over the city."
    1,  # "Contagious laughter filled the room, brightening everyone's mood."
    0,  # "Despite his efforts, he couldn't escape the feeling of failure after the project fell through."
    1,  # "We gathered around the campfire, sharing stories under the starry night."
    0,  # "The dense jungles were trekked through, but they seemed to hold more dangers than anticipated."
    1,  # "The melody of the piano echoed through the concert hall, captivating the audience."
    0,  # "The room fell silent after hearing the news of the accident, leaving everyone in a state of shock."
    1,  # "The culinary journey included mastering the art of French cuisine."
    0   # "The shipwreck was a tragic reminder of the dangers lurking in the stormy seas."
]


In [90]:
sequences = tokenizer.texts_to_sequences(sentences)
padded_sequences = tf.keras.preprocessing.sequence.pad_sequences(sequences, padding='post')
sequences

[[1, 65, 66, 68, 14, 69, 5, 70, 71, 72],
 [127, 128, 129, 1, 130, 132, 1, 133, 134],
 [46, 47, 48, 49, 3, 4, 50, 2, 51, 52, 53],
 [1, 10, 60, 148, 115, 61, 2, 62, 63, 13],
 [1, 109, 110, 2, 1, 111, 112, 113, 3, 114, 2, 115, 116],
 [1, 144, 13, 4, 145, 146, 1, 147, 112, 1, 149, 150, 8, 151, 152],
 [117, 118, 2, 119, 120, 6, 121, 122, 7, 123, 2, 124, 4, 125, 126],
 [6, 54, 55, 56, 57, 2, 1, 58, 59, 5],
 [1, 82, 83, 84, 85, 86, 13],
 [1, 73, 74, 75, 76, 77, 78, 2, 79, 80, 81, 35, 13],
 [1, 87, 88, 89, 1, 90, 91, 1, 18, 3, 92, 2, 93, 8, 94],
 [1, 18, 13, 8, 4, 24, 1, 60],
 [95, 96, 11, 1, 97, 98, 99, 100],
 [14, 12, 1, 2, 117, 1, 19],
 [101, 102, 9, 1, 103, 104, 105, 106, 1, 107, 108],
 [1, 154, 155, 48, 153, 19, 15, 5],
 [1, 161, 2, 1, 162, 163, 19, 1, 164, 165, 166, 1, 167],
 [1, 97, 117, 1, 2, 1, 28, 3, 4, 2],
 [1, 168, 169, 170, 1, 171, 2, 172, 173],
 [1, 13, 4, 2, 1, 3, 1, 127, 128]]

In [94]:
embedding_matrix = np.zeros((vocab_size, embed_size))
for word, i in word2index.items():
    embedding_vector = word_vectors[i]
    if embedding_vector is not None:
        embedding_matrix[i] = embedding_vector

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embed_size, weights=[embedding_matrix], input_length=padded_sequences.shape[1], trainable=False),
    tf.keras.layers.LSTM(64, return_sequences=True),
    tf.keras.layers.LSTM(32),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Modeli eğitme
model.fit(padded_sequences, np.array(labels), epochs=5, verbose=1)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x792065dc69b0>

In [95]:
# Örnek yeni metinler
new_texts = [
    "This movie was fantastic!",
    "I didn't like the plot of this book.",
    "The restaurant served delicious food.",
    "The performance of the team was disappointing.",
    "The weather ruined our picnic plans.",
    "She always brightens up our day with her smile.",
    "The new software update has some great features.",
    "I'm really excited about my upcoming trip.",
    "The traffic was unbearable on my way home.",
    "The customer service experience was excellent."
]

# Yeni metinleri tokenlara dönüştürme
new_sequences = tokenizer.texts_to_sequences(new_texts)

# Metinleri aynı uzunlukta doldurma (paddingleme)
padded_new_sequences = tf.keras.preprocessing.sequence.pad_sequences(new_sequences, maxlen=15)

# Model ile tahmin yapma
predictions = model.predict(padded_new_sequences)

# Tahmin sonuçlarını yazdırma
for i, text in enumerate(new_texts):
    sentiment = "Positive" if predictions[i] > 0.5 else "Negative"
    print(f"Text: {text}")
    print(f"Predicted Sentiment: {sentiment} (Probability: {predictions[i][0]:.4f})")
    print()


Text: This movie was fantastic!
Predicted Sentiment: Positive (Probability: 0.5021)

Text: I didn't like the plot of this book.
Predicted Sentiment: Positive (Probability: 0.5042)

Text: The restaurant served delicious food.
Predicted Sentiment: Positive (Probability: 0.5003)

Text: The performance of the team was disappointing.
Predicted Sentiment: Negative (Probability: 0.4905)

Text: The weather ruined our picnic plans.
Predicted Sentiment: Positive (Probability: 0.5003)

Text: She always brightens up our day with her smile.
Predicted Sentiment: Negative (Probability: 0.4992)

Text: The new software update has some great features.
Predicted Sentiment: Positive (Probability: 0.5003)

Text: I'm really excited about my upcoming trip.
Predicted Sentiment: Positive (Probability: 0.5045)

Text: The traffic was unbearable on my way home.
Predicted Sentiment: Positive (Probability: 0.5004)

Text: The customer service experience was excellent.
Predicted Sentiment: Negative (Probability: 0.49