# Introduction to Word Embedding
Provide an overview of word embedding and its importance in NLP.

# Key Citations
List key citations and references for further reading on word embeddings.

## Introduction to Word Embedding

Word embedding is a technique used in natural language processing (NLP) to represent words in a continuous vector space. This allows words with similar meanings to have similar representations, which can be used to improve the performance of various NLP tasks such as text classification, sentiment analysis, and machine translation.

### Importance of Word Embedding

1. **Captures Semantic Relationships**: Word embeddings capture semantic relationships between words, allowing models to understand context and meaning.
2. **Dimensionality Reduction**: By representing words as vectors, word embeddings reduce the dimensionality of text data, making it more manageable for machine learning algorithms.
3. **Improves Model Performance**: Using word embeddings can significantly improve the performance of NLP models by providing them with richer and more informative input data.

### Key Citations

- [Word Embedding - Wikipedia](https://en.wikipedia.org/wiki/Word_embedding)
- [Word embeddings in NLP: A Complete Guide](https://www.analyticsvidhya.com/blog/2020/08/top-4-sentence-embedding-techniques-using-python/)
- [Word Embeddings in NLP - GeeksforGeeks](https://www.geeksforgeeks.org/word-embeddings-in-nlp/)
- [What Are Word Embeddings? | IBM](https://www.ibm.com/cloud/learn/word-embeddings)
- [What Are Word Embeddings for Text? - MachineLearningMastery.com](https://machinelearningmastery.com/what-are-word-embeddings/)
- [The Ultimate Guide to Word Embeddings](https://towardsdatascience.com/the-ultimate-guide-to-word-embeddings-in-nlp-5cdd0f1d5e8c)
- [An intuitive introduction to text embeddings - Stack Overflow](https://stackoverflow.blog/2020/07/27/a-quick-introduction-to-text-embeddings/)
- [An Intuitive Understanding of Word Embeddings: From Count Vectors to Word2Vec - Analytics Vidhya](https://www.analyticsvidhya.com/blog/2020/08/top-4-sentence-embedding-techniques-using-python/)
- [What are Word Embeddings? | A Comprehensive Word Embedding Guide | Elastic](https://www.elastic.co/blog/what-are-word-embeddings)
- [The Illustrated Word2vec](https://jalammar.github.io/illustrated-word2vec/)
- [Word2vec - Wikipedia](https://en.wikipedia.org/wiki/Word2vec)
- [A Dummy’s Guide to Word2Vec - Medium](https://medium.com/@mishra.thedeepak/a-dummys-guide-to-word2vec-2b865a8339e0)
- [Word2Vec: Explanation and Examples - Serokell](https://serokell.io/blog/word2vec)
- [Word Embedding using Word2Vec - GeeksforGeeks](https://www.geeksforgeeks.org/word-embedding-using-word2vec/)
- [word2vec | Text | TensorFlow](https://www.tensorflow.org/tutorials/text/word2vec)
- [Word2Vec For Word Embeddings -A Beginner's Guide - Analytics Vidhya](https://www.analyticsvidhya.com/blog/2020/08/top-4-sentence-embedding-techniques-using-python/)
- [Practice Word2Vec for NLP Using Python | Built In](https://builtin.com/data-science/word2vec)

# Import Required Libraries
Import the necessary libraries, including Gensim and Matplotlib.

In [None]:
# Import the necessary libraries
import gensim
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import nltk

# Download necessary NLTK data
nltk.download('punkt')
nltk.download('stopwords')

# Understanding Word2Vec
Explain the Word2Vec model and its components, including CBOW and Skip-gram.

In [None]:
from gensim.models import Word2Vec
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Sample text data
text = "Word embeddings are a type of word representation that allows words to be represented as vectors in a continuous vector space."

# Preprocess the text
stop_words = set(stopwords.words('english'))
words = word_tokenize(text.lower())
filtered_words = [word for word in words if word.isalnum() and word not in stop_words]

# Create CBOW model
cbow_model = Word2Vec([filtered_words], vector_size=50, window=2, min_count=1, sg=0)

# Create Skip-gram model
skipgram_model = Word2Vec([filtered_words], vector_size=50, window=2, min_count=1, sg=1)

# Display the vector for a word
print("Vector for 'word' using CBOW:", cbow_model.wv['word'])
print("Vector for 'word' using Skip-gram:", skipgram_model.wv['word'])

# Implementing Word2Vec with Gensim
Provide code examples to implement Word2Vec using the Gensim library.

In [None]:
# Import the necessary libraries
import gensim
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import nltk

# Download necessary NLTK data
nltk.download('punkt')
nltk.download('stopwords')
from gensim.models import Word2Vec
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Sample text data
text = "Word embeddings are a type of word representation that allows words to be represented as vectors in a continuous vector space."

# Preprocess the text
stop_words = set(stopwords.words('english'))
words = word_tokenize(text.lower())
filtered_words = [word for word in words if word.isalnum() and word not in stop_words]

# Create CBOW model
cbow_model = Word2Vec([filtered_words], vector_size=50, window=2, min_count=1, sg=0)

# Create Skip-gram model
skipgram_model = Word2Vec([filtered_words], vector_size=50, window=2, min_count=1, sg=1)

# Display the vector for a word
print("Vector for 'word' using CBOW:", cbow_model.wv['word'])
print("Vector for 'word' using Skip-gram:", skipgram_model.wv['word'])

# Visualizing Word Embeddings
Use Matplotlib to visualize the word embeddings in a 2D space.

In [None]:
# Visualizing Word Embeddings

from sklearn.decomposition import PCA

# Get the word vectors from the CBOW model
word_vectors = cbow_model.wv

# Reduce the dimensionality of the word vectors to 2D using PCA
pca = PCA(n_components=2)
word_vectors_2d = pca.fit_transform(word_vectors.vectors)

# Create a scatter plot of the word vectors
plt.figure(figsize=(10, 10))
plt.scatter(word_vectors_2d[:, 0], word_vectors_2d[:, 1])

# Annotate the points with the corresponding words
for i, word in enumerate(word_vectors.index_to_key):
    plt.annotate(word, xy=(word_vectors_2d[i, 0], word_vectors_2d[i, 1]))

plt.title('2D Visualization of Word Embeddings (CBOW)')
plt.xlabel('PCA Component 1')
plt.ylabel('PCA Component 2')
plt.show()

# Applications of Word Embeddings
Discuss various applications of word embeddings in NLP tasks.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Sample text data for classification
texts = ["I love this movie", "I hate this movie", "This film is great", "This film is terrible"]
labels = [1, 0, 1, 0]  # 1 for positive, 0 for negative

# Preprocess the text data
def preprocess(text):
    words = word_tokenize(text.lower())
    return [word for word in words if word.isalnum() and word not in stop_words]

processed_texts = [preprocess(text) for text in texts]

# Get the average word vectors for each text
def get_average_vector(words, model):
    vectors = [model.wv[word] for word in words if word in model.wv]
    return np.mean(vectors, axis=0)

X = np.array([get_average_vector(text, cbow_model) for text in processed_texts])
y = np.array(labels)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# Train a logistic regression model
clf = LogisticRegression()
clf.fit(X_train, y_train)

# Predict and evaluate the model
y_pred = clf.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))