# Word Embedding (Feature Representation)

Word Embedding is an approach for representing words and documents. Word Embedding or Word Vector is a numeric vector input that represents a word in a **lower-dimensional space**.

   1) Method of extracting features out of text so that we can input those features into a machine learning model to work with text data.
    
   2) It allows words with similar meanings to have a similar representation. Thus, **Similarity can be assessed based on Similar vector representations.**
    
   3) High Computation Cost: Large input vectors will mean a huge number of weights. Embeddings help to **reduce dimensionality.**
   
   4) **Preserve syntactical and semantic information.**
    
   5) Some methods based on Word Frequency are Bag of Words (BOW), Count Vectorizer and TF-IDF. 

**Cosine similarity:-**

Cosine similarity measures the similarity between two non-zero vectors by calculating the cosine of the angle between them. It is widely used in machine learning and data analysis, especially in text analysis, document comparison, search queries, and recommendation systems.

1) Similarity measure calculates the distance between data objects based on their feature dimensions in a dataset.
    
2) A smaller distance indicates a higher similarity, while a larger distance indicates a lower similarity.

![Screenshot%20from%202025-08-01%2001-44-16.jpeg](attachment:Screenshot%20from%202025-08-01%2001-44-16.jpeg)

# Approaches for Text Representation

![Screenshot%20from%202025-07-30%2001-53-44.jpeg](attachment:Screenshot%20from%202025-07-30%2001-53-44.jpeg)

**1) Traditional Approach:**-
a) One-Hot Encoding  b) Bag of Word (Bow) c) Term frequency-inverse document frequency (TF-IDF)

**a) One-Hot Encoding :-** In this encoding scheme, each word in the vocabulary is represented as a unique vector, where the dimensionality of the vector is equal to the size of the vocabulary. The vector has all elements set to 0, except for the element corresponding to the index of the word in the vocabulary, which is set to 1. 

   **Disadvantages:-**
    1) High-dimensional vectors, Computationally expensive and Memory-intensive
    
    2) Does not capture Semantic Relationships
    
    3) Restricted to the seen training vocabulary

![Screenshot%20from%202025-07-30%2002-10-47%281%29.jpeg](attachment:Screenshot%20from%202025-07-30%2002-10-47%281%29.jpeg)

**b) Bag of Word (Bow):-** It represents a document as an unordered set of words and their respective frequencies. It discards the word order and captures the frequency of each word in the document, creating a vector representation.

  **Disadvantages:-**
  
  1) Ignores the order of words in the document: Causes loss of sequential information and context
    
  2) Sparse representations make it Memory intensive: Many elements are zero resulting in Computational inefficiency with large datasets. 

![Screenshot%20from%202025-07-30%2002-18-53.jpeg](attachment:Screenshot%20from%202025-07-30%2002-18-53.jpeg)

**c) Term frequency-inverse document frequency (TF-IDF):-** It reflects the importance of a word in a document relative to a collection of documents (corpus). It is widely used in natural language processing and information retrieval to evaluate the significance of a term within a specific document in a larger corpus. 

**Disadvantages:-**

1) Doesn't consider semantic relationships in words.

2) Sensitivity to Document Length: Longer documents have higher overall term frequencies, potentially biasing TF-IDF towards longer documents. 

**2)Neural Approach:-**
a)Word2Vec b) Continuous Bag of Words(CBOW)  c) Skip-Gram

**a)Word2Vec:-** Word2Vec is a neural approach for generating word embeddings. It belongs to the family of neural word embedding techniques and specifically falls under the category of distributed representation models. It is a popular technique in natural language processing (NLP).

    Represent words as continuous vector spaces.
    
    Aim: Capture the semantic relationships between words by mapping them to high-dimensional vectors.
    
    Words with similar meanings should have similar vector representations. Every word is assigned a vector. We start with either a random or one-hot vector. 

There are two neural embedding methods for Word2Vec: Continuous Bag of Words (CBOW) and Skip-gram.

**b)Continuous Bag of Words(CBOW):-** Continuous Bag of Words (CBOW) is a type of neural network architecture used in the Word2Vec model. The primary objective of CBOW is to predict a target word based on its context, which consists of the surrounding words in a given window. Given a sequence of words in a context window, the model is trained to predict the target word at the center of the window.

1) Feedforward neural network with a single hidden layer.

2) The input layer, hidden layer, and output layer represent the context words, learned continuous vectors or embeddings, and the target word.
    
3) Useful for learning distributed representations of words in a continuous vector space. 

The hidden layer contains the continuous vector representations (word embeddings) of the input words.

      1) The weights between the input layer and the hidden layer are learned during training.
    
      2) The dimensionality of the hidden layer represents the size of the word embeddings (the continuous vector space).

**c)The Skip-Gram**:- In this model learns distributed representations of words in a continuous vector space. The main objective of Skip-Gram is to predict context words (words surrounding a target word) given a target word. This is the opposite of the Continuous Bag of Words (CBOW) model, where the objective is to predict the target word based on its context. It is shown that this method produces more meaningful embeddings.

    Output: Trained vectors of each word after many iterations through the corpus.
    Preserve syntactical or semantic information, Converted to lower dimensions.
    Similar meaning (semantic info) vectors are placed close to each other in space.
    vector_size parameter controls the dimensionality of the word vectors, and you can adjust other parameters 
         such as window.

The choice between CBOW and Skip-gram depends on data and the task.

   1) CBOW might be preferred when training resources are limited, and capturing syntactic information is important.
   
   2) Skip-gram is chosen when semantic relationships and the representation of rare words are important.