# What is Embedding?

Embedding converts categories into continuous vectors

Embedding, mainly used in deep learning, converts categorical data into dense vectors of continuous numbers. These vectors capture semantic relationships between categories and reduce dimensionality efficiently. Pre-trained embeddings like Word2Vec, GloVe, and FastText are available for tasks like natural language processing (NLP) and recommendation systems. Trainable embeddings are learned during model training for specific tasks.

Advantages of embeddings include their compactness and ability to capture semantic relationships, improving model performance. However, embeddings are computationally expensive and less interpretable compared to encodings.

The choice between encoding and embedding depends on the dataset size, number of categories, and specific problem requirements. One-hot encoding is suitable for small datasets with few categories, while embeddings are preferred for large datasets, especially in NLP tasks and recommendation systems.

In [1]:
# The Gensim library provides tools and algorithms for topic modeling, 
# document similarity analysis, and other natural language processing (NLP) tasks.
!pip install gensim

Collecting gensim
  Obtaining dependency information for gensim from https://files.pythonhosted.org/packages/63/46/5feab9c524a380bfa9f9f1c0d065743280dca30b216ab4c7a231f22dbed7/gensim-4.3.2-cp311-cp311-macosx_11_0_arm64.whl.metadata
  Downloading gensim-4.3.2-cp311-cp311-macosx_11_0_arm64.whl.metadata (8.3 kB)
Collecting smart-open>=1.8.1 (from gensim)
  Obtaining dependency information for smart-open>=1.8.1 from https://files.pythonhosted.org/packages/ad/08/dcd19850b79f72e3717c98b2088f8a24b549b29ce66849cd6b7f44679683/smart_open-7.0.1-py3-none-any.whl.metadata
  Downloading smart_open-7.0.1-py3-none-any.whl.metadata (23 kB)
Collecting wrapt (from smart-open>=1.8.1->gensim)
  Obtaining dependency information for wrapt from https://files.pythonhosted.org/packages/0f/16/ea627d7817394db04518f62934a5de59874b587b792300991b3c347ff5e0/wrapt-1.16.0-cp311-cp311-macosx_11_0_arm64.whl.metadata
  Downloading wrapt-1.16.0-cp311-cp311-macosx_11_0_arm64.whl.metadata (6.6 kB)
Downloading gensim-4.3.2-cp

In [2]:
# Importing necessary libraries
import numpy as np
from gensim.models import Word2Vec

# Sample data
sentences = [['I', 'love', 'machine', 'learning'],
             ['machine', 'learning', 'is', 'awesome'],
             ['deep', 'learning', 'is', 'interesting']]

# Training Word2Vec model
model = Word2Vec(sentences, vector_size=5, window=3, min_count=1, sg=1)

# Getting embedding for a word
word_embedding = model.wv['machine']
print("Embedding for 'machine':", word_embedding)

# Getting embedding for a sentence
sentence_embedding = np.mean([model.wv[word] for word in sentences[0]], axis=0)
print("Embedding for sentence 'I love machine learning':", sentence_embedding)


Embedding for 'machine': [ 0.1476101  -0.03066943 -0.09073226  0.13108103 -0.09720321]
Embedding for sentence 'I love machine learning': [-0.01302523  0.02925177  0.0208698   0.0414466  -0.10625307]
