# The World of Embeddings

## What are Embeddings

* Concept from Narutal Language Processing (NLP)
* Numerical representation of text
* Text is mapped onto a multi-dimensional **vector space**
* The numbers outputted by the model are the text's location in the space
    * Similar words appear closer together
    * Dissimilar words appear further away

## Why are embeddings useful?

* Embeddings allow semantic meaning to be captured
* **Semantic meaning**: context and intent behind the text

Example:

- Which way is it to the supermarket?
- Could I have directions to the shop?

These examples are similar, and the system returns similar answers.

## Semantic Search Engines

### Traditional search engines
* Use **keyword** pattern matching
* May miss the true intent
* Will miss word variations

This returns results with words matching the original words

### Semantic search engines
* Use embeddings to understand intent and context
> "confirmtable running shoes"  --> 0.5481, 0.249, ...

* It looks for the vector space

## Recommendation Systems

The semantic search are useful for recommendation systems

* Example: Job post recommendations
    * Recommended jobs based on descriptions already viewed
    * Mitigates variation in job title

## Classification

Similar to Recommendation systems

Classification tasks:

* Classify sentiment
* Cluster observations
* Categorization


* Example
   * Classifying news headlines


## Creating an Embedding Request

* Embeddings endpoints

In [1]:
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Embeddings are a numerical representation of text that can be used to measure the relatedness between two pieces of text."
)

In [3]:
response_dict = response.model_dump()
print(response_dict)

{'data': [{'embedding': [-0.016230566427111626, -0.01696062460541725, 0.0343233086168766, 0.0007829607930034399, 0.01564863510429859, 0.008136443793773651, 0.06170577555894852, -0.021446777507662773, -0.011035514995455742, 0.003380486276000738, -0.0009820074774324894, -0.005271759815514088, 1.603614873602055e-05, -0.03354034945368767, 0.041094861924648285, -0.0013066980754956603, 0.02577422372996807, -0.004639571998268366, 0.02026175707578659, 0.06576871126890182, 0.013553686439990997, -0.011606864631175995, 0.009009339846670628, 0.035931553691625595, 0.022853991016745567, 0.011363512836396694, 0.013077561743557453, 0.03808998689055443, 0.057219624519348145, -0.03961358591914177, -0.023044440895318985, -0.013797039166092873, 0.02535100094974041, -0.009125725366175175, -0.007078389171510935, 0.03741282969713211, -0.031741656363010406, 0.03311712667346001, -0.014389550313353539, 0.025329841300845146, 0.011183642782270908, 0.050532713532447815, -0.07004325091838837, 0.005665885284543037, 

In [4]:
print(response_dict['data'][0]['embedding'])

[-0.016230566427111626, -0.01696062460541725, 0.0343233086168766, 0.0007829607930034399, 0.01564863510429859, 0.008136443793773651, 0.06170577555894852, -0.021446777507662773, -0.011035514995455742, 0.003380486276000738, -0.0009820074774324894, -0.005271759815514088, 1.603614873602055e-05, -0.03354034945368767, 0.041094861924648285, -0.0013066980754956603, 0.02577422372996807, -0.004639571998268366, 0.02026175707578659, 0.06576871126890182, 0.013553686439990997, -0.011606864631175995, 0.009009339846670628, 0.035931553691625595, 0.022853991016745567, 0.011363512836396694, 0.013077561743557453, 0.03808998689055443, 0.057219624519348145, -0.03961358591914177, -0.023044440895318985, -0.013797039166092873, 0.02535100094974041, -0.009125725366175175, -0.007078389171510935, 0.03741282969713211, -0.031741656363010406, 0.03311712667346001, -0.014389550313353539, 0.025329841300845146, 0.011183642782270908, 0.050532713532447815, -0.07004325091838837, 0.005665885284543037, -0.05781213194131851, -0

In [5]:
print(response_dict['usage'])

{'prompt_tokens': 23, 'total_tokens': 23}


In [8]:
print(response_dict.keys())

dict_keys(['data', 'model', 'object', 'usage'])
