# Local Embeddings

We will try a few embedding models running locally and compare their performance

## References

- https://docs.llamaindex.ai/en/stable/examples/embeddings/huggingface.html#huggingfaceembedding
- Leaderboard : https://huggingface.co/spaces/mteb/leaderboard
- Explaining leaderboard: https://huggingface.co/blog/mteb

## Option 1 - Using Llama-Index

In [2]:
import os
## Optional, but set llamaindex cache dir to ./cache dir here.  Default is system tmp
## This way, we can easily see downloaded artifacts
os.environ['LLAMA_INDEX_CACHE_DIR'] = os.path.join(os.path.abspath(''), 'cache')

from llama_index.embeddings import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
embeddings = embed_model.get_text_embedding("Hello World!")

print(len(embeddings))
print(embeddings[:5])

  from .autonotebook import tqdm as notebook_tqdm


384
[-0.0032757227309048176, -0.011690807528793812, 0.041559189558029175, -0.03814816102385521, 0.024183066561818123]


### Let's try a few embedding models

See hugging face embedding models (sentence transformers) here : https://huggingface.co/models?library=sentence-transformers&sort=trending

Here are a select models for comparison.  Taken from leaderboard : https://huggingface.co/spaces/mteb/leaderboard

| model name                              | overall score | model params | model size | embedding length | url                                                            |
|-----------------------------------------|---------------|--------------|------------|------------------|----------------------------------------------------------------|
| intfloat/e5-mistral-7b-instruct         | 66.x          | 7.11 B       | 15 GB      | 4096             | https://huggingface.co/intfloat/e5-mistral-7b-instruct         |
| BAAI/bge-large-en-v1.5                  | 64.x          | 335 M        | 1.34 GB    | 1024             | https://huggingface.co/BAAI/bge-large-en-v1.5                  |
| BAAI/bge-small-en-v1.5                  | 62.x          | 33.5 M       | 133 MB     | 384              | https://huggingface.co/BAAI/bge-small-en-v1.5                  |
| sentence-transformers/all-mpnet-base-v2 | 57.8          |              | 438 MB     | 768              | https://huggingface.co/sentence-transformers/all-mpnet-base-v2 |
| sentence-transformers/all-MiniLM-L12-v2 | 56.x          |              | 134 MB     | 384              | https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2 |
| sentence-transformers/all-MiniLM-L6-v2  | 56.x          |              | 91 MB      | 384              | https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2  |

### Benchmark

In [3]:
embedding_models = [
    'BAAI/bge-large-en-v1.5' ,
    'BAAI/bge-small-en-v1.5' ,
    'sentence-transformers/all-mpnet-base-v2' ,
    'sentence-transformers/all-MiniLM-L12-v2' ,
    'sentence-transformers/all-MiniLM-L6-v2' ,
]

import time
import timeit

for model in embedding_models:
    embed_model = HuggingFaceEmbedding(model_name=model)

    embeddings = embed_model.get_text_embedding("Hello World!")
    print(f'model={model}, embeding_length={len(embeddings):,}')
    %timeit (embed_model.get_text_embedding("Hello World!"))
    print()


model=BAAI/bge-large-en-v1.5, embeding_length=1,024
12.7 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

model=BAAI/bge-small-en-v1.5, embeding_length=384
6.61 ms ± 76.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

model=sentence-transformers/all-mpnet-base-v2, embeding_length=768
6.61 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

model=sentence-transformers/all-MiniLM-L12-v2, embeding_length=384
6.85 ms ± 333 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

model=sentence-transformers/all-MiniLM-L6-v2, embeding_length=384
3.84 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



## Option 2: Encoding Using Sentence Transformers

In [4]:
from sentence_transformers import SentenceTransformer
#sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
# embeddings = model.encode(sentences)
embeddings = model.encode('a happy dog!')
print(model)
print (len(embeddings))
print(embeddings[:5])



SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Normalize()
)
384
[-0.00542399  0.07206922 -0.02727443  0.04371347 -0.0695779 ]
