# Local Embeddings
In this Python notebook, we will try running embedding models locally and compare their performances.

## Using Llama-Index
We'll be making use of LlamaIndex to help us download the embedding models from Hugging Face.

In [None]:
# If you face issues running the code cell below, uncomment the 2 lines below and run this code cell to install some libraries

# !pip install llama-index-llms-huggingface
# !pip install llama-index-embeddings-huggingface

In [1]:
import os
# Optional, but set llamaindex cache dir to ../cache dir here. Default is system tmp
# This way, we can easily see downloaded artifacts
os.environ['LLAMA_INDEX_CACHE_DIR'] = os.path.join(os.path.abspath('../'), 'cache')

# from llama_index.embeddings import HuggingFaceEmbedding
# Uncomment the line above and comment the line below if you face an import error
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
embeddings = embed_model.get_text_embedding("Hello World!")

print(len(embeddings))
print(embeddings[:5])

  from .autonotebook import tqdm as notebook_tqdm
config.json: 100%|██████████████████████████████| 743/743 [00:00<00:00, 939kB/s]
model.safetensors: 100%|█████████████████████| 133M/133M [00:09<00:00, 14.2MB/s]
tokenizer_config.json: 100%|███████████████████| 366/366 [00:00<00:00, 2.41MB/s]
vocab.txt: 100%|█████████████████████████████| 232k/232k [00:00<00:00, 8.23MB/s]
tokenizer.json: 100%|████████████████████████| 711k/711k [00:00<00:00, 16.0MB/s]
special_tokens_map.json: 100%|██████████████████| 125/125 [00:00<00:00, 331kB/s]


384
[-0.003275729948654771, -0.011690844781696796, 0.04155920818448067, -0.03814816102385521, 0.02418307028710842]


Awesome! As you can see from the output above, we managed to download the `BAAI/bge-small-en-v1.5` model from Hugging Face by using Llama-Index and using the downloaded model to get the vector representation of the string "Hello World!".

### Let's Try a Few Embedding Models
As mentioned earlier in the quest, Hugging Face has many different models available for us to choose from. In this quest, we'll be using the following three models:

| model name                              | overall score | model params | model size | embedding length | url                                                            |
|-----------------------------------------|---------------|--------------|------------|------------------|----------------------------------------------------------------|
| BAAI/bge-small-en-v1.5                  | 62.x          | 33.5 M       | 133 MB     | 384              | https://huggingface.co/BAAI/bge-small-en-v1.5                  |
| sentence-transformers/all-mpnet-base-v2 | 57.8          |              | 438 MB     | 768              | https://huggingface.co/sentence-transformers/all-mpnet-base-v2 |
| sentence-transformers/all-MiniLM-L6-v2  | 56.x          |              | 91 MB      | 384              | https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2  |

If you would like to, you can also download other models provided by [Hugging Face](https://huggingface.co/spaces/mteb/leaderboard) and compare the results you get from running them!

### Benchmark
What we're doing in the code cell below is running a benchmark test on our three chosen models. Essentially, we're downloading the models into our local machines, using them to create embeddings for the string "Hello World!" and comparing the speed of each model when it comes to generating embeddings.

Don't worry if you see the `TqdmWarning: IProgress not found` warning message, it should not affect your code from running.

After running the code cell below, if you look in the `cache` folder of the root directory of your project, you should be able to see that you've downloaded these three embedding models.

In [2]:
import time
import timeit

embedding_models = [
    'BAAI/bge-small-en-v1.5',
    'sentence-transformers/all-mpnet-base-v2',
    'sentence-transformers/all-MiniLM-L6-v2',
]

for model in embedding_models:
    embed_model = HuggingFaceEmbedding(model_name=model)
    embeddings = embed_model.get_text_embedding("Hello World!")
    print(f'model={model}, embeding_length={len(embeddings):,}')
    %timeit (embed_model.get_text_embedding("Hello World!"))
    print()

model=BAAI/bge-small-en-v1.5, embeding_length=384
16.5 ms ± 589 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

model=sentence-transformers/all-mpnet-base-v2, embeding_length=768
19.3 ms ± 488 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

model=sentence-transformers/all-MiniLM-L6-v2, embeding_length=384
9.73 ms ± 189 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



## [OPTIONAL] Using SentenceTransformers
Note that besides using Llama-Index, you can also use other providers like SentenceTransformers to download embedding models; the code cell below demonstrates this.

In [3]:
from sentence_transformers import SentenceTransformer
# sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
# embeddings = model.encode(sentences)
embeddings = model.encode('a happy dog!')
print(model)
print (len(embeddings))
print(embeddings[:5])

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
.gitattributes: 100%|██████████████████████████████████████████████████████████████████████████████| 1.18k/1.18k [00:00<00:00, 3.29MB/s]
1_Pooling/config.json: 100%|████████████████████████████████████████████████████████████████████████████| 190/190 [00:00<00:00, 626kB/s]
README.md: 100%|███████████████████████████████████████████████████████████████████████████████████| 10.6k/10.6k [00:00<00:00, 11.7MB/s]
config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████| 612/612 [00:00<00:00, 1.96MB/s]
config_sentence_transformers.json: 100%|████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 425kB/s]
data_config.json: 100%|█████████

SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Normalize()
)
384
[-0.00542394  0.07206918 -0.02727438  0.04371348 -0.06957784]





And we're done with this notebook! Please **head back to the Quest page on StackUp now**.