# Infinity

`Infinity` allows to create `Embeddings` using a MIT-licensed Embedding Server via the AsyncEmbeddingEngine. 

This notebook goes over how to use Langchain with Embeddings with the [Infinity Github Project](https://github.com/michaelfeil/infinity).


## Imports

In [1]:
from langchain_community.embeddings import InfinityEmbeddingsLocal

## Optional: install infinity

To install infinity use the following command. For further details check out the [Docs on Github](https://github.com/michaelfeil/infinity).
Install the torch and onnx dependencies. 

```bash
pip install infinity_emb[torch,optimum]
```

In [2]:
# Install the infinity package
%pip install --upgrade --quiet  infinity_emb[optimum,torch]


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Embed your documents using your Infinity instance 

In [3]:
documents = [
    "Baguette is a dish.",
    "Paris is the capital of France.",
    "numpy is a lib for linear algebra",
    "You escaped what I've escaped - You'd be in Paris getting fucked up too",
]
query = "Where is Paris?"

In [7]:
embeddings = InfinityEmbeddingsLocal(
    model="sentence-transformers/all-MiniLM-L6-v2",
    # revision
    revision=None,
    # best to keep at 32
    batch_size=32,
    # for AMD/Nvidia GPUs via torch
    device="cuda",
    # warm up model before execution
)


async def embed():
    # TODO: This function is just to showcase that your call can run async.

    # important: use engine inside of `async with` statement to start/stop the batching engine.
    async with embeddings:
        # avoid closing and starting the engine often.
        # rather keep it running.
        # you may call `await embeddings.__aenter__()` and `__aexit__()
        # if you are sure when to manually start/stop execution` in a more granular way

        documents_embedded = await embeddings.aembed_documents(documents)
        query_result = await embeddings.aembed_query(query)
        print("embeddings created successful")
    return documents_embedded, query_result

The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.


In [5]:
# run the async code however you would like
# if you are in a jupyter notebook, you can use the following
documents_embedded, query_result = await embed()

embeddings created successful


In [8]:
# (demo) compute similarity
import numpy as np

scores = np.array(documents_embedded) @ np.array(query_result).T
dict(zip(documents, scores))

{'Baguette is a dish.': 0.31341904,
 'Paris is the capital of France.': 0.8148992,
 'numpy is a lib for linear algebra': 0.0044837985,
 "You escaped what I've escaped - You'd be in Paris getting fucked up too": 0.50892454}