# Xorbits inference (Xinference)

This notebook goes over how to use Xinference embeddings within LangChain

## Installation

Install `Xinference` through PyPI:

In [None]:
%pip install "xinference[all]"

## Deploy Xinference in a Distributed Cluster

Firstly, start an Xinference supervisor using the `xinference-supervisor`. You can also use the option -p to specify the port and -H to specify the host. The default port is 9997.

Then, start the Xinference workers using `xinference-worker` on each server you want to run them on. 

## Wrapper

To use Xinference with LangChain, you need to first launch a model. You can use the RESTfulClient to do so:

In [None]:
from xinference.client import RESTfulClient

client = RESTfulClient("http://0.0.0.0:9997")

model_uid = client.launch_model(model_name="orca", quantization="q4_0", embedding = "True")

Now you can use Xinference embeddings within LangChain:

In [None]:
from langchain.embeddings import XinferenceEmbeddings

xinference = XinferenceEmbeddings(
    server_url="http://0.0.0.0:9997",
    model_uid = model_uid
)

In [None]:
query_result = xinference.embed_query("This is a test query")

In [None]:
doc_result = xinference.embed_documents(["text A", "text B"])

Lastly, terminate the model when you do not need to use it:

In [8]:
client.terminate_model(model_uid)