# Xorbits Inference (Xinference)

[Xinference](https://github.com/xorbitsai/inference) is a powerful and versatile library designed to serve LLMs, 
speech recognition models, and multimodal models, even on your laptop. It supports a variety of models compatible with GGML, such as chatglm, baichuan, whisper, vicuna, orca, and many others. This notebook demonstrates how to use Xinference with LangChain.

## Installation

Install `Xinference` through PyPI:

In [None]:
%pip install "xinference[all]"

## Deploy Xinference in a Distributed Cluster

Firstly, start an Xinference supervisor using the `xinference-supervisor`. You can also use the option -p to specify the port and -H to specify the host. The default port is 9997.

Then, start the Xinference workers using `xinference-worker` on each server you want to run them on. 

## Wrapper

To use Xinference with LangChain, you need to first launch a model. You can use command line interface (CLI) to do so:

In [2]:
!xinference launch -n orca -s 3 -q q4_0

Model uid: c2dc8072-277e-11ee-8b88-d29396a3f064


A model uid is returned for you to use. Now you can use Xinference embeddings within LangChain:

In [5]:
from langchain.llms import Xinference

llm = Xinference(
    server_url="http://0.0.0.0:9997",
    model_uid = "c2dc8072-277e-11ee-8b88-d29396a3f064"
)

llm(
    prompt="Q: where can we visit in the capital of France? A:",
    generate_config={"max_tokens": 1024, "stream": True},
)

' The Eiffel Tower is a famous landmark in Paris, France. It is located on the Champ de Mars and is one of the most visited places in the world. Additionally, you can visit the Louvre Museum, Notre-Dame Cathedral, or the Palace of Versailles.'

### Integrate with a LLMChain

In [7]:
from langchain import PromptTemplate, LLMChain

template = "Where can we visit in the capital of {country}?"

prompt = PromptTemplate(template=template, input_variables=["country"])

llm_chain = LLMChain(prompt=prompt, llm=llm)

generated = llm_chain.run(country="France")
print(generated)

 We can visit the Eiffel Tower, the Louvre Museum, Notre-Dame Cathedral and the Palace of Versailles.


Lastly, terminate the model when you do not need to use it:

In [8]:
!xinference terminate --model-uid "c2dc8072-277e-11ee-8b88-d29396a3f064"