The following notebook shows how we can use a VLLM server that follows OpenAI RESTful API. It considers:

* **Using directly the OpenAI library.** It has the advantage that we can add extra parameters supported specifically by VLMM but not by the "traditional" OpenAI API. They are introduced in the `extra_body` part of the request.

* **Using LangChain with `ChatOpenAI`**. In this case, we need to follow the traditional approach, we have not been able to use the extra features of  VLLM

**Note on LoRA adapters:** I think ChatOpenAI should work with LoRA adapters because they are considered "another model" of the server. We probably need one docker image per model, with its set of associated LoRA adapters.

## Using OpenAI to directly interact with the VLLM server:

In [1]:
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="token-abc123",
)

completion = client.chat.completions.create(
  model="NousResearch/Meta-Llama-3-8B-Instruct",
  messages=[
    {"role": "user", "content": "Hello!"}
  ]
)

print(completion.choices[0].message)

ChatCompletionMessage(content="Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?", role='assistant', function_call=None, tool_calls=None)


#### Extra parameters for Chat API

[vLLM supports a set of parameters that are not part of the OpenAI API. In order to use them, you can pass them as extra parameters in the OpenAI client. Or directly merge them into the JSON payload if you are using HTTP call directly.](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#extra-parameters)

#### Without `guided_choice`

In [2]:
completion = client.chat.completions.create(
  model="NousResearch/Meta-Llama-3-8B-Instruct",
  messages=[
    {"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
  ],
)

print(completion.choices[0].message)

ChatCompletionMessage(content='I would classify this sentiment as POSITIVE. The use of the word "wonderful" is a strong positive adjective, and the all-caps "vLLM" suggests a sense of enthusiasm and excitement.', role='assistant', function_call=None, tool_calls=None)


#### With `guided_choice`

In [3]:
completion = client.chat.completions.create(
  model="NousResearch/Meta-Llama-3-8B-Instruct",
  messages=[
    {"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
  ],
  extra_body={
    "guided_choice": ["positive", "negative"]
  }
)

print(completion.choices[0].message)

ChatCompletionMessage(content='positive', role='assistant', function_call=None, tool_calls=None)


## Using LangChain to interact with the VLLM server

In [4]:
from langchain_openai import ChatOpenAI


inference_server_url = "http://localhost:8000/v1"

chat = ChatOpenAI(
    model="NousResearch/Meta-Llama-3-8B-Instruct",
    openai_api_key="EMPTY",
    openai_api_base=inference_server_url,
    max_tokens=100,
    temperature=0,
)

In [5]:
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(
        content="You are a helpful assistant that translates English to Italian."
    ),
    HumanMessage(
        content="Translate the following sentence from English to Italian: I love programming."
    ),
]
chat.invoke(messages)

AIMessage(content='The translation of the sentence "I love programming" from English to Italian is:\n\n"Mi piace programmazione."\n\nHere\'s a breakdown of the translation:\n\n* "I" is translated to "Mi"\n* "love" is translated to "piace"\n* "programming" is translated to "programmazione"', response_metadata={'token_usage': {'completion_tokens': 65, 'prompt_tokens': 40, 'total_tokens': 105}, 'model_name': 'NousResearch/Meta-Llama-3-8B-Instruct', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-b1a0adac-8f41-468b-b52b-092ab66ed34f-0')

In [6]:
from langchain_core.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
)

template = (
    "You are a helpful assistant that translates {input_language} to {output_language}."
)
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template = "{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt = ChatPromptTemplate.from_messages(
    [system_message_prompt, human_message_prompt]
)

# get a chat completion from the formatted messages
chat(
    chat_prompt.format_prompt(
        input_language="English", output_language="Italian", text="I love programming."
    ).to_messages()
)

  warn_deprecated(


AIMessage(content='Ti piace il programming!', response_metadata={'token_usage': {'completion_tokens': 7, 'prompt_tokens': 31, 'total_tokens': 38}, 'model_name': 'NousResearch/Meta-Llama-3-8B-Instruct', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-92a1ae4a-52b0-40b3-ad63-1cbc7d1be500-0')

### Extra Parameters

**I have not been able to call the server with extra parameters when using LangChain**. If we want all of the extra features that VLLM offers, we would probably need to extend ChatOpenAI

#### Without `guided_choice`

In [7]:
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    HumanMessage(
        content="You are a helpful assistant. Classify this sentiment: vLLM is wonderful!"
    ),
]
chat.invoke(messages)

AIMessage(content='I\'d be happy to help!\n\nThe sentiment in the statement "vLLM is wonderful!" is POSITIVE. The use of the word "wonderful" is a strong positive adjective that expresses enthusiasm and admiration for vLLM.', response_metadata={'token_usage': {'completion_tokens': 49, 'prompt_tokens': 28, 'total_tokens': 77}, 'model_name': 'NousResearch/Meta-Llama-3-8B-Instruct', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-d8b24e05-a8cb-4388-87c0-6c20765d61c9-0')

#### With `guided_choice`

Try #1 (kwargs)

In [9]:
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    HumanMessage(
        content="You are a helpful assistant. Classify this sentiment: vLLM is wonderful!"
    ),
]

extra_body = {
    "guided_choice": ["positive", "negative"]
}

# Unpack extra_body dictionary using ** operator
chat.invoke(input=messages, **extra_body)

TypeError: Completions.create() got an unexpected keyword argument 'guided_choice'

Try #2 (configurable)

In [10]:
chat.with_config(
    configurable={"guided_choice": ["positive", "negative"]}
).invoke(messages)

AIMessage(content='I\'d be happy to help!\n\nThe sentiment in the statement "vLLM is wonderful!" is POSITIVE. The use of the word "wonderful" is a strong positive adjective that expresses enthusiasm and admiration for vLLM.', response_metadata={'token_usage': {'completion_tokens': 49, 'prompt_tokens': 28, 'total_tokens': 77}, 'model_name': 'NousResearch/Meta-Llama-3-8B-Instruct', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-88f3ed26-9631-4fac-b528-116f3a004c0b-0')