### OICM Inference for models deployed with: vLLM and TGI
##### ref: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html

In [7]:
import requests
import json
import os
from openai import OpenAI

In [37]:
api_key = os.getenv('OI_API_KEY') # Your valid api key
model_version_id = os.getenv('OI_MODEL_VERSION_ID') # model version id
oicm_host = ""
# example: https://inference.{oicm_host}/models/{model_version_id}/proxy/vllm/v1
# oicm_host: is the platform url ex: develop.openinnovation.ai

base_url = f"https://inference.{oicm_host}/models/{model_version_id}/proxy/v1"

model_name = "/data/Qwen/Qwen2-VL-2B-Instruct" # HF Model name ex: tiiuae/Falcon3-1B-Instruct
api_key = "dummy"

headers = {
    "Authorization": f"Bearer {api_key}"
}

### Using OpenAI API Client

In [40]:
client = OpenAI(
    base_url=base_url,
    api_key=api_key
)

In [42]:
stream = client.chat.completions.create(
            model=model_name,
            messages=[
                {"role": "system", "content": "You're a helpful assistant"},
                {"role": "user", "content": "what do you know about Egypt?"}
                
            ],
            max_tokens=256,
            temperature=0.7,
            stream=True
        )
        
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

APIConnectionError: Connection error.

### Using requests

In [19]:
payload = {
    "messages": [
        {"role": "system", "content": "You're a helpful assistant"},
        {"role": "user", "content": "what do you know about Egypt?"}
    ],
    "max_tokens": 256,
    "temperature": 0.7,
    "model": model_name,
    "stream": True
}

response = requests.post(f"{base_url}/chat/completions", json=payload, headers=headers, stream=True)

for token in response.iter_lines():
    try:
        decoded_token = token.decode("UTF-8")
        json_str = decoded_token[6:]
        token_json = json.loads(json_str)
        new_str = token_json['choices'][0]['delta']['content']
        print(new_str, end="", flush=True)
    except:
        pass

Egypt is a country located in the northeastern corner of Africa, bordered by Libya to the north, Sudan to the east, South Sudan to the south, and Israel and Palestine to the southwest. It is one of the oldest civilizations in the world, with a rich history dating back over 5000 years, and it is famous for its iconic landmarks like the Pyramids of Giza, the Sphinx, and the Nile River.

Egypt is located on the east bank of the Nile River, which is the longest river in Africa. The Nile has played a crucial role in the history and development of Egypt, providing fertile soil for agriculture and sustaining the Egyptian civilization.

The country is divided into two main regions: the Nile Valley, which includes the fertile agricultural lands along the river, and the Western Desert, which is arid and rocky.

Egypt is known for its diverse culture and heritage, with a mix of ancient Egyptian traditions and influences from other civilizations over the centuries. It has a robust economy, driven 

### Get the usage data
##### supported by vLLM and TGI
##### input_tokens and completion_tokens

In [78]:
# add {"stream_options": {"include_usage": True}}

payload = {
    "messages": [
        {"role": "system", "content": "You're a helpful assistant"},
        {"role": "user", "content": "what do you know about Egypt?"}
    ],
    "max_tokens": 256,
    "temperature": 0.7,
    "model": model_name,
    "stream": True,
    "stream_options": {
        "include_usage": True
    }
}

# The usage is returned in the last token

response = requests.post(f"{base_url}/chat/completions", json=payload, headers=headers, stream=True)

for token in response.iter_lines():
    try:
        decoded_token = token.decode("UTF-8")
        json_str = decoded_token[6:]
        token_json = json.loads(json_str)
        new_str = token_json['choices'][0]['delta']['content']
        print(new_str, end="", flush=True)
        if "usage" in token_json.keys() and token_json["usage"] is not None:
            print("\n\n")
            print("=" * 100)
            print(token_json)
            print(token_json["usage"])
            print("=" * 100)
    except:
        pass

Egypt is a country located in Northeast Africa, bordered by the Mediterranean Sea to the north, the Red Sea to the east, Sudan to the south, and Libya to the west. It is one of the oldest civilizations in the world, with a rich history stretching back over 5,000 years.

Key aspects of Egypt:

1. Geography: Egypt is divided into two main regions - the fertile Nile Valley and the desert area that surrounds it. The Nile River is the longest river in the world, providing water and fertile land for agriculture.

2. History: Egypt has a long and complex history, with notable periods including the Old, Middle, and New Kingdoms, as well as the rise of Ancient Egypt. The civilization began around 3100 BC and lasted until the Roman conquest in 30 BC.

3. Pyramids & Sphinx: One of the most iconic symbols of Egypt is the Great Pyramid of Giza, built as tombs for Pharaohs of the Old Kingdom. The Sphinx, with its lion's body and human head, stands guard over the pyramids.

4. Pyramids: There are sev

### Integrate with Langchain

In [34]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
        model_name=model_name,
        openai_api_key=api_key,
        openai_api_base=base_url,
        max_tokens=1000,
)

llm_answer = llm.invoke("Egypt is")

llm_answer.content

"Egypt is a country located in Northeast Africa, bordered by Sudan to the north, South Sudan to the east, Ethiopia to the northwest, and Libya to the west. It is also connected to the Mediterranean Sea to the south through the Suez Canal. The capital of Egypt is Cairo, which is also the largest city in the country. The modern nation-state of Egypt was formed in 1922 following the Arab nationalist struggle for independence from British colonial rule. Egypt has a rich history spanning over 5,000 years, making it one of the world's oldest civilizations."

### Langchain Structured Output
#### Structured output is only supported with models deployed with vLLM

In [44]:
from typing import List, Optional
from pydantic import BaseModel, Field

In [46]:
class BookRecommendation(BaseModel):
    """Details of a book recommendation."""
    title: str = Field(description="The title of the book")
    author: str = Field(description="The author of the book")
    genre: str = Field(description="The genre of the book")
    rating: float = Field(description="The book's rating on a scale from 1.0 to 5.0")
    reasons: List[str] = Field(
        description="Reasons why this book is recommended"
    )

In [50]:
structured_llm = llm.with_structured_output(BookRecommendation)

response = structured_llm.invoke("Recommend a good science fiction book for someone who loves space exploration.")

response

BookRecommendation(title='The Expanse', author='James S. A. Corey', genre='Science Fiction', rating=8.0, reasons=['The Expanse is a critically acclaimed series that explores a future where humanity has colonized the solar system and faces challenges such as resource scarcity, political conflict, and alien threats.', 'It delves into the complexities of human nature, ethics, and survival in a universe where boundaries blur.', 'Its rich world-building and intricate plotlines make it a compelling read for fans of space exploration and speculative fiction.', 'The series offers a deep dive into the consequences of human actions in a universe governed by gravity, technology, and the stars.', 'Readers will find the blend of suspense, character development, and political intrigue engaging and thought-provoking.', "Overall, 'The Expanse' is a solid choice for those who enjoy the depth and complexity of science fiction narratives set in the vast expanse of the cosmos."])