## 1.1 - Basic LLM Prompting with vLLM, Langchain and Prompt with Conversation Memory

## Preparation

In [2]:
!pip install -q langchain==0.1.9 openai==1.13.3


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [5]:
# Imports
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain_community.llms import VLLMOpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate
import os

In [3]:
INFERENCE_SERVER_URL = "https://mistral-7b-instruct-v0-3-maas-apicast-production.apps.prod.rhoai.rh-aiservices-bu.com:443"
MODEL_NAME = "mistral-7b-instruct"
API_KEY= os.getenv('API_KEY')

#### Create the LLM instance

In [7]:
# LLM definition
llm = VLLMOpenAI(
    openai_api_key=API_KEY,
    openai_api_base= f"{INFERENCE_SERVER_URL}/v1",
    model_name=MODEL_NAME,
    top_p=0.92,
    temperature=0.01,
    max_tokens=512,
    presence_penalty=1.03,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

#### Create the Prompt

In [8]:
template="""<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always be as helpful as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

Current conversation:
{history}
Human: {input}
AI:
[/INST]
"""
PROMPT = PromptTemplate(input_variables=["history", "input"], template=template)

#### And finally some memory for the conversation

In [9]:
memory=ConversationBufferMemory()

## First example, fully verbose mode

#### Create the Chain using the different components

In [10]:
# Verbose mode is intentionally set to True so you can see the prompt and the history from the buffer memory
conversation = ConversationChain(llm=llm,
                                 prompt=PROMPT,
                                 verbose=True,
                                 memory=memory
                                )

#### Let's talk...

In [12]:
first_input = "Describe Kubeflow in 100 words or less."
conversation.predict(input=first_input);



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always be as helpful as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

Current conversation:
Human: Describe Paris in 100 words or less.
AI: Paris, the capital of France, is a city renowned for its rich history, art, and culture. Known as the "City of Light," it's famous for iconic landmarks like the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral. The Seine River adds to its romantic charm, while bustling cafes, boutiques, and vibrant night

In [13]:
# This is a follow-up question without context to showcase the conversation memory
second_input = "Is there a registry?"
conversation.predict(input=second_input);



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always be as helpful as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

Current conversation:
Human: Describe Paris in 100 words or less.
AI: Paris, the capital of France, is a city renowned for its rich history, art, and culture. Known as the "City of Light," it's famous for iconic landmarks like the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral. The Seine River adds to its romantic charm, while bustling cafes, boutiques, and vibrant night

### Second example, no verbosity

In [15]:
# Verbose mode is intentionally set to True so you can see the prompt and the history from the buffer memory
conversation2 = ConversationChain(llm=llm,
                                 prompt=PROMPT,
                                 verbose=False,
                                 memory=memory
                                )

In [16]:
# Let's clear the previous conversation first
memory.clear()

In [19]:
first_input = "What is the last version of Kubernetes currently?"
conversation2.predict(input=first_input);

As of my last update, the latest version of Kubernetes is Kubernetes 1.23, which was released on May 10, 2022. However, it's always a good idea to check the official Kubernetes website for the most recent updates and versions.

In [None]:
second_input = "What is then ?"
conversation2.predict(input=first_input);