## 1.0 - Basic LLM Prompting with vLLM and Langchain 

## Preparation

In [2]:
!pip install -q langchain==0.1.9 openai==1.13.3


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [1]:
# Imports
import json
import os
from os import listdir
from os.path import isfile, join
from langchain.chains import LLMChain
from langchain_community.llms import VLLMOpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

In [13]:
INFERENCE_SERVER_URL = "https://mistral-7b-instruct-v0-3-maas-apicast-production.apps.prod.rhoai.rh-aiservices-bu.com:443"
MODEL_NAME = "mistral-7b-instruct"
API_KEY= os.getenv('API_KEY')

### Langchain

Langchain (https://www.langchain.com/) is a framework for developing applications powered by language models. It will take care for us of all the boilerplate code we would have to manually write to properly query an LLM.

We will start by creating an **llm** instance, defined by the location where the LLM API can be queried and some parameters that will be applied to the model. For example, `max_new_tokens` will instruct the model to answer with a maximum of 512 tokens (words or parts of words). `temperature`, set really low here, will instruct the model to stay truth-grounded, and not try to be too "creative". After all, we're not trying to write a fancy poem here!

#### Create the LLM instance

In [3]:
# LLM definition
llm = VLLMOpenAI(
    openai_api_key=API_KEY,
    openai_api_base= f"{INFERENCE_SERVER_URL}/v1",
    model_name=MODEL_NAME,
    top_p=0.92,
    temperature=0.01,
    max_tokens=512,
    presence_penalty=1.03,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

#### Create the Prompt

In [4]:
template="""<s>[INST]<<SYS>>
You are a helpful, respectful and honest assistant. Always be as helpful as possible, while being safe.
You will be asked a question, to which you must give an answer.
Your answer should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer to a question, answer "I don't know".
<</SYS>>

### QUESTION:
{input}

### ANSWER:
[/INST]
"""
PROMPT = PromptTemplate(input_variables=["input"], template=template)

## First Query to LLMs

#### Create the Chain using the different components

In [9]:
# Verbose mode is intentionally set to True so you can see the prompt and the history from the buffer memory
conversation = LLMChain(llm=llm,
                            prompt=PROMPT,
                            verbose=False,
                            )

#### Let's talk...

In [10]:
first_input = "Describe Kubeflow in 100 words or less."
conversation.predict(input=first_input);

Kubeflow is an open-source project that aims to make deploying machine learning (ML) workflows on Kubernetes simple, portable, and scalable. It provides a comprehensive ecosystem of tools, libraries, and frameworks for end-to-end ML pipelines, including data preparation, model training, serving, and management. Kubeflow enables data scientists and engineers to leverage the power of Kubernetes for ML applications, promoting collaboration, reproducibility, and efficiency in AI development.

#### LLM without Conversational Memory

In [11]:
second_input = "Is there a registry?"
conversation.predict(input=second_input);

The term "registry" can have several meanings depending on the context. It could refer to a database or list of items, such as the Windows Registry in computing, a registry of businesses or organizations, or a registry of vehicles. Without more specific details, it's difficult to provide an accurate answer. If you're referring to a specific type of registry, could you please provide more context or clarify your question?