## 1.0 - Basic LLM Prompting with vLLM and Langchain 

## Preparation

In [None]:
!pip install -q langgraph==0.2.35 langchain_experimental==0.0.65 langchain-openai==0.1.25 termcolor==2.3.0 duckduckgo_search==7.1.0 openapi-python-client==0.12.3 langchain_community==0.2.19 wikipedia==1.4.0

In [None]:
# Imports
import json
import os
from os import listdir
from os.path import isfile, join
from langchain.chains import LLMChain
from langchain_community.llms import VLLMOpenAI
from langchain_openai import ChatOpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

In [None]:
INFERENCE_SERVER_URL = os.getenv('API_URL_GRANITE')
MODEL_NAME = "granite-3-8b-instruct"
API_KEY= os.getenv('API_KEY_GRANITE')

### Langchain

Langchain (https://www.langchain.com/) is a framework for developing applications powered by language models. It will take care for us of all the boilerplate code we would have to manually write to properly query an LLM.

We will start by creating an **llm** instance, defined by the location where the LLM API can be queried and some parameters that will be applied to the model. For example, `max_new_tokens` will instruct the model to answer with a maximum of 512 tokens (words or parts of words). `temperature`, set really low here, will instruct the model to stay truth-grounded, and not try to be too "creative". After all, we're not trying to write a fancy poem here!

#### Create the LLM instance

In [None]:
# LLM definition
llm = ChatOpenAI(
    openai_api_key=API_KEY,
    openai_api_base= f"{INFERENCE_SERVER_URL}/v1",
    model_name=MODEL_NAME,
    top_p=0.92,
    temperature=0.01,
    max_tokens=512,
    presence_penalty=1.03,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

#### Create the Prompt

In [None]:
template = """\
<|start_of_role|>system<|end_of_role|>
You are a helpful, respectful, and honest assistant. Always be as helpful as possible, while being safe. 
Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. 
Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something incorrect. 
If you don't know the answer to a question, please don't share false information.
<|end_of_text|>

<|start_of_role|>user<|end_of_role|>
Human: {input}
<|end_of_text|>

<|start_of_role|>assistant<|end_of_role|>
"""

PROMPT = PromptTemplate(input_variables=["input"], template=template)


## First Query to LLMs

#### Create the Chain using the different components

In [None]:
conversation = LLMChain(llm=llm,
                            prompt=PROMPT,
                            verbose=False,
                            )

  conversation = LLMChain(llm=llm,


#### Let's talk...

In [None]:
first_input = "Describe what is the S&P500 in 100 words or less."
conversation.predict(input=first_input);

The S&P 500 is a stock market index that tracks the performance of 500 large companies listed on stock exchanges in the United States. It's widely used as a benchmark for the U.S. stock market and is often referred to as the "S&P." The index covers approximately 80% of all publicly traded U.S. equities and is weighted by market capitalization, meaning that larger companies have a greater impact on the index's performance.

#### LLM without Conversational Memory

In [None]:
second_input = "How many companies are included?"
conversation.predict(input=second_input);

I'm sorry for the confusion, but I need more context to provide an accurate answer. Could you please specify which companies or industry you're referring to? This will help me give you the most helpful response possible.

## What's Next?

Let's head over to give our LLM Conversational Memory!