## Working with an LLM programmatically

You have certainly interacted before with a Large Language Model (LLM) like ChatGPT. This is usually done through a UI or an application.

In this Notebook, we are going to use Python to connect and query an LLM directly through its API. For this Lab are using **Azure OpenAI GPT-4**.(https://azure.microsoft.com/en-us/blog/introducing-gpt4-in-azure-openai-service/).

### Library imports

First we will import the libraries we need, they are already installed on our workbench image so no need to run `pip install`.

In [None]:
import json
import os
from os import listdir
from os.path import isfile, join
from langchain_openai import AzureChatOpenAI
from langchain_community.llms import VLLMOpenAI
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate
from dotenv import load_dotenv
from azure.core.exceptions import ClientAuthenticationError
from azure.identity import DefaultAzureCredential

if load_dotenv():
    print("Found Azure OpenAI Base Endpoint: " + os.getenv("AZURE_OPENAI_ENDPOINT"))
else:
    print("No file .env found")

### Langchain

Langchain (https://www.langchain.com/) is a framework for developing applications powered by language models. It reduces the amount of boilerplate code required to interface properly with different LLMs.

We will start by creating an **llm** instance, defined by the location where the LLM API can be queried and some parameters that will be applied to the model. For example, `max_new_tokens` will instruct the model to answer with a maximum of 512 tokens (words or parts of words). `temperature`, set really low here, will instruct the model to stay truth-grounded, and not try to be too "creative". After all, we're not trying to write a fancy poem here!

In [None]:
if not os.getenv("AZURE_OPENAI_API_KEY"):
    credential = DefaultAzureCredential()
    access_token = credential.get_token("https://cognitiveservices.azure.com/.default")
    os.environ["AZURE_OPENAI_AD_TOKEN"] = access_token.token

llm = AzureChatOpenAI(
    azure_deployment = os.getenv("AZURE_DEPLOYMENT"),
    max_tokens=512,
    temperature=0.01,
    top_p=0.92,
    presence_penalty=1.03,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

We also need a **template** to be applied to every request we are sending to the model (the "Prompt").

When querying a model, you almost never want to send directly what the user has typed. On top of this entry, you need to give proper instructions to the model so that it knows how to handle it: what and how to answer, what NOT to answer, the tone it must use...

In [3]:
template="""<s>[INST]<<SYS>>
You are a helpful, respectful and honest assistant. Always be as helpful as possible, while being safe.
You will be asked a question, to which you must give an answer.
Your answer should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content.
Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct.
If you don't know the answer to a question, answer "I don't know".
<</SYS>>

### QUESTION:
{input}

### ANSWER:
[/INST]
"""
PROMPT = PromptTemplate(input_variables=["input"], template=template)

Langchain allows us to now easily "stitch" those elements together and create a **conversation** object that we will use to query the model.

In [4]:
conversation = PROMPT | llm

We are now ready to query the model!

In [None]:
query = "What is Artificial Intelligence?"

conversation.invoke ({"input": query});

You can come back to this notebook at section 3.7 for some optional exercises if you want.