We've actually already seen how it can be useful to "chain" various LLM operations together. In the Hinglish chat example we chained a response generation and then a machine translation using LLMs.

**As you solve problems with LLMs, do NOT always think about your task as a single prompt.** Decompose your problem into multiple steps. Just like programming which uses multiple functions, classes, etc. LLM integration is a new kind of reasoning engine that you can "program" in a multi-step, conditional, control flow sort of fashion.

Further, enterprise LLM appllications need reliability, trust, and consistency. **Because LLMs only predict probable text, they have no understanding or connection to reality.** This produces **hallucinations** that can be part of a coherent text block but factually (or otherwise) wrong. To deal with this we need to **ground** on LLM operations with external data.

# Dependencies and imports

In [1]:
! pip install langchain predictionguard lancedb html2text

Collecting predictionguard
  Downloading predictionguard-2.7.0-py2.py3-none-any.whl.metadata (872 bytes)
Collecting lancedb
  Downloading lancedb-0.16.0-cp38-abi3-manylinux_2_28_x86_64.whl.metadata (4.8 kB)
Collecting html2text
  Downloading html2text-2024.2.26.tar.gz (56 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.5/56.5 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting deprecation (from lancedb)
  Downloading deprecation-2.1.0-py2.py3-none-any.whl.metadata (4.6 kB)
Collecting pylance==0.19.2 (from lancedb)
  Downloading pylance-0.19.2-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (7.4 kB)
Collecting overrides>=0.7 (from lancedb)
  Downloading overrides-7.7.0-py3-none-any.whl.metadata (5.8 kB)
Downloading predictionguard-2.7.0-py2.py3-none-any.whl (21 kB)
Downloading lancedb-0.16.0-cp38-abi3-manylinux_2_28_x86_64.whl (27.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.4/2

In [2]:
import os
import urllib.request
import shutil

import html2text
from predictionguard import PredictionGuard
from langchain import PromptTemplate, FewShotPromptTemplate
from langchain.text_splitter import CharacterTextSplitter
import numpy as np
from getpass import getpass
import lancedb
import pandas as pd

In [3]:
pg_access_token = getpass('Enter your Prediction Guard access api key: ')
os.environ['PREDICTIONGUARD_API_KEY'] = pg_access_token

Enter your Prediction Guard access api key: ··········


In [4]:
client = PredictionGuard()

# Chaining

In [13]:
messages = [
    {
        "role": "system",
        "content": """Decide if the following input message is an informational question, a general chat message, or a request for code generation.
If the message is an informational question, answer it based on the informational context provided below.
If the message is a general chat message, respond in a kind and friendly manner based on the coversation context provided below.
If the message is a request for code generation, respond with a code snippet.

Informational Context: The Greater Los Angeles and San Francisco Bay areas in California are the nation's second and fifth-most populous urban regions, respectively. Greater Los Angeles has over 18.7 million residents and the San Francisco Bay Area has over 9.6 million residents. Los Angeles is state's most populous city and the nation's second-most populous city. San Francisco is the second-most densely populated major city in the country. Los Angeles County is the country's most populous county, and San Bernardino County is the nation's largest county by area. Sacramento is the state's capital.""",
    },
    {
        "role": "user",
        "content": "Hello, how are you?"
    },
    {
        "role": "assistant",
        "content": "I'm good, what can I help you with?"
    },
    {
        "role": "user",
        "content": "What is the captital of California?"
    },
    {
        "role": "assistant",
        "content": "Sacramento"
    },
    {
        "role": "user",
        "content": "Thanks!"
    },
    {
        "role": "assistant",
        "content": "You are welcome!"
    }
]

In [14]:
messages.append({
    "role": "user",
    "content": "What is the population of LA?"
})

result = client.chat.completions.create(
    model="Hermes-3-Llama-3.1-8B",
    messages=messages
)

print(result['choices'][0]['message']['content'])

Greater Los Angeles, the second most populous urban region in the United States, has over 18.7 million residents. Los Angeles itself is the nation's second-most populous city.


In [21]:
category_template = """Read the below input and determine if it is a request to generate computer code? Respond "yes" or "no" and no other text.

Input: "{query}"

Response: """

category_prompt = PromptTemplate(
    input_variables=["query"],
    template=category_template
)

qa_template = """Context: {context}

Question: {query}

Answer: """

qa_prompt = PromptTemplate(
    input_variables=["context", "query"],
    template=qa_template
)

qa_system = "Read the context below and respond with an answer to the question. If the question cannot be answered based on the context alone or the context does not explicitly say the answer to the question, write \"Sorry I had trouble answering this question, based on the information I found.\""
chat_system = "You are a friendly and clever AI assistant. Respond to the latest human message in the input conversation below."
code_system = "You are a code generation assistant. Respond with a code snippet and any explanation requested in the below input."

# QuestionID provides some help in determining if a sentence is a question.
class QuestionID:
    """
        QuestionID has the actual logic used to determine if sentence is a question
    """
    def padCharacter(self, character: str, sentence: str):
        if character in sentence:
            position = sentence.index(character)
            if position > 0 and position < len(sentence):

                # Check for existing white space before the special character.
                if (sentence[position - 1]) != " ":
                    sentence = sentence.replace(character, (" " + character))

        return sentence

    def predict(self, sentence: str):
        questionStarters = [
            "which", "wont", "cant", "isnt", "arent", "is", "do", "does",
            "will", "can"
        ]
        questionElements = [
            "who", "what", "when", "where", "why", "how", "sup", "?"
        ]

        sentence = sentence.lower()
        sentence = sentence.replace("\'", "")
        sentence = self.padCharacter('?', sentence)
        splitWords = sentence.split()

        if any(word == splitWords[0] for word in questionStarters) or any(
                word in splitWords for word in questionElements):
            return True
        else:
            return False

def response_chain(message, message_history, info_context):

  # Determine what kind of message this is.
  result = client.chat.completions.create(
      model="Hermes-3-Llama-3.1-8B",
      messages=[{"role": "user", "content": category_prompt.format(query=message)}]
  )

  # configure our chain
  if "yes" in result['choices'][0]['message']['content']:
    code = "yes"
  else:
    code = "no"
  qIDModel = QuestionID()
  question = qIDModel.predict(message)

  if code == "no" and question:

    # Handle the informational request.
    result = client.chat.completions.create(
        model="Hermes-3-Llama-3.1-8B",
        messages=[
            {
                "role": "system",
                "content": qa_system
            },
            {
                "role": "user",
                "content": qa_prompt.format(context=info_context, query=message)
            }
        ]
    )
    completion = result['choices'][0]['message']['content']

  elif code == "yes":

    # Handle the code generation request.
    result = client.chat.completions.create(
        model="Hermes-3-Llama-3.1-8B",
        messages=[
            {
                "role": "system",
                "content": code_system
            },
            {
                "role": "user",
                "content": message
            }
        ],
        max_tokens=2000
    )
    completion = result['choices'][0]['message']['content']

  else:

    messages_use = [
        {
            "role": "system",
            "content": chat_system
        }
    ]
    messages_use += message_history
    messages_use.append({
        "role": "user",
        "content": message
    })

    # Handle the chat message.
    result = client.chat.completions.create(
        model="Hermes-3-Llama-3.1-8B",
        messages=messages_use,
        output={
            "toxicity": True
        }
    )
    completion = result['choices'][0]['message']['content']

  return code, question, completion


In [22]:
info_context = "The Greater Los Angeles and San Francisco Bay areas in California are the nation's second and fifth-most populous urban regions, respectively. Greater Los Angeles has over 18.7 million residents and the San Francisco Bay Area has over 9.6 million residents. Los Angeles is state's most populous city and the nation's second-most populous city. San Francisco is the second-most densely populated major city in the country. Los Angeles County is the country's most populous county, and San Bernardino County is the nation's largest county by area. Sacramento is the state's capital."

convo_context = [
    {
        "role": "user",
        "content": "Hello, how are you?"
    },
    {
        "role": "assistant",
        "content": "I'm good, what can I help you with?"
    },
    {
        "role": "user",
        "content": "What is the captital of California?"
    },
    {
        "role": "assistant",
        "content": "Sacramento"
    },
    {
        "role": "user",
        "content": "Thanks!"
    },
    {
        "role": "assistant",
        "content": "You are welcome!"
    }
]

#message = "Which city in California has the highest population?"
#message = "I'm really enjoying this conversation."
message = "Generate some python code that gets the current weather in the bay area."

code, question, completion = response_chain(message, convo_context, info_context)
print("CODE GEN REQUESTED:", code)
print("QUESTION:", question)
print("")
print("RESPONSE:", completion)

CODE GEN REQUESTED: yes
QUESTION: False

RESPONSE: Here's a Python code snippet that retrieves the current weather in the Bay Area using the OpenWeatherMap API:

```python
import requests

def get_current_weather(location):
    api_key = "YOUR_API_KEY"  # Replace with your OpenWeatherMap API key
    base_url = "http://api.openweathermap.org/data/2.5/weather"

    params = {
        'q': location,
        'appid': api_key,
        'units': 'imperial'  # You can change 'imperial' to 'metric' for Celsius
    }

    response = requests.get(base_url, params=params)

    if response.status_code == 200:
        data = response.json()
        weather_description = data['weather'][0]['description']
        temperature = data['main']['temp']
        humidity = data['main']['humidity']
        wind_speed = data['wind']['speed']

        print(f"Current weather in {location}:")
        print(f"Description: {weather_description}")
        print(f"Temperature: {temperature} °F")
        print(f"Humi

# External knowledge in prompts, Grounding



We've actually already seen external knowledge within our prompts. In the question and answer example, the `context` that we pasted in was a copy of phrasing on the Domino's website.

In [23]:
template = """Read the context below and respond with an answer to the question. If the question cannot be answered based on the context alone or the context does not explicitly say the answer to the question, write "Sorry I had trouble answering this question, based on the information I found."

Context: {context}

Question: {question}

Answer:
"""

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=template,
)

In [24]:
context = "Domino's gift cards are great for any person and any occasion. There are a number of different options to choose from. Each comes with a personalized card carrier and is delivered via US Mail."

question = "How are gift cards delivered?"

myprompt = prompt.format(context=context, question=question)
print(myprompt)

Read the context below and respond with an answer to the question. If the question cannot be answered based on the context alone or the context does not explicitly say the answer to the question, write "Sorry I had trouble answering this question, based on the information I found."

Context: Domino's gift cards are great for any person and any occasion. There are a number of different options to choose from. Each comes with a personalized card carrier and is delivered via US Mail.

Question: How are gift cards delivered?

Answer:



In [25]:
result = client.chat.completions.create(
    model="Hermes-2-Pro-Llama-3-8B",
    messages=[{"role": "user", "content": myprompt}]
)
result['choices'][0]['message']['content']

'Gift cards are delivered via US Mail.'

# Retrieval augmentation

Retrieval-augmented generation (RAG) is an innovative approach that merges the capabilities of large-scale retrieval systems with sequence-to-sequence models to enhance their performance in generating detailed and contextually relevant responses. Instead of relying solely on the knowledge contained within the model's parameters, RAG allows the model to dynamically retrieve and integrate information from an external database or a set of documents during the generation process. By doing so, it provides a bridge between the vast knowledge stored in external sources and the powerful generation abilities of neural models, enabling more informed, diverse, and context-aware outputs in tasks like question answering, dialogue systems, and more.

In [26]:
# Let's get the html off of a website.
fp = urllib.request.urlopen("https://docs.kernel.org/process/submitting-patches.html")
mybytes = fp.read()
html = mybytes.decode("utf8")
fp.close()

# And convert it to text.
h = html2text.HTML2Text()
h.ignore_links = True
text = h.handle(html)

print(text)

![Logo](../_static/logo.svg)

# The Linux Kernel

6.12.0

### Quick search

### Contents

  * Development process
  * Submitting patches
    * Obtain a current source tree
    * Describe your changes
    * Separate your changes
    * Style-check your changes
    * Select the recipients for your patch
    * No MIME, no links, no compression, no attachments. Just plain text
    * Respond to review comments
    * Use trimmed interleaved replies in email discussions
    * Don’t get discouraged - or impatient
    * Include PATCH in the subject
    * Sign your work - the Developer’s Certificate of Origin
      * Developer’s Certificate of Origin 1.1
    * When to use Acked-by:, Cc:, and Co-developed-by:
    * Using Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: and Fixes:
      * Reviewer’s statement of oversight
    * The canonical patch format
      * Backtraces in commit messages
    * Explicit In-Reply-To headers
    * Providing base tree information
    * Tooling
    * References

In [27]:
# Clean things up just a bit.
text = text.split("### This Page")[1]
text = text.split("## References")[0]
print(text)



  * Show Source

English

  * Chinese (Simplified)
  * Chinese (Traditional)
  * Italian
  * Spanish

# Submitting patches: the essential guide to getting your code into the
kernel¶

For a person or company who wishes to submit a change to the Linux kernel, the
process can sometimes be daunting if you’re not familiar with “the system.”
This text is a collection of suggestions which can greatly increase the
chances of your change being accepted.

This document contains a large number of suggestions in a relatively terse
format. For detailed information on how the kernel development process works,
see A guide to the Kernel Development Process. Also, read Linux Kernel patch
submission checklist for a list of items to check before submitting code. For
device tree binding patches, read Submitting Devicetree (DT) binding patches.

This documentation assumes that you’re using `git` to prepare your patches. If
you’re unfamiliar with `git`, you would be well-advised to learn how to use
it, it

In [28]:
# Chunk the text into smaller pieces for injection into LLM prompts.
text_splitter = CharacterTextSplitter(chunk_size=700, chunk_overlap=50)
docs = text_splitter.split_text(text)
len(docs)



66

In [29]:
# Let's checkout some of the chunks!
for i in range(0, 3):
  print("Chunk", str(i+1))
  print("----------------------------")
  print(docs[i])
  print("")

Chunk 1
----------------------------
* Show Source

English

  * Chinese (Simplified)
  * Chinese (Traditional)
  * Italian
  * Spanish

# Submitting patches: the essential guide to getting your code into the
kernel¶

For a person or company who wishes to submit a change to the Linux kernel, the
process can sometimes be daunting if you’re not familiar with “the system.”
This text is a collection of suggestions which can greatly increase the
chances of your change being accepted.

Chunk 2
----------------------------
This document contains a large number of suggestions in a relatively terse
format. For detailed information on how the kernel development process works,
see A guide to the Kernel Development Process. Also, read Linux Kernel patch
submission checklist for a list of items to check before submitting code. For
device tree binding patches, read Submitting Devicetree (DT) binding patches.

This documentation assumes that you’re using `git` to prepare your patches. If
you’re unfam

In [30]:
# Let's take care of some of the formatting so it doesn't conflict with our
# typical prompt template structure
docs = [x.replace('#', '-') for x in docs]
print(docs[2])

Some subsystems and maintainer trees have additional information about their
workflow and expectations, see Documentation/process/maintainer-handbooks.rst.

-- Obtain a current source tree¶

If you do not have a repository with the current kernel source handy, use
`git` to obtain one. You’ll want to start with the mainline repository, which
can be grabbed with:

    
    
    git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git


In [33]:
# Now we need to embed these documents and put them into a "vector store" or
# "vector db" that we will use for semantic search and retrieval.

# Function to get embeddings from Prediction Guard in batches
def embed_text_column(df, text_column="text", batch_size=20):
    def embed_batch(batch):
        embeddings = []
        for i in range(0, len(batch), batch_size):
            chunk = batch[i:i+batch_size]
            response = client.embeddings.create(
                model="multilingual-e5-large-instruct",
                input=chunk,
                truncate=True
            )

            if "data" in response:
                chunk_embeddings = [item["embedding"] for item in response["data"] if "embedding" in item]
                embeddings.extend(chunk_embeddings)
            else:
                raise Exception(f"Error in embedding response: {response}")
        return [np.array(embedding) for embedding in embeddings]

    # Embed the documents in smaller batches
    embeddings = embed_batch(df[text_column].tolist())

    # Verify the number of embeddings matches the number of documents
    if len(embeddings) != len(df):
        raise ValueError("The number of embeddings does not match the number of documents")

    # Add embeddings to the dataframe as a single column named "vector"
    df["vector"] = embeddings

    return df

def embed(sentence):
    response = client.embeddings.create(
        model="multilingual-e5-large-instruct",
        input=sentence,
        truncate=True
    )
    if "data" in response:
        embedding = response["data"][0]["embedding"]
        return embedding
    else:
        raise Exception(f"Error in embedding response: {response}")

# LanceDB setup
uri = ".lancedb"
if os.path.exists(uri):
    shutil.rmtree(uri)
os.mkdir(uri)
db = lancedb.connect(uri)

# Create a dataframe with the chunk ids and chunks
metadata = []
for i in range(len(docs)):
    metadata.append([i, docs[i]])
doc_df = pd.DataFrame(metadata, columns=["chunk", "text"])
print(doc_df.head())

   chunk                                               text
0      0  * Show Source\n\nEnglish\n\n  * Chinese (Simpl...
1      1  This document contains a large number of sugge...
2      2  Some subsystems and maintainer trees have addi...
3      3  Note, however, that you may not want to develo...
4      4  Describe user-visible impact. Straight up cras...


In [34]:
data=embed_text_column(doc_df)

In [35]:
# Create the DB table and add the records
if "linux" in db.table_names():
    db.drop_table("linux")

table = db.create_table("linux", data=data)

In [37]:
# Let's try to match a query to one of our documents.
message = "How many problems should be solved per patch?"
results = table.search(embed(message)).limit(5).to_pandas()
results.head()

Unnamed: 0,chunk,text,vector,_distance
0,15,When dividing your change into a series of pat...,"[0.0004274976, 0.020138724, -0.012549005, -0.0...",0.291016
1,6,The maintainer will thank you if you write you...,"[0.009361901, 0.028679557, -0.019474851, -0.02...",0.298232
2,17,Check your patches with the patch style checke...,"[-0.006910427, 0.004181309, -0.02513793, -0.00...",0.29865
3,14,"On the other hand, if you make a single change...","[0.0046620346, 0.01853433, -0.027573884, -0.02...",0.29995
4,55,"If a patch fixes a compile failure, it may not...","[0.01613208, 0.015890308, -0.022159725, -0.031...",0.317525


In [40]:
# Now let's augment our Q&A prompt with this external knowledge on-the-fly!!!
template = """Read the below input context and respond with a short answer to the given question. Use only the information in the below input to answer the question. If you cannot answer the question, respond with "Sorry, I can't find an answer, but you might try looking in the following resource."

Context: {context}

Question: {question}

Answer:
"""
qa_prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=template,
)

def rag_answer(message):

  # Search the for relevant context
  results = table.search(embed(message)).limit(5).to_pandas()
  results.sort_values(by=['_distance'], inplace=True, ascending=True)
  doc_use = results['text'].values[0]

  # Augment the prompt with the context
  prompt = qa_prompt.format(context=doc_use, question=message)

  # Get a response
  result = client.chat.completions.create(
      model="Hermes-3-Llama-3.1-8B",
      messages=[{"role": "user", "content": prompt}]
  )

  return result['choices'][0]['message']['content']


In [41]:
response = rag_answer("How many problems should be solved in a single patch?")

print('')
print("RESPONSE:", response)


RESPONSE: A single patch should solve a single logical change. It should be easily understood and verifiable by reviewers, and able to stand on its own merits.
