### <span style="color:lightgray">EDC, November 2024</span>

# Developing solutions with LLMs
---

### Matt Hall, Equinor &nbsp; `mtha@equinor.com`

<span style="color:lightgray">&copy;2024  Matt Hall, Equinor &nbsp; | &nbsp; licensed CC BY, please share this work</span>

In [None]:
# See https://platform.openai.com/docs/quickstart
from dotenv import load_dotenv

__ = load_dotenv("secrets.txt") # If key is in a file.

In [None]:
from openai import OpenAI
import tiktoken
import os
from openai import AzureOpenAI
import httpx
import base64


MODEL = "gpt-35-turbo" # Deployment name; "gpt-4o" is multimodal.

CLIENT = AzureOpenAI(
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key=os.getenv("AZURE_OPENAI_KEY"),  
    api_version="2024-02-01",
)

def ask(prompt, model=MODEL, image_url=None):
    """Ask ChatGPT about an (optional) image."""
    content = []

    if image_url is not None:
        image_media_type = f"image/{image_format}"
        image = base64.b64encode(httpx.get(image_url).content).decode("utf-8")
        image_content = {
              "type": "image_url",
              "image_url": {"url": f"data:image/jpeg;base64,{image}"}
            }
        content.append(image_content)

    content.append({"type": "text", "text": prompt})
    
    messages = [{"role": "user", "content": content},]
    response = CLIENT.chat.completions.create(
        model=model,
        temperature=0.5,
        max_tokens=1024,
        messages=messages
    )
    
    return response.choices[0].message.content

def tokenize(prompt):
    encoding = tiktoken.encoding_for_model(MODEL)
    tokens = encoding.encode(prompt)
    decode = lambda token: encoding.decode_single_token_bytes(token).decode()
    return [decode(token) for token in tokens]

def get_embedding(text, model="text-embedding-3-large"):
   text = text.replace("\n", " ")
   return CLIENT.embeddings.create(input=[text], model=model).data[0].embedding


class Convo:
    def __init__(self, temperature=0, model='gpt-35-turbo'):
        self.temperature = temperature
        self.model = model
        self.messages = []

    def ask(self, prompt):
        self.messages.append({"role": "user", "content": prompt})
        response = CLIENT.chat.completions.create(
            model=self.model,
            temperature=self.temperature,
            max_tokens=1024,
            messages=self.messages
        )
        content = response.choices[0].message.content
        self.messages.append({'role': 'assistant',  'content': content})
        return content

    def history(self):
        return self.messages

# Needed for f-string printing later.
n = '\n'

# Check that things work.
ask('Repeat exactly: ✅ System check')

## Can agents help?

LLMs cannot answer questions like this:

In [None]:
q = ("What is the Gardner equation's prediction "
     "of density if Vp is 2000 m/s? "
     "Assume a = 0.31 and b = 0.25. "
     "Think step by step.")

print(ask(q))

Hmm...

---

Let's ask the smarter model.

In [None]:
q = ("What is the Gardner equation's prediction "
     "of density if Vp is 2000 m/s? "
     "Assume a = 0.31 and b = 0.25. "
     "Think step by step.")

print(ask(q, model='gpt-4o'))

Still no good.

---

Let's try something else.

**Agents** can provide services:

- Maths
- Search
- Code execution
- API calls
- Database queries

For example, a **math agent** can answer mathematical questions:

In [None]:
from langchain.agents import initialize_agent
from langchain.agents import load_tools
from langchain_openai import AzureChatOpenAI as ACOAI
from langchain.agents import AgentType

llm = ACOAI(
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key=os.getenv("AZURE_OPENAI_KEY"),  
    api_version="2024-02-01",
    model="gpt-35-turbo",
)

agent = initialize_agent(
    agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    llm=llm,
    tools=load_tools(['llm-math'], llm=llm),
    handle_parsing_errors=True,
    verbose=True,
)

agent.invoke(q)

---

## RAG

Retrieval-augmented generation is another approach to keeping an LLM's information on rails. We first find documents that are semantically similar to the query prompt, inject those into the prompt we give to the LLM, and tell it to constrain its response to information from those documents.

The approach depends on comparing embeddings:

In [None]:
query = "Describe the rocks in Ainsa."

ask(query)

In [None]:
text = ("Sandstones in the Ainsa basin are "
        "generally composed of carbonate grains.")

e = get_embedding(text)  # 8192 tokens, 1536 dimensions

len(e), e[:10]

In [None]:
docs = [
    "Sandstones in the Ainsa basin are "
    "generally composed of carbonate grains.",

    "Siltstones in the Ainsa basin have extensive "
    "early carbonate cementation.",

    "The rocks in the Ainsa Basin are generally "
    "Eocene in age.",

    "The rocks in the Tremp Basin are generally "
    "Cretaceous in age.",

    "Arsenal’s only loss in their last nine games "
    "was in the first leg.",
]

Now I have lots of docs, I need a way to decide how similar 2 docs are. Here's a popular one:

In [None]:
import numpy as np
from numpy.linalg import norm

def cosine(u, v):
    """Cosine similarity between two vectors"""
    return np.dot(u, v) / (norm(u) * norm(v))

Compute the similarities between my query and the docs:

In [None]:
q = get_embedding(query)

sims = []
for idx, doc in enumerate(docs):
    x = get_embedding(doc)
    sims.append(cosine(q, x))

Look at the similarities:

In [None]:
print(f"{query}\n{'='*len(query)}")
for doc, sim in zip(docs, sims):
    print(f"{doc[:29]}... {sim:.3f}")

Answer the question with the useful docs:

In [None]:
query = "Who is the king of Spain?"

In [None]:
sims = []
q = get_embedding(query)

for idx, doc in enumerate(docs):
    x = get_embedding(doc)
    sims.append(cosine(q, x))
    
useful = [d for d, s in zip(docs, sims) if s > 0.2]

prompt = (f"Answer this question:\n\n> {query}\n\nIMPORTANT: Use the following information only:\n\n"
    f"{n.join(useful)}\n\nIf the documents are not relevant, use your implicit knownledge to answer instead.")

print(prompt)

In [None]:
ask(prompt)

There are still plenty of questions about how best to do this:

- How to chunk the documents?
- How to compare the prompt?
- How to know when to look for documents?
- How to constrain the response to the retrieved docs?
- How to do all this efficiently?

## Gotcha

Let's add another document from our source.

>3. West of the Mediano Anticline
>
>Everywhere in the Pyrenees, except in the Ainsa Basin, fractures play an important role in the diagenesis.

It's long, so we split it into 2 pieces:

In [None]:
docs.extend([
    "3. West of the Mediano Anticline\nEverywhere "
    "in the Pyrenees, except",
    "in the Ainsa Basin, fractures play an "
    "important role in the diagenesis.",
])

Let's ask a new question, this time about diagenesis.

We're looking for docs with high similarity.

In [None]:
query = ("Summarize the diagenesis in "
         "the Ainsa Basin.")
q = get_embedding(query)

sims = []
for idx, doc in enumerate(docs):
    x = get_embedding(doc)
    sims.append(cosine(q, x))

print(f"{query}\n{'='*len(query)}")
for doc, sim in zip(docs, sims):
    print(f"{doc[:29]}... {sim:.3f}")

Now answer the question, using the retreived documents.

In [None]:
ask(f"{query}\nUse the following information only:\n"
    f"{n.join([d for d, s in zip(docs, sims) if s > 0.5])}.")

Uh oh! Looks like we need more people to work on this problem...

<span style="color:lightgray">&copy; 2024 Matt Hall, Equinor &nbsp; | &nbsp; licensed CC BY, please share this work</span>