### <span style="color:lightgray">DigEx 2024, Oslo, Norway</span>

# Prompt engineering for geoscientists
---

### Matt Hall, Equinor &nbsp; `mtha@equinor.com`

<span style="color:lightgray">&copy; Matt Hall, Equinor &nbsp; | &nbsp; licensed CC BY, please share this work</span>

In [None]:
# See https://platform.openai.com/docs/quickstart
from dotenv import load_dotenv

_ = load_dotenv(".env") # If key is in a file.

In [None]:
from openai import OpenAI

MODEL = "gpt-3.5-turbo"

def ask(prompt, temperature=0):
    completion = OpenAI().chat.completions.create(
        model=MODEL,
        temperature=temperature,
        messages=[
            {"role": "user", "content": prompt}
        ])
    return completion.choices[0].message.content

## Large language models

1. Base models
2. Generic instruction tuned 👈 what we mostly have
3. Custom fine tuned


## What can they do?

Lots of things:

- **Transformation**, eg translation, correction, formatting
- **Summarization**, eg summaries, refactoring
- **Analysis**, eg keywords, topics, sentiment, classification
- **Expansion** 🐉 eg brainstorming, text generation, Q&A

Unfortunately, they are bad at some of them.

Unfortunately, they are also very convincing.

## Not just bad, but weird

In [None]:
ask("I have nine thin sections. Two go "
    "to the lab and I drop one. Ashley "
    "gives me four more, but I lose three. "
    "Quick, how many thin sections do I "
    "have now? ")

## Really weird

[Blog post](https://www.lesswrong.com/posts/kmWrwtGE9B9hpbgRT/a-search-for-more-chatgpt-gpt-3-5-gpt-4-unspeakable-glitch) by Martin Fell.

In [None]:
ask("Spell ' JSBracketAccess'")

## Some tasks are difficult for them

In [None]:
ask("Take the letters in "
    "'stratigraphically' and "
    "reverse them")

In [None]:
print(_[::-1])

We can force the model to tokenize the letters, eg:

In [None]:
ask("Take the letters in "
    "S-T-R-A-T-I-G-R-A-P-H-I-C-A-L-L-Y "
    "and reverse them")

In [None]:
print(_[::-1])

## What is a token?

In [None]:
import tiktoken

encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
tokens = encoding.encode("stratigraphically")
[encoding.decode_single_token_bytes(token).decode() for token in tokens]

In [None]:
tokens = encoding.encode("S-T-R-A-T-I-G-R-A-P-H-I-C-A-L-L-Y")
[encoding.decode_single_token_bytes(token).decode() for token in tokens]

## They hallucinate and confabulate

In [None]:
ask("In which Tintin story does he "
    "sell his fossil collection?")

## They never ask for clarification

In [None]:
ask("What do you get if you mix "
    "red and green?")

## So do LLMs know anything?

Sort of, but it's 

1. From the Internet.
2. Lossily compressed.
3. Probabilistically interpolated.

Ambiguity is a problem.

In [None]:
ask("What is Polaris?")

In [None]:
# There are over 280 Williamm Smiths on
# https://en.wikipedia.org/wiki/William_Smith
print(ask("Who was William Smith?")) 

In [None]:
print(ask("Who was William Smith? "
          "Ask clarifying questions "
          "if you need to.")) 

In [None]:
print(ask("What does 'run' mean?"))

In [None]:
ask("Is 'run' a verb?")

In [None]:
ask("Is 'run' a noun?")

In [None]:
ask("Is sandstone porous?")

In [None]:
ask("""
Q: Is shale porous?
A: Shale is porous but has low porosity,
   and very low effective porosity. It
   depends on its grain size, 
   grain composition, and diagenesis.

Q: Is limestone porous?
A: Limestone can be very porous but it
   can also be tight. It depends on many
   factors, such as its depositional and
   diagenetic history.

Q: Is sandstone porous?
A: """)

In [None]:
# Type 2 truth.
ask("Can granite be a reservoir rock?")

In [None]:
question = "Is sandstone permeable?"
ask(question)

In [None]:
print(ask(f"""I asked an LLM: {question}

I got the response: {_}

1. Consider how the response could be improved \
and made more complete.
2. Write a better one incorporating these \
improvements."""))

## Can LLMs reason about logic?

In [None]:
ask("This local sandstone has fossils. "
    "All Cretaceous sandstones in this "
    "basin are well sorted. No porous "
    "sandstone in this basin is "
    "fossiliferous. All well sorted "
    "sandstones are porous. Can I say "
    "anything about its age?")

# Try sorting the premises.

## Can LLMs reason about time and space?

In [None]:
ask("I am standing in the garden. "
    "I walk to the kitchen and pick up "
    "a cup, then walk to the living room. "
    "I pick place a ball in the cup. "
    "I carry the cup to the hall and turn "
    "it upside down. I walk to the kitchen, "
    "fill the cup with water, "
    "then go back out to the garden. "
    "Where is the ball?"
)

This seems similar to reasoning about geological events:

In [None]:
ask("I have a geological question. "
    "Layer H overlies Layer N, which in turn "
    "overlies Layer B. Regional normal fault Q "
    "has E-W strike and cuts layers B, N and H. "
    "Fold P folds Layer B. Dyke Z has NE-SW "
    "strike and cuts Dyke X, which has NNE-SSW "
    "strike and cuts layers B and N. Is Dyke X "
    "cut by the fault?")  # Yes.

## LLMs are really good at text analysis

Text processing is laborious for humans.

It is also hard to do with supervised learning.

These abstracts are from the most recent issues of [_Geophysics_](https://library.seg.org/doi/10.1190/geo2023-0264.1), [_Geochemistry_](https://doi.org/10.1016/j.chemer.2023.126022) and [_Palaeontology_](https://onlinelibrary.wiley.com/doi/full/10.1111/pala.12690).

In [None]:
abstracts = [
    "Projection over convex sets (POCS) is one of the most widely used algorithms to interpolate seismic data sets. A formal understanding of the underlying objective function and the associated optimization process is, however, lacking to date in the literature. Here, POCS is shown to be equivalent to the application of the half-quadratic splitting (HQS) method to the 𝐿0 norm of an orthonormal projection of the sought after data, constrained on the available traces. Similarly, the apparently heuristic strategy of using a decaying threshold in POCS is revealed to be the result of the continuation strategy that HQS must use to converge to a solution of the minimizer. In light of this theoretical understanding, another methods able to solve this convex optimization problem, namely the Chambolle-Pock primal-dual algorithm, is shown to lead to a new POCS-like method with superior interpolation capabilities at nearly the same computational cost of the industry-standard POCS method.",
    "The Nidar ophiolite is one of the well-preserved and almost complete ophiolite sections of the Neo-Tethyan oceanic lithosphere, obducted along the continental margin between the Indian and the Eurasian plate. This ophiolite sequence is mostly dominated by ultramafic rocks, consisting of forearc-related refractory, mainly harzburgite, dunite, and serpentinite, with minor intrusions of lherzolite, chromitites, and pyroxenites. In this present study, detailed mineralogical, whole rock geochemistry (major oxides, trace elements, PGE), and Nd isotopic composition of mantle-derived peridotites have been carried out to constrain the petrogenesis and melt evolution. These peridotites are depleted in nature due to the low modal composition of clinopyroxene, high forsterite content in olivine, and wide variation in Cr# and bulk rock chemistry, indicating variable degree of partial melting. The spoon-shaped rare earth element (REE) patterns indicate metasomatism by fluids derived from a subducting slab enriched in light REEs. Geochemical composition of the studied peridotites rocks is marked by high ratio of Al2O3/TiO2, LILE-LREE enrichment, HFSE depletion, and spoon-shaped chondrite-normalized REE patterns and (La/Sm)N > 1 and (Gd/Yb)N < 1, indicates some involvement of boninitic mantle melts and validate a subduction initiation process. The total PGE of the peridotites (ΣPGE = 33–337 ppb) is much more enriched than that of the primitive mantle and other ophiolite peridotites. The PGE distribution displays a concave upward pattern with higher PPGE/IPGE ratios (i.e., 0.11–1.45), suggesting that partial melting is not the only process for the evolution of the Nidar ophiolite peridotites. Enrichment of PPGE and incompatible elements (like LREE) and higher Pd/Ir ratio (0.69–8.26) indicates that these peridotites have undergone fluid/melt interaction in a supra-subduction zone (SSZ) tectonic setting. PGE concentrations of these depleted harzburgites and dunites, formed by partial melting of cpx–harzburgites in an SSZ that produced the boninitic-like melt. The enrichment of incompatible elements like the PPGE is mainly due to the circulation of fluids in the subduction zone, which leads to the PGE fractionation in mantle peridotites. Also, these peridotites have 143Nd/144Nd ratios (0.51148–0.51262) and εNd(t) (t = 140 Ma) values (i.e., +0.97 to −21.3), indicating derivation from depleted mantle sources within an intra-oceanic arc setting. The geochemical behavior exhibited by the Nidar ophiolite peridotites suggests the evolution of a highly depleted fore-arc mantle wedge significantly modified by various fluids and melts during subduction. The mineralogical, geochemical, and Nd isotopic composition of these peridotites and dunites mutually depict the diverse mantle compositions, suggesting insights into the interactions between the oceanic crust and mantle as well as associated geochemical cycling in an SSZ environment.",
    "Tridentinosaurus antiquus represents one of the oldest fossil reptiles and one of the very few skeletal specimens with evidence of soft tissue preservation from the Cisuralian (Early Permian) of the Italian Alps. The preservation and appearance of the fossil have puzzled palaeontologists for decades and its taphonomy and phylogenetic position have remained unresolved. We reanalysed T. antiquus using ultraviolet light (UV), 3D surface modelling, scanning electron microscopy coupled with energy dispersive spectroscopy (SEM-EDS), micro x-ray diffraction (μ-XRD), Raman and attenuated total reflectance Fourier transformed infrared (ATR-FTIR) spectroscopy to determine the origin of the body outline and test whether this represents the remains of organically preserved soft tissues which in turn could reveal important anatomical details about this enigmatic protorosaur. The results reveal, however, that the material forming the body outline is not fossilized soft tissues but a manufactured pigment indicating that the body outline is a forgery. Our discovery poses new questions about the validity of this enigmatic taxon."
]

print("Word counts:")
[len(a.split()) for a in abstracts]

In [None]:
prompt = f"""
A scientific abstract follows the #### characters.
Perform three tasks:
 1 Write a short, one-sentence summary.
 2 Extract up to three keywords.
 3 Classify the abstract into a category from the \
following list: Stratigraphy, Sedimentology, Volcanology, \
Geochemistry, Mineralogy, Palaeontology, Structural \
geology, Petrology, Economic geology, Planetary geology.
Provide your output in JSON format with the keys:\
summary, keywords, category.
####
"""

In [None]:
for abstract in abstracts:
    print(ask(prompt + abstract))

## Can agents provide knowledge?

**Agents** can provide services:

- Search
- Code execution
- API queries
- Database queries

In [None]:
prompt = ("When did the dinosaurs "
          "go extinct?")
ask(prompt)

In [None]:
from langchain_openai import ChatOpenAI
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType

llm = ChatOpenAI(temperature=0, model=MODEL)

tools = ["wikipedia", "llm-math"]

agent= initialize_agent(
    load_tools(tools, llm=llm), 
    llm, 
    agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    handle_parsing_errors=True,
    verbose=True
)

In [None]:
result = agent(prompt)
result['output']

In [None]:
_

In [None]:
ask("What is 12.81% of "
    "66 million years?"
    )  # 8.4546 Ma

In [None]:
agent("What is 12.81% of "
      "66 million years?"
      )

## It doesn't always work though

In [None]:
# Zoeppritz died at 26 in 1908.
prompt = ("How old was Karl Bernhard Zoeppritz "
          "when he published the Zoeppritz "
          "equations?")
ask(prompt)

In [None]:
result = agent(prompt)

In [None]:
_

## Conclusions

#### 1. LLMs are amazing but flawed yet convincing
#### 2. They can be really weird or hallucinate
#### 3. Don't ask them for: information, reasoning
#### 4. Do ask them for: summarization, analysis, ideas
#### 5. What's next? Learn about agents, RAG, ensembles


<span style="color:lightgray">&copy; 2024 Matt Hall, Equinor &nbsp; | &nbsp; licensed CC BY, please share this work</span>