In [1]:
with open("./data/2302.11382v1.tex", "r") as f:
  doc = f.read()
doc, ref = doc.split("\\begin{thebibliography}")
ref = "\\begin{thebibliography}" + ref

In [2]:
sections = ["\\section"+s for s in doc.split("\\section")[1:]]
sections

["\\section{Introduction}\n\\label{sec:intro}\nConversational large language models (LLMs)~\\cite{bommasani2021opportunities}, such as ChatGPT~\\cite{bang2023multitask}, have generated immense interest in a range of domains for tasks ranging from answering questions on medical licensing exams~\\cite{gilson2022well} to generating code snippets. This paper focuses on enhancing the application of LLMs in several domains,\nsuch as helping developers code effectively and efficiently with unfamiliar APIs or allowing students to acquire new coding skills and techniques.\n\nLLMs are particularly promising in domains where humans and AI tools work together as trustworthy collaborators to more rapidly and reliably evolve software-reliant systems~\\cite{carleton2022architecting}. For example, LLMs are being integrated directly into software tools, such as Github's Co-Pilot \\cite{github,asare2022github,pearce2022asleep} and included in integrated development environments (IDEs), such as IntelliJ~

In [3]:
subsections = [
  [
    "\\subsection" + s if i!=0 else s
    for i, s in enumerate(sub.split("\\subsection"))
  ] for sub in sections
]
subsections

[["\\section{Introduction}\n\\label{sec:intro}\nConversational large language models (LLMs)~\\cite{bommasani2021opportunities}, such as ChatGPT~\\cite{bang2023multitask}, have generated immense interest in a range of domains for tasks ranging from answering questions on medical licensing exams~\\cite{gilson2022well} to generating code snippets. This paper focuses on enhancing the application of LLMs in several domains,\nsuch as helping developers code effectively and efficiently with unfamiliar APIs or allowing students to acquire new coding skills and techniques.\n\nLLMs are particularly promising in domains where humans and AI tools work together as trustworthy collaborators to more rapidly and reliably evolve software-reliant systems~\\cite{carleton2022architecting}. For example, LLMs are being integrated directly into software tools, such as Github's Co-Pilot \\cite{github,asare2022github,pearce2022asleep} and included in integrated development environments (IDEs), such as IntelliJ

In [4]:
subsubsection = [
  [
    [
      "\\subsubsection" + s if i!=0 else s
      for i, s in enumerate(subsub.split("\\subsubsection"))
    ] for subsub in sub
  ] for sub in subsections
]
subsubsection

[[["\\section{Introduction}\n\\label{sec:intro}\nConversational large language models (LLMs)~\\cite{bommasani2021opportunities}, such as ChatGPT~\\cite{bang2023multitask}, have generated immense interest in a range of domains for tasks ranging from answering questions on medical licensing exams~\\cite{gilson2022well} to generating code snippets. This paper focuses on enhancing the application of LLMs in several domains,\nsuch as helping developers code effectively and efficiently with unfamiliar APIs or allowing students to acquire new coding skills and techniques.\n\nLLMs are particularly promising in domains where humans and AI tools work together as trustworthy collaborators to more rapidly and reliably evolve software-reliant systems~\\cite{carleton2022architecting}. For example, LLMs are being integrated directly into software tools, such as Github's Co-Pilot \\cite{github,asare2022github,pearce2022asleep} and included in integrated development environments (IDEs), such as Intelli

In [5]:
chunks = [s for sec in subsubsection for sub in sec for s in sub]
len(chunks)

106

In [6]:
chunks

["\\section{Introduction}\n\\label{sec:intro}\nConversational large language models (LLMs)~\\cite{bommasani2021opportunities}, such as ChatGPT~\\cite{bang2023multitask}, have generated immense interest in a range of domains for tasks ranging from answering questions on medical licensing exams~\\cite{gilson2022well} to generating code snippets. This paper focuses on enhancing the application of LLMs in several domains,\nsuch as helping developers code effectively and efficiently with unfamiliar APIs or allowing students to acquire new coding skills and techniques.\n\nLLMs are particularly promising in domains where humans and AI tools work together as trustworthy collaborators to more rapidly and reliably evolve software-reliant systems~\\cite{carleton2022architecting}. For example, LLMs are being integrated directly into software tools, such as Github's Co-Pilot \\cite{github,asare2022github,pearce2022asleep} and included in integrated development environments (IDEs), such as IntelliJ~

In [7]:
min([len(s) for s in chunks])

34

In [8]:
[s for s in chunks if len(s) < 100]

['\\subsection{The Meta Language Creation Pattern}\n\\label{firstpattern}\n\n',
 '\\subsection{The Output Automater Pattern}\n\n',
 '\\subsection{The Flipped Interaction Pattern}\n\n',
 '\\subsection{The Persona Pattern}\n\n',
 '\\subsection{The Question Refinement Pattern}\n\n',
 '\\subsection{The Alternative Approaches Pattern}\n\n',
 '\\subsection{The Cognitive Verifier Pattern}\n\n',
 '\\subsection{The Fact Check List Pattern}\n\n',
 '\\subsection{The Template Pattern}\n\n',
 '\\subsection{The Infinite Generation Pattern}\n\n',
 '\\subsection{The Visualization Generator Pattern}\n\n',
 '\\subsection{The Game Play Pattern}\n\n',
 '\\subsection{The Reflection Pattern}\n\n',
 '\\subsection{The Refusal Breaker Pattern}\n\n',
 '\\subsection{The Context Manager Pattern}\n\\label{lastpattern}\n\n',
 '\\subsection{The Recipe Pattern}\n\\label{architectpattern}\n\n']

In [9]:
chunks = [s for s in chunks if len(s) > 100]

In [10]:
import re
import pandas as pd

In [11]:
heads = [re.search(r"\\(.*)section\{(.*)\}", s).group(0) for s in chunks]
heads = [s.split('{') for s in heads]
heads = [(a[1:], b[:-1]) for a, b in heads]
heads

[('section', 'Introduction'),
 ('section', 'Comparing Software Patterns \\\\ with Prompt Patterns'),
 ('subsection', 'Overview of Software Patterns'),
 ('subsection', 'Overview of Prompt Patterns'),
 ('subsection',
  "Evaluating Means for Defining a Prompt Pattern's Structure and Ideas"),
 ('subsection', 'A Way Forward: Fundamental Contextual Statements'),
 ('section', 'A Catalog of Prompt Patterns \\\\ for Conversational LLMs'),
 ('subsection', 'Summary of the Prompt Pattern Catalog'),
 ('subsubsection', 'Intent and Context'),
 ('subsubsection', 'Motivation'),
 ('subsubsection', 'Structure and Key Ideas'),
 ('subsubsection', 'Example Implementation'),
 ('subsubsection', 'Consequences'),
 ('subsubsection', 'Intent and Context'),
 ('subsubsection', 'Motivation'),
 ('subsubsection', 'Structure and Key Ideas'),
 ('subsubsection', 'Example Implementation'),
 ('subsubsection', 'Consequences'),
 ('subsubsection', 'Intent and Context'),
 ('subsubsection', 'Motivation'),
 ('subsubsection', 'St

In [12]:
df_chunks = pd.DataFrame(heads, columns=["type", "title"])
df_chunks["chunk"] = chunks
df_chunks

Unnamed: 0,type,title,chunk
0,section,Introduction,\section{Introduction}\n\label{sec:intro}\nCon...
1,section,Comparing Software Patterns \\ with Prompt Pat...,\section{Comparing Software Patterns \\ with P...
2,subsection,Overview of Software Patterns,\subsection{Overview of Software Patterns}\nA ...
3,subsection,Overview of Prompt Patterns,\subsection{Overview of Prompt Patterns}\n\lab...
4,subsection,Evaluating Means for Defining a Prompt Pattern...,\subsection{Evaluating Means for Defining a Pr...
...,...,...,...
85,subsubsection,Structure and Key Ideas,\subsubsection{Structure and Key Ideas}\n\nFun...
86,subsubsection,Example Implementation,\subsubsection{Example Implementation}\n\nAn e...
87,subsubsection,Consequences,\subsubsection{Consequences}\n\nOne consequenc...
88,section,Related Work,\section{Related Work}\n\label{related}\n\nSof...


In [13]:
import openai
from openai.embeddings_utils import get_embeddings
import streamlit as st

openai.api_key = st.secrets["OPENAI_API_KEY"]

In [14]:
embeddings = get_embeddings(
  df_chunks.chunk.to_list(),
  engine="text-embedding-ada-002"
)

In [15]:
df_chunks["embedding"] = embeddings
df_chunks

Unnamed: 0,type,title,chunk,embedding
0,section,Introduction,\section{Introduction}\n\label{sec:intro}\nCon...,"[-0.022545885294675827, 0.01123933307826519, 0..."
1,section,Comparing Software Patterns \\ with Prompt Pat...,\section{Comparing Software Patterns \\ with P...,"[-0.015552609227597713, 0.012375413440167904, ..."
2,subsection,Overview of Software Patterns,\subsection{Overview of Software Patterns}\nA ...,"[-0.009582468308508396, 0.014695747755467892, ..."
3,subsection,Overview of Prompt Patterns,\subsection{Overview of Prompt Patterns}\n\lab...,"[-0.013566446490585804, 0.015537639148533344, ..."
4,subsection,Evaluating Means for Defining a Prompt Pattern...,\subsection{Evaluating Means for Defining a Pr...,"[-0.019268428906798363, 0.013945139944553375, ..."
...,...,...,...,...
85,subsubsection,Structure and Key Ideas,\subsubsection{Structure and Key Ideas}\n\nFun...,"[-0.005268328823149204, 0.02675585262477398, -..."
86,subsubsection,Example Implementation,\subsubsection{Example Implementation}\n\nAn e...,"[0.0006337417871691287, 0.0069627100601792336,..."
87,subsubsection,Consequences,\subsubsection{Consequences}\n\nOne consequenc...,"[0.00674416683614254, 0.009749607183039188, -0..."
88,section,Related Work,\section{Related Work}\n\label{related}\n\nSof...,"[-0.013976316899061203, 0.006437990348786116, ..."


In [16]:
df_chunks.to_csv("./data/2302.11382v1_embeddings.csv")

In [17]:
pd.read_csv("./data/2302.11382v1_embeddings.csv", index_col=0)

Unnamed: 0,type,title,chunk,embedding
0,section,Introduction,\section{Introduction}\n\label{sec:intro}\nCon...,"[-0.022545885294675827, 0.01123933307826519, 0..."
1,section,Comparing Software Patterns \\ with Prompt Pat...,\section{Comparing Software Patterns \\ with P...,"[-0.015552609227597713, 0.012375413440167904, ..."
2,subsection,Overview of Software Patterns,\subsection{Overview of Software Patterns}\nA ...,"[-0.009582468308508396, 0.014695747755467892, ..."
3,subsection,Overview of Prompt Patterns,\subsection{Overview of Prompt Patterns}\n\lab...,"[-0.013566446490585804, 0.015537639148533344, ..."
4,subsection,Evaluating Means for Defining a Prompt Pattern...,\subsection{Evaluating Means for Defining a Pr...,"[-0.019268428906798363, 0.013945139944553375, ..."
...,...,...,...,...
85,subsubsection,Structure and Key Ideas,\subsubsection{Structure and Key Ideas}\n\nFun...,"[-0.005268328823149204, 0.02675585262477398, -..."
86,subsubsection,Example Implementation,\subsubsection{Example Implementation}\n\nAn e...,"[0.0006337417871691287, 0.0069627100601792336,..."
87,subsubsection,Consequences,\subsubsection{Consequences}\n\nOne consequenc...,"[0.00674416683614254, 0.009749607183039188, -0..."
88,section,Related Work,\section{Related Work}\n\label{related}\n\nSof...,"[-0.013976316899061203, 0.006437990348786116, ..."


In [18]:
from openai.embeddings_utils import get_embedding, cosine_similarity
import numpy as np
from ast import literal_eval

def retriever(query:str):
  top_n = 3
  df = pd.read_csv("./data/2302.11382v1_embeddings.csv", index_col=0)
  df["embedding"] = df.embedding.apply(literal_eval).apply(np.array)
  query_embedding = get_embedding(query, engine="text-embedding-ada-002")
  df["similarity"] = df.embedding.apply(lambda x: cosine_similarity(x, query_embedding))
  return df.sort_values("similarity", ascending=False).head(top_n)[["title", "chunk", "similarity"]].to_json()

In [19]:
ret = retriever("Fact Check Pattern")
ret

'{"title":{"47":"Consequences","46":"Example Implementation","43":"Intent and Context"},"chunk":{"47":"\\\\subsubsection{Consequences}\\nThe \\\\textit{Fact Check List} pattern should be employed whenever users are not experts in the domain for which they are generating output.\\nFor example, a software developer reviewing code could benefit from the pattern suggesting security considerations. In contrast, an expert on software architecture is likely to identify errors in statements about the software structure and need not see a fact check list for these outputs.\\n\\nErrors are potential in all LLM outputs, so \\\\textit{Fact Check List} is an effective pattern to combine with other patterns, such as by combining it with the \\\\textit{Question Refinement} pattern. A key aspect of this pattern is that users can inherently check it against the output. In particular, users can directly compare the fact check list to the output to verify the facts listed in the fact check list actually 