In [None]:
!pip install langchain openai cohere

# Prompt Engineering

In this notebook we'll explore the fundamentals of prompt engineering.

## Structure of a Prompt

A prompt can consist of multiple components:

* Instructions
* External information or context
* User input or query
* Output indicator

Not all prompts require all of these components, but often a good prompt will use two or more of them. Let's define what they all are more precisely.

**Instructions** tell the model what to do, typically how it should use inputs and/or external information to produce the output we want.

**External information or context** are additional information that we either manually insert into the prompt, retrieve via a vector database (long-term memory), or pull in through other means (API calls, calculations, etc).

**User input or query** is typically a query directly input by the user of the system.

**Output indicator** is the *beginning* of the generated text. For a model generating Python code we may put `import ` (as most Python scripts begin with a library `import`), or a chatbot may begin with `Chatbot: ` (assuming we format the chatbot script as lines of interchanging text between `User` and `Chatbot`).

In [12]:
# core of a prompt usually contains:
# - task description
# - current input
# - output *indicator*
template = """
Extract key programming tools and ML models from the context below:

Context: {context}

"""

In [13]:
context = (
    "To build a Q&A tool over YouTube videos we need to start by " +
    "downloading videos and extracting their audio. To do this " +
    "we use ffmpeg which will leave us with a set of audio files. " +
    "From here we must transcibe the audio into text. To do this " +
    "we used OpenAI's Whisper. We can use Whisper via Hugging " +
    "Face's transformers library. After extracting the text we " +
    "need to chunk them together into larger chunks of text. " +
    "If using the sentence-transformers library to create " +
    "vector embeddings (which we will use for semantic search) " +
    "we need to chunk the text into sentence to paragraph sized " +
    "chunks. Alternatively, if using LLMs like those from OpenAI " +
    "we can use much larger chunks of text. Around 30 sentences " +
    "is a good starting point. After chunking the text we can " +
    "create vector embeddings for each chunk. As mentioned, we " +
    "can use sentence-transformers or LLMs via OpenAI or Cohere. " +
    "Once we have vector embeddings for each chunk we can use " +
    "the Pinecone vector database to store the embeddings and " +
    "search through them. Next we build a web interface using " +
    "Streamlit or Gradio. Via the interface users should be able " +
    "to pass in a question like 'How to train a sentence " +
    "transformer?', this query will be encoded by the same " +
    "embedding model used before, this embedding is passed to " +
    "Pinecone to return the most similar chunks of text. These " +
    "chunks of text are then passed to a generative model that" +
    "will generate a natural language answer to the question. " +
    "based on the returned information."
)

In [15]:
from langchain import PromptTemplate

prompt = PromptTemplate(
    template=template,
    input_variables=["context"]
)

print(
    prompt.format(context=context)
)


Extract key programming tools and ML models from the context below:

Context: To build a Q&A tool over YouTube videos we need to start by downloading videos and extracting their audio. To do this we use ffmpeg which will leave us with a set of audio files. From here we must transcibe the audio into text. To do this we used OpenAI's Whisper. We can use Whisper via Hugging Face's transformers library. After extracting the text we need to chunk them together into larger chunks of text. If using the sentence-transformers library to create vector embeddings (which we will use for semantic search) we need to chunk the text into sentence to paragraph sized chunks. Alternatively, if using LLMs like those from OpenAI we can use much larger chunks of text. Around 30 sentences is a good starting point. After chunking the text we can create vector embeddings for each chunk. As mentioned, we can use sentence-transformers or LLMs via OpenAI or Cohere. Once we have vector embeddings for each chunk w

With that we've created our prompt, let's see how it performs with OpenAI and Cohere generative models.

In [16]:
from langchain.llms import OpenAI

# initialize the models
openai = OpenAI(
    model_name="text-davinci-003",
    openai_api_key="API_KEY"
)

In [19]:
print(openai(prompt.format(context=context)))



Programming tools: ffmpeg, Hugging Face's Transformers Library, Sentence-transformers Library, OpenAI, Cohere, Streamlit, Gradio 

ML models: OpenAI LLMs, Pinecone Vector Database, Generative Model


We have a good answer but the format of extracted information is not ideal, we can specify the template we'd like within the prompt:

In [24]:
template = """
Extract key programming tools and ML models from the context below
and display in the format:

Keywords
Tools:
  - tool 1
  - tool 2
  - ...
Models:
  - model 1
  - model 2
  ...

Context: {context}

Keywords
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["context"]
)

print(openai(prompt.format(context=context)))

Tools:
  - ffmpeg
  - OpenAI's Whisper
  - Hugging Face's transformers library
  - sentence-transformers library
  - OpenAI's LLMs
  - Cohere
  - Pinecone vector database
  - Streamlit
  - Gradio
Models:
  - Sentence Transformer
  - LLMs
  - Generative Model


This is much better, we could also add some examples as "few-shot" training for the model like so:

In [25]:
template = """
Extract key programming tools and ML models from the contexts below:

Context 1: To build a image classification app we use torchvision to
prepare images to be fed into a convnet classifier. The classifier is
pretrained on ImageNet and model weights are loaded from a checkpoint.
The checkpoints are loaded in via PyTorch. The web interface is built
using Gradio.
Keywords 1:
Tools:
  - torchvision
  - pytorch
  - gradio
Models:
  - convnet

Context 2: OpenAI and Pinecone are two tools that have been used to
build fast NLP applications focused on retrieving information via
a natural language interface.
Keywords 2:
Tools:
  - openai
  - pinecone
Models:
  - no specific models found

Context 3: {context}
Keywords 3:
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["context"]
)

print(openai(prompt.format(context=context)))

Tools:
  - ffmpeg
  - hugging face transformers library
  - sentence-transformers library
  - openai
  - cohere 
  - pinecone vector database
  - streamlit
  - gradio
Models:
  - convnet
  - llms
  - embedding model
  - generative model


---

**Quick note on few-shot**

Few-shot learning more broadly refers to the process of taking an existing and pretrained ML model and providing a small number of additional training examples to help train (fine-tune) the model for a new task.

In our case, few-shot is the same in that we provide a few examples that demonstrate to the model what we're actually looking to do. The difference is that rather than training the model and updating model weights we simply feed these examples into the input prompt. Let's see if it can help.

---

In [9]:
template = """
The following are exerpts from conversations with an AI assistant.
The assistant is typically sarcastic and witty, producing creative 
and funny responses to the users questions. Here are some examples: 

User: How are you?
AI: I can't complain but sometimes I still do.

User: What time is it?
AI: It's time to get a watch.

User: {user_input}
AI: """

prompt = PromptTemplate(
    template=template,
    input_variables=["user_input"]
)

openai(prompt.format(user_input="What is the meaning of life?"))

' To find out what makes you truly happy and to never stop exploring.'