## Principles of Prompt Engineering
### I. Write clear and specific instructions
- **Tactic 1**: Use delimeters to clearly indicate distinct parts of the input. Those also help avoid prompt injections (a user's direct input to the prompt). 
    - Triple quotes: """
    - Triple brackticks: ```
    - Triple dashes: ---
    - Angle brackets: < >,
    - XML tags: ```<tag> </tag>```
- **Tactic 2**: Ask for structured ouput (e.g., json or HTML)
- **Tactic 3**: Ask the model to check whether conditions are satisfied. Check assumptins required to do the task; if not tell the model how to handle this. Also, how should the model handle edge cases?
- **Tactic 4**: Few shot prompting: Give examples of successful completing of the tasks. Then ask the model to perform the task.
### II. Give the model time to "think"
The model can make reasoning errors by rushing to an incorrect conculsion. You can encourage the model to spend more time (more computational resources) on the task. Sometimes we get better results when we explicitly instruct the model to reason from first principles before coming to a conclusion.

## Model Limitations
- Hallucinations: The model makes up statements that sound plausible but are not true.
- To reduce hallucinations ask the model to first find relevant information, then to answer conditional on this information. 

In [1]:
import openai
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())
openai.api_key = os.getenv("OPENAI_API_KEY")

# or you can specify api key directly without using dotenv by uncommenting the line below, 
# openai.api_key = "sk-<get the rest from openai>"

In [2]:
def get_completion(prompt, model="gpt-4", temp=0):
    """
    Generate a text completion using the OpenAI API.

    Parameters:
        prompt (str): The user's prompt or input text.
        model (str, optional): The model to use (default is "gpt-4").
        temp (float, optional): The temperature parameter controlling randomness (default is 0).

    Returns:
        str: The generated text completion.
    """
    messages = [
        {"role": "user", 
         "content": prompt}
    ]
    response = openai.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temp  
    )
    return response.choices[0].message.content

In [3]:
# Example input text
input_text = f"""Introducing "GreenEra: Your Eco-Friendly Companion" - Revolutionizing Sustainability! \
In a world urgently needing eco-conscious solutions, GreenEra emerges as a \
pioneering startup reshaping our lifestyles. With a relentless commitment to \
sustainability, GreenEra simplifies conscious living. Our curated marketplace \
offers a wide range of eco-friendly products, while personalized recommendations \
align with your values. Access a wealth of knowledge in our educational hub and \
track your carbon footprint. Join our community for events and connect with \
like-minded individuals. With GreenEra, quality and impact converge, ensuring a \
brighter, greener future. Join us at [www.green-era.com](https://www.green-era.com) \
and embrace a sustainable lifestyle today."""


### Tactic 1: Use delimiters

In [4]:
# What if you forget to provide the text? 
prompt0 = f"""Summarize the text delimited by triple backticks into a single sentence ``` ```"""
response = get_completion(prompt0)
print(response)

You didn't provide any text to summarize. Please provide the text you want summarized.


In [5]:
prompt1 = f"""Summarize the text delimited by triple backticks into a single sentence ```{input_text}```"""
response = get_completion(prompt1)
print(response)

GreenEra is a pioneering startup offering a curated marketplace of eco-friendly products, personalized recommendations, an educational hub, and a community for like-minded individuals, all aimed at promoting sustainable living and reducing carbon footprint.


## Tactic 2: Ask for structured output
For example, ask for a json output (which can be nicely read into a dictionary in Python downstream)

In [6]:
prompt2 = f"""Generate a list of three made-up news headlines along with the author \
and the topic. Provide them in a JSON format with the following keys: \
headline, author, topic."""
response = get_completion(prompt2)
print(response)


[
  {
    "headline": "Aliens Found Living Among Us: The Hidden Truth Unveiled",
    "author": "Johnathan Doe",
    "topic": "Extraterrestrial Life"
  },
  {
    "headline": "Time Travel Now Possible: Scientist Discovers Breakthrough Method",
    "author": "Jane Smith",
    "topic": "Science and Technology"
  },
  {
    "headline": "Atlantis Rediscovered: The Lost City Emerges from the Depths",
    "author": "Robert Brown",
    "topic": "Archaeology"
  }
]


### Tactic 3: Ask the model to check whether conditions are satisfied before completing the task

In [7]:
input_text = f"""he simplest way to make a cup of tea is to start with the basic ingredients: \
a tea bag or loose tea leaves, and boiling water. Begin by bringing the water to a rolling boil \
in a kettle or on the stovetop. Place a tea bag or a teaspoon of loose tea leaves into a cup. \
Then, pour the boiling water over the tea bag or leaves and allow it to steep for 2-5 minutes, \
depending on how strong you want your tea. Remove the tea bag or strain the tea if you used \
loose leaves. Finally, customize your cup of tea by adding sugar, milk, honey, or any other \
flavorings to suit your taste. Give it a good stir, and you're ready to enjoy your freshly brewed 
cup of tea. Just remember to exercise caution when handling boiling water to prevent burns."""

# create a function to accommodate different input_text sequences
def prompt3(input_text):
    prompt = f""" You will be provided with a text delimited by triple backticks. \
    If it contains a sequence of instructions, re-write them in the following format: \
    Step 1. 
    Step 2.
    ...
    Step N.
    If the text does not contain a sequence of instrutions, then simply write \"No steps provided\".
    ```{input_text}```"""
    return prompt

response = get_completion(prompt3(input_text))
print(response)

Step 1. Start with the basic ingredients: a tea bag or loose tea leaves, and boiling water.
Step 2. Bring the water to a rolling boil in a kettle or on the stovetop.
Step 3. Place a tea bag or a teaspoon of loose tea leaves into a cup.
Step 4. Pour the boiling water over the tea bag or leaves.
Step 5. Allow it to steep for 2-5 minutes, depending on how strong you want your tea.
Step 6. Remove the tea bag or strain the tea if you used loose leaves.
Step 7. Customize your cup of tea by adding sugar, milk, honey, or any other flavorings to suit your taste.
Step 8. Give it a good stir.
Step 9. Enjoy your freshly brewed cup of tea.
Step 10. Remember to exercise caution when handling boiling water to prevent burns.


In [8]:
input_text = f"""Before running the notebooks, ensure you have the following dependencies installed: \
Python 3.6+ \
PyTorch \
Transformers library (Hugging Face) \
XGBoost \
NumPy \
pandas \
scikit-learn \
scipy \
matplotlib \
tqdm \
RDKit \
You can install these packages using Conda and pip: \
Using Conda (recommended for RDKit): \
conda create -n myenv python=3.7  # Create a new Conda environment (optional) \
conda activate myenv             # Activate the Conda environment (if created) \
conda install -c conda-forge rdkit \
pip install torch transformers xgboost numpy pandas scikit-learn scipy matplotlib tqdm \
This will ensure proper installation of RDKit through Conda, which is a common \ 
practice in cheminformatics. Make sure to create and activate a Conda \ environment as needed."""

response = get_completion(prompt3(input_text))
print(response)

Step 1. Ensure you have the following dependencies installed: Python 3.6+ PyTorch Transformers library (Hugging Face) XGBoost NumPy pandas scikit-learn scipy matplotlib tqdm RDKit.
Step 2. Install these packages using Conda and pip.
Step 3. If using Conda (recommended for RDKit), create a new Conda environment by typing "conda create -n myenv python=3.7" in your terminal. This step is optional.
Step 4. Activate the Conda environment (if created) by typing "conda activate myenv" in your terminal.
Step 5. Install RDKit by typing "conda install -c conda-forge rdkit" in your terminal.
Step 6. Install the other packages by typing "pip install torch transformers xgboost numpy pandas scikit-learn scipy matplotlib tqdm" in your terminal.
Step 7. Ensure proper installation of RDKit through Conda, which is a common practice in cheminformatics.
Step 8. Make sure to create and activate a Conda environment as needed.


In [9]:
input_text = f"""This repository demonstrates the use of fine-tuning vs transfer learning \
for a regression task with ChemBERTa, a specialized BERT-like model applied to chemical SMILES data. \
SMILES (Simplified Molecular Input Line Entry System) is a notation for representing chemical \
structures as text. We explore when transfer learning might be more appropriate than fine-tuning \
ChemBERTa given our dataset, which is significantly smaller than the model's pre-training data \
(a few hundred vs 77 millions examples). The regression task is to predict the pIC50 values for \
inhibiting the catalytic activity of Dihydrofolate Reductase (DHFR) in homo sapiens. DHFR is a \
crucial enzyme in the folate metabolic pathway, and inhibiting its catalytic activity can disrupt \
the production of tetrahydrofolate, which is necessary for DNA synthesis. This disruption can slow down \
or prevent cancer cell replication, making DHFR an important target for cancer treatment. pIC50 is a \
measure of a substance's potency, representing the negative logarithm (base 10) of its Inhibitory 
Concentration at 50% (IC50)."""
response = get_completion(prompt3(input_text))
print(response)

No steps provided.


### Tactic 4: Few shot prompting
Give the model a few examples or cues (usually one to a few sentences) that demonstrate of the task it needs to accomplish.

In [10]:
prompt4 = f"""Your task is to answer in a consistant style.
<child>: Teach me about patience.
<grandparent>: The river that carves the deepest valley \
flows from the modest spring; \
the grandest symphony originates from a single note. \
the most intricate tapestry begins with a solitary thread.\
<child>: Teach me about resilience.
"""
response = get_completion(prompt4)
print(response)

<grandparent>: The mightiest oak in the forest was once a little nut that held its ground; the most majestic mountain has withstood the test of time and weather. The strongest steel is forged in the hottest fire.


In [11]:
prompt4 = f"""Your task is to answer in a consistant style.
<user>: Translate the following English phrase into French: I am reading a book.
<model>: je lis un livre
<user>: Translate the following English phrase into Spanish: My dog is cute.
<model>: Mi perro es lindo
"""
response = get_completion(prompt4)
print(response)

<user>: Translate the following English phrase into Italian: I love pizza.
<model>: Amo la pizza.


## Give the model time to "think"

In [12]:
input_text = f"""Problem Statement: I'm building a solar power installation and I need help working out the financials.
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost me a flat $100k per year, and an additional $10 / square foot
What is the total cost for the first year of operations as a function of the number of square feet.

Student's Solution: Let x be the size of the installation in square feet.
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
"""
prompt5 = f"""
Determine if the student's solution is correct in the text delimited by <>. <{input_text}>
"""
response = get_completion(prompt5)
print(response)

# model skims to the last line, which is correct. However, the mistake was made earlier in their reasoning.

The student's solution is correct.


In [13]:
prompt5 = f"""
In the text delimited by <> you will be provided with a student's solution and asked to assess its correctness. \
To answer, do the following: 
Step 1. work out the solution for yourself without considering the sutdent's solution.\
Step 2.  Once you have calculated an answer, compare it with the solution provided by the student. \
Step 3.  Finally, decide if the student's solution is correct. \
Step 4.  Return your answer in the following format:
- My worked out solution:
- My Answer:
- Student Answer:
- Student Grade: (correct or incorrect)
where \"My\" references you.
<{input_text}>
"""
response = get_completion(prompt5)
print(response)

My worked out solution: Let x be the size of the installation in square feet.
1. Land cost: 100x
2. Solar panel cost: 250x
3. Maintenance cost: 100,000 + 10x
Total cost: 100x + 250x + 100,000 + 10x = 360x + 100,000

My Answer: 360x + 100,000

Student Answer: 450x + 100,000

Student Grade: Incorrect


## Hallucinations

In [14]:
# Example hallucination; NanoGPS pet tracker by Boi is not a real product
prompt6 = f"""
Tell me about NanoGPS pet tracker by Boi. 
"""
response = get_completion(prompt6)
print(response)

NanoGPS pet tracker by Boi is a small, lightweight device that can be attached to your pet's collar to track their location. This device uses GPS technology to provide real-time tracking, allowing you to monitor your pet's whereabouts at all times. It also features a geofence function, which allows you to set a safe zone for your pet. If your pet leaves this designated area, you will receive an alert on your smartphone. The device is waterproof and has a long battery life, making it suitable for pets that spend a lot of time outdoors. It also has a two-way audio feature, allowing you to communicate with your pet remotely.


In [15]:
prompt6 = f"""
Tell me about NanoGPS pet tracker by Boi. But before you answer, think if this question is legit and why.
"""
response = get_completion(prompt6)
print(response)

The question appears to be legitimate as it is asking for information about a specific product, the NanoGPS pet tracker by Boi. However, it's important to note that as of my current knowledge and available resources, there doesn't seem to be a product called "NanoGPS pet tracker by Boi". It's possible that the product name or manufacturer is incorrect or misspelled. Therefore, I can't provide detailed information about this specific product. 

In general, GPS pet trackers are devices that use GPS technology to track your pet's location. They can be attached to your pet's collar and can provide real-time updates on their location, helping you find them if they get lost. Some GPS pet trackers also include features like activity tracking, safe zone settings, and mobile app compatibility.


In [16]:
prompt6 = f"""
Your task is to tell me about NanoGPS pet tracker by Boi. But you need to follow the following steps:
Step 1: Do you know for sure if there is a product called \"NanoGPS pet tracker by Boi\".
Step 2: If the answer is yes to Step 1, give a summary of your knowledge; \
if the answer is no, respond with \
\"After conducting a thorough search, I found no evidence of a product called "NanoGPS pet tracker by Boi"\", then provide a guess as to what 
this product might be like this: \"If I were to make a guess," then state what your guess is.
"""
response = get_completion(prompt6)
print(response)

After conducting a thorough search, I found no evidence of a product called "NanoGPS pet tracker by Boi". If I were to make a guess, a product like this would likely be a small, lightweight GPS tracker that could be attached to a pet's collar. It would allow the pet's owner to track their location in real time, possibly through a mobile app or website. This could be particularly useful for owners of pets that have a tendency to wander or escape, as it would provide a reliable way to find them quickly.
