# Guidelines for Prompting

> We will practice two prompting principles and their related tactics in order to write effective prompts for large language models.


In [13]:
#| default_exp llms_intro

## Setup

### Load the API key and relevant Python libaries.

In [1]:
#| export
import openai
import os

from dotenv import load_dotenv, find_dotenv

In [2]:
#| export
_ = load_dotenv(find_dotenv())

openai.api_key  = os.getenv('OPENAI_API_KEY')

In [None]:
# os.getenv('OPENAI_API_KEY')

In [15]:
#| export
def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message["content"]

## Prompting Principles

- **Principle 1: Write clear, specific instructions with context of the task**
- **Principle 2: Give the model time to “think”**

### Tactics

#### Tactic 1: Use delimiters to clearly indicate distinct parts of the input
- Delimiters can be anything like: ```, """, < >, `<tag> </tag>`, `:`

In [16]:
text = f"""
You should express what you want a model to do by \ 
providing instructions that are as clear and \ 
specific as you can possibly make them. \ 
This will guide the model towards the desired output, \ 
and reduce the chances of receiving irrelevant \ 
or incorrect responses. Don't confuse writing a \ 
clear prompt with writing a short prompt. \ 
In many cases, longer prompts provide more clarity \ 
and context for the model, which can lead to \ 
more detailed and relevant outputs.
"""
prompt = f"""
Summarize the text delimited by triple backticks into a single sentence.
```{text}```
"""


To guide a model towards the desired output and reduce irrelevant or incorrect responses, it is important to provide clear and specific instructions, which may require longer prompts for more clarity and context.


In [21]:
#| eval: false
response = get_completion(prompt)
print(response)

[
  {
    "title": "Introduction to Machine Learning",
    "lecturer": "Dr. John Smith",
    "description": "This workshop will provide an overview of machine learning, including supervised and unsupervised learning, and the different types of algorithms used in machine learning. Participants will also learn how to implement machine learning algorithms using Python."
  },
  {
    "title": "Natural Language Processing",
    "lecturer": "Dr. Jane Doe",
    "description": "This workshop will cover the basics of natural language processing, including text preprocessing, sentiment analysis, and topic modeling. Participants will also learn how to use popular NLP libraries such as NLTK and spaCy."
  },
  {
    "title": "Computer Vision",
    "lecturer": "Dr. David Lee",
    "description": "This workshop will introduce participants to computer vision, including image processing, object detection, and image classification. Participants will also learn how to use popular computer vision librarie

In [22]:
text_pl = f"""
Pisząc prompty nalezy wyrazić, co model ma robić, dostarczając instrukcje, które są tak jasne i szczegółowe, jak to tylko możliwe.
Poprowadzi to model w kierunku pożądanego efektu i zmniejsza szanse na otrzymanie nieistotnych lub nieprawidłowych odpowiedzi.
Nie należy mylić pisania jasnej podpowiedzi z krótką podpowiedzią. 
W wielu przypadkach dłuższe prompty zapewniają większą jasność i kontekst dla modelu. 
"""


In [18]:
#| eval: false
prompt_pl = f"""
Podsumuj tekst ograniczony potrójnymi znakami ``` w jedno zdanie.
```{text_pl}```
"""
response_pl = get_completion(prompt_pl)
print(response_pl)

Jasne i szczegółowe instrukcje w promptach pomagają modelowi osiągnąć pożądany efekt i uniknąć nieistotnych lub nieprawidłowych odpowiedzi.


In [19]:
#| eval: false
prompt_pl = f"""
Podsumuj tekst ograniczony potrójnymi znakami ``` w jedno zdanie w stylu nastolatka korzystającego z emoji.
```{text_pl}```
"""
response_pl = get_completion(prompt_pl)
print(response_pl)

🤖💬 Jasne i szczegółowe prompty pomagają modelowi osiągnąć pożądany efekt i uniknąć błędów. Dłuższe prompty są lepsze niż krótkie, bo dają więcej kontekstu.


> 5' napisz przykładowy prompt i sprawdź czy działa Ci API OpenAI 

#### Tactic 2: Ask for a structured output
- JSON, HTML, Markdown, etc.

In [20]:
#| eval: false
prompt = f"""
Generate a list of 5 best workshop topics for AI Tech Summer School that is a AI related school.
Write title, lecturer name, and description for each topic.
Provide them in JSON format with the following keys: 
title, lecturer, description.
"""
response = get_completion(prompt)
print(response)

[
  {
    "title": "Introduction to Machine Learning",
    "lecturer": "Dr. John Smith",
    "description": "This workshop will provide an overview of machine learning, including supervised and unsupervised learning, and the different types of algorithms used in machine learning. Participants will also learn how to implement machine learning algorithms using Python."
  },
  {
    "title": "Natural Language Processing",
    "lecturer": "Dr. Jane Doe",
    "description": "This workshop will cover the basics of natural language processing, including text preprocessing, sentiment analysis, and topic modeling. Participants will also learn how to use popular NLP libraries such as NLTK and spaCy."
  },
  {
    "title": "Computer Vision",
    "lecturer": "Dr. David Lee",
    "description": "This workshop will introduce participants to computer vision, including image processing, object detection, and image classification. Participants will also learn how to use popular computer vision librarie

#### Tactic 3: Ask the model to check whether conditions are satisfied - reflect on the task 

In [23]:
#| eval: false
text_1 = f"""
Preparation of the workshop is quite easy. However, you must follow some steps. 
Firstly, you need to choose the topic. Then review literature on this topic. Then prepare the materials.
And finally you need to prepare the presentation and deliver the workshop.
"""
prompt = f"""
You will be provided with text delimited by triple =. 
If it contains a sequence of steps, re-write those instructions in the following format:

Step 1 - 
Step 2 - 
Step N - 

If the text does not contain a sequence of steps, then simply write \"No steps are provided.\"

==={text_1}===
"""
response = get_completion(prompt)
print(response)

Step 1 - Choose the topic.
Step 2 - Review literature on the chosen topic.
Step 3 - Prepare the materials.
Step 4 - Prepare the presentation.
Step 5 - Deliver the workshop.


In [25]:
#| eval: false
text_2 = """
Witamy w nowej usłudze Bing
Poznaj możliwości obsługiwanej przez sztuczną inteligencję funkcji Copilot w Internecie

🧐 Zadawaj złożone pytania
"Jakie posiłki mogę przygotować dla mojego wybrednego malucha, który je tylko jedzenie w kolorze pomarańczowym?"

🙌 Uzyskaj lepsze odpowiedzi
"Jakie są zalety i wady 3 najczęściej kupowanych odkurzaczy dla zwierząt domowych?"

🎨 Zdobądź twórcze inspiracje
"Napisz wiersz haiku o krokodylach w kosmosie, w którym narratorem jest pirat"
Uczmy się razem. Usługa Bing jest obsługiwana przez sztuczną inteligencję, więc są możliwe niespodzianki i błędy. Pamiętaj o sprawdzaniu faktów oraz przekaż opinię, abyśmy mogli się uczyć i rozwijać!
"""

prompt = f"""
You will be provided with text delimited by triple =. 
If it contains a sequence of steps, re-write those instructions in the following format:

Step 1 - 
Step 2 - 
Step N - 

If the text does not contain a sequence of steps, then simply write \"No steps are provided.\"

==={text_2}===
"""
response = get_completion(prompt)
print(response)

No steps are provided.


#### Tactic 4: "Few-shot" prompting - use a few examples to show to the model how to behave

In [None]:
prompt = """
I'm working in Custom Office and we want to synthetically generate data about shipments. Please generate one more examples.
###
{
    "id": "95ea9a1c-f934-4f08-bc74-3ff7c8da5464",
    "exporter_country_name": "Indonesia",
    "destination_country_name": "Portugal",
    "exporter_country_code": "ID",
    "destination_country_code": "PT",
    "invoice_value": 36513.02,
    "invoice_currency": "EUR",
    "commodity_code": 7108110000,
    "weight_gross": 22.03,
    "weight_net": 3.08,
    "importer_name": "Dawson, Lewis and Miller",
    "declarant_person": "Lydia Reed",
    "good_description": "II. PRECIOUS METALS AND METALS CLAD WITH PRECIOUS METAL -> Gold (including gold plated with platinum), unwrought or in semi-manufactured forms, or in powder form -> Non-monetary -> Powder",
    "exporter_name": "Hurst, Freeman and Kennedy",
    "origin_country_name": "Viet Nam",
    "origin_country_code": "VN"
}
"""


In [33]:
#| eval: false
response = get_completion(prompt)
print(response)

{
    "id": "b3c8d7e2-6f5a-4c5c-9c5d-8f5a9b1c2d4f",
    "exporter_country_name": "China",
    "destination_country_name": "United States",
    "exporter_country_code": "CN",
    "destination_country_code": "US",
    "invoice_value": 12000.50,
    "invoice_currency": "USD",
    "commodity_code": 8517120000,
    "weight_gross": 45.20,
    "weight_net": 38.10,
    "importer_name": "Smith and Sons",
    "declarant_person": "John Doe",
    "good_description": "VII. VEHICLES, AIRCRAFT, VESSELS AND ASSOCIATED TRANSPORT EQUIPMENT -> Aircraft and associated equipment -> Aircraft engines and parts thereof",
    "exporter_name": "Changzhou Aviation Precision Machinery Co., Ltd.",
    "origin_country_name": "China",
    "origin_country_code": "CN"
}

{
    "id": "f8e9d6c5-4b3a-2c1d-1e0f-9a8b7c6d5e4f",
    "exporter_country_name": "Germany",
    "destination_country_name": "France",
    "exporter_country_code": "DE",
    "destination_country_code": "FR",
    "invoice_value": 5000.00,
    "invoice_c

### Principle 2: Give the model time to “think” 

#### Tactic 1: Specify the steps required to complete a task

In [37]:
#| eval: false
text = f"""
Massively Multilingual Corpus of Sentiment Datasets and Multi-faceted Sentiment Classification Benchmark
Łukasz Augustyniak, Szymon Woźniak, Marcin Gruza, Piotr Gramacki, Krzysztof Rajda, Mikołaj Morzy, Tomasz Kajdanowicz
Despite impressive advancements in multilingual corpora collection and model training, developing large-scale deployments of multilingual models still presents a significant challenge. This is particularly true for language tasks that are culture-dependent. One such example is the area of multilingual sentiment analysis, where affective markers can be subtle and deeply ensconced in culture. This work presents the most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected datasets from over 350 datasets reported in the scientific literature based on strict quality criteria. The corpus covers 27 languages representing 6 language families. Datasets can be queried using several linguistic and functional features. In addition, we present a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies.
"""
# example 1
prompt_1 = f"""
Perform the following actions for a research paper text: 
1 - Summarize the following text delimited by triple backticks with 1 sentence.
2 - Generate a title for this summary with emojis.
3 - Prepare a tweet based on summary.
4 - Prepare a linkedin post based on summary. 

Separate your answers with line breaks.

Text:
```{text}```
"""
response = get_completion(prompt_1)
print(response)

🌍📊👍 "Massively Multilingual Corpus of Sentiment Datasets" presents a comprehensive collection of 79 datasets in 27 languages for sentiment analysis, along with a multi-faceted sentiment classification benchmark. 

📊🌍 "Massively Multilingual Corpus of Sentiment Datasets" provides a comprehensive collection of datasets in 27 languages for sentiment analysis, along with a multi-faceted sentiment classification benchmark. #sentimentanalysis #multilingual #corpus

🌍📊 Looking for a comprehensive collection of sentiment datasets in multiple languages? Check out "Massively Multilingual Corpus of Sentiment Datasets" which also includes a multi-faceted sentiment classification benchmark. #sentimentanalysis #multilingual #corpus

👍🌍 "Massively Multilingual Corpus of Sentiment Datasets" is a valuable resource for sentiment analysis researchers, providing a comprehensive collection of datasets in 27 languages and a multi-faceted sentiment classification benchmark. #sentimentanalysis #multilingual #

#### Return strcutured output

In [39]:
#| eval: false
text = f"""
Massively Multilingual Corpus of Sentiment Datasets and Multi-faceted Sentiment Classification Benchmark
Łukasz Augustyniak, Szymon Woźniak, Marcin Gruza, Piotr Gramacki, Krzysztof Rajda, Mikołaj Morzy, Tomasz Kajdanowicz
Despite impressive advancements in multilingual corpora collection and model training, developing large-scale deployments of multilingual models still presents a significant challenge. This is particularly true for language tasks that are culture-dependent. One such example is the area of multilingual sentiment analysis, where affective markers can be subtle and deeply ensconced in culture. This work presents the most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected datasets from over 350 datasets reported in the scientific literature based on strict quality criteria. The corpus covers 27 languages representing 6 language families. Datasets can be queried using several linguistic and functional features. In addition, we present a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies.
"""
# example 1
prompt_1 = f"""
Perform the following actions for a research paper text: 
1 - Summarize the following text delimited by triple backticks with 1 sentence.
2 - Generate a title for this summary with emojis.
3 - Prepare a tweet based on summary.
4 - Prepare a linkedin post based on summary. 

Return a python dictionary with the following keys: summary, title, tweet, linkedin_post 

Text:
```{text}```
"""
response = get_completion(prompt_1)
print(response)

{
    "summary": "📚 This paper presents a massively multilingual corpus of sentiment datasets consisting of 79 manually selected datasets from over 350 datasets reported in the scientific literature, covering 27 languages representing 6 language families, and a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies.",
    "title": "📚 Massive Multilingual Sentiment Corpus and Benchmark",
    "tweet": "📚 This paper presents a massively multilingual corpus of sentiment datasets and a multi-faceted sentiment classification benchmark covering 27 languages and summarizing hundreds of experiments. #sentimentanalysis #multilingual #corpus",
    "linkedin_post": "Check out this paper presenting a massively multilingual corpus of sentiment datasets and a multi-faceted sentiment classification benchmark covering 27 languages and summarizing hundreds of experiments

#### Tactic 2: Instruct the model to work out its own solution before rushing to a conclusion

In [57]:
question = f"""
Question:
I'm organizing a conference that will need a space for up to 1000 people. 
- Renting the space costs $500/day
- Catering the conference costs $50/person
- I'll need to hire 3 staff members for the duration of the conference at $75/day/person
What is the total cost for the conference as a function of the number of people attending?

Student's Solution:
Let x be the number of people attending the conference.
Costs:
1. Space rental cost: 500
2. Catering cost: 50
3. Staff cost: 75 
Total cost: 500 + 50x + 75x = 620x + 500
"""

In [58]:
#| eval: false
prompt = f"""
Determine if the student's solution is correct or not.
{question}
"""
response = get_completion(prompt)
print(response)

The student's solution is correct.


In [60]:
#| eval: false
prompt = f"""
Your task is to determine if the student's solution is correct or not.

To solve the problem do the following:
- First, work out your own solution to the problem. 
- Then compare your solution to the student's solution and evaluate if the student's solution is correct or not. 
Don't decide if the student's solution is correct until you have done the problem yourself.

Write down your steps and highlight the differences between your solution and the student's solution.

{question}
"""
response = get_completion(prompt)
print(response)

My Solution:
Let x be the number of people attending the conference.
Costs:
1. Space rental cost: 500
2. Catering cost: 50x
3. Staff cost: 3 * 75 * number of days of conference
Total cost: 500 + 50x + 3 * 75 * number of days of conference

Difference:
The student's solution assumes that the staff cost is a fixed cost of $75 per person per day, regardless of the number of days of the conference. However, the staff cost should be dependent on the number of days of the conference. My solution takes this into account by multiplying the number of days of the conference by the number of staff members and the daily rate of $75.


In [67]:
## Model Limitations and Problem: 

### Hallucinations


In [63]:
#| eval: false
prompt = f"""
Write me about a new product called "The Autonomous Research Assistant" created by Wrocław University of Science and Technology.
"""
response = get_completion(prompt)
print(response)

The Autonomous Research Assistant is a revolutionary new product developed by the Wrocław University of Science and Technology. This cutting-edge technology is designed to assist researchers in their work by automating many of the tedious and time-consuming tasks that are typically associated with research.

The Autonomous Research Assistant is a sophisticated machine learning system that is capable of analyzing vast amounts of data and identifying patterns and trends that would be difficult or impossible for a human researcher to detect. It can also generate reports and summaries of its findings, making it easy for researchers to quickly and easily understand the results of their experiments.

One of the key features of the Autonomous Research Assistant is its ability to learn and adapt over time. As it analyzes more data and gains more experience, it becomes increasingly accurate and efficient, allowing researchers to focus on more complex and challenging tasks.

Another important as

In [66]:
#| eval: false
prompt = f"""
Opisz Albański Wordnet wskazując na to jaki długowłosy wojownik z krajów bałkańskich go stworzył?
"""
response = get_completion(prompt)
print(response)

Albański Wordnet to system leksykalny, który został stworzony przez długowłosego wojownika z krajów bałkańskich o imieniu Artan. Artan był z pochodzenia Albańczykiem i miał głęboką wiedzę na temat języka albańskiego oraz jego związków z innymi językami bałkańskimi. W oparciu o swoją wiedzę i doświadczenie, Artan stworzył Albański Wordnet, który jest narzędziem służącym do analizy semantycznej języka albańskiego. System ten zawiera wiele informacji na temat znaczeń słów, ich synonimów, antonimów oraz związków semantycznych między nimi. Dzięki temu Albański Wordnet jest bardzo przydatnym narzędziem dla lingwistów, badaczy języka oraz tłumaczy.
