<a href="https://colab.research.google.com/github/AlexUmnov/genai_course/blob/main/week1_genai_api/homework.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task 1. Book summarisation

3 points

OpenAI API is a great tool for summarization, but it is able to process only relatively short texts, as prescribed by the MAX TOKENS restrictions of the models. So, what if we want to summarize a whole book? Let's try to make a workaround for this.

To test our solutions we will be using CMU dataset for book summarization. Let's start with downloading a sample of book dataset from huggingface

```
@article{kryscinski2021booksum,
      title={BookSum: A Collection of Datasets for Long-form Narrative Summarization},
      author={Wojciech Kry{\'s}ci{\'n}ski and Nazneen Rajani and Divyansh Agarwal and Caiming Xiong and Dragomir Radev},
      year={2021},
      eprint={2105.08209},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```

In [9]:
!curl -X GET \
    "https://datasets-server.huggingface.co/rows?dataset=kmfoda%2Fbooksum&config=default&split=train&offset=0&limit=100" > book_sum_dataset.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3771k  100 3771k    0     0  1799k      0  0:00:02  0:00:02 --:--:-- 1800k


In [11]:
import json

book_dataset = json.load(open("book_sum_dataset.txt"))

Let's look at one row of our database. Beware: a large text will get printed.

In [12]:
book_dataset['rows'][0]

{'row_idx': 0,
 'row': {'bid': 27681,
  'is_aggregate': True,
  'source': 'cliffnotes',
  'chapter_path': 'all_chapterized_books/27681-chapters/chapters_1_to_2.txt',
  'summary_path': 'finished_summaries/cliffnotes/The Last of the Mohicans/section_1_part_0.txt',
  'book_id': 'The Last of the Mohicans.chapters 1-2',
  'summary_id': 'chapters 1-2',
  'content': None,
  'summary': '{"name": "Chapters 1-2", "url": "https://web.archive.org/web/20201101053205/https://www.cliffsnotes.com/literature/l/the-last-of-the-mohicans/summary-and-analysis/chapters-12", "summary": "Before any characters appear, the time and geography are made clear. Though it is the last war that England and France waged for a country that neither would retain, the wilderness between the forces still has to be overcome first. Thus it is in 1757, in the New York area between the head waters of the Hudson River and Lake George to the north. Because only two years earlier General Braddock was disgracefully routed by a hand

In [13]:
len(book_dataset['rows'][0]['row']['chapter'])

40844

As you can see the chapters are pretty long. Let's see how many tokens we have in those chapters. Based on instruction from OpenAI we need a package `tiktoken`

We'll be following this instruction:

https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb

In [5]:
!pip install tiktoken

Collecting tiktoken
  Downloading tiktoken-0.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tiktoken
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires openai, which is not installed.[0m[31m
[0mSuccessfully installed tiktoken-0.5.2


In [14]:
import tiktoken

encoder = tiktoken.encoding_for_model("gpt-3.5-turbo")



In [15]:
encoder.encode("Hello World!")

[9906, 4435, 0]

Please write a small function to count tokens:

In [16]:
def count_chatgpt_tokens(text: str, tokenizer: tiktoken.Encoding) -> int:
    pass

In [17]:
def count_chatgpt_tokens(text: str, tokenizer: tiktoken.Encoding) -> int:
    return len(encoder.encode(text))

Let's check how many tokens are there in a very simple string:

In [18]:
count_chatgpt_tokens("Hello world!", tokenizer=encoder)

3

Now that we have this function, let's fing the maximum token length of a chapter?

In [19]:
max(
    count_chatgpt_tokens(row['row']['chapter'], encoder)
    for row in
    book_dataset['rows']
)

13237

As we can see at the api reference page, `gpt-3.5-turbo ` has only 4096 tokens context length. Our chapters are longer then that. We could still use `pt-3.5-turbo-16k`, but it's easy to imagine texts that are longer than this, so let's learn how to do it with a smaller-context LLM.

An obvious way to cope with the problem is to:
1. Split the text into chunks of sentences that can fit into the context window.
2. Summarize each of the chunks.
3. Concatenate all the summaries. If the total length is still too big, repeat the steps 1 and 2 until it's ok.
4. Summarize the concatenations of the summaries.

## Your task

Write a function

```summarize_long_text_with_chatgpt(chapter: str) -> str```

implementing the above method of summarization.

Don't forget to log the lengths each iteration to see how much texts shrink.

For a given example please analyse your intermediate and final results. Is it indeed a good summary of the text?

**Hints and suggestions**:
- Keep in mind, that MAX TOKENS restrictions takes into account both request and model answer. So, you also need to leave some tokens for a response. So we'd suggest using at least 2:1 token ratio for chapter and summary.
- You can control the length of the summary with prompts.
- If you just use `split(".")`, you won't get a proper splitting into sentences. Luckily we have convenient Python libraries for text processing. We recommend using `sent_tokenize` or `split_into_sentences` from the `nltk` library. You can also try splitting the text into chunks of paragraphs instead.
- It's difficult to measure the quality of summarization, but please analyze at lease two examples. Are the summaries coherent?
- If you need inspiration in prompt building, take a look at this [paper](https://arxiv.org/pdf/2312.16171v1.pdf)

**Bonus parts:**

- Summarized text often starts with something like "This text is about", and after merging the partial summaries you'll probably have things like that all over the text. You may wish to get rid of such introductory phrases either by tuning a prompt or by post editing.

In [20]:
!pip install nltk openai



In [21]:
import nltk
from nltk.tokenize import sent_tokenize
import openai

nltk.download('punkt')
openai.api_key = open("../keys/.open-ai-api-key").read().strip()


def get_chatgpt_answer(content: str, model_to_use: str = "gpt-3.5-turbo") -> str:
    chat_completion = openai.chat.completions.create(
        model=model_to_use,
        messages=[{"role": "user", "content": content}]
    )
    return chat_completion.choices[0].message.content

[nltk_data] Downloading package punkt to /Users/es/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [34]:
def summarize_chunk(chunk: str, max_tokens=2048) -> str:
    message_content = f"Write a short summary of the following text. " \
                      f"Summary should be like a part of a bigger text, without any introduction words.\n{chunk}"
    content_tokens = count_chatgpt_tokens(message_content, tokenizer=encoder)
    assert max_tokens > content_tokens
    return get_chatgpt_answer(message_content, "gpt-3.5-turbo")


def summarize_long_text_with_chatgpt(chapter: str, max_tokens: int = 2048) -> str:
    sentences = sent_tokenize(chapter)

    sentences_in_chunk = 20
    summarized_text = ""
    full_summary_iterations = 1

    while len(sentences) > 0:
        chunk = " ".join(sentences[:sentences_in_chunk])
        sentences = sentences[sentences_in_chunk:]

        chunk_summary = summarize_chunk(chunk)
        summarized_text += " " + chunk_summary

        print(
            f"Summary iteration {full_summary_iterations}, Chunk Length: {len(chunk)}, Summary Length: {len(chunk_summary)}")
        full_summary_iterations += 1

        if count_chatgpt_tokens(summarized_text, tokenizer=encoder) <= max_tokens:
            break

    final_summary = summarize_chunk(summarized_text, max_tokens=2048)
    return final_summary


In [35]:
sample_chapter = book_dataset['rows'][0]['row']['chapter']
summarized_chapter_v1 = summarize_long_text_with_chatgpt(sample_chapter)
print(summarized_chapter_v1)
print(len(summarized_chapter_v1))
assert len(summarized_chapter_v1) < len(sample_chapter) // 2

Summary iteration 1, Chunk Length: 4444, Summary Length: 759
In the colonial wars of North America between France and England, both sides had to navigate dense forests and difficult terrain before engaging in battle. The area between the head waters of the Hudson River and the adjacent lakes was particularly brutal, with control fiercely contested. The French used Champlain lake as a natural passage, while the English named another lake after their prince. Many battles were fought in this region, and forts were strategically built. The forests were filled with soldiers, and the sounds of war were constant. It is against this backdrop that the events of the story occurred in the third year of the war.
648


In [36]:
summarized_chapter_v2 = summarize_long_text_with_chatgpt(sample_chapter)
print(summarized_chapter_v2)
print(len(summarized_chapter_v2))
assert len(summarized_chapter_v2) < len(sample_chapter) // 2

Summary iteration 1, Chunk Length: 4444, Summary Length: 954
The colonial wars of North America, specifically the conflicts between France and England, were marked by fierce battles and constant fighting for control of the area between the Hudson River and the adjacent lakes. The French utilized the natural advantages of the region, while the English named the area after their reigning prince. The battles were frequent and forts were built and destroyed multiple times. Despite the danger, the region was teeming with soldiers and the echoes of military activity. These events took place during the third year of the war between England and France, which ultimately resulted in both countries losing the territory.
657


In [37]:
chapter_to_verify = sample_chapter
print(count_chatgpt_tokens(chapter_to_verify + summarized_chapter_v1 + summarized_chapter_v2, tokenizer=encoder))


9736


In [38]:
message = f"Which summary better conveys the main idea of given text." \
          f" Answer with single word summarized_chapter_v1 or summarized_chapter_v2 depending on which is better.\n" \
          f"Given text: \n{chapter_to_verify}\n" \
          f"summarized_chapter_v1:\n{summarized_chapter_v1}\n" \
          f"summarized_chapter_v2:\n{summarized_chapter_v2}\n"

print(get_chatgpt_answer(message, model_to_use="gpt-3.5-turbo-16k"))
print(get_chatgpt_answer(message, model_to_use="gpt-3.5-turbo-16k"))
print(get_chatgpt_answer(message, model_to_use="gpt-3.5-turbo-16k"))

summarized_chapter_v1
summarized_chapter_v2
summarized_chapter_v2


# Task 2. Extracting information with LLMs

4 points

At the practice session we were usually happy if we got something coherent. However, in real applications we often need to obtain concrete answers. Let's explore how to do it with LLMs.

Let's imagine that you work for a marketing agency, and you need to gather analytics about the passing events dedicated to AI and Machine Learning. For that, you need to process press releases and extract:
- Event name,
- Event date,
- Number of participants,
- Number of speakers,
- Attendance price.

Of course, you can do it manually, but it's much more fun to use Generative AI! So, your task will be to write a function that does this with only one request to OpenAI API.

Below there is an example of a press release (generated by ChatGPT, of course, so that both the event and the personae are fictional). All of them are in the press_releases.zip archive in the hometask week 1 folder.

<blockquote>
<p>PRESS RELEASE

InnovAI Summit 2023: A Glimpse into the Future of Artificial Intelligence</p>

City of Virtue, Cyberspace - November 8, 2023 - The most anticipated event of the year, InnovAI Summit 2023, successfully concluded last weekend, on November 5, 2023. Held in the state-of-the-art VirtuTech Arena, the summit saw a massive turnout of over 3,500 participants, from brilliant AI enthusiasts and researchers to pioneers in the field.

Esteemed speakers took to the stage to shed light on the latest breakthroughs, practical implementations, and ethical considerations in AI. Dr. Evelyn Quantum, renowned for her groundbreaking work on Quantum Machine Learning, emphasized the importance of this merger and how it's revolutionizing computing as we know it. Another keynote came from Prof. Leo Nexus, whose current project 'AI for Sustainability' highlights the symbiotic relationship between nature and machine, aiming to use AI in restoring our planet's ecosystems.

This year's panel discussion, moderated by the talented Dr. Ada Neura, featured lively debates on the limits of AI in creative arts. Renowned digital artist, Felix Vortex, showcased how he uses generative adversarial networks to create surreal art pieces, while bestselling author, Iris Loom, explained her experiments with AI-assisted story crafting.

Among other highlights were hands-on workshops, interactive Q&A sessions, and an 'AI & Ethics' debate which was particularly well-received, emphasizing the need for transparency and fairness in AI models. An exclusive 'Start-up Alley' allowed budding entrepreneurs to showcase their innovations, gaining attention from global venture capitalists and media.

The event wrapped up with an announcement for InnovAI Summit 2024, set to be even grander. Participants left with a renewed enthusiasm for the vast possibilities that the AI and ML world promises.

For media inquiries, please contact:
Jane Cipher
Director of Communications, InnovAI Summit
Email: jane.cipher@innovai.org
Phone: +123-4567-8910</p>
</blockquote>

More specifically, you should write a function

```python
parse_press_release(pr: str) -> dict
```

where the output should be in the format

```python
{
  name: 'InnovAI Summit 2023',
  date: '08.11.2023',
  n_participants: 3500,
  n_speakers: 4,
  price: None
}
```

If any of the four characteristics is not mentioned in the text, put `None` in the respective field.

At the end, calculate the statistics of right answers and analyse what kind of mistakes you "model" makes the most.

**Hints and suggestions:**
- It's gonna be more convenient to experiment in OpenAI chat interface https://chat.openai.com/. Plus this doesn't cost API requests money.
- You need to be very accurate with what you want from the model.
- It will help if you specify in the prompt that the output should be in JSON format, this way you will spend less time parsing the output.
- Please be careful with the details. For example, Jane Cipher in the text above is not a speaker and shouldn't be counter as such (how to get rid of a contact person?). Also pay attention to the date format,
- If the model is too wilful with the output format, don't hesitate to show some examples. Decreasing the temperature of predictions can help reduce the creativity of the answer, which is what we want for such task.
- Debugging an LLM-powered application may become a tough business. When you think that you've polished it, an LLM can still surprise you. So, we don't expect 100% accuracy in this task, but we expect that you do your best to achieve high quality results.

In [42]:
press_release = """PRESS RELEASE

InnovAI Summit 2023: A Glimpse into the Future of Artificial Intelligence

City of Virtue, Cyberspace - November 8, 2023 - The most anticipated event of the year, InnovAI Summit 2023, successfully concluded last weekend, on November 5, 2023. Held in the state-of-the-art VirtuTech Arena, the summit saw a massive turnout of over 3,500 participants, from brilliant AI enthusiasts and researchers to pioneers in the field.

Esteemed speakers took to the stage to shed light on the latest breakthroughs, practical implementations, and ethical considerations in AI. Dr. Evelyn Quantum, renowned for her groundbreaking work on Quantum Machine Learning, emphasized the importance of this merger and how it's revolutionizing computing as we know it. Another keynote came from Prof. Leo Nexus, whose current project 'AI for Sustainability' highlights the symbiotic relationship between nature and machine, aiming to use AI in restoring our planet's ecosystems.

This year's panel discussion, moderated by the talented Dr. Ada Neura, featured lively debates on the limits of AI in creative arts. Renowned digital artist, Felix Vortex, showcased how he uses generative adversarial networks to create surreal art pieces, while bestselling author, Iris Loom, explained her experiments with AI-assisted story crafting.

Among other highlights were hands-on workshops, interactive Q&A sessions, and an 'AI & Ethics' debate which was particularly well-received, emphasizing the need for transparency and fairness in AI models. An exclusive 'Start-up Alley' allowed budding entrepreneurs to showcase their innovations, gaining attention from global venture capitalists and media.

The event wrapped up with an announcement for InnovAI Summit 2024, set to be even grander. Participants left with a renewed enthusiasm for the vast possibilities that the AI and ML world promises.

For media inquiries, please contact: Jane Cipher Director of Communications, InnovAI Summit Email: jane.cipher@innovai.org Phone: +123-4567-8910"""

In [95]:
def parse_press_release(pr: str) -> dict:
    response = openai.chat.completions.create(
        model='gpt-3.5-turbo',
        temperature=0.3,
        messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant"
                           " and you only answer in json format with fields\n"
                           """
               {
                   'name':,
                   'date':,
                   'n_participants': 0,
                   'n_speakers': 0,
                   'price': 
               }
               """
            },
            {
                "role": "user",
                "content": "Extract the following details from the press release and format the output as a JSON object:\n"
                           "- Event Name (Usually, event name is short; If year follows the event name everywhere in the release -- year is a part of the name)\n"
                           "- Event Date (format: 'DD.MM.YYYY' or 'DD.MM.YYYY-DD.MM.YYYY', days always take 2 symbols;)\n"
                           "- Number of Participants\n"
                           "- Number of Speakers (exclude contact persons and non-speaking roles)\n"
                           "- Attendance Price (format: string starts 3-symbol currency code like USD or EUR followed by space and the price without commas or dots in it)\n"
                           "If any other detail is not mentioned, put null for that field.\n"
                           "Here is the press release:\n\n" +
                           pr + "\n\n"
                                "Extracted Details:"},
        ]
    ).choices[0].message.content

    response_dict = json.loads(response)
    answer = {
        "name": str,
        "date": str,
        "n_speakers": int,
        "n_participants": int,
        "price": str
    }
    for response_dict_item, response_dict_item_type in response_dict.items():
        answer[response_dict_item] = str(response_dict[response_dict_item])
    print("\n" + answer.__str__())

    return answer

In [93]:
parse_press_release(press_release)


{'name': 'InnovAI Summit 2023', 'date': '5.11.2023', 'n_speakers': '4', 'n_participants': '3500', 'price': 'None'}


{'name': 'InnovAI Summit 2023',
 'date': '5.11.2023',
 'n_speakers': '4',
 'n_participants': '3500',
 'price': 'None'}

###Testing
We prepared a small dataset for you to test your prompt on.
Provided you've written your function, try running the following code.
At the end you also have an opportunity to look at the results in a table side-by-side in `with_results.csv`.
Your goal is to get at least 60% accuracy, or 26 fields right.

Please don't forget to output these metrics, they will be used for grading.

In [51]:
!gdown -O press_release_extraction.csv https://docs.google.com/spreadsheets/d/15IGdc3MV8864lxrLxsug0Ij480p76T1EAwBM7WGT_OI/export?format=csv

/bin/bash: gdown: command not found


In [94]:
import pandas

pr_df = pandas.read_csv("press_release_extraction.csv")
pr_df.head()

Unnamed: 0,pr_text,pr_parsed
0,InnovAI Summit 2023: A Glimpse into the Future...,"{\n ""name"": ""InnovAI Summit 2023"",\n ""date"":..."
1,Press Dispatch: 'Artificial Mariners: Navigati...,"{""name"": ""Artificial Mariners: Navigatin' the ..."
2,FOR IMMEDIATE RELEASE\n\nAI Innovators Convene...,"{""name"": ""Annual Machine Learning Symposium 20..."
3,Press Release: Cutting-Edge Innovations Debute...,"{""name"": ""AI Advancements Summit"",\n ""date"": ""..."
4,"Press Release: Innovative Minds Gather at ""AI ...","{""name"": ""AI Horizon 2023"",\n ""date"": ""15.10.2..."


In [96]:
fields = {
    "name": str,
    "date": str,
    "n_speakers": int,
    "n_participants": int,
    "price": str
}
for row in pr_df.itertuples():
    golden = json.loads(row.pr_parsed)
    for field, field_type in fields.items():
        golden_field = golden[field]
        print(str(golden_field) + " " + str(type(golden_field)))
        
    print("\n")

InnovAI Summit 2023 <class 'str'>
05.11.2023 <class 'str'>
4 <class 'int'>
3500 <class 'int'>
None <class 'str'>


Artificial Mariners: Navigatin' the AI Seas <class 'str'>
08.10.2023-09.10.2023 <class 'str'>
5 <class 'int'>
2000 <class 'int'>
None <class 'str'>


Annual Machine Learning Symposium 2023 <class 'str'>
14.10.2023-16.10.2023 <class 'str'>
4 <class 'int'>
2000 <class 'int'>
USD 1450 <class 'str'>


AI Advancements Summit <class 'str'>
16.10.2023 <class 'str'>
2 <class 'int'>
800 <class 'int'>
USD 950 <class 'str'>


AI Horizon 2023 <class 'str'>
15.10.2023 <class 'str'>
None <class 'str'>
2000 <class 'int'>
None <class 'str'>


AI for Equity Summit <class 'str'>
15.10.2023 <class 'str'>
6 <class 'int'>
3000 <class 'int'>
USD 250 <class 'str'>


Generative Intelligence Conclave, Spain 2023 <class 'str'>
8.10.2023 <class 'str'>
3 <class 'int'>
2000 <class 'int'>
EUR 180 <class 'str'>


In [97]:
import json

parsed_list = []
fields = {
    "name": str,
    "date": str,
    "n_speakers": int,
    "n_participants": int,
    "price": str
}
correct_fields = 0
for row in pr_df.itertuples():
    parsed_release = parse_press_release(row.pr_text)
    parsed_list.append(json.dumps(parsed_release, indent=4))
    golden = json.loads(row.pr_parsed)
    for field, field_type in fields.items():
        golden_field = golden[field]
        parsed_field = parsed_release.get(field)
        try:
            parsed_field = field_type(parsed_field)
        except (ValueError, TypeError):
            pass
        if golden_field == parsed_field:
            correct_fields += 1
        else:
            print(f"For {golden['name']} {field} {parsed_release.get(field)} doesn't seem the same as {golden[field]}")

print(correct_fields)


{'name': 'InnovAI Summit 2023', 'date': '05.11.2023', 'n_speakers': '4', 'n_participants': '3500', 'price': 'None'}

{'name': "Artificial Mariners: Navigatin' the AI Seas", 'date': '08.10.2023-09.10.2023', 'n_speakers': '4', 'n_participants': '2000', 'price': 'None'}
For Artificial Mariners: Navigatin' the AI Seas n_speakers 4 doesn't seem the same as 5

{'name': 'Annual Machine Learning Symposium 2023', 'date': '14.10.2023-16.10.2023', 'n_speakers': '4', 'n_participants': '2000', 'price': 'USD 1450'}

{'name': 'AI Advancements Summit 2023', 'date': '16.10.2023', 'n_speakers': '2', 'n_participants': '800', 'price': 'USD 950'}
For AI Advancements Summit name AI Advancements Summit 2023 doesn't seem the same as AI Advancements Summit

{'name': 'AI Horizon 2023', 'date': '15.10.2023', 'n_speakers': 'None', 'n_participants': '2000', 'price': 'None'}

{'name': 'AI for Equity Summit', 'date': '15.10.2023', 'n_speakers': '6', 'n_participants': '3000', 'price': 'USD 250'}

{'name': 'Generativ

 ## Comment from student
 There is inconsistency in expected date format, sometimes it takes 1 symbol for a day numbers lower than 10, sometimes 2 symbols (InnovAI Summit 2023 vs Generative Intelligence Conclave, Spain 2023). 
### Solution overview
I used a system message for specifying output format (specific json) and user message that describes the task and formatting of each field. Response is parsed by standard json.loads() method and then converted into dict with desired types. I also reduced temperature (from default 1.0 to 0.3) to make model less creative in order to get more precise and deterministic answers.
### Performance overview
After 20+ launches the situation is the following: 
  - Constantly failing to identify the correct name of the AI Advancements Summit 2023 event (because event names sometimes include date in the end and sometimes not). 
  - Constantly failing to count the fifth speaker in Artificial Mariners: Navigatin' the AI Seas event
  - Constantly failing to output the correct date in Generative Intelligence Conclave, Spain 2023 event (check comment above)
  - All other fields are identified correctly in every launch, maybe with some rare misses. Usually score is 31+. 

In [99]:
pr_df['results'] = parsed_list
pr_df.to_csv("with_results.csv")

# Task 3. Broken telephone

3 points

In the practice session we saw how to do text-to-speech (TTS) with play-ht. OpenAI API also supports speech-to-text (ASR, automatic speech recognition) using Whisper model. Let's make a broken telephone function

```python
broken_telephone(message: str, iterations: int = 5) -> str:
```

which does TTS and ASR a certain amount of times (number equals to `iterations`) and outputs the result.

Check it with several initial phrases.

In [None]:
# Example of work
# broken_telephone(
#     "A tutor who tooted the flute tried to teach two young tooters to toot." \
#     "Said the two to the tutor, Is it harder to toot, or to tutor two tooters to toot?" \
#     "This year I asked Santa to gift me a little cutie kitten. And my dream came true!"
# )

In [105]:
import requests
import json
import openai
import os

openai.api_key = open("../keys/.open-ai-api-key").read().strip()

playht_key = open("../keys/.playht-key").read().strip()
playht_used_id = open("../keys/.playht-user-id").read().strip()


def generate_speech(text):
    response = requests.post(
        url="https://play.ht/api/v2/tts",
        headers={
            "AUTHORIZATION": f"Bearer {playht_key}",
            "X-USER-ID": playht_used_id,
            "accept": "text/event-stream",
            "content-type": "application/json"
        },
        json={
            "text": text,
            "voice": "larry"
        }
    )
    return json.loads(response.text.splitlines()[-2].replace("data: ", ""))['url']


def transcribe_speech(audio_path: str):
    audio_file = open(audio_path, "rb")
    transcript = openai.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file
    )
    return transcript.text


def broken_telephone(message: str, iterations: int = 5) -> str:
    current_message = message
    for i in range(iterations):
        speech_url = generate_speech(current_message)
        audio_response = requests.get(speech_url)
        speech_file = 'temp_audio.mp3'
        with open(speech_file, 'wb') as audio_file:
            audio_file.write(audio_response.content)
            
        current_message = transcribe_speech(speech_file)
        print(f"Iteration {i}: {current_message}")
        os.remove(speech_file)

    return current_message

In [108]:
broken_telephone("Hello, Yegor Safronov. What are doing today evening?")

Iteration 0: Hello Yegor Safronov, what are doing today evening?
Iteration 1: Hello Yegor Safronov, what are doing today evening?
Iteration 2: Hello Yegor Safronov, what are doing today evening?
Iteration 3: Hello, Yegor Safronov. What are you doing today evening?
Iteration 4: Hello, Yegor Safronov. What are you doing today evening?


'Hello, Yegor Safronov. What are you doing today evening?'

In [109]:
broken_telephone(
    "A tutor who tooted the flute tried to teach two young tooters to toot. Said the two to the tutor, ‘Is it harder to toot, or to tutor two tooters to toot? This year I asked Santa to gift me a little cutie kitten. And my dream came true!")

Iteration 0: A tutor who tooted the flute tried to teach two young tutors to toot. Said the toot to the tutor, Is it harder to toot or to tutor two tutors to toot? This year I asked Santa to gift me a little cutie kitten. And my dream came true.
Iteration 1: A tutor who tooted the flute tried to teach two young tutors to toot, said the toot to the tutor. Is it harder to toot or to tutor two tutors to toot? This year I asked Santa to gift me a little cutie kitten, and my dream came true.
Iteration 2: A tutor who tooted the flute tried to teach two young tutors to toot, said the toot to the tutor. Is it harder to tutor, to tutor two tutors to toot? This year I asked Santa to gift me a little cutie kitten and my dream came true.
Iteration 3: A tutor who tutored the flute tried to teach two young tutors to toot, said the toot to the tutor. Is it harder to tutor? To tutor two tutors to toot? This year I asked Santa to gift me a little cutie kitten and my dream came true.
Iteration 4: A tuto

'A tutor who tutored the flute tried to teach two young tutors to toot, said the toot to the tutor. Is it harder to tutor? To tutor two tutors to toot? This year I asked Santa to gift me a little cute eyed kitten and my dream came true.'

## Overview from student
On the third launch, there appeared a problem with "cutie" transforming to "cute eyed" on the last iteration. But this is the only change. 

**Note**. Because both algorithms might be too good, you can actually see no change, depending on the sentence you've passed.

**Bonus** (1 point). Do the same with the text <-> image translation. You can skip the text <-> audio part if you choose this. However, the authors of the homework don't know any good image captioning (image -> text) API, so you'll have to work with a model locally. It may be done quite conveniently with the [Hugging Face](https://huggingface.co/) 🤗 infrastructure, but this is beyond the scope of the first part of the course, so we leave it as an optional exercise.

# Bonus task.

1 point

It's quite important to understand the current limitations of the technology. As for LLMs, there are still plenty of weak spots. They can struggle even with such an example:

In [None]:
sentence = "How many tokens are in this sentence?"
print(f"Actual token count: {len(encoder.encode(sentence))}")
print(f"ChatGPT thinks it's: {get_chatgpt_answer(sentence)}")

Actual token count: 8
ChatGPT thinks it's: There are 7 tokens in this sentence.


Or with this one:

In [None]:
# example
sentence = "Reverse this sentence character by character"
reversed_sentence = get_chatgpt_answer(sentence)
print(reversed_sentence[::-1])

Selse sentence this Telse.


### LLM doing math

Math is also not too easy for LLMs which are getting better at counting and mathematical reasoning thanks to chain-of-thought generation, but can still struggle with symbolic algebra.

Let us look at an example:

In [None]:
sentence = "Let's define a mathematical operation x * y := xy + x + y. Is it associative?"
print(get_chatgpt_answer(sentence))

To determine if the operation x * y is associative, we need to check if (x * y) * z is equal to x * (y * z) for any values of x, y, and z.

Let's calculate (x * y) * z:
(x * y) * z = (xy + x + y) * z
= (xy + x + y)z + (xy + x + y) + z
= xyz + xz + yz + xy + x + y + z

Now, let's calculate x * (y * z):
x * (y * z) = x * (yz + y + z)
= x(yz + y + z) + x + yz + y + z
= xyz + xy + xz + yz + x + y + z

Comparing the two results, we can see that (x * y) * z is equal to x * (y * z). Therefore, the operation x * y is associative.


**The solution by ChatGPT-4**

The

To determine if the operation $*$ is associative, we need to check if:

$$(a * b) * c = a * (b * c)$$

for all real numbers $a$, $b$, and $c$. If this equality holds for all real numbers, then the operation is associative.

Given the operation $x∗y:=xy+x+y$:

Calculating $(a∗b)∗c$:

First,

$$a∗b=ab+a+b.$$

Using this result:

$$(a∗b)∗c=(ab+a+b)∗c=$$
$$=(ab+a+b)c+(ab+a+b)+c=$$
$$=abc+ac+bc+ab+a+b+c$$

Calculating $a∗(b∗c)$:

First,

$$b∗c=bc+b+c.$$

Using this result:

$$a∗(b∗c)=a∗(bc+b+c)=$$
$$=a(bc+b+c)+a+bc+b+c=$$
$$=abc+ab+ac+a+bc+b+c$$

Comparing the two results:

$$(a∗b)∗c=abc+ac+bc+ab+a+b+c$$

$$a∗(b∗c)=abc+ab+ac+a+bc+b+c$$

The two expressions are not equal for all real numbers $a$, $b$, and $c$. Therefore, the operation $*$ is not associative.

**End**

Let's analyze this solution. You can see that ChatGPT knows definitions and does well with logic, but fails at the very last stage where it can't understand that

$$abc+ac+bc+ab+a+b+c$$

and

$$abc+ab+ac+a+bc+b+c$$

is the same expression with permuted summands.

## Task 4*

Find out what is it you are good at, but ChatGPT cannot do. Please try to be objective. Bonus points for analyzing stability of the failures of ChatGPT and their dependence on prompt formulation.