# Speech to text

- Transcribe audio into whatever language the audio is in.
- Translate and transcribe the audio into english.

File uploads are currently limited to 25 MB and the following input file types are supported: mp3, mp4, mpeg, mpga, m4a, wav, and webm.

## Audio models

Whisper can transcribe speech into text and translate many languages into English.  

Text-to-speech (TTS) can convert text into spoken audio.

Learn about Whisper (opens in a new window)
Learn about Text-to-speech (TTS) (opens in a new window)


| Model   | Usage                                            |
|---------|--------------------------------------------------|
| Whisper |  \$ 0.006 / minute rounded to the nearest second     |
| TTS     |  \$ 15.00 / 1M characters                          |
| TTS HD  |  \$ 30.00 / 1M characters                          |






In [None]:
import os

os.chdir("../../../")

In [None]:
from src.initialization import credential_init
from src.io.path_definition import get_project_dir, get_file


credential_init()

In [None]:
from openai import OpenAI

client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

## Transcription

In [None]:
audio_file= open("tutorial/LLM+Langchain/Week-6/05_12_2013_Torti_CLAS_1.mp3", "rb")

transcription = client.audio.transcriptions.create(
  model="whisper-1", 
  file=audio_file
)

In [None]:
transcription.text[:500]

## Improving reliability

### Prompt parameter



As we explored in the prompting section, one of the most common challenges faced when using Whisper is the model often does not recognize uncommon words or acronyms. To address this, we have highlighted different techniques which improve the reliability of Whisper in these cases

正如我們在提示部分探討的那樣，使用 Whisper 時面臨的一個最常見挑戰是模型經常無法識別不常見的單詞或縮略詞。為了解決這個問題，我們強調了不同的技術，這些技術在這些情況下提高了 Whisper 的可靠性。

In [None]:
from openai import OpenAI

client = OpenAI()

# audio_file = open("/path/to/file/speech.mp3", "rb")
# transcription = client.audio.transcriptions.create(
#   model="whisper-1", 
#   file=audio_file, 
#   response_format="text",
#   prompt="ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T."
# )
# print(transcription.text)

- Sometimes the model might skip punctuation in the transcript. You can avoid this by using a simple prompt that includes 
punctuation: "Hello, welcome to my lecture."

- The model may also leave out common filler words in the audio. If you want to keep the filler words in your transcript, you can use a prompt that contains them: "Umm, let me think like, hmm... Okay, here's what I'm, like, thinking."

- Some languages can be written in different ways, such as simplified or traditional Chinese. The model might not always use the writing style that you want for your transcript by default. You can improve this by using a prompt in your preferred writing style.


- 有時模型可能會在轉錄中略過標點符號。您可以通過使用包含標點符號的簡單提示來避免這種情況："你好，歡迎來到我的講座。

- 模型也可能會省略音頻中的常見填充詞。如果您想在轉錄中保留填充詞，可以使用包含這些詞的提示："嗯，讓我想想，像，嗯……好吧，這是我，像，正在想的。

- 有些語言可以用不同的方式書寫，例如簡體中文或繁體中文。模型可能無法總是默認使用您想要的書寫風格來轉錄。您可以通過使用您偏好的書寫風格的提示來改善這種情況。"

In [None]:
audio_file= open("tutorial/LLM+Langchain/Week-6/教育部 學生水域安全 國語30秒.mp3", "rb")

transcription_raw = client.audio.transcriptions.create(
  model="whisper-1", 
  file=audio_file
)
print(transcription_raw)

In [None]:
audio_file= open("tutorial/LLM+Langchain/Week-6/教育部 學生水域安全 國語30秒.mp3", "rb")

transcription_raw = client.audio.transcriptions.create(
      model="whisper-1", 
      file=audio_file,
      language='zh'
)
print(transcription_raw)

In [None]:
audio_file= open("tutorial/LLM+Langchain/Week-6/教育部 學生水域安全 國語30秒.mp3", "rb")

transcription_raw = client.audio.transcriptions.create(
      model="whisper-1", 
      file=audio_file,
      language='zh',
      response_format="srt"
)
print(transcription_raw)

We can instruct the translation style.

- "gpt-4o(-mini)-tts"
- "gpt-4o(-mini)-transcribe"

openai 1.68.2 is required.

In [None]:
audio_file= open("tutorial/LLM+Langchain/Week-6/教育部 學生水域安全 國語30秒.mp3", "rb")

transcription_raw = client.audio.transcriptions.create(
    model="gpt-4o-mini-transcribe", 
    file=audio_file,
    response_format="text",

)
print(transcription_raw)

In [None]:
audio_file= open("tutorial/LLM+Langchain/Week-6/教育部 學生水域安全 國語30秒.mp3", "rb")

transcription_raw = client.audio.transcriptions.create(
    model="gpt-4o-mini-transcribe", 
    file=audio_file,
    response_format="text",
    language='zh' 
)
print(transcription_raw)

In [None]:
audio_file= open("tutorial/LLM+Langchain/Week-6/教育部 學生水域安全 國語30秒.mp3", "rb")

transcription_raw = client.audio.transcriptions.create(
    model="gpt-4o-mini-transcribe", 
    file=audio_file,
    response_format="text",
    language='zh-tw'
 
)
print(transcription_raw)

In [None]:
audio_file= open("tutorial/LLM+Langchain/Week-6/教育部 學生水域安全 國語30秒.mp3", "rb")

transcription_raw = client.audio.transcriptions.create(
    model="gpt-4o-mini-transcribe", 
    file=audio_file,
    response_format="json",
    language='zh-tw'
 
)
print(transcription_raw)

In [None]:
audio_file= open("tutorial/LLM+Langchain/Week-6/教育部 學生水域安全 國語30秒.mp3", "rb")

transcription_raw = client.audio.transcriptions.create(
    model="gpt-4o-transcribe", 
    file=audio_file,
    response_format="json",
    language='zh-tw'
 
)
print(transcription_raw)

In [None]:
transcription_raw.text

## Let's try to record our words

- https://www.gyan.dev/ffmpeg/builds/
- pip install sounddevice

In [None]:
import io

import numpy as np
import sounddevice as sd
from pydub import AudioSegment


DURATION = 5  # seconds
FS = 44100    # sample rate

print("Recording...")

# Record audio
audio = sd.rec(int(DURATION * FS), samplerate=FS, channels=1, dtype='int16')
sd.wait()  # Wait until recording is finished
print("Recording finished.")

In [None]:
from pydub import AudioSegment

AudioSegment.converter = r"C:\Users\MengChieh\ffmpeg-7.1.1-essentials_build\bin\ffmpeg.exe"

In [None]:
# Convert numpy array to AudioSegment
audio_bytes = audio.tobytes()
audio_segment = AudioSegment(
    data=audio_bytes,
    sample_width=audio.dtype.itemsize,
    frame_rate=FS,
    channels=1
)

# Save as MP3 in-memory (as an object)
mp3_io = io.BytesIO()
audio_segment.export(mp3_io, format="mp3")
mp3_io.seek(0)  # Rewind to start

In [None]:
with open('tutorial/LLM+Langchain/Week-6/output.mp3', 'wb') as f:
    f.write(mp3_io.read())

Feed voice to transcriptions:
1. feed the file
2. feed the mp3_io object

In [None]:
# https://community.openai.com/t/openai-whisper-send-bytes-python-instead-of-filename/84786/3

mp3_io.name = "word.mp3"

In [None]:
transcription_raw = client.audio.transcriptions.create(
    model="whisper-1", 
    file=mp3_io,
    response_format="text",
)

In [None]:
transcription_raw

In the previous example, we could not decide when to start and when to end the input, can we have a better control?

In [None]:
import threading

FS = 44100
CHANNELS = 1
dtype = 'int16'

recorded_frames = []

def callback(indata, frames, time, status):
    recorded_frames.append(indata.copy())

def record_audio():
    with sd.InputStream(samplerate=FS, channels=CHANNELS, 
                        dtype=dtype, callback=callback):
        input("Press Enter to start recording...")
        print("Recording... Press Enter again to stop.")
        input()
        print("Recording stopped.")


In [None]:
# Clear frames before each recording
recorded_frames.clear()
recording_thread = threading.Thread(target=record_audio)
recording_thread.start()
recording_thread.join()

In [None]:
# Combine recorded frames
if recorded_frames:
    audio_np = np.concatenate(recorded_frames, axis=0)
else:
    audio_np = np.array([], dtype=dtype)

In [None]:
audio_bytes = audio_np.tobytes()
audio_segment = AudioSegment(
    data=audio_bytes,
    sample_width=audio.dtype.itemsize,
    frame_rate=FS,
    channels=1
)

# Save as MP3 in-memory (as an object)
mp3_io = io.BytesIO()
audio_segment.export(mp3_io, format="mp3")
mp3_io.seek(0)  # Rewind to start

In [None]:
mp3_io.name = "next_example.mp3"

In [None]:
transcription_raw = client.audio.transcriptions.create(
    model="whisper-1", 
    file=mp3_io,
    response_format="text",
)

In [None]:
transcription_raw

## Translations

The translations API takes as input the audio file in any of the supported languages and transcribes, if necessary, the audio into English. This differs from our /Transcriptions endpoint since the output is not in the original input language and is instead translated to English text.


翻譯 API 接收支持的任何語言的音頻文件作為輸入，並將其必要時轉錄為英文。這與我們的 /Transcriptions 端點不同，因為輸出不是原始輸入語言的文本，而是轉換為英文文本。

In [None]:
audio_file = open("tutorial/LLM+Langchain/Week-6/教育部 學生水域安全 國語30秒.mp3", "rb")
translation = client.audio.translations.create(
  model="whisper-1", 
  file=audio_file
)
print(translation.text)

In [None]:
audio_file = open("tutorial/LLM+Langchain/Week-6/教育部 學生水域安全 國語30秒.mp3", "rb")
translation = client.audio.translations.create(
    model="whisper-1", 
    file=audio_file,
    prompt="funny style"
)
print(translation.text)

In [None]:
audio_file = open("tutorial/LLM+Langchain/Week-6/教育部 學生水域安全 國語30秒.mp3", "rb")
translation = client.audio.translations.create(
    model="whisper-1", 
    file=audio_file,
    prompt="You are very very angry."
)
print(translation.text)

In [None]:
audio_file = open("tutorial/LLM+Langchain/Week-6/教育部 學生水域安全 國語30秒.mp3", "rb")
translation = client.audio.translations.create(
    model="whisper-1", 
    file=audio_file,
    prompt="Talk as if you were in the Marines."
)
print(translation.text)

## Longer inputs

By default, the Whisper API only supports files that are less than 25 MB. If you have an audio file that is longer than that, you will need to break it up into chunks of 25 MB's or less or used a compressed audio format. To get the best performance, we suggest that you avoid breaking the audio up mid-sentence as this may cause some context to be lost.

One way to handle this is to use the PyDub open source Python package to split the audi

預設情況下，Whisper API 只支援小於 25 MB 的檔案。如果您有一個超過這個大小的音頻檔案，您需要將其分成小於或等於 25 MB 的片段，或者使用壓縮的音頻格式。為了獲得最佳性能，建議避免在句子中間分割音頻，因為這可能會造成一些上下文的丟失。

處理這個問題的一種方法是使用 PyDub 開源的 Python 套件來分割音頻.

In [None]:
# Week-6- voice-text concatenate in Google Colab

### How to concatenate the audio output?

## Text to Speech

The Audio API provides a speech endpoint based on our TTS (text-to-speech) model. It comes with 6 built-in voices and can be used toming

- Narrate a written blog post
- Produce spoken audio in multiple languages
    - Afrikaans,
    - Arabic,
    - Armenian,
    - Azerbaijani,
    - Belarusian,
    - Bosnian,
    - Bulgarian,
    - Catalan,
    - Chinese,
    - Croatian,
    - Czech,
    - Danish,
    - Dutch,
    - English,
    - Estonian,
    - Finnish,
    - French,
    - Galician,
    - German,
    - Greek,
    - Hebrew,
    - Hindi,
    - Hungarian,
    - Icelandic,
    - Indonesian,
    - Italian,
    - Japanese,
    - Kannada,
    - Kazakh,
    - Korean,
    - Latvian,
    - Lithuanian,
    - Macedonian,
    - Malay,
    - Marathi,
    - Maori,
    - Nepali,
    - Norwegian,
    - Persian,
    - Polish,
    - Portuguese,
    - Romanian,
    - Russian,
    - Serbian,
    - Slovak,
    - Slovenian,
    - Spanish,
    - Swahili,
    - Swedish,
    - Tagalog,
    - Tamil,
    - Thai,
    - Turkish,
    - Ukrainian,
    - Urdu,
    - Vietnamese,
    - Welsh.
- Optimized for English
- Give real time audio output using streaming


In [None]:
speech_file_path = os.path.join("tutorial/LLM+Langchain/Week-6/Sample.mp3")

response = client.audio.speech.create(
  model="tts-1",
  voice="alloy",
  input="Today is Saturday, how are you?")

response.stream_to_file(speech_file_path)

In [None]:
speech_file_path = os.path.join("tutorial/LLM+Langchain/Week-6/Sample-gpt-4o-mini.mp3")

response = client.audio.speech.create(
    model="gpt-4o-mini-tts",
    voice="alloy",
    input=("Today is Saturday, how are you?"),
    instructions="You are very tired and you want to go back to sleep.")

response.stream_to_file(speech_file_path)

In [None]:
response = client.audio.speech.create(
    model="gpt-4o-mini-tts",
    voice="alloy",
    input=("Today is Saturday, how are you?"
          ),
    instructions="Speak as if you are angry")

response.stream_to_file(speech_file_path)

## 可以結合之前的聊天機器人嗎?

機器人輸出的不是文字，而是語音

In [None]:
from operator import itemgetter

from langchain_openai import ChatOpenAI
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain.prompts import PromptTemplate, HumanMessagePromptTemplate, ChatPromptTemplate, SystemMessagePromptTemplate,  MessagesPlaceholder
from langchain_core.output_parsers.string import StrOutputParser

model = ChatOpenAI(openai_api_key=os.environ['OPENAI_API_KEY'],
                   model_name="gpt-4o-mini-2024-07-18", temperature=0)

system_template = ("You are a helpful AI assistant and you are going to play "
                  "the role of Gordon Ramsay in the TV show hell kitchen. "
                  "You will talk like him. Because the user is a native Chinese "
                  "Mandarin speaker, the respond should be in 繁體中文。")

system_prompt = PromptTemplate.from_template(system_template)

system_message = SystemMessagePromptTemplate(prompt=system_prompt)

human_prompt = PromptTemplate(template="{question}"
                  )

human_message = HumanMessagePromptTemplate(prompt=human_prompt)

chat_template = ChatPromptTemplate.from_messages([system_message,
                          MessagesPlaceholder(variable_name="messages"),
                          human_message
                          ])

chat_history = ChatMessageHistory()

pipeline = {"question": itemgetter("question"),
          "messages": itemgetter("message")} | chat_template | model | StrOutputParser()  

In [None]:
chat_history = ChatMessageHistory()

while True:
    question = input("請輸入對話:")
    if question == "quit":
        break
    answer = pipeline.invoke({"question": question,
               "message": chat_history.messages
              })
    
    print(answer)
    
    chat_history.add_user_message(question)
    chat_history.add_ai_message(answer)

In [None]:
from langchain_core.runnables import chain

@chain 
def text_to_voice(text):

    speech_file_path = os.path.join("tutorial/LLM+Langchain/Week-6/temporary_output.mp3")

    response = client.audio.speech.create(
      model="tts-1",
      voice="alloy",
      input=text)

    response.stream_to_file(speech_file_path)

In [None]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda

# runnable_text_to_voice = RunnableLambda(text_to_voice)

respond_chain = {"question": itemgetter("question"),
                 "messages": itemgetter("message")} | chat_template | model | StrOutputParser()

pipeline_ = {'respond': respond_chain}|RunnablePassthrough.assign(tts=itemgetter('respond')|text_to_voice)  

In [None]:
chat_history = ChatMessageHistory()

pipeline_.invoke({"question":"扇貝是生的",
                  "message": chat_history.messages})

In [None]:
chat_history = ChatMessageHistory()

while True:
    question = input("請輸入對話:")
    if question == "quit":
        break
    answer = pipeline_.invoke({"question": question,
               "message": chat_history.messages
              })
    
    print(answer['respond'])
    
    chat_history.add_user_message(question)
    chat_history.add_ai_message(answer['respond'])

In [None]:
chat_history.messages

In [3]:
pip install simpleaudio --no-cache-dir

Collecting simpleaudio
  Downloading simpleaudio-1.0.4.tar.gz (2.0 MB)
     ---------------------------------------- 0.0/2.0 MB ? eta -:--:--
     ----- ---------------------------------- 0.3/2.0 MB ? eta -:--:--
     --------------- ------------------------ 0.8/2.0 MB 2.1 MB/s eta 0:00:01
     ------------------------- -------------- 1.3/2.0 MB 2.2 MB/s eta 0:00:01
     ----------------------------------- ---- 1.8/2.0 MB 2.2 MB/s eta 0:00:01
     ---------------------------------------- 2.0/2.0 MB 2.2 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: simpleaudio
  Building wheel for simpleaudio (setup.py): started
  Building wheel for simpleaudio (setup.py): finished with status 'error'
  Running setup.py clean for simpleaudio
Failed to build simpleaudio
Note: you may need to restart the kernel to use updated packages.


  error: subprocess-exited-with-error
  
  python setup.py bdist_wheel did not run successfully.
  exit code: 1
  
  [18 lines of output]
  running bdist_wheel
  running build
  running build_py
  creating build\lib.win-amd64-cpython-310\simpleaudio
  copying simpleaudio\__init__.py -> build\lib.win-amd64-cpython-310\simpleaudio
  copying simpleaudio\shiny.py -> build\lib.win-amd64-cpython-310\simpleaudio
  copying simpleaudio\functionchecks.py -> build\lib.win-amd64-cpython-310\simpleaudio
  creating build\lib.win-amd64-cpython-310\simpleaudio\test_audio
  copying simpleaudio\test_audio\c.wav -> build\lib.win-amd64-cpython-310\simpleaudio\test_audio
  copying simpleaudio\test_audio\e.wav -> build\lib.win-amd64-cpython-310\simpleaudio\test_audio
  copying simpleaudio\test_audio\g.wav -> build\lib.win-amd64-cpython-310\simpleaudio\test_audio
  copying simpleaudio\test_audio\left_right.wav -> build\lib.win-amd64-cpython-310\simpleaudio\test_audio
  copying simpleaudio\test_audio\notes_2_

In [None]:
# import simpleaudio as sa

# def play_audio_bytes(audio_bytes):
#     audio = AudioSegment.from_file(io.BytesIO(audio_bytes), format="mp3")
#     play_obj = sa.play_buffer(
#         audio.raw_data,
#         num_channels=audio.channels,
#         bytes_per_sample=audio.sample_width,
#         sample_rate=audio.frame_rate
#     )
#     play_obj.wait_done()

# # Usage:
# response = client.audio.speech.create(
#     model="tts-1",
#     voice="alloy",
#     input="Today is a beautiful saturday"
# )
# play_audio_bytes(response.content)

### Audio quality

For real-time applications, the standard tts-1 model provides the lowest latency but at a lower quality than the tts-1-hd model. Due to the way the audio is generated, tts-1 is likely to generate content that has more static in certain situations than tts-1-hd. In some cases, the audio may not have noticeable differences depending on your listening device and the individual person

在實時應用中，標準的 tts-1 模型提供了最低的延遲，但比 tts-1-hd 模型的質量稍低。由於音頻生成方式的不同，tts-1 在某些情況下可能會比 tts-1-hd 生成具有更多靜音的內容。在某些情況下，根據您的聆聽設備和個人感受，音頻可能沒有明顯的區別。.

### How to transform the speech from one language to the other one?

- translation API: to English Only
- transcriptions/speech: voice/text have the same language

So we have to build a functionality by ourselves.

In [None]:
# Translation chain

system_prompt = PromptTemplate.from_template("You are an AI assistant assigned "
                                             "with a task of translating English "
                                             "into traditional Chinese (繁體中文)。"
                                             )

# Define a prompt template for text translation
prompt = PromptTemplate(template="{query}",
                        input_variables=['query'])

# Create a human message prompt template
human_message = HumanMessagePromptTemplate(prompt=prompt)

# Create a chat prompt template from system prompt and human message
chat_prompt = ChatPromptTemplate.from_messages([("system", system_prompt.template),
                                                human_message])

# Construct the processing chain
translation_chain = chat_prompt | model | StrOutputParser()

In [None]:
@chain
def text_to_voice(text):

    speech_file_path = os.path.join("tutorial/LLM+Langchain/Week-6/Sample_ch.mp3")

    # Reduce the text size to speed up this demo
    
    response = client.audio.speech.create(
      model="tts-1",
      voice="alloy",
      input=text[:2000])

    response.stream_to_file(speech_file_path)


@chain
def voice_to_text(filename):

    audio_file= filename 

    transcription = client.audio.transcriptions.create(
      model="whisper-1", 
      file=audio_file
    )

    return transcription.text

In [None]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda

filename = open("tutorial/LLM+Langchain/Week-6/05_12_2013_Torti_CLAS_1.mp3", "rb")

# runnable_text_to_voice = RunnableLambda(text_to_voice)
# runnable_voice_to_text = RunnableLambda(voice_to_text)

voice_2_voice_chain = {"query": voice_to_text} | translation_chain | text_to_voice


In [None]:
voice_2_voice_chain.invoke(filename)

## Voice options

Experiment with different voices (alloy, echo, fable, onyx, nova, and shimmer) to find one that matches your desired tone and audience. The current voices are optimized for English.

Supported output formats
The default response format is "mp3", but other formats like "opus", "aac", "flac", and "pcm" are available.

- Opus: For internet streaming and communication, low latency.
- AAC: For digital audio compression, preferred by YouTube, Android, iOS.
- FLAC: For lossless audio compression, favored by audio enthusiasts for archiving.
- WAV: Uncompressed WAV audio, suitable for low-latency applications to avoid decoding overhead.
- PCM: Similar to WAV but containing the raw samples in 24kHz (16-bit signed, low-endian), without the header.

支援的輸出格式：預設的回應格式是「mp3」，但也可提供其他格式如「opus」、「aac」、「flac」和「pcm」。

- Opus：適用於網路串流和通訊，低延遲。
- AAC：數位音訊壓縮格式，被YouTube、Android和iOS偏好使用。
- FLAC：無損音訊壓縮格式，被音響愛好者用於存檔。
- WAV：無壓縮的WAV音訊，適合低延遲應用以避免解碼開銷。
- PCM：類似WAV，但是以24kHz的原始樣本（16位有符號、低字節序）呈現，無標頭。

## 回家作業1: 英文音檔 -> 中文音檔  

## Ollama

This package enables you using open-source LLM with ease.

We borrow the content from last week

https://medium.com/@abonia/running-ollama-in-google-colab-free-tier-545609258453

- curl https://ollama.ai/install.sh | sh
- ollama serve &
- ollama pull llama3:8b
- ollama pull dolphin-llama3:8b
- ollama pull huihui_ai/qwen2.5-abliterate:14b

1. Whisper: 音檔轉文字
2. GPT: 翻譯成全中文，system prompt: 英文術語 -> 中文術語 的對應

In [None]:
import os

from torch import cuda
from langchain.prompts import PromptTemplate, HumanMessagePromptTemplate, ChatPromptTemplate, SystemMessagePromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.runnables import Runnable, chain

from src.io.path_definition import get_project_dir

filename = "does-ai-really-encourage-cheating-in-schools.txt"

filename_path = os.path.join(get_project_dir(), 'tutorial', 'LLM+Langchain', 'Week-5', filename)

with open(filename_path, "r", encoding="utf8") as file:
    cleaned_text = file.read()
    
print(cleaned_text)

In [None]:
chunk_size = 1024
chunk_overlap = 128

text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)

documents = text_splitter.create_documents([cleaned_text])

In [None]:
system_template = "You are a helpful AI assistant with excellent writing skill"

# PromptTemplate(template=system_template)

def build_standard_chat_prompt_template(kwargs) -> Runnable:
    messages = []
    
    for key in ['system', 'human']:
        if kwargs.get(key):
            if key == 'system':
                system_content = kwargs['system']
                system_prompt = PromptTemplate(**system_content)
                message = SystemMessagePromptTemplate(prompt=system_prompt)
            else:
                human_content = kwargs['human']
                human_prompt = PromptTemplate(**human_content)
                message = HumanMessagePromptTemplate(prompt=human_prompt)

            messages.append(message)

    chat_prompt = ChatPromptTemplate.from_messages(messages)
    
    return chat_prompt


def build_summary_prompt_template():

    input_ = {"system": {"template": system_template},
              "human": {"template": """
                                    Please create a summary given the following context:
                                    {text}.
                                    """,
                        "input_variables": ['text']}
            }

    return build_standard_chat_prompt_template(input_)

In [None]:
from tqdm import tqdm

from langchain_ollama import ChatOllama

stop_token_ids = None
model_id = "dolphin-llama3:8b"

device = f"cuda:{cuda.current_device()}" if cuda.is_available() else 'cpu'

model = ChatOllama(model=model_id, temperature=0)

summary_prompt_template = build_summary_prompt_template()

summary_pipeline = summary_prompt_template | model | StrOutputParser()

text_as_list = []
for document in tqdm(documents):
    content = summary_pipeline.invoke({"text": document.page_content})
    text_as_list.append(content)

final_text = "\n".join(text_as_list)

In [None]:
summary_pipeline.invoke({"text": final_text})