##### !pip install --upgrade openai pytubefix tiktoken moviepy

This command is installing and upgrading several Python packages using pip, the Python package installer. Let's break it down:

1. `!pip install`: This is a command to install packages. The `!` at the beginning is typically used in Jupyter notebooks to run shell commands.

2. `--upgrade`: This flag tells pip to upgrade the packages to their latest versions if they're already installed.

3. The packages being installed/upgraded are:

   - `openai`: The official Python library for the OpenAI API, used for interacting with OpenAI's language models and other AI services.
   
   - `pytubefix`: A fork of the pytube library, used for downloading YouTube videos. The "fix" version likely includes some patches or improvements over the original pytube.
   
   - `tiktoken`: A fast BPE (Byte Pair Encoding) tokenizer, often used with OpenAI's models for text processing tasks.
   
   - `moviepy`: A Python library for video editing and manipulation.

This combination of packages suggests that the user might be working on a project involving AI (possibly using OpenAI's services), YouTube video downloading, text processing, and video editing.

In [None]:
from kaggle_secrets import UserSecretsClient

import numpy as np
import pandas as pd
import os

from openai import OpenAI
import openai 


from pytubefix import YouTube
from pytubefix.cli import on_progress
 
from moviepy.editor import VideoFileClip
from moviepy.video.io.ffmpeg_tools import ffmpeg_extract_audio, ffmpeg_extract_subclip

import tiktoken

import IPython

This code block is importing various Python libraries and modules. Let's break it down:

1. `from kaggle_secrets import UserSecretsClient`: 
   This imports a client for managing user secrets in Kaggle notebooks, likely used for securely storing API keys or other sensitive information.

2. `import numpy as np`:
   Imports NumPy, a fundamental package for scientific computing in Python, with the alias 'np'.

3. `import pandas as pd`:
   Imports Pandas, a data manipulation and analysis library, with the alias 'pd'.

4. `import os`:
   Imports the OS module, which provides a way to use operating system-dependent functionality.

5. `from openai import OpenAI`:
   Imports the OpenAI class from the OpenAI library, used for interacting with OpenAI's API.

6. `import openai`:
   Imports the entire OpenAI module.

7. `from pytubefix import YouTube`:
   Imports the YouTube class from pytubefix, used for downloading YouTube videos.

8. `from pytubefix.cli import on_progress`:
   Imports a progress callback function from pytubefix's command-line interface module.

9. `from moviepy.editor import VideoFileClip`:
   Imports VideoFileClip from moviepy, used for video file operations.

10. `from moviepy.video.io.ffmpeg_tools import ffmpeg_extract_audio, ffmpeg_extract_subclip`:
    Imports specific functions from moviepy's ffmpeg tools for audio extraction and video clipping.

11. `import tiktoken`:
    Imports tiktoken, a fast tokenizer often used with OpenAI's models.

12. `import IPython`:
    Imports IPython, which provides a rich toolkit to help you make the most of using Python interactively.

This set of imports suggests that the script is likely to perform tasks involving data analysis (numpy, pandas), API interactions with OpenAI, YouTube video downloading and processing, video and audio manipulation, and possibly some interactive or notebook-based operations (IPython).



In [None]:
class CFG:
    model1 = "gpt-4-vision-preview"
    device = 'cuda'


In [None]:
# setup OpenAI API connection
user_secrets = UserSecretsClient()
api_key=user_secrets.get_secret("openaivision")
os.environ['OPENAI_API_KEY']= api_key
client = OpenAI(api_key= api_key)

This code block sets up the connection to the OpenAI API. Let's break it down line by line:

1. `user_secrets = UserSecretsClient()`
   This creates an instance of the UserSecretsClient, which is used to securely access user secrets (like API keys) in Kaggle notebooks.

2. `api_key = user_secrets.get_secret("openaivision")`
   This retrieves a secret value named "openaivision" using the UserSecretsClient. This secret is likely the API key for OpenAI's services, specifically for their vision-related API.

3. `os.environ['OPENAI_API_KEY'] = api_key`
   This sets an environment variable named 'OPENAI_API_KEY' with the value of the api_key. Many libraries, including the OpenAI library, can automatically detect and use API keys set as environment variables.

4. `client = OpenAI(api_key=api_key)`
   This creates an instance of the OpenAI client, explicitly passing the api_key. This client can be used to make requests to OpenAI's API.

The purpose of this code is to securely set up the OpenAI API for use in the script. It's doing so in a way that's compatible with Kaggle's notebook environment, which provides a secure way to store and access API keys without exposing them in the code.

By setting the API key both as an environment variable and explicitly when creating the OpenAI client, the code ensures that the API key is available regardless of how different parts of the OpenAI library might try to access it.

This setup allows the script to make authenticated requests to OpenAI's API, likely for tasks involving AI models.

# Functions

In [None]:
def num_tokens_from_message(messages, tokens_per_message = 3, tokens_per_name = 1):
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3
    return num_tokens

This function `num_tokens_from_message` is designed to estimate the number of tokens in a series of messages. It's likely used to calculate token usage for OpenAI's API, which often charges based on the number of tokens processed. Let's break it down:

```python
def num_tokens_from_message(messages, tokens_per_message=3, tokens_per_name=1):
```
- This defines a function that takes a list of `messages` and two optional parameters:
  - `tokens_per_message`: Defaults to 3, representing a fixed token cost per message.
  - `tokens_per_name`: Defaults to 1, representing an additional token cost for names.

```python
    num_tokens = 0
```
- Initializes a counter for the total number of tokens.

```python
    for message in messages:
        num_tokens += tokens_per_message
```
- Iterates through each message, adding the fixed cost (`tokens_per_message`) for each message.

```python
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
```
- For each key-value pair in the message:
  - It encodes the value using some encoding (likely defined elsewhere in the code, probably using `tiktoken`) and adds the length of the encoded value to the token count.

```python
            if key == "name":
                num_tokens += tokens_per_name
```
- If the key is "name", it adds an additional token cost specified by `tokens_per_name`.

```python
    num_tokens += 3
```
- Adds 3 more tokens at the end. This might be for some fixed overhead in the API call.

```python
    return num_tokens
```
- Returns the total calculated number of tokens.

This function appears to be a custom implementation for estimating token usage, possibly tailored for a specific API or use case. It takes into account:
1. A fixed cost per message
2. The content length of each key-value pair in the messages
3. An extra cost for 'name' fields
4. A small fixed overhead

The actual encoding of the values is done using an `encoding` object, which is not defined in this snippet but is likely an instance of a tokenizer (probably from the `tiktoken` library, given the earlier imports).

This kind of function is useful for estimating costs or ensuring that API requests stay within token limits before sending them to the API.

# Transcript a video

In [None]:
video_url = "https://www.youtube.com/watch?v=ssGKniyIUGk"
    
    
# Transcript a large video
yt = YouTube(video_url, on_progress_callback = on_progress)
print(yt.title)
 
ys = yt.streams.get_highest_resolution()
ys.download()

This code is using the `pytubefix` library to download a YouTube video. Let's break it down:

```python
video_url = "https://www.youtube.com/watch?v=ssGKniyIUGk"
```
This line defines the URL of the YouTube video to be downloaded. In this case, it's a specific video with the ID "ssGKniyIUGk".

```python
yt = YouTube(video_url, on_progress_callback=on_progress)
```
This creates a `YouTube` object from the `pytubefix` library. It takes two arguments:
1. `video_url`: The URL of the video to be processed.
2. `on_progress_callback=on_progress`: This sets a callback function to be called to report download progress. The `on_progress` function was imported earlier from `pytubefix.cli`.

```python
print(yt.title)
```
This line prints the title of the YouTube video. The `YouTube` object retrieves this information from the video's metadata.

```python
ys = yt.streams.get_highest_resolution()
```
This line gets the highest resolution stream available for the video. The `streams` attribute of the `YouTube` object contains all available streams (different formats and qualities), and `get_highest_resolution()` selects the stream with the highest video quality.

```python
ys.download()
```
This final line starts the download of the selected stream. By default, it will save the video in the current working directory with the video's title as the filename.

In summary, this code does the following:
1. Specifies a YouTube video URL.
2. Creates a `YouTube` object for that video, setting up progress reporting.
3. Prints the video's title.
4. Selects the highest quality version of the video available.
5. Downloads that version of the video.


In [None]:
# brutal override - YT videos typically have emojis and whatnot - this is 2024 
!cp *mp4 ./muhfile.mp4

In [None]:
input_video_file = "muhfile.mp4"

video = VideoFileClip(input_video_file)
video_duration = video.duration
print('time: ' + str(video_duration) + " seconds")
print('estimated cost: ' + str(video_duration * 0.006/60))

This code snippet is using the `moviepy` library to analyze a video file and estimate the cost of processing it. Let's break it down line by line:

```python
input_video_file = "muhfile.mp4"
```
This line defines the name of the input video file. In this case, it's a file named "muhfile.mp4" which should be located in the same directory as the script (or the full path should be provided).

```python
video = VideoFileClip(input_video_file)
```
This creates a `VideoFileClip` object from the `moviepy` library. It loads the video file specified by `input_video_file`. This object allows various operations and analyses on the video.

```python
video_duration = video.duration
```
This retrieves the duration of the video in seconds and stores it in the `video_duration` variable.

```python
print('time: ' + str(video_duration) + " seconds")
```
This line prints the duration of the video in seconds.

```python
print('estimated cost: ' + str(video_duration * 0.006/60))
```
This line calculates and prints an estimated cost based on the video duration. Let's break down the calculation:

- `video_duration * 0.006/60`: This calculates the cost.
  - `0.006` seems to be a cost rate, possibly $0.006 per minute.
  - The division by 60 converts the duration from seconds to minutes.

So, this is calculating the cost as if the rate is $0.006 per minute of video.

Overall, this code snippet is:
1. Loading a video file
2. Determining its duration
3. Printing the duration
4. Calculating and printing an estimated cost based on the duration


In [None]:
# file size in bytes
file_size_bytes = os.path.getsize(input_video_file)
file_size_mb = file_size_bytes / (1024 * 1024)
print(f"video size: {file_size_mb:.2f} MB")

This code snippet is used to determine and print the file size of the video. Let's break it down line by line:

```python
file_size_bytes = os.path.getsize(input_video_file)
```
- `os.path.getsize()` is a function from the `os` module that returns the size of a file in bytes.
- `input_video_file` is the path to the video file (which was defined earlier as "muhfile.mp4").
- This line gets the size of the video file in bytes and stores it in `file_size_bytes`.

```python
file_size_mb = file_size_bytes / (1024 * 1024)
```
- This line converts the file size from bytes to megabytes (MB).
- The conversion is done by dividing the number of bytes by 1024 twice:
  - First 1024 converts bytes to kilobytes
  - Second 1024 converts kilobytes to megabytes
- The result is stored in `file_size_mb`.

```python
print(f"video size: {file_size_mb:.2f} MB")
```
- This line prints the file size in MB.
- It uses an f-string (formatted string literal) for easy formatting.
- `{file_size_mb:.2f}` formats `file_size_mb` to display with 2 decimal places.
- The output will look something like: "video size: 123.45 MB"

In summary, this code snippet:
1. Gets the size of the video file in bytes
2. Converts the size from bytes to megabytes
3. Prints the size in megabytes, rounded to two decimal places

This information can be useful for various reasons, such as:
- Estimating storage requirements
- Checking if the file size is within acceptable limits for processing
- Providing information to the user about the video they're working with
- Potentially using the file size to estimate processing time or costs for certain operations



In [None]:
# max size is 25 MB 
# -> https://community.openai.com/t/whisper-api-how-to-upload-file-that-larger-than-25mb/693285
limit_size_mb = 25

# how many chunks will we need?
nof_chunks = int(file_size_mb /limit_size_mb )  + 1
print("We will need: " + str(nof_chunks) + " chunks")

# min duration per chunk
split = video_duration // nof_chunks

This code snippet is calculating how to split a video file into chunks that meet a specific size limit. Let's break it down:

```python
limit_size_mb = 25
```
This sets the maximum allowed size for each chunk to 25 MB. The comment above mentions that this limit is related to the OpenAI Whisper API, which has a 25 MB file size limit.

```python
nof_chunks = int(file_size_mb / limit_size_mb) + 1
```
This calculates the number of chunks needed:
- `file_size_mb / limit_size_mb` divides the total file size by the size limit.
- `int()` rounds down to the nearest integer.
- `+ 1` ensures that there's always at least one chunk, and accounts for any remainder.

For example, if the file is 30 MB, this would calculate 2 chunks (30 / 25 = 1.2, rounded down to 1, then +1 = 2).

```python
print("We will need: " + str(nof_chunks) + " chunks")
```
This prints the number of chunks that will be needed.

```python
split = video_duration // nof_chunks
```
This calculates the duration of each chunk:
- `video_duration` is the total duration of the video (calculated earlier).
- `//` is integer division (rounds down).
- This divides the total duration evenly among the chunks.

For example, if the video is 100 seconds long and we need 2 chunks, each chunk would be 50 seconds.

In summary, this code is doing the following:
1. Setting a maximum chunk size of 25 MB (based on an API limitation).
2. Calculating how many chunks are needed to split the video file into pieces no larger than 25 MB.
3. Printing the number of chunks needed.
4. Calculating how long each chunk should be in terms of video duration.

This approach allows processing of large video files by breaking them into smaller pieces that can be handled by APIs or processes with file size limitations. It's a common technique when working with large media files, especially when using services with upload size restrictions.

In [None]:
# split the video
split_time = []
for ii in range(nof_chunks):
    split_time.append(split * ii)
split_time.append(video_duration)


This code snippet is creating a list of time points to split the video. Let's break it down:

```python
split_time = []
```
This initializes an empty list called `split_time` that will store the time points for splitting the video.

```python
for ii in range(nof_chunks):
    split_time.append(split * ii)
```
This is a loop that runs `nof_chunks` times (remember, `nof_chunks` was calculated earlier based on the file size).

In each iteration:
- `ii` goes from 0 to `nof_chunks - 1`
- `split * ii` calculates a time point
- This time point is appended to the `split_time` list

For example, if `split` is 50 seconds and `nof_chunks` is 3, this loop will add the following to `split_time`:
- When `ii` is 0: 0 * 50 = 0 seconds
- When `ii` is 1: 1 * 50 = 50 seconds
- When `ii` is 2: 2 * 50 = 100 seconds

```python
split_time.append(video_duration)
```
After the loop, this line adds the total duration of the video to the `split_time` list.

The purpose of this code is to create a list of time points that can be used to split the video into chunks. Here's what the `split_time` list represents:

- The first element (index 0) is always 0, representing the start of the video.
- The last element is the total duration of the video.
- The elements in between represent the starting points of each chunk.

For example, if the video is 160 seconds long and we're splitting it into 3 chunks, `split_time` might look like this:
`[0, 53, 106, 160]`

This means:
- The first chunk goes from 0 to 53 seconds
- The second chunk goes from 53 to 106 seconds
- The third chunk goes from 106 to 160 seconds

This list of split points can then be used with video processing tools (like `moviepy`, which was imported earlier) to actually split the video into separate files or process it in chunks.

In [None]:
# convert to .mp3
for ii in range(len(split_time) -1):
    out_video = "out_video"+str(ii)+".mp4"
    ffmpeg_extract_subclip(input_video_file, split_time[ii], split_time[ii+1], targetname = out_video)
    ffmpeg_extract_audio(out_video, "output_video_" + str(ii) + ".mp3")
    
video.reader.close()

This code is processing the video by splitting it into chunks and converting each chunk to an MP3 audio file. Let's break it down:

```python
for ii in range(len(split_time) - 1):
```
This loop runs for each chunk of the video. It goes from 0 to one less than the length of `split_time` because we're using each pair of adjacent times in `split_time` to define a chunk.

```python
    out_video = "out_video"+str(ii)+".mp4"
```
This creates a unique filename for each video chunk, like "out_video0.mp4", "out_video1.mp4", etc.

```python
    ffmpeg_extract_subclip(input_video_file, split_time[ii], split_time[ii+1], targetname = out_video)
```
This function (from `moviepy.video.io.ffmpeg_tools`) extracts a portion of the video:
- `input_video_file` is the original video file
- `split_time[ii]` is the start time of this chunk
- `split_time[ii+1]` is the end time of this chunk
- `targetname = out_video` specifies the output file name

This creates a new video file for each chunk.

```python
    ffmpeg_extract_audio(out_video, "output_video_" + str(ii) + ".mp3")
```
This function extracts the audio from the video chunk and saves it as an MP3 file:
- `out_video` is the input (the video chunk we just created)
- `"output_video_" + str(ii) + ".mp3"` is the output file name (e.g., "output_video_0.mp3")

```python
video.reader.close()
```
After the loop, this line closes the video file that was opened earlier with `VideoFileClip`. This is important for releasing system resources.

In summary, this code does the following for each chunk of the video:
1. Creates a unique filename for the video chunk
2. Extracts that chunk from the original video file
3. Extracts the audio from that chunk and saves it as an MP3 file

After processing all chunks, it closes the original video file.

This approach allows for processing large videos in smaller pieces, which can be useful for:
- Working within memory constraints
- Parallel processing
- Handling API limitations (like the 25MB limit mentioned earlier)
- Creating smaller, more manageable files for further processing or analysis

The resulting MP3 files could then be used for tasks like speech recognition, audio analysis, or transcription.

In [None]:
# transcribe with Whisper
transcript_all = ""
for ii in range(len(split_time) - 1):
    output_audio = "output_video_" + str(ii) + ".mp3"
    audio_file = open(output_audio, "rb")
    transcript = client.audio.transcriptions.create(model = "whisper-1", file = audio_file)
    transcript_all += transcript.text


This code is using OpenAI's Whisper model to transcribe the audio chunks created in the previous steps. Let's break it down:

```python
transcript_all = ""
```
This initializes an empty string to store the complete transcript of all audio chunks.

```python
for ii in range(len(split_time) - 1):
```
This loop iterates over each audio chunk. The range is the same as in the previous code snippet, corresponding to each chunk of the original video.

```python
    output_audio = "output_video_" + str(ii) + ".mp3"
```
This creates the filename for each MP3 audio chunk, matching the names created in the previous step.

```python
    audio_file = open(output_audio, "rb")
```
This opens the audio file in binary read mode ("rb"). This is necessary because the OpenAI API expects the file to be sent as binary data.

```python
    transcript = client.audio.transcriptions.create(
        model="whisper-1", 
        file=audio_file
    )
```
This uses the OpenAI client (set up earlier) to transcribe the audio:
- `model="whisper-1"` specifies the Whisper model to use for transcription.
- `file=audio_file` provides the opened audio file to the API.

The API call returns a transcript object.

```python
    transcript_all += transcript.text
```
This appends the transcribed text from this chunk to the `transcript_all` string, building up the complete transcript.

In summary, this code:
1. Iterates through each audio chunk
2. Opens each MP3 file
3. Sends it to OpenAI's Whisper API for transcription
4. Appends the resulting transcription to a complete transcript string

The result is a single string (`transcript_all`) containing the full transcript of the entire video, created by concatenating the transcriptions of each audio chunk.

This approach allows for transcribing long videos that might exceed API limits if sent as a single file. It processes each chunk separately and combines the results, which is useful for handling large videos or working within API constraints.

After this process, `transcript_all` would contain the complete transcription of the original video, which could then be used for further processing, analysis, or display to the user.

In [None]:
# save to local file to be used later
with open("local_transcript.txt", "w") as file:
    file.write(transcript_all)

In [None]:
# extract key points
encoding = tiktoken.get_encoding("cl100k_base")

In [None]:
# estimate the cost

muh_messages = [
        {
            "role": "user",
            "content":f"give me the key topics from this interview {transcript_all}" 
        }
]

nbr = num_tokens_from_message(muh_messages)
print("prompt tokens will be: " + str(nbr))

print("price estimate for GPT3.5: " + str( nbr * 0.0010 / 1000))
print("price estimate for GPT4: " + str( nbr * 0.03 / 1000))



1. First, a list called `muh_messages` is created. This list contains a single dictionary with two key-value pairs:
   - "role": "user"
   - "content": A string that includes "give me the key topics from this interview" followed by the content of `transcript_all`

2. The variable `nbr` is assigned the result of calling a function named `num_tokens_from_message` with `muh_messages` as its argument. This function likely counts the number of tokens in the message(s).

3. The code then prints three lines:

   a. The number of prompt tokens:
      ```python
      print("prompt tokens will be: " + str(nbr))
      ```
      This displays the number of tokens calculated in step 2.

   b. The estimated price for using GPT-3.5:
      ```python
      print("price estimate for GPT3.5: " + str( nbr * 0.0010 / 1000))
      ```
      This calculates the cost by multiplying the number of tokens by 0.0010 (which represents $0.0010 per 1000 tokens) and dividing by 1000 to get the cost in dollars.

   c. The estimated price for using GPT-4:
      ```python
      print("price estimate for GPT4: " + str( nbr * 0.03 / 1000))
      ```
      Similar to the GPT-3.5 calculation, but uses 0.03 (which represents $0.03 per 1000 tokens) for GPT-4.

Overall, this code appears to be estimating the cost of processing a transcript using both GPT-3.5 and GPT-4 models, based on the number of tokens in the input message. The pricing used here suggests that GPT-4 is significantly more expensive per token than GPT-3.5.


In [None]:
response = client.chat.completions.create(
    model = "gpt-3.5-turbo",
    messages = muh_messages,
    
)

print(response.choices[0].message.content)


1. The code is using an API client   to interact with a language model. Let's go through it line by line:

2. ```python
   response = client.chat.completions.create(
   ```
   This line is calling the `create` method of the `chat.completions` object from the client. It's used to generate a chat completion - essentially, to get a response from the AI model.

3. ```python
   model = "gpt-3.5-turbo",
   ```
   This specifies which model to use. In this case, it's using "gpt-3.5-turbo", which is one of OpenAI's language models known for its efficiency and lower cost compared to GPT-4.

4. ```python
   messages = muh_messages,
   ```
   This is passing the `muh_messages` we saw in the previous code snippet as the input to the model. It contains the user's request along with the transcript.

5. ```python
   )
   ```
   This closing parenthesis ends the `create` method call.

6. ```python
   print(response.choices[0].message.content)
   ```
   After getting the response from the API:
   - `response.choices` is a list of possible responses.
   - `[0]` selects the first (and usually only) response.
   - `.message.content` accesses the content of the message in that response.
   - The `print` function then outputs this content to the console.

In summary, this code is sending a request to the GPT-3.5-turbo model with the previously prepared messages, and then printing out the model's response. The response likely contains the key topics from the interview that were requested in the input message.


# Translate to Polish

In [None]:
msg_to_translate = response.choices[0].message.content

resp_translation = client.chat.completions.create(
    model = "gpt-3.5-turbo",
    messages = [
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": f"Translate to Polish this text: {msg_to_translate}"}
    ]
    
)

translation = resp_translation.choices[0].message.content

1. ```python
   msg_to_translate = response.choices[0].message.content
   ```
   This line extracts the content of the message from the previous API response. It's taking the content that we printed in the last code snippet and storing it in a new variable called `msg_to_translate`.

2. ```python
   resp_translation = client.chat.completions.create(
   ```
   This line starts another API call to the chat completions endpoint. This time, it's being used to perform a translation.

3. ```python
   model = "gpt-3.5-turbo",
   ```
   Again, we're using the GPT-3.5-turbo model for this task.

4. ```python
   messages = [
       {"role": "system", "content": "You are a helpful assistant"},
       {"role": "user", "content": f"Translate to Polish this text: {msg_to_translate}"}
   ]
   ```
   This creates a new list of messages for the API call:
   - The first message sets the system role, defining the AI as a helpful assistant.
   - The second message is the user's request, asking for a translation to Polish. It includes the text to be translated (`msg_to_translate`) using an f-string.

5. ```python
   )
   ```
   This closes the `create` method call.

6. ```python
   translation = resp_translation.choices[0].message.content
   ```
   Finally, this line extracts the translated text from the API response and stores it in a variable called `translation`.

In summary, this code is taking the result from the previous API call (which likely contained the key topics from an interview), and then making another API call to translate that content into Polish. The translated text is then stored in the `translation` variable.

This approach allows for a two-step process: first extracting information, then translating it, all using the same GPT-3.5-turbo model but with different prompts.

In [None]:
print(translation)

# Talk back

In [None]:
voice = 'nova'

speech_file_path = f"text_{voice}.mp3"

response_audio =  client.audio.speech.create(
    model = "tts-1",
    voice = voice,
    input = translation
    )

response_audio.stream_to_file(speech_file_path)



1. ```python
   voice = 'nova'
   ```
   This line sets a variable `voice` to the string 'nova'. This likely refers to a specific voice model or style that will be used for text-to-speech conversion.

2. ```python
   speech_file_path = f"text_{voice}.mp3"
   ```
   This creates a file path for the output audio file. It uses an f-string to incorporate the `voice` variable into the filename. For example, with `voice` set to 'nova', the file path would be "text_nova.mp3".

3. ```python
   response_audio = client.audio.speech.create(
   ```
   This line starts a call to the audio speech creation API. It's likely using OpenAI's text-to-speech service or a similar service.

4. ```python
   model = "tts-1",
   ```
   This specifies the text-to-speech model to use. "tts-1" is likely a identifier for a specific text-to-speech model.

5. ```python
   voice = voice,
   ```
   This passes the `voice` variable we set earlier as a parameter, specifying which voice to use for the speech synthesis.

6. ```python
   input = translation
   ```
   This provides the text to be converted to speech. It uses the `translation` variable, which from the previous code snippet contains the Polish translation of the original text.

7. ```python
   )
   ```
   This closes the `create` method call.

8. ```python
   response_audio.stream_to_file(speech_file_path)
   ```
   This line takes the audio response from the API and streams it directly to a file. The file path is the one we defined earlier in `speech_file_path`.

In summary, this code is taking the translated text (which is in Polish based on the previous snippet) and converting it to speech using a text-to-speech API. It's using a voice model called 'nova' and saving the resulting audio as an MP3 file named "text_nova.mp3".

This completes a pipeline that starts with extracting key topics from a transcript, translates those topics to Polish, and finally converts the Polish text to speech.


In [None]:
IPython.display.Audio(speech_file_path)