https://platform.openai.com/docs/guides/audio

# Audio APIs

Many APIs to work with audio.

### General use APIs vs. specialized APIs

The main distinction is general use APIs vs. specialized APIs. 

- With the Realtime and Chat Completions APIs, you can use our latest models' native audio understanding and generation capabilities and combine them with other features like function calling. These APIs can be used for a wide range of use cases, and you can select the model you want to use.

- On the other hand, the Transcription, Translation and Speech APIs are specialized to work with specific models and only meant for one purpose.

In [1]:
import os
from pprint import pprint

import pandas as pd
from openai import OpenAI

In [2]:
os.environ["OPENAI_API_KEY"] = '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'

client = OpenAI()

API reference

https://platform.openai.com/docs/guides/speech-to-text

The Audio API provides two speech to text endpoints, transcriptions and translations. They can be used to:

- Transcribe audio into whatever language the audio is in.
- Translate and transcribe the audio into english.

ID of the model to use. The options are gpt-4o-transcribe, gpt-4o-mini-transcribe, and whisper-1 (which is powered by our open source Whisper V2 model).

In [17]:
WHISPER_MODEL = "whisper-1"

GPT_TRANSCRIBE_MODEL = "gpt-4o-transcribe"

## Transcription

Speech generated using: https://elevenlabs.io/text-to-speech

In [4]:
audio_file = open("audio/nature_usaccent_female.mp3", "rb")

transcript = client.audio.transcriptions.create(
    model = WHISPER_MODEL, file = audio_file
)

transcript

Transcription(text="The ancient forest breathes with primordial life, its canopy a cathedral of emerald and jade. Sunlight filters through the leaves, dappling the forest floor with liquid gold. Moss-covered logs, soft as velvet, crumble beneath cautious steps, releasing the earthy scent of decay and renewal. A creek babbles nearby, its crystal waters tumbling over smooth stones, creating a soothing melody that underlies the forest's symphony.", logprobs=None)

In [5]:
print(transcript.text)

The ancient forest breathes with primordial life, its canopy a cathedral of emerald and jade. Sunlight filters through the leaves, dappling the forest floor with liquid gold. Moss-covered logs, soft as velvet, crumble beneath cautious steps, releasing the earthy scent of decay and renewal. A creek babbles nearby, its crystal waters tumbling over smooth stones, creating a soothing melody that underlies the forest's symphony.


https://platform.openai.com/docs/api-reference/audio/createTranscription

In [10]:
def transcribe(audio_file, model, response_format="text", language=None):
    try:
        with open(audio_file, "rb") as audio_file:
            response = client.audio.transcriptions.create(
                file = audio_file,
                model = model,
                response_format = response_format,
                language = language
            )

            return response
    except Exception as e:
        return str(e)

In [11]:
audio_file = 'audio/llms_ukaccent_female.mp3'

transcription = transcribe(audio_file, model=GPT_TRANSCRIBE_MODEL)

print(transcription)

Large language models (LLMs) have revolutionized the field of artificial intelligence, captivating both tech enthusiasts and the general public. These sophisticated AI systems, trained on vast amounts of text data, can generate human-like text, answer questions, and perform a wide range of language-related tasks with remarkable proficiency.



An SRT file (otherwise known as a SubRip Subtitle file) is a plain-text file that contains critical information regarding subtitles, including the start and end timecodes of your text to ensure your subtitles match your audio, and the sequential number of subtitles.

https://blog.hubspot.com/marketing/srt-file

#### The gpt-4o-transcribe model does not support the srt format

In [13]:
audio_file = 'audio/harrypotter_tanzaniaaccent_male.mp3'
response_format='srt'

transcription = transcribe(audio_file, model=WHISPER_MODEL, response_format=response_format)

print(transcription)

1
00:00:00,000 --> 00:00:04,280
Harry Potter, na mahaofaki zabaguma na nikwamia alipasha iliyotua masili wa baziri,

2
00:00:04,280 --> 00:00:08,240
"; kutegu kuteka na ukekwo 11. katika ninahia caila zizidi na tufanyi

3
00:00:08,240 --> 00:00:11,480
mapya holdi uthabiti nutritiousa ihivi profilei na praedia.

4
00:00:11,480 --> 00:00:15,440
Na muahwaja, Harry niwezoa broju zake la hivyo za hitamiesi umuu wai Wensley na Hermione Granger

5
00:00:15,440 --> 00:00:18,280
na maenda ndrrithi maha foko openobu masingi za mazima Thippi.

6
00:00:19,240 --> 00:00:21,740
Hudike beethi ukuzotea hajia na wasikleana kuteturi

7
00:00:21,740 --> 00:00:23,360
chezha kujitahan ja umisba partiria voldemorte folksieri,





In [15]:
audio_file = 'audio/harrypotter_tanzaniaaccent_male.mp3'
response_format='srt'

transcription = transcribe(audio_file, model=WHISPER_MODEL, response_format=response_format, language="en")

print(transcription)

1
00:00:00,000 --> 00:00:05,200
Harry Potter, an orphan boy living with his neglectful aunt and uncle, discovers on his

2
00:00:05,200 --> 00:00:10,000
eleventh birthday that he's a wizard and has been accepted to Hogwarts School of Witchcraft and

3
00:00:10,000 --> 00:00:16,080
Wizardry. At Hogwarts, Harry makes friends with Ron Weasley and Hermione Granger and learns about

4
00:00:16,080 --> 00:00:21,840
the magical world he never knew existed. He also discovers that his parents were murdered by the

5
00:00:21,840 --> 00:00:27,120
dark wizard Voldemort, who tried to kill Harry as a baby but mysteriously failed, leaving Harry with

6
00:00:27,120 --> 00:00:32,880
a lightning-bolt scar. Over seven years at Hogwarts, Harry faces numerous challenges and

7
00:00:32,880 --> 00:00:38,000
adventures, from confronting a troll in his first year to competing in the dangerous triwizard

8
00:00:38,000 --> 00:00:43,439
tournament in his fourth. Each year brings new defense against the dark a

In [21]:
audio_file = 'audio/harrypotter_german_male.mp3'

transcription = transcribe(audio_file, model=GPT_TRANSCRIBE_MODEL)

print(transcription)

Harry Potter, ein junger Zauberer, entdeckt seine magischen Kräfte und besucht die Hogwarts-Schule für Hexerei und Zauberei. Dort findet er Freunde und lernt Magie, während er sich dem bösen Zauberer Voldemort stellt, der seine Eltern getötet hat. In sieben Büchern wächst Harry heran, kämpft gegen das Böse und rettet schließlich die Zaubererwelt in einem epischen finalen Kampf.



#### The Whisper model can do translation as well, but there is a separate service for that, however the gpt-transcribe model cannot do translation, it will just transribe

In [23]:
audio_file = 'audio/harrypotter_german_male.mp3'
language = 'en'

transcription = transcribe(audio_file=audio_file, model=WHISPER_MODEL, language=language)

print(transcription)

Harry Potter, a young wizard, discovers his magical powers and visits the Hogwarts School of Witchcraft and Sorcery. There he finds friends and learns magic, while he faces the evil wizard Voldemort, who killed his parents. In seven books, Harry grows up, fights against the evil and finally saves the wizarding world in an epic final battle.



This is not a great translation to Japanese

In [24]:
audio_file = 'audio/harrypotter_german_male.mp3'
language = 'ja'

transcription = transcribe(audio_file=audio_file, model=WHISPER_MODEL, language=language)

print(transcription)

ハリー・ポッターは少年の魔術師で 魔法の力を発見し ハッグウォッチの学校を 魔術家として訪ねます そこで彼は友達を見つけ 魔法を学んでいます 悪い魔術師のウォルデモードを 見つけ 彼は父親を殺しました ハリーは7本の本を読み 悪い魔術師と戦って 最後に魔術師の世界を 最終戦で救います



gpt-transcribe DOES NOT do translation

In [25]:
audio_file = 'audio/harrypotter_german_male.mp3'
language = 'en'

transcription = transcribe(audio_file=audio_file, model=GPT_TRANSCRIBE_MODEL, language=language)

print(transcription)

Harry Potter, ein junger Zauberer, entdeckt seine magischen Kräfte und besucht die Hogwarts-Schule für Hexerei und Zauberei. Dort findet er Freunde und lernt Magie, während er sich dem bösen Zauberer Voldemort stellt, der seine Eltern getötet hat. In sieben Büchern wächst Harry heran, kämpft gegen das Böse und rettet schließlich die Zaubererwelt in einem epischen finalen Kampf.



## Translation

Translate audio to English

In [26]:
def translate(audio_file, response_format="text"):
    try:
        with open(audio_file, "rb") as audio_file:
            response = client.audio.translations.create(
                file = audio_file,
                model = WHISPER_MODEL,
                response_format = response_format
            )

            return response
    except Exception as e:
        return str(e)

In [27]:
audio_file = 'audio/harrypotter_german_male.mp3'

translation = translate(audio_file)

print(translation)

Harry Potter, a young wizard, discovers his magical powers and visits the Hogwarts School of Witchcraft and Sorcery. There he finds friends and learns magic, while he faces the evil wizard Voldemort, who killed his parents. In seven books, Harry grows up, fights against the evil and finally saves the wizarding world in an epic final battle.



Post-processing with GPT-4.1 to correct transcription issues

In [28]:
audio_file = 'audio/harrypotter_singaporeaccent_male.mp3'

transcription = transcribe(audio_file=audio_file, model=WHISPER_MODEL)

print(transcription)

Harry Potter, an orphan boy living with his neglectful aunt and uncle, discovers on his 11th birthday that he's a wizard and has been accepted to Hogwarts School of Witchcraft and Wizardry. At Hogwarts, Harry makes friends with Ron Weasley and Hermione Granger and learns about the magical world he never knew existed. He also discovers that his parents were murdered by the dark wizard Voldemort, who tried to kill Harry as a baby but mysteriously failed, leaving Harry with a lightning bolt scar. Over seven years at Hogwarts, Harry faces numerous challenges and adventures, from confronting a troll in his first year to competing in the dangerous Triwizard Tournament in his fourth. Each year brings new defense against the dark arts teachers, some helpful and others harboring dark secrets. As Harry grows, he learns more about his connection to Voldemort, including a prophecy that links their fates. Voldemort, meanwhile, slowly regains power and gathers his Death Eater followers. Harry receiv

In [31]:
system_prompt = """
    Your task is to correct any spelling discrepancies in the transcribed text. Please reproduce 
    the text with the corrections
"""

def generate_corrected_transcript(system_prompt, audio_file):
    response = client.chat.completions.create(
        model="gpt-4.1",
        messages=[
            {
                "role": "system",
                "content": system_prompt
            },
            {
                "role": "user",
                "content": transcribe(audio_file, model=WHISPER_MODEL)
            }
        ]
    )
    return response.choices[0].message.content

audio_file = 'audio/harrypotter_singaporeaccent_male.mp3'

corrected_text = generate_corrected_transcript(system_prompt, audio_file)

corrected_text

"Harry Potter, an orphan boy living with his neglectful aunt and uncle, discovers on his 11th birthday that he's a wizard and has been accepted to Hogwarts School of Witchcraft and Wizardry. At Hogwarts, Harry makes friends with Ron Weasley and Hermione Granger and learns about the magical world he never knew existed. He also discovers that his parents were murdered by the dark wizard Voldemort, who tried to kill Harry as a baby but mysteriously failed, leaving Harry with a lightning bolt scar. Over seven years at Hogwarts, Harry faces numerous challenges and adventures, from confronting a troll in his first year to competing in the dangerous Triwizard Tournament in his fourth. Each year brings new Defence Against the Dark Arts teachers, some helpful and others harboring dark secrets. As Harry grows, he learns more about his connection to Voldemort, including a prophecy that links their fates. Voldemort, meanwhile, slowly regains power and gathers his Death Eater followers. Harry recei

### Speech

One of the available TTS models: tts-1, tts-1-hd or gpt-4o-mini-tts.

In [32]:
def convert_to_speech(input_text, model="gpt-4o-mini-tts", filename="speech.mp3", voice='alloy'):
    try:
        with client.audio.speech.with_streaming_response.create(
            model = model,
            input = input_text,
            voice = voice,
            response_format = 'mp3'
        ) as response:

            if not os.path.exists("./speech"):
                os.makedirs("./speech")

            speech_file_path = f"./speech/{filename}"

            response.stream_to_file(speech_file_path)
    except Exception as e:
        print(e)

In [33]:
convert_to_speech(
    """
        An orphaned boy discovers he's a wizard, attends a magical school,
        and battles the dark lord who killed his parents while growing up and saving the wizarding world.
    """,
    filename = "harrypotter_alloy_en.mp3"
)

In [34]:
convert_to_speech(
    """
        An orphaned boy discovers he's a wizard, attends a magical school,
        and battles the dark lord who killed his parents while growing up and saving the wizarding world.
    """,
    voice = "echo",
    filename = "harrypotter_echo_en.mp3"
)

In [53]:
convert_to_speech(
    """
        An orphaned boy discovers he's a wizard, attends a magical school,
        and battles the dark lord who killed his parents while growing up and saving the wizarding world.
    """,
    voice = "onyx",
    filename = "harrypotter_onyx_en.mp3"
)

In [35]:
convert_to_speech(
    """
        Ein Waisenjunge entdeckt, dass er ein Zauberer ist, besucht eine magische Schule 
        und kämpft gegen den dunklen Lord, der seine Eltern getötet hat, während er 
        erwachsen wird und die Zauberwelt rettet.
    """,
    filename = "harrypotter_alloy_de.mp3"
)

In [36]:
convert_to_speech(
    """
        அனாதையாக வளர்ந்த ஒரு சிறுவன் தான் ஒரு மாயவித்தைக்காரன் என்பதைக் கண்டறிந்து, 
        மந்திர பள்ளியில் சேர்ந்து, தன் பெற்றோரைக் கொன்ற இருண்ட சக்திகளுடன் போராடி, வளர்ந்து வரும் 
        வேளையில் மாய உலகத்தைக் காப்பாற்றுகிறான்.
    """,
    filename = "harrypotter_alloy_tn.mp3"
)