# ASR Example - Unified Transcription Interface

Audio Speech Recognition with aisuite's unified API supporting OpenAI, Deepgram, and Google providers.

This example demonstrates:
- Basic transcription with kwargs (OpenAI format)
- Advanced transcription with TranscriptionOptions
- Provider-specific features using custom_parameters
- Streaming transcription support


In [1]:
import aisuite as ai
from aisuite.framework.message import TranscriptionOptions, TranscriptionResult
from dotenv import load_dotenv, find_dotenv
import os

load_dotenv(find_dotenv())


# Set up client with provider configurations
client = ai.Client({
    "openai": {"api_key": os.getenv("OPENAI_API_KEY")},
    "deepgram": {"api_key": os.getenv("DEEPGRAM_API_KEY")},
    "google": {
        "project_id": os.getenv("GOOGLE_PROJECT_ID"),
        "region": os.getenv("GOOGLE_REGION"),
        "application_credentials": os.getenv("GOOGLE_APPLICATION_CREDENTIALS"),
    },
})

audio_file = "../aiplayground/speech.mp3"  # Replace with your audio file path


In [2]:
# Basic transcription using kwargs (OpenAI format)
print("=== Basic Transcription ===")

try:
    result = client.audio.transcriptions.create(
        model="openai:whisper-1",
        file=audio_file,
        language="en"
    )
    if isinstance(result, TranscriptionResult):
        print(f"OpenAI: {result.text}")
    else:
        print("OpenAI: Got streaming result (not expected for basic call)")
except Exception as e:
    print(f"OpenAI error: {e}")

print("--------------------------------")

try:
    # Same kwargs work with other providers (auto-mapped)
    result = client.audio.transcriptions.create(
        model="deepgram:nova-2",
        file=audio_file,
        language="en",
        punctuate=True
    )
    if isinstance(result, TranscriptionResult):
        print(f"Deepgram: {result.text}")
    else:
        print("Deepgram: Got streaming result (not expected for basic call)")
except Exception as e:
    print(f"Deepgram error: {e}")


=== Basic Transcription ===
OpenAI: Good afternoon everyone! Today, I want to take a few minutes to reflect on the importance of making intentional financial decisions early in life, especially when it comes to retirement planning. Many people underestimate the power of compounding, and the difference between starting at age 30 versus starting at 40. By contributing consistently to a retirement account, whether it's a traditional 401k, a Roth 401k, or an IRA, you allow your money to grow, and multiply over decades. The choice between a traditional, and a Roth plan isn't just about preference, it's about strategy. A traditional account lets you save on taxes now, potentially giving you more to invest today. A Roth, on the other hand, asks you to pay taxes up front so your future withdrawals are entirely tax-free. Both have their place, and the right choice depends on your current tax rate, your expected tax rate in retirement, and your long-term financial goals. If you expect your incom

In [3]:
# Using TranscriptionOptions for unified interface
print("\n=== TranscriptionOptions - Unified Interface ===")

# Create unified options that work across all providers
options = TranscriptionOptions(
    language="en",
    include_word_timestamps=True,
    enable_automatic_punctuation=True,
    enable_speaker_diarization=True,
    max_speakers=3
)

# Same options work with any provider
providers = ["openai:whisper-1", "deepgram:nova-2", "google:latest_long"]

for model in providers:
    try:
        result = client.audio.transcriptions.create(
            model=model,
            file=audio_file,
            options=options
        )
        if isinstance(result, TranscriptionResult):
            print(f"\n{model}:")
            print(f"  Text: {result.text[:100]}...")
            print(f"  Language: {result.language}")
            print(f"  Confidence: {result.confidence}")
            if result.words:
                print(f"  Words: {len(result.words)}")
    except Exception as e:
        print(f"\n{model} error: {e}")



=== TranscriptionOptions - Unified Interface ===

openai:whisper-1:
  Text: Good afternoon everyone! Today, I want to take a few minutes to reflect on the importance of making ...
  Language: english
  Confidence: None

deepgram:nova-2:
  Text: Good afternoon, everyone. Today, I want to take a few minutes to reflect on the importance of making...
  Language: None
  Confidence: 0.99798644
  Words: 247

google:latest_long:
  Text: Good afternoon everyone. Today, I want to take a few minutes to reflect on the importance of making ...
  Language: None
  Confidence: 0.4342968165874481
  Words: 197


In [None]:
# Provider-specific features using custom_parameters
print("\n=== Custom Parameters - Provider-Specific Only ===")

# Single options object with provider-specific parameters
# Users MUST namespace parameters by provider
advanced_options = TranscriptionOptions(
    language="en",
    enable_automatic_punctuation=True,
    custom_parameters={
        # OpenAI-specific parameters (only applied when using OpenAI)
        "openai": {
            "response_format": "verbose_json",
            "timestamp_granularities": ["word", "segment"],
            "temperature": 0.2
        },
        # Deepgram-specific parameters (only applied when using Deepgram)
        "deepgram": {
            "search": ["important", "keyword", "technical"],
            "replace": {"um": "", "uh": "", "like": ""},
            "numerals": True,
            "measurements": True
        },
        # Google-specific parameters (only applied when using Google)
        "google": {
            "use_enhanced": True,
            "adaptation": {"phrase_sets": ["technical_terms"]},
            "metadata": {"interaction_type": "DISCUSSION"}
        }
        # Note: Parameters NOT under a provider key will be IGNORED
    }
)

# Same options work with all providers - only relevant params are used
for model in providers:
    try:
        result = client.audio.transcriptions.create(
            model=model,
            file=audio_file,
            options=advanced_options
        )
        if isinstance(result, TranscriptionResult):
            print(f"\n{model} with custom params:")
            print(f"  Text: {result.text[:80]}...")
            if result.segments:
                print(f"  Segments: {len(result.segments)}")
            if result.alternatives:
                print(f"  Alternatives: {len(result.alternatives)}")
    except Exception as e:
        print(f"\n{model} error: {e}")


In [None]:
# Streaming transcription example
print("\n=== Streaming Transcription ===")

async def streaming_example():
    # Streaming options
    streaming_options = TranscriptionOptions(
        language="en",
        stream=True,  # Enable streaming in options (takes precedence)
        interim_results=True,
        enable_automatic_punctuation=True,
        include_word_timestamps=True
    )
    
    # Providers that support streaming (all three providers support stream=True)
    streaming_providers = ["openai:whisper-1", "deepgram:nova-2", "google:latest_long"]
    
    for model in streaming_providers:
        try:
            print(f"\n--- Streaming with {model} ---")
            result = client.audio.transcriptions.create(
                model=model,
                file=audio_file,
                options=streaming_options  # stream=True is set in options
            )
            
            # Handle streaming vs non-streaming results
            if isinstance(result, TranscriptionResult):
                # Got batch result instead of streaming
                print(f"Got batch result: {result.text[:50]}...")
            else:
                # Should be streaming response
                print("Processing streaming chunks...")
                chunk_count = 0
                try:
                    async for chunk in result:
                        chunk_count += 1
                        print(f"Chunk {chunk_count}: {'[FINAL]' if chunk.is_final else '[INTERIM]'} {chunk.text}")
                        if chunk_count >= 3:  # Limit output for demo
                            break
                except Exception as stream_error:
                    print(f"Streaming processing error: {stream_error}")
                
        except Exception as e:
            print(f"Error with {model}: {e}")

# Run streaming example
print("Running streaming example...")
try:
    import asyncio
    asyncio.create_task(streaming_example())
    print("Streaming task created (may run in background)")
except Exception as e:
    print(f"Streaming example error: {e}")
    print("Note: Streaming may not work in all environments or without proper audio files")


In [None]:
# Example: Detailed timestamps with TranscriptionOptions
print("\n=== Detailed Timestamps Example ===")

timestamp_options = TranscriptionOptions(
    language="en",
    include_word_timestamps=True,
    include_segment_timestamps=True,
    custom_parameters={
        "openai": {
            "response_format": "verbose_json",
            "timestamp_granularities": ["word", "segment"]
        }
    }
)

try:
    result = client.audio.transcriptions.create(
        model="openai:whisper-1",
        file=audio_file,
        options=timestamp_options
    )
    if isinstance(result, TranscriptionResult):
        print(f"Text: {result.text}")
        if result.words:
            print(f"Words: {len(result.words)}")
            for word in result.words[:3]:
                print(f"  {word.word}: {word.start:.1f}s-{word.end:.1f}s")
        if result.segments:
            print(f"Segments: {len(result.segments)}")
except Exception as e:
    print(f"Timestamp example error: {e}")


## Summary

This notebook demonstrates the new unified ASR interface with:

1. **TranscriptionOptions**: Unified configuration that works across all providers
2. **Provider-specific features**: Access unique features via `custom_parameters`
3. **Streaming support**: Real-time transcription where supported

### Environment Setup Required:

- **OpenAI**: `OPENAI_API_KEY`
- **Deepgram**: `DEEPGRAM_API_KEY`  
- **Google**: `GOOGLE_PROJECT_ID`, `GOOGLE_REGION`, `GOOGLE_APPLICATION_CREDENTIALS`

### Key Benefits:

- Write once, run on any provider
- Access provider-specific features when needed
- Consistent error handling and response format
- Easy provider switching for testing and optimization
