Skip to content

token optimization techniques for voice ai prompts using concise language structured formatting abbreviations context compression chunked processing and model knowledge reuse to reduce tokens while maintaining accuracy and clarity

Notifications You must be signed in to change notification settings

rahulsense/structuring_prompts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Token Optimization Techniques

Efficient prompts reduce operational costs and latency while maintaining effectiveness. Average token reduction targets: 30-50% without information loss.

1. Remove Filler Language

Eliminate: "please", "kindly", "in order to", "make sure to", "carefully", "thoroughly"

Before: "Please kindly transcribe the following audio recording carefully and thoroughly, making sure to capture every single word accurately."

After: "Transcribe audio accurately."

Savings: 18 tokens → 4 tokens (78% reduction)


2. Structure with Delimiters

Replace verbose explanations with clear section markers. Use triple hashtags, dashes, or brackets to separate instruction components.

Before (32 tokens): "The context for this conversation is that the user is calling about a job application. The job title is Senior Software Engineer. The task you need to perform is to screen the candidate and collect their information."

After (19 tokens):

Context

Job application call | Title: Senior Software Engineer

Task

Screen candidate, collect: name, experience, availability, salary expectations

Savings: Clearer structure, 40% token reduction (32→19 tokens), improved model parsing

Why It Works:

  • Section headers (###) provide explicit semantic boundaries
  • Pipe separator (|) efficiently connects related facts
  • Comma-separated lists replace verbose phrases
  • Model processes structured data more accurately than prose
  • Visual clarity aids both human review and LLM comprehension

3. Use Concise Abbreviations

Apply abbreviations where meaning remains unambiguous in context.

Safe abbreviations:

  • avg (average)
  • sec (seconds)
  • min (minutes)
  • info (information)
  • config (configuration)
  • doc (document)

Before: "Calculate average call duration in seconds, number of speakers, and overall sentiment."

After: "Calculate: avg_duration_sec, speaker_count, sentiment"

Savings: 12 tokens → 7 tokens, preserves complete meaning


4. Enforce Structured Output Formats

Replace prose-based instructions with format specifications.

Before: "Please provide the call summary including the duration of the call, the number of speakers involved, and the overall sentiment detected during the conversation."

After: "Return JSON: {duration_sec: int, speakers: int, sentiment: positive|neutral|negative}"

Savings: More concise, machine-parseable, enforces format compliance


5. Context Compression

Voicemail Detection Example:

Before (58 tokens): "You need to carefully analyze the audio input and look for various indicators that might suggest the bot has reached a voicemail system instead of connecting with a live person. Pay close attention to typical voicemail greetings, lack of interactive responses, beep sounds, or mentions of leaving a message."

After (16 tokens): "Voicemail indicators: greeting messages, beeps, 'leave a message', no interaction. If detected: invoke end_call_global immediately. Do not leave message."

Savings: 72% reduction while preserving critical decision logic


6. Implement Chunked Processing

For long contexts (2000+ words), replace single massive prompts with sequential smaller prompts.

Pattern:

  • Step 1: Summarize full transcript (output: 200 tokens max)
  • Step 2: Extract action items from summary
  • Step 3: Analyze sentiment from summary

Benefits: Lower per-call token usage, better accuracy on focused tasks, reduced context overflow risk


7. Leverage Pre-Existing Model Knowledge

Avoid restating common knowledge the model already possesses.

Before: "You are an expert call analyst with 10 years of experience in customer service, trained in active listening techniques, familiar with various industries, and skilled at identifying customer pain points and sentiment."

After: "You are a call analyst. Extract: topics, sentiment, action items."

Savings: 35 tokens → 10 tokens (71% reduction)


8. Remove Redundant Instructions

Identify and eliminate instructions stated multiple times throughout prompt.

Audit process: Search for repeated phrases like "make sure to", "remember to", "don't forget to", "it's important that". Consolidate into single instruction section.


About

token optimization techniques for voice ai prompts using concise language structured formatting abbreviations context compression chunked processing and model knowledge reuse to reduce tokens while maintaining accuracy and clarity

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published