Efficient prompts reduce operational costs and latency while maintaining effectiveness. Average token reduction targets: 30-50% without information loss.
Eliminate: "please", "kindly", "in order to", "make sure to", "carefully", "thoroughly"
Before: "Please kindly transcribe the following audio recording carefully and thoroughly, making sure to capture every single word accurately."
After: "Transcribe audio accurately."
Savings: 18 tokens → 4 tokens (78% reduction)
Replace verbose explanations with clear section markers. Use triple hashtags, dashes, or brackets to separate instruction components.
Before (32 tokens): "The context for this conversation is that the user is calling about a job application. The job title is Senior Software Engineer. The task you need to perform is to screen the candidate and collect their information."
After (19 tokens):
Job application call | Title: Senior Software Engineer
Screen candidate, collect: name, experience, availability, salary expectations
Savings: Clearer structure, 40% token reduction (32→19 tokens), improved model parsing
Why It Works:
- Section headers (###) provide explicit semantic boundaries
- Pipe separator (|) efficiently connects related facts
- Comma-separated lists replace verbose phrases
- Model processes structured data more accurately than prose
- Visual clarity aids both human review and LLM comprehension
Apply abbreviations where meaning remains unambiguous in context.
Safe abbreviations:
- avg (average)
- sec (seconds)
- min (minutes)
- info (information)
- config (configuration)
- doc (document)
Before: "Calculate average call duration in seconds, number of speakers, and overall sentiment."
After: "Calculate: avg_duration_sec, speaker_count, sentiment"
Savings: 12 tokens → 7 tokens, preserves complete meaning
Replace prose-based instructions with format specifications.
Before: "Please provide the call summary including the duration of the call, the number of speakers involved, and the overall sentiment detected during the conversation."
After: "Return JSON: {duration_sec: int, speakers: int, sentiment: positive|neutral|negative}"
Savings: More concise, machine-parseable, enforces format compliance
Voicemail Detection Example:
Before (58 tokens): "You need to carefully analyze the audio input and look for various indicators that might suggest the bot has reached a voicemail system instead of connecting with a live person. Pay close attention to typical voicemail greetings, lack of interactive responses, beep sounds, or mentions of leaving a message."
After (16 tokens): "Voicemail indicators: greeting messages, beeps, 'leave a message', no interaction. If detected: invoke end_call_global immediately. Do not leave message."
Savings: 72% reduction while preserving critical decision logic
For long contexts (2000+ words), replace single massive prompts with sequential smaller prompts.
Pattern:
- Step 1: Summarize full transcript (output: 200 tokens max)
- Step 2: Extract action items from summary
- Step 3: Analyze sentiment from summary
Benefits: Lower per-call token usage, better accuracy on focused tasks, reduced context overflow risk
Avoid restating common knowledge the model already possesses.
Before: "You are an expert call analyst with 10 years of experience in customer service, trained in active listening techniques, familiar with various industries, and skilled at identifying customer pain points and sentiment."
After: "You are a call analyst. Extract: topics, sentiment, action items."
Savings: 35 tokens → 10 tokens (71% reduction)
Identify and eliminate instructions stated multiple times throughout prompt.
Audit process: Search for repeated phrases like "make sure to", "remember to", "don't forget to", "it's important that". Consolidate into single instruction section.