BufferedTokenStream never yields the completed tokens (sentence or word) till the next token arrives causing delays in TTS Speech generation

### Bug Description

Issue 1:
Streaming tokenizers in [`livekit.agents.tokenize.token_stream.BufferedTokenStream`](https://github.com/livekit/agents/blob/main/livekit-agents/livekit/agents/tokenize/token_stream.py#L14) never emit the first token unless a second token appears. Ref [code](https://github.com/livekit/agents/blob/main/livekit-agents/livekit/agents/tokenize/token_stream.py#L44)

With real LLM output (multi‑second gaps between tokens in some cases) this means Cartesia/ElevenLabs TTS stays silent until another chunk arrives. 

If we relax the [guard](https://github.com/livekit/agents/blob/main/livekit-agents/livekit/agents/tokenize/token_stream.py#L44) to allow a single token, the loop spins forever because sentence/word tokenizers return inclusive end indices (tok[2]), so the buffer slice never shrinks.

Issue 2:
blingfire.SentenceTokenizer treats back‑to‑back sentences with no intervening space as a single token. So when the LLM outputs something like `Sentence one.Sentence two` - common when tool calls finish and new LLM request is made - BlingFire glues everything together, and the streaming path won’t yield a sentence until the entire turn finishes causing further delay in speech generation. 

As an example run the tokenization script I have shared below and see the output as `Could you please help me with your full name?What could go wrong. <break time="0.25s" />.` even though there are 2 complete sentences here and 1 SSML tag but are sent to TTS as a single complete sentence. Please [refer this](https://gist.github.com/zaheerabbas-prodigal/e37077fb6f46484a66ba43ca75dc84d6#:~:text=%5Bblingfire.SentenceTokenizer%5D%5B%20%206.116s%5D%20output%20token%3D%27Could%20you%20please%20help%20me%20with%20your%20full%20name%3FWhat%20could%20go%20wrong.%20%3Cbreak%20time%3D%220.25s%22%20/%3E.%27%20(segment%3D54fa17021f32))

### Expected Behavior

Issue 1:
As soon as a tokenizer detects a complete token - whether that’s a sentence, a word - it should be emitted immediately, even if it’s the only token so far and not depend on the look-ahead logic as the delay from LLM chunks can be in seconds in certain cases.

Issue 2:
Blingfire Sentence tokenizer should split between sentences instead of gluing the rest of the turn onto the first token if there are no intervening spaces

### Reproduction Steps

1. pip install `livekit-agents[elevenlabs,cartesia,openai,deepgram]==1.2.17`
2. There are two scripts below as [GH Gist](https://gist.github.com/zaheerabbas-prodigal/e37077fb6f46484a66ba43ca75dc84d6)
 - [tok.py](https://gist.github.com/zaheerabbas-prodigal/e37077fb6f46484a66ba43ca75dc84d6#file-tok-py) script has LLM simulated responses with delays that have been observed in realistic cases of LLM when tool calls occur and the behavior of these tokenization implementation.
 - [tts_filler.py](https://gist.github.com/zaheerabbas-prodigal/e37077fb6f46484a66ba43ca75dc84d6#file-tts_filler-py) script that has the same above simulated responses but shows the experience with `cartesia` and `elevelabs` TTS - you manually have to update the script to use different TTS and run again.
3. Output of these scripts are added as comments in the Gist
 - [Execution of `tok.py` for all tokenizer behavior](https://gist.github.com/zaheerabbas-prodigal/e37077fb6f46484a66ba43ca75dc84d6?permalink_comment_id=5846729#gistcomment-5846729)
 - [Execution of `tts_filler.py` with cartesia](https://gist.github.com/zaheerabbas-prodigal/e37077fb6f46484a66ba43ca75dc84d6?permalink_comment_id=5846733#gistcomment-5846733)
 - [Execution of `tts_filler.py` with elevenlabs](https://gist.github.com/zaheerabbas-prodigal/e37077fb6f46484a66ba43ca75dc84d6?permalink_comment_id=5846742#gistcomment-5846742)

### Operating System

macOS Tahoe 26.0

### Models Used

Deepgram nova-2-phonecall, elevenlabs, cartesia, custom llm simulation

### Package Versions

```bash
livekit==1.0.16
livekit-agents==1.2.17
livekit-api==1.0.7
livekit-blingfire==1.0.0
livekit-plugins-anthropic==1.2.17
livekit-plugins-cartesia==1.2.17
livekit-plugins-deepgram==1.2.17
livekit-plugins-elevenlabs==1.2.17
livekit-plugins-google==1.2.17
livekit-plugins-noise-cancellation==0.2.5
livekit-plugins-openai==1.2.17
livekit-plugins-silero==1.2.17
livekit-plugins-turn-detector==1.2.17
livekit-protocol==1.0.8
```

### Session/Room/Call IDs

_No response_

### Proposed Solution

I have tried to relax the token length but that causes the buffer stream to go into an infinite state as mentioned in the bug description.

I am also unsure how we can implement this without introducing timers - hence created this bug to see if there could be other solutions as this is a core part of the code that hasn't been changed much.


### Additional Context
Please go through this [GH Gist](https://gist.github.com/zaheerabbas-prodigal/e37077fb6f46484a66ba43ca75dc84d6)

### Screenshots and Recordings

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BufferedTokenStream never yields the completed tokens (sentence or word) till the next token arrives causing delays in TTS Speech generation #3798

Bug Description

Expected Behavior

Reproduction Steps

Operating System

Models Used

Package Versions

Session/Room/Call IDs

Proposed Solution

Additional Context

Screenshots and Recordings

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

BufferedTokenStream never yields the completed tokens (sentence or word) till the next token arrives causing delays in TTS Speech generation #3798

Description

Bug Description

Expected Behavior

Reproduction Steps

Operating System

Models Used

Package Versions

Session/Room/Call IDs

Proposed Solution

Additional Context

Screenshots and Recordings

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions