Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some output (/search) is not throttled; some LLM use (reranking, etc.) is throttled but should not be #714

Closed
dominiccooney opened this issue Aug 17, 2023 · 9 comments

Comments

@dominiccooney
Copy link
Contributor

Forked from discussion in #704. After #699 is fixed so the scrolling behavior is back, that scrolling is too fast because /search sends a lot of content "at once."

We have "Typewriter" wired to chat clients here but:

  • The layer that intervention happens at is wrong. For example, it sits behind the BotResponseMultiplexer for recipes. The BRM will buffer and undo some of the typewriter effect; some BRM subscribers are code and don't need a typewriter effect. For the LLM reranker, the typewriter effect will slow down the reranker.
  • The effect doesn't work for search, because that doesn't use the chat client.

It would be good to move this incremental display closer to the web UI (maybe not into it, but closer to it) so that everything that a human sees uses it (and it's harmonized with the incremental Markdown handling) and things destined for machines can happen as fast as possible.

@arafatkatze
Copy link
Contributor

@dominiccooney Thanks for the information. I would love to work on this issue and look deeper into this.

@dominiccooney
Copy link
Contributor Author

@arafatkatze Super! Thanks for your help with this.

I have a diff which moves the typewriter into MessageProvider which I'll post soon—I'm just working on updating tests. This improves the Typewriter (I added a method to wait for the typewriter to finish without forcibly stopping it and getting a huge chunk) and gets it out of the reranker and keyword expander path.

I haven't integrated Typewriter with the "search" output. So you might like to crib from the improved Typewriter in my diff and integrate it there, or move the typewriter out of the MessageProvider and closer to the chat view.

@arafatkatze
Copy link
Contributor

arafatkatze commented Aug 19, 2023

@dominiccooney Thanks a lot for such a detailed response and guidance in this comment,
I was able to make the default typewriter work for the search scenario but it was obviously too fast as you had mentioned
What you need here will involve the typewriter "slowing down" the interaction from ContextSearch.

* Note: For particularly large chunks, this will result in a character chunk size that is far bigger than you would expect for a typing effect.
* This is an accepted trade-off in order to ensure we stay in sync with the incoming text.

Basically I wanted to change this ^ a bit and after some tweaks to the typewriter max_delay I was able to make it "slow"

Screen.Recording.2023-08-19.at.2.09.57.AM.mov

Obviously doing the case statements for search case seems hacky to me and the way that I am doing it would probably have to make the max delay somehow a function of the length of the response that I got. Maybe for the short term it might be okay to do this but I am thinking of a better way.

I am trying to understand how to do a refactoring to solve these things cleanly and also take into account your work in explain-two.

dominiccooney added a commit that referenced this issue Aug 21, 2023
…719)

The Typewriter was integrated at a convenient point for other agents
possibly, but it didn't work well for Cody-VScode. VScode's
BotResponseMultiplexer buffers content to parse tags in the output,
which chunks the incremental effect a bit. The typewriter also slowed
down consumers like inline fixups which just want the complete code and
don't care if it appears "typed."

Since then, Cody-VScode has added LLM turns for keyword expansion and
context re-ranking. The typewriter effect also applied to these,
needlessly slowing them down. For me these turns are about twice as fast
now the typewriter is out of their path.

This moves the typewriter up to the MessageProvider, and only applies
the typewriter effect to output destined for the chat sidebar. This
makes all the code consumers faster because there's no animation delay.
(The "typing" is very smooth now—maybe too smooth—because there's no
coincidental chunking from the BotResponseMultiplexer.)

This adds a way for the consumer to drain the buffered content out of
the typewriter while still "animating" it, `flush`. It indicates how
much backpressure there was on the typewriter that we didn't really
notice this "chunk at the end" behavior.

See #714.

## Test plan

```
pnpm test:unit && pnpm test:integration && pnpm test:e2e
```

Manual test:

1. Open Cody chat sidebar.
2. Ask it to enumerate the 50 states of the US in order of accession
with their state flowers in a numbered list

The output should appear incrementally and completely.

---------

Co-authored-by: Naman Kumar <naman@outlook.in>
@dominiccooney
Copy link
Contributor Author

@arafatkatze Yeah, the typewriter throttle really comes way off when there's a lot of pressure behind it. I got some feedback from @toolmantim that the typing effect would be better word-by-word anyway. So let me simplify it a bit to do that and change the heuristic to not open the throttle so much. Then the integration you've done should work well.

Don't be slowed down by what I'm suggesting in explain-two, that needs work, it may be a while.

@arafatkatze
Copy link
Contributor

@dominiccooney Right now I was doing this as a clean way to typewrite. It's half finished(commit) but essentially my goal was to morph the typewritter interface to support longer delaytimes or changing the algo somehow so that context search stuff also gets gracefully printed. I am happy to wait for your algo change to merge so that I can merge my stuff on top of it.

@arafatkatze
Copy link
Contributor

@dominiccooney The other thing that I was thinking of but didn't pursue further was. Would it be a good idea to have some sort of a queue based system where the queue would take in all the incoming input and always "typewrite" it at the same graceful speed so that the algo doesn't have to have complex algo scenarios where the typing speed is somehow a function of

  1. Length of the input(Incase of context-search)
  2. Speed of incoming input(In case of LLM buffer inflow)

The typing speed is the same but the input can wary. Obviously there are many risky race conditions that can happen with this as i tried this in my attempts here doesn't work well for many reasons but I figured I would give it a shot.

@arafatkatze
Copy link
Contributor

I completely understand explain-two was an experimental work but the idea of two interactions seemed very interesting. My only recommendation and you probably already thought of it anyway would be to have an interaction manager interface just to separate out the functions for getting different interactions, showing different displays and executing different cases.

Copy link

This issue is marked as stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed automatically in 5 days.

@github-actions github-actions bot added the Stale label Jan 25, 2024
@arafatkatze
Copy link
Contributor

We have moved to a different API for this so I am closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants