Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: A few suggestions to enhance speechgpt user experience #40

Closed
erbanku opened this issue Apr 6, 2023 · 3 comments
Closed
Labels
enhancement New feature or request

Comments

@erbanku
Copy link

erbanku commented Apr 6, 2023

Is your feature request related to a problem?

My feature request is related to several problems I am experiencing while using the current version of the speechgpt. I am frustrated when:

  1. The keyboard remains visible even after completing my input, which takes up unnecessary screen space and makes it harder to read the chat.
  2. The keyboard still shows up while I interact with the assistant using speech recognition, which is unnecessary in that scenario and can be distracting.
  3. Many average users need clarification on setting the speech recognition/synthesis language and language ID. So, I prefer an easier way to do this through environment variables and let the average users use it more easily with default configurations.
  4. When the assistant generates a lengthy response, I have to wait for the honest answer to be developed before I can listen or read it. Streaming output for both text and TTS would make this process smoother and more enjoyable.
  5. I often want to replay the assistant's response or my input via TTS but cannot curate more so, which can be inconvenient when I need to review previous interactions.

Describe the solution you'd like

  • Hide the keyboard after the user completes input and show back again after ChatGPT completes the response. This repo: ddiu8081/chatgpt-demo achieved this well. You can look around it if you like.
  • Do not show the keyboard when the user interacts with the assistant via speech recognition
  • Ability to set default speech recognition/synthesis language & language ID via environment variables. (As many average users find setting these at first a few confusing)
  • Assistant response streaming output, if it is possible, + streaming TTS output (This is very helpful when the assistant generates a long response)
  • Ability to replay the assistant response or the other input via the TTS engine

Additional context

No response

@erbanku erbanku added the enhancement New feature or request label Apr 6, 2023
@hahahumble
Copy link
Owner

  1. This is a great suggestion.
  2. My plan is to add an option that allows users to choose whether to display the keyboard during speech recognition, as speech recognition may produce errors, and displaying the keyboard would allow users to quickly correct mistakes.
  3. Different services have different supported languages and voices, so using environment variables for configuration might be complicated.
  4. Currently, I have not found any TTS API that supports streaming. A possible solution is to split the assistant's responses into multiple sentences and send multiple requests.
  5. This feature will be supported in future updates.

Thank you very much for your suggestions.

@Misaka-9982-coder
Copy link
Contributor

Perhaps these two bots can bring some inspiration.
Samantha: https://t.me/samantha_x64_bot
Sherlock: https://t.me/sherlock_myshell_ai_bot

@hahahumble
Copy link
Owner

Suggestions 1, 2, and 5 have been resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants