-
Notifications
You must be signed in to change notification settings - Fork 7.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I add speech-to-text and text-to-speech models when using xinference? #2359
Comments
🤖 Hey @andylzming! Great to see you back here, diving into more adventures with dify. 🚀 To add speech-to-text and text-to-speech models when using xinference in a self-hosted Docker environment with Dify version 0.5.2, you need to extend the provided Here is a simplified example of how you can do this: class MyText2SpeechModel(TongyiText2SpeechModel):
def _invoke(self, model: str, credentials: dict, content_text: str, streaming: bool, user: Optional[str] = None) -> any:
# Your implementation here
pass
def validate_credentials(self, model: str, credentials: dict, user: Optional[str] = None) -> None:
# Your implementation here
pass
def _tts_invoke(self, model: str, credentials: dict, content_text: str, user: Optional[str] = None) -> Response:
# Your implementation here
pass
def _tts_invoke_streaming(self, model: str, credentials: dict, content_text: str, user: Optional[str] = None) -> any:
# Your implementation here
pass
def _process_sentence(self, sentence: str, model: str, credentials: dict, audio_type: str):
# Your implementation here
pass You need to replace the Once you have implemented these methods, you can use your new class in your Docker environment. Make sure to provide the necessary credentials and other parameters when invoking the model. You can find more details about the Please note that I'm unable to view the images you've attached. If you need further assistance, could you please provide more details or context in text form? Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
See https://docs.dify.ai/tutorials/model-configuration/xinference#starting-xinference for more details. Please use the right template to fire a bug. |
@crazywoola Is the functionality of adding Image models and Audit models (including speech-to-text, text-to-speech) through xinference set to be implemented in the next version? |
We might integrate these models in our spare time, not sure though. If you're interested, you can contribute to us and we would really appreciate it! |
Self Checks
Dify version
0.5.2
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
How do I add speech-to-text and text-to-speech models when using xinference? Xinference already supports Image model and Audio model.
✔️ Expected Behavior
No response
❌ Actual Behavior
No response
The text was updated successfully, but these errors were encountered: