How do I add speech-to-text and text-to-speech models when using xinference? #2359

andylzming · 2024-02-02T08:35:32Z

Self Checks

I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to file this report (我已阅读并同意 Language Policy).

Dify version

0.5.2

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

How do I add speech-to-text and text-to-speech models when using xinference? Xinference already supports Image model and Audio model.

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

dosubot · 2024-02-02T08:35:48Z

🤖

Hey @andylzming! Great to see you back here, diving into more adventures with dify. 🚀

To add speech-to-text and text-to-speech models when using xinference in a self-hosted Docker environment with Dify version 0.5.2, you need to extend the provided TongyiText2SpeechModel class and implement the required methods.

Here is a simplified example of how you can do this:

class MyText2SpeechModel(TongyiText2SpeechModel):
    def _invoke(self, model: str, credentials: dict, content_text: str, streaming: bool, user: Optional[str] = None) -> any:
        # Your implementation here
        pass

    def validate_credentials(self, model: str, credentials: dict, user: Optional[str] = None) -> None:
        # Your implementation here
        pass

    def _tts_invoke(self, model: str, credentials: dict, content_text: str, user: Optional[str] = None) -> Response:
        # Your implementation here
        pass

    def _tts_invoke_streaming(self, model: str, credentials: dict, content_text: str, user: Optional[str] = None) -> any:
        # Your implementation here
        pass

    def _process_sentence(self, sentence: str, model: str, credentials: dict, audio_type: str):
        # Your implementation here
        pass

You need to replace the # Your implementation here comments with your own code. The _invoke method is used to invoke the text-to-speech model, _tts_invoke and _tts_invoke_streaming are used to process the text and convert it to speech, and _process_sentence is used to process each sentence individually. The validate_credentials method is used to validate the credentials for the text-to-speech model.

Once you have implemented these methods, you can use your new class in your Docker environment. Make sure to provide the necessary credentials and other parameters when invoking the model.

You can find more details about the TongyiText2SpeechModel class in the dify codebase.

Please note that I'm unable to view the images you've attached. If you need further assistance, could you please provide more details or context in text form?

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

crazywoola · 2024-02-02T09:17:50Z

See https://docs.dify.ai/tutorials/model-configuration/xinference#starting-xinference for more details. Please use the right template to fire a bug.

andylzming · 2024-02-05T07:50:35Z

@crazywoola Is the functionality of adding Image models and Audit models (including speech-to-text, text-to-speech) through xinference set to be implemented in the next version?

takatost · 2024-02-07T08:30:36Z

@crazywoola Is the functionality of adding Image models and Audit models (including speech-to-text, text-to-speech) through xinference set to be implemented in the next version?

We might integrate these models in our spare time, not sure though. If you're interested, you can contribute to us and we would really appreciate it!

leslie2046 · 2024-03-30T14:17:16Z

#3045

andylzming added the 🐞 bug Something isn't working label Feb 2, 2024

dosubot bot added 📚 documentation Improvements or additions to documentation NewModelSupport labels Feb 2, 2024

crazywoola closed this as completed Feb 2, 2024

crazywoola reopened this Feb 2, 2024

crazywoola removed the 🐞 bug Something isn't working label Feb 2, 2024

crazywoola removed the NewModelSupport label Mar 29, 2024

crazywoola mentioned this issue Mar 31, 2024

feat:xinference audio model support #3045

Merged

14 tasks

crazywoola closed this as completed in #3045 Mar 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I add speech-to-text and text-to-speech models when using xinference? #2359

How do I add speech-to-text and text-to-speech models when using xinference? #2359

andylzming commented Feb 2, 2024 •

edited

Loading

dosubot bot commented Feb 2, 2024 •

edited

Loading

crazywoola commented Feb 2, 2024 •

edited

Loading

andylzming commented Feb 5, 2024

takatost commented Feb 7, 2024

leslie2046 commented Mar 30, 2024

How do I add speech-to-text and text-to-speech models when using xinference? #2359

How do I add speech-to-text and text-to-speech models when using xinference? #2359

Comments

andylzming commented Feb 2, 2024 • edited Loading

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

dosubot bot commented Feb 2, 2024 • edited Loading

Sources

crazywoola commented Feb 2, 2024 • edited Loading

andylzming commented Feb 5, 2024

takatost commented Feb 7, 2024

leslie2046 commented Mar 30, 2024

andylzming commented Feb 2, 2024 •

edited

Loading

dosubot bot commented Feb 2, 2024 •

edited

Loading

crazywoola commented Feb 2, 2024 •

edited

Loading