Skip to content

Add text to audio task#969

Merged
osanseviero merged 13 commits intomainfrom
add_tta
Oct 5, 2023
Merged

Add text to audio task#969
osanseviero merged 13 commits intomainfrom
add_tta

Conversation

@Vaibhavs10
Copy link
Copy Markdown
Contributor

No description provided.

@osanseviero
Copy link
Copy Markdown
Contributor

@Vaibhavs10
Copy link
Copy Markdown
Contributor Author

I think I added it to all the relevant ones. Do you mind giving it a look again please @osanseviero ?

Copy link
Copy Markdown
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Please check that you have similar set of files as in https://github.com/search?q=repo%3Ahuggingface%2Fhub-docs+TEXT-to-speech+language%3ATypeScript&type=code&l=TypeScript

  • You also need to add it to tasksData (can lead to an undefined task page for now (or same as TTS?) - in any case, if not, let's open an issue for a follow-up task page
  • Need to specify which libraries (and here) support this task

Copy link
Copy Markdown
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, things are looking good! You will also need to

You can preview how this would look like in GitHub Codespaces as specified in https://github.com/huggingface/moon-landing#codespace

@Vaibhavs10
Copy link
Copy Markdown
Contributor Author

Noice. Going to to test this now!

@Vaibhavs10
Copy link
Copy Markdown
Contributor Author

Weird! I keep getting permission denied on codespace for anything I do. Is this the same for you too @osanseviero ?

@Vaibhavs10
Copy link
Copy Markdown
Contributor Author

Hey @mishig25

After your suggestions, I tried running npm ci and npm run prettier - no changes
then I tried npm run lint also resulted in no changes.

Send help! - welp!

@mishig25
Copy link
Copy Markdown
Collaborator

mishig25 commented Sep 27, 2023

Was TextToAudio widget was created in this PR ?
if so, it would good idea to provide a description on what the API shape looks like, plus screenshot, plus which model to test on. You can see an example here

After your suggestions, I tried running npm ci and npm run prettier - no changes
then I tried npm run lint also resulted in no changes.

npm run format:all was the one

"format:all": "npm run prettier src && npm run lint:all",

@Vaibhavs10
Copy link
Copy Markdown
Contributor Author

Hey @mishig25 - I just copied InferenceTextToSpeech widget over to text-to-audio.
So there are no new changes. I'll follow up with a separate PR to change the actual look.

Let me know if this is okay for you?

@osanseviero
Copy link
Copy Markdown
Contributor

I think we should reuse TextToSpeechWidget rather than reimplementing it, same way we reuse TabularDataWidget. API input/output is exactly the same

@mishig25
Copy link
Copy Markdown
Collaborator

Btw, is the task already supported in api-inference? Is there a model or curl script to test on ?

@Vaibhavs10
Copy link
Copy Markdown
Contributor Author

Hey! @mishig25 - Sorry for the delayed response, api-inference needed an update to make sure that the model can work.

Since the pipeline_tag text-to-audio is not yet supported the models cannot be directly curled, however, changing the pipeline_tag to text-to-speech does the trick and works - you can test it here: https://huggingface.co/reach-vb/musicgen-small?text=lo-fi+music

Do note that this is embarrassingly slow - will work on fixing that separately.

@osanseviero - I was thinking of making a follow-up PR to update the Icon for TTM, that's why I created those files. I think it'd be good to distinguish the two. However, if you feel strongly about this then I can revert the change. lmk.

I'll treat this as priority today, would be nice to get this merged soon.

@Vaibhavs10
Copy link
Copy Markdown
Contributor Author

Note: that text-to-speech is an alias to text-to-audio pipeline so the behaviour would be exactly the same for both.

Copy link
Copy Markdown
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok from my side to follow-up with the icon, but as mentioned before, we should reuse the widget, no need to implement a new one

@Vaibhavs10
Copy link
Copy Markdown
Contributor Author

@osanseviero - Removed all the custom TextToAudio code. I am waiting for the checks to run.

Is there anything else left for this to be good to merge? @mishig25 @osanseviero

@Vaibhavs10
Copy link
Copy Markdown
Contributor Author

Both the checks have passed! 🤗

@Vaibhavs10 Vaibhavs10 requested a review from osanseviero October 4, 2023 10:16
Copy link
Copy Markdown
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to merge if the TTA Inference API is working already

@Vaibhavs10
Copy link
Copy Markdown
Contributor Author

Good to merge, then?

For testing the inference API, feel free to head over to this URL: https://huggingface.co/reach-vb/musicgen-small

"fill-mask": IconFillMask,
"sentence-similarity": IconSentenceSimilarity,
"text-to-speech": IconTextToSpeech,
"text-to-audio": IconTextToAudio,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is exact same icon as IconTextToSpeech, why not just use IconTextToSpeech ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call: I fixed it in the latest commit.

@osanseviero
Copy link
Copy Markdown
Contributor

It seems stuck in loading or internal server errors though

@Vaibhavs10
Copy link
Copy Markdown
Contributor Author

@osanseviero -> The model is pretty much unusable on CPU (it takes an insane amount of time to generate).

I tested it from api-inference main - It returns a file as expected!
image

The plan is to enable these for GPU run-time once the pipeline is merged.

@osanseviero osanseviero merged commit ff7586c into main Oct 5, 2023
@osanseviero osanseviero deleted the add_tta branch October 5, 2023 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants