New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add Edge TTS provider #30
Conversation
@xtmu Thanks for contributing. Love this! Will try to debug this on or after Monday. I'm quite curious about if Edge TTS could be used to convert a whole book without being banned by Microsoft. BTW, We have a tiny discord server https://discord.gg/pgp2G8zhS7. Invite you to join if you want to discuss anything. |
For now it could be used to converting a whole book. Here is some information: edge-tts bypassed text length limit and seems won't be banned if conections are not thousand parallel. |
retrieve title from fallback tag: <h1>,<h2>,<h3>.
Looks good. Will test, review and merge asap! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great Pull Request. I went through the code and conducted tests, and it's excellent. Thank you. In the past, I often used the voice zh-CN-YunyeNeural
, but it's not supported by edge TTS. I wonder if there are any other similar recommendations.
@@ -45,7 +45,12 @@ def get_chapters(self, break_string) -> List[Tuple[str, str]]: | |||
for item in self.book.get_items_of_type(ebooklib.ITEM_DOCUMENT): | |||
content = item.get_content() | |||
soup = BeautifulSoup(content, "lxml") | |||
title = soup.title.string if soup.title else "" | |||
title = "" | |||
title_levels = ['title', 'h1', 'h2', 'h3'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about the possibility that items labeled as h1 h2 h3 could be non section title. However, it's not a big issue, and if there is indeed a problem, we can fix it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for testing. Yunjian could be an alternative, in addition, you can adjust voice_pitch and voice_rate, even for female voice, it could be sound like male if lowered by 50Hz. I use this script to try out voice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about the possibility that items labeled as h1 h2 h3 could be non section title. However, it's not a big issue, and if there is indeed a problem, we can fix it later.
Where did you get your audio book? I have converted my favorite two books, each page's <head>
element content was somehow cleared, though, their section info is designed to be in the h1 or h2 element, so I still can get the exact section title, in the past code, I get the first 100 characters.
soup sample:
<?xml version='1.0' encoding='utf-8'?><!DOCTYPE html>
<html epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/#" lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head></head>
<body>
<h2>Section XX</h2>
Edge TTS and Azure TTS are almost same,but Edge TTS don't require API Key because it's based on Edge read aloud functionality, it's free to use.