Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTS: Text to speech support #1808

Merged
merged 95 commits into from Sep 1, 2019

Conversation

@vyzigold
Copy link
Contributor

commented Aug 14, 2019

This pull request adds text to speech support on Linux, Windows and macOS (author of the macOS text to speech manager is @criezy ). It also uses this to add text to speech to the mortevielle engine, to the GUI (after allowing it in options) and it adds text to speech tests to the Testbed engine. This is a GSoC task described here.

To run the text to speech on Linux you need:

  • Install and configure speech-dispatcher, either through your package manager, or from github

  • Install one (or more) text to speech engine, supported by speech-dispatcher are: eSpeak, Festival, Flite, Pico. I tested it (end it worked) with eSpeak-ng and Festival.

Once you are successfuly able to use the speech-dispatcher to speak from the terminal, everything should be set up correctly to work with ScummVM. For more info on configuring speech-dispatcher, look at speech-dispatcher documentation

To run the text to speech on Windows you need:
(Do all of these steps even for MinGW-w64, which has it's own SAPI. Their implementation unfortunately doesn't have all the needed features.)

  • Install SAPI, for example from here

  • From the Program Files*\Microsoft Speech SDK 5.*\Include copy the sapi.h, sapiddk.h and sperror.h to your include directory, so the compiler finds them.

  • From the Program Files*\Microsoft Speech SDK 5.*\Lib*\ copy the sapi.lib to your lib directory, so the compiler finds them. (with MinGW-w64 delete or rename libsapi.a in the MinGW-w64's lib folder)

  • You have to explicitly enable the TTS when running configure script with the --enable-tts option. Or if you are using Visual Studio, run the create_project with --enable-tts option.

The text to speech is disabled by default on Windows, because the compilation currently ends with a warning: "Warning: corrupt .drectve at end of def file", which is caused by using a msvc compiled library (sapi.lib) with MinGW. Even with the warning, everything seems to work correctly.

@vyzigold

This comment has been minimized.

Copy link
Contributor Author

commented Aug 14, 2019

There are 2 things, I would like to know the opinion of the community about.

  1. sphelper.h
    I use some functions from SAPI's sphelper.h. But because it is written to work with msvc only, I had to modifie it and create sphelper-scummvm.h so it works with MinGW too. Currently there are some warnings when compiling, which I could try to correct if needed. But I think it might be a better idea to leave the file as close to the original sphelper.h as possible, so that in the future, when there is a new version of SAPI, it is hopefuly easier to transition to that new version.

  2. GUI TTS enabling
    I added a new tab to the options (Accessibility), in which is currently only one checkbox, which enables or disables the TTS. I thought there might be some more accessibility features added in the future (it's kind of mentioned in the GSoC task description), so this tab could be used for that. But maybe it might be a better idea to remove the Accessibility tab and move the checkbox to the Misc tab until there are more Accessibility options.

@@ -81,7 +81,7 @@ bool confirmWindowsVersion(int majorVersion, int minorVersion) {
return VerifyVersionInfoFunc(&versionInfo, VER_MAJORVERSION | VER_MINORVERSION, conditionMask);
}

wchar_t *ansiToUnicode(const char *s, uint codePage) {
wchar_t *ansiToUnicode(const char *s, unsigned int codePage) {

This comment has been minimized.

Copy link
@bluegr

bluegr Aug 14, 2019

Member

uint IS an assigned int: it’s defined as such in scummsys.h. So what’s the purpose of this change?

This comment has been minimized.

Copy link
@vyzigold

vyzigold Aug 15, 2019

Author Contributor

It didn't know the type when compiling with Visual Studio. I guess the scummsys.h wasn't included before this. But the fix seemed pretty easy (uint and unsigned int should be the same), so I changed it and didn't worry about it.

Is this something, I should change back?

This comment has been minimized.

Copy link
@bluegr

bluegr Aug 16, 2019

Member

Yes, please try and use the ScummVM specific types, they are defined appropriately for each platform we support

pthread_mutex_unlock(params->mutex);
return NULL;
}
if(spd_say(_connection, SPD_MESSAGE, params->speechQueue->front().c_str()) == -1) {

This comment has been minimized.

Copy link
@bluegr

bluegr Aug 14, 2019

Member

Spacing


// init voice
hr = CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_ALL, IID_ISpVoice, (void **)&_voice);
if (!SUCCEEDED(hr)) {

This comment has been minimized.

Copy link
@bluegr

bluegr Aug 14, 2019

Member

You use FAILED() above... if these two functions are mutually exclusive, you should use one or the other


_voice->SetOutput(_audio, FALSE);

if (_ttsState->_availableVoices.size() > 0)

This comment has been minimized.

Copy link
@bluegr

bluegr Aug 14, 2019

Member

Can’t you use empty() here?

configure Outdated Show resolved Hide resolved
@criezy

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

I posted this in Discord/IRC last week as a teaser, but not everybody may have seen it. So here it is in action (on macOS):
TTS in the GUI: https://www.youtube.com/watch?v=z5T3Uh0zu1I
TTS in Mortville Manor: https://www.youtube.com/watch?v=DG00_a5qVKU&t=39s

@bluegr

This comment has been minimized.

Copy link
Member

commented Aug 15, 2019

Overall, pretty good work :) Well done!

@criezy: which speech engine is used in the videos?

@criezy

This comment has been minimized.

Copy link
Member

commented Aug 15, 2019

In the video it is using the macOS speech engine (the one that comes with the system).
@vyzigold has some slightly older recordings on both Linux and Windows in his youtube channel.

@vyzigold

This comment has been minimized.

Copy link
Contributor Author

commented Aug 15, 2019

@bluegr Thank you for the review, I think I addressed all of the issues you pointed out. As for the recordings on my youtube channel, even though they are older, everything on the videos should still be working the same.

@vyzigold

This comment has been minimized.

Copy link
Contributor Author

commented Aug 19, 2019

@bluegr I added more documentation to the TTSVoice and a comment to the malloc. I also changed the unsigned int back to uint in the win32_wrapper. I hope everything is allright now.

@bluegr

This comment has been minimized.

Copy link
Member

commented Aug 20, 2019

Many thanks! This looks OK now, well done, and nice work overall!

+1 from me

@bluegr

This comment has been minimized.

Copy link
Member

commented Aug 20, 2019

One more question: I do not understand why LinuxTextToSpeechManager has been renamed to SpeechDispatcherManager. Now its name is different from the other two (the Windows and the MacOS one).

Could someone explain the rationale here?

@vyzigold? @criezy?

@bluegr
bluegr approved these changes Aug 20, 2019
@criezy

This comment has been minimized.

Copy link
Member

commented Aug 21, 2019

@bluegr I suggested to rename it as this specific implementation uses the speech dispatcher library, but we could envisage other implementations on Linux. This is a bit similar to the UnityTaskbarManager having been named as it is and not LinuxTaskbarManager. On the other hand the macOS and Windows implementations use the TTS API from the system rather than a third party library.

@bluegr

This comment has been minimized.

Copy link
Member

commented Aug 24, 2019

@criezy: thanks for the explanation :)

@bluegr

This comment has been minimized.

Copy link
Member

commented Aug 25, 2019

This is now in a good state to be merged to master, IMHO. Is there any other pending issue? If not, I would like to have this merged by next weekend (Sunday, 1st of September)

@vyzigold

This comment has been minimized.

Copy link
Contributor Author

commented Aug 25, 2019

It has merge conflicts because of the Common::Encoding PR #1809. And since the Common::Encoding is merged and the TTS needs to convert between encodings, I would like to use that. After that, I don't know of any issue.

I will resolve the conflicts and reimplement the encoding conversions using Common::Encoding tomorrow, since today I am quite busy.

vyzigold added 13 commits Aug 12, 2019
TTS: Document diferences in resume()
On linux the resume() behaves slightly differently than on
other platforms.
TTS: Implement our own queuing on windows.
Similarly as on linux, there isn't enough control of the speech
queue to properly implement INTERRUPT_NO_REPEAT. So since this
commit we use our own queuing and use SAPI to speak each speech.
This is done outside the main thread.
TTS: Refactoring
Refactoring as suggested by bluegr on github.
TTS: Remove USE_PLATFORM_TTS defines
Use defined(USE_TTS) && defined(PLATFORM) instead
TTS: Rename LinuxTextToSpeechManager to SpeechDispatcherManager
Add a new define for the SpeechDispatcherManager

@vyzigold vyzigold force-pushed the vyzigold:tts branch 2 times, most recently from 2e0839b to 6fce403 Aug 26, 2019

@vyzigold

This comment has been minimized.

Copy link
Contributor Author

commented Aug 26, 2019

I rebased the branch on top of the current master. I replaced the character encoding conversion code with the Common::Encoding::convert(), which should support more encodings. And because this pull request adds speech synthesis to the mortevielle engine, which uses CP850 character encoding and Common::Encoding didn't support this encoding when compiling without iconv and using SDL, which is also compiled without iconv, I added a CP850 to unicode (and back) conversion table to the Common::Encoding.

vyzigold added 2 commits Aug 26, 2019
COMMON: Add CP850 conversion.
CP850 is used by the mortevielle engine (and apparently by other
engines too). Anytime an engine using CP850 encoding wants to use
the TTS, the encoding has to be converted, so this is pretty
important encoding conversion to support. Unfortunately SDL
(when compiled without iconv) doesn't support this encoding
(which means, there might not be a way to convert this encoding
on some platforms), so I added a conversion table for this.
@bluegr

This comment has been minimized.

Copy link
Member

commented Aug 27, 2019

Apart from codepage 850, there’s also 437. Perhaps this should be supported natively as well?

https://en.m.wikipedia.org/wiki/Code_page_437

@neuromancer

This comment has been minimized.

Copy link

commented Aug 29, 2019

is it planed to offer several voices for different "actors" in the game engines? (if most engines allow such distinction)

@vyzigold

This comment has been minimized.

Copy link
Contributor Author

commented Aug 29, 2019

Yes, but it is not guaranteed, there might not be any voice available in some situations. This is why in the mortevielle engine we use as many voices as are available, but we also change the pitch a little bit for each character (so the voice sounds at least a little bit different, if there are not enough voices for each character). We also change the pitch a lot more, if we have to use a voice with a different gender, than the character. So if for example there are only male voices available, but we have a female character, we use a male voice, but we set the pitch really high.

@bluegr

This comment has been minimized.

Copy link
Member

commented Sep 1, 2019

Great work! This can be merged now

@bluegr bluegr merged commit d4a98dc into scummvm:master Sep 1, 2019

1 of 2 checks passed

Codacy/PR Quality Review Not up to standards. This pull request quality could be better.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@digitall

This comment has been minimized.

Copy link
Member

commented Sep 2, 2019

@vyzigold : Thanks for the work on TTS. However, this has broken the OSX Intel build on the buildbot. I think the issue is that the SDK we use is for OSX 10.4 which is the earliest version we support and thus the TTS API may have changed or be not present in that version. Any ideas, apart from just dropping support for earlier OSX versions which is not optimal...
See following for details:
https://buildbot.scummvm.org/builders/master-osx_intel/builds/1896/steps/compile/logs/stdio
https://wiki.scummvm.org/index.php?title=Mac_OS_X

@criezy

This comment has been minimized.

Copy link
Member

commented Sep 2, 2019

Indeed, some of the API I used to implement TTS on macOS were introduced in OSX 10.5. Since this is the minimal version supported for the official releases, I did not think buildbot was using an older SDK (the PPC official release does support OS X 10.4 though, but buildbot doesn't have a OS X PPC build).

I can look at trying to fix this so that we have TTS support on OS X 10.4. But my suggestion would be instead to:

  • Add a configure test to disable TTS support on older OS X versions.
  • Update the toolchain and SDK on buildbot to use the 10.5 SDK. Bringing it in line with what I am using for the official releases would be a good idea anyway I think. I have also been thinking of adding the toolchain for the 64 bits / OS X 10.7 builds).
  • In the meantime we could explicitly disable TTS for the macOS build on buildbot (with --disable-tts).
@vyzigold

This comment has been minimized.

Copy link
Contributor Author

commented Sep 2, 2019

@digitall I noticed, that the OSX build is broken by this pull request, but because I don't have anything with OSX and I have never seen any Objective-C++ code before this GSoC, I wasn't able to do anything about it. I hope criezy manages to fix the code, so it works with the 10.4 SDK, otherwise it will probably have to be disabled by the configure script for older versions.

@digitall

This comment has been minimized.

Copy link
Member

commented Sep 2, 2019

@vyzigold : Ah OK. No problem then. I am sure @criezy and I can do something to sort out the OSX builds on buildbot, so not a problem for you.

@digitall

This comment has been minimized.

Copy link
Member

commented Sep 2, 2019

@criezy: I think it would be best to bring the buildbot up to date to 10.5 SDK as per your release builds and add a 64-bit OSX Intel build since we are already getting bug reports from the OSX 10.15 dropping of 32-bit build support: https://bugs.scummvm.org/ticket/11136

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.