Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTS: Text to speech support #1808

Open
wants to merge 91 commits into
base: master
from

Conversation

@vyzigold
Copy link
Contributor

commented Aug 14, 2019

This pull request adds text to speech support on Linux, Windows and macOS (author of the macOS text to speech manager is @criezy ). It also uses this to add text to speech to the mortevielle engine, to the GUI (after allowing it in options) and it adds text to speech tests to the Testbed engine. This is a GSoC task described here.

To run the text to speech on Linux you need:

  • Install and configure speech-dispatcher, either through your package manager, or from github

  • Install one (or more) text to speech engine, supported by speech-dispatcher are: eSpeak, Festival, Flite, Pico. I tested it (end it worked) with eSpeak-ng and Festival.

Once you are successfuly able to use the speech-dispatcher to speak from the terminal, everything should be set up correctly to work with ScummVM. For more info on configuring speech-dispatcher, look at speech-dispatcher documentation

To run the text to speech on Windows you need:
(Do all of these steps even for MinGW-w64, which has it's own SAPI. Their implementation unfortunately doesn't have all the needed features.)

  • Install SAPI, for example from here

  • From the Program Files*\Microsoft Speech SDK 5.*\Include copy the sapi.h, sapiddk.h and sperror.h to your include directory, so the compiler finds them.

  • From the Program Files*\Microsoft Speech SDK 5.*\Lib*\ copy the sapi.lib to your lib directory, so the compiler finds them. (with MinGW-w64 delete or rename libsapi.a in the MinGW-w64's lib folder)

  • You have to explicitly enable the TTS when running configure script with the --enable-tts option. Or if you are using Visual Studio, run the create_project with --enable-tts option.

The text to speech is disabled by default on Windows, because the compilation currently ends with a warning: "Warning: corrupt .drectve at end of def file", which is caused by using a msvc compiled library (sapi.lib) with MinGW. Even with the warning, everything seems to work correctly.

vyzigold added some commits Jul 10, 2019

TTS: Add TTS checkbox to Options
Probably works only in the builtin theme right now.
TTS: Restrict TTS on linux to only english
Unfortunatedly the encoding used by ScummVM breaks the
speech-dispatcher, so after trying to say non-ascii character
the connection has to be restarted. So for now I am restricting
the GUI TTS to english only.
TTS: Convert strings to UTF-8
Conversion happens only for languages, that might needed (not
for english)
TTS: Prepare for windows TTS
Add windows configuration in configure
Add basic skeleton to backends
Check if ttsMan is initialized in GUI
TTS: Add reference counting to TTSVoice
Also refactor TTSVoice destruction to use this reference counting.
TTS: Fix voice setting on startup
The ScummVM was crashing because of an assert, when there was less
voices availaible, than what was set in the ConfMan.

Now the voice just falls back to 0th voice, if there are not
enough voices.

vyzigold and others added some commits Aug 2, 2019

TTS: Implement our own queuing for linux
It seems like, that at least some versions of speech-dispatcher
aren't able to successfuly pause and resume. For me, when trying
to pause, it still finishes the speech just being said instead
of pausing it and then it puts it at the end of the speech queue
with some speech-dispatcher internal commands added to it, which
are also hearable.

There is no way to find out where the speech ended when calling
pause, so it is just stopped and when resume is called it is
read from it's start again.
TESTBED: Make sure to process events while waiting for speech to finish
Same implementations of TextToSpeechManager may require system events
to be processed for the state synchronisation to work properly.

This commit also fixes a few typos or inconsistencies in some texts.
TTS: Implement *_NO_REPEAT actions and Fix state synchronization issu…
…es on macOS

The NSSpeechSynthesizer is asynchronous and does not immediately start, pause,
or stop the speech. As a result querrying the state of the NSSpeechSynthesizer
does not alwats return the expected result (for example isSpeaking may not
yet been true just after we requested starting to speak). So instead the
TextToSpeechManager on macOS keeps track of the state itself.
TTS: Add proper speech queuing, update INT_NO_REP.
Before I used SPD to queue messages and I had a copy of the queue,
so I could requeue everything when resume is called(). But more
control of the queue is needed, so I don't use the SPD's queue
and instead start speeches from my queue one by one from another
thread.

INTERRUPT_NO_REPEAT now behaves as described in the documentation
TTS: Document diferences in resume()
On linux the resume() behaves slightly differently than on
other platforms.
TTS: Implement our own queuing on windows.
Similarly as on linux, there isn't enough control of the speech
queue to properly implement INTERRUPT_NO_REPEAT. So since this
commit we use our own queuing and use SAPI to speak each speech.
This is done outside the main thread.
@vyzigold

This comment has been minimized.

Copy link
Contributor Author

commented Aug 14, 2019

There are 2 things, I would like to know the opinion of the community about.

  1. sphelper.h
    I use some functions from SAPI's sphelper.h. But because it is written to work with msvc only, I had to modifie it and create sphelper-scummvm.h so it works with MinGW too. Currently there are some warnings when compiling, which I could try to correct if needed. But I think it might be a better idea to leave the file as close to the original sphelper.h as possible, so that in the future, when there is a new version of SAPI, it is hopefuly easier to transition to that new version.

  2. GUI TTS enabling
    I added a new tab to the options (Accessibility), in which is currently only one checkbox, which enables or disables the TTS. I thought there might be some more accessibility features added in the future (it's kind of mentioned in the GSoC task description), so this tab could be used for that. But maybe it might be a better idea to remove the Accessibility tab and move the checkbox to the Misc tab until there are more Accessibility options.

}

void LinuxTextToSpeechManager::createVoice(int typeNumber, Common::TTSVoice::Gender gender, Common::TTSVoice::Age age, char *description) {
SPDVoiceType *type = (SPDVoiceType *) malloc(sizeof(SPDVoiceType));

This comment has been minimized.

Copy link
@bluegr

bluegr Aug 14, 2019

Member

Why do you need to allocate memory for a simple type?

This comment has been minimized.

Copy link
@vyzigold

vyzigold Aug 15, 2019

Author Contributor

Because I need to pass the vaule to the TTSVoice class as it's data field, which is defined as void *

The data field in the TTSVoice is meant for platform specific information about each voice. The WindowsTextToSpeechManager for example saves there a pointer to a whole class ISpObjectToken.

So because I needed a pointer to the value and I can't just pass a pointer to the stack, because the TTSVoice that is being created in this function outlives this function, I alocate space for it on the heap and copy it there.

This comment has been minimized.

Copy link
@bluegr

bluegr Aug 16, 2019

Member

Thanks for the explanation. Could you please add a corresponding comment to the code?

@@ -81,7 +81,7 @@ bool confirmWindowsVersion(int majorVersion, int minorVersion) {
return VerifyVersionInfoFunc(&versionInfo, VER_MAJORVERSION | VER_MINORVERSION, conditionMask);
}

wchar_t *ansiToUnicode(const char *s, uint codePage) {
wchar_t *ansiToUnicode(const char *s, unsigned int codePage) {

This comment has been minimized.

Copy link
@bluegr

bluegr Aug 14, 2019

Member

uint IS an assigned int: it’s defined as such in scummsys.h. So what’s the purpose of this change?

This comment has been minimized.

Copy link
@vyzigold

vyzigold Aug 15, 2019

Author Contributor

It didn't know the type when compiling with Visual Studio. I guess the scummsys.h wasn't included before this. But the fix seemed pretty easy (uint and unsigned int should be the same), so I changed it and didn't worry about it.

Is this something, I should change back?

This comment has been minimized.

Copy link
@bluegr

bluegr Aug 16, 2019

Member

Yes, please try and use the ScummVM specific types, they are defined appropriately for each platform we support

pthread_mutex_unlock(params->mutex);
return NULL;
}
if(spd_say(_connection, SPD_MESSAGE, params->speechQueue->front().c_str()) == -1) {

This comment has been minimized.

Copy link
@bluegr

bluegr Aug 14, 2019

Member

Spacing


// init voice
hr = CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_ALL, IID_ISpVoice, (void **)&_voice);
if (!SUCCEEDED(hr)) {

This comment has been minimized.

Copy link
@bluegr

bluegr Aug 14, 2019

Member

You use FAILED() above... if these two functions are mutually exclusive, you should use one or the other


_voice->SetOutput(_audio, FALSE);

if (_ttsState->_availableVoices.size() > 0)

This comment has been minimized.

Copy link
@bluegr

bluegr Aug 14, 2019

Member

Can’t you use empty() here?

Show resolved Hide resolved backends/text-to-speech/windows/windows-text-to-speech.cpp Outdated
Show resolved Hide resolved backends/text-to-speech/linux/linux-text-to-speech.cpp
* SPHelper.h *
*------------*
* Description:
* This is the header file for core helper functions implementation.

This comment has been minimized.

Copy link
@bluegr

bluegr Aug 14, 2019

Member

Does this file have any changes from the original? If yes, which ones?

This comment has been minimized.

Copy link
@vyzigold

vyzigold Aug 15, 2019

Author Contributor

It has changes, I had to make them, so it works with MinGW. I added a summary of the changes to the top of the file. I think documenting each changed line would be too much and it would end up being quite confusing when reading it.

This comment has been minimized.

Copy link
@bluegr

bluegr Aug 16, 2019

Member

Thanks, but where is this summary block? I can’t find it at the top of the file

This comment has been minimized.

Copy link
@vyzigold

vyzigold Aug 17, 2019

Author Contributor

It is added by this commit: e21ebee

configure Outdated
_tts=auto
_linux_tts=no
_windows_tts=no
_macosx_tts=no

This comment has been minimized.

Copy link
@bluegr

bluegr Aug 14, 2019

Member

Why do you need three defines here? You could just combine one define with the target platform

Show resolved Hide resolved backends/text-to-speech/windows/windows-text-to-speech.cpp
@criezy

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

I posted this in Discord/IRC last week as a teaser, but not everybody may have seen it. So here it is in action (on macOS):
TTS in the GUI: https://www.youtube.com/watch?v=z5T3Uh0zu1I
TTS in Mortville Manor: https://www.youtube.com/watch?v=DG00_a5qVKU&t=39s

@bluegr

This comment has been minimized.

Copy link
Member

commented Aug 15, 2019

Overall, pretty good work :) Well done!

@criezy: which speech engine is used in the videos?

@criezy

This comment has been minimized.

Copy link
Member

commented Aug 15, 2019

In the video it is using the macOS speech engine (the one that comes with the system).
@vyzigold has some slightly older recordings on both Linux and Windows in his youtube channel.

vyzigold added some commits Aug 15, 2019

TTS: Refactoring
Refactoring as suggested by bluegr on github.
TTS: Remove USE_PLATFORM_TTS defines
Use defined(USE_TTS) && defined(PLATFORM) instead
@criezy

This comment has been minimized.

Copy link
Member

commented on backends/module.mk in 4ad9d05 Aug 15, 2019

I have not tried, but I suspect that this here might be an issue on POSIX system other than Linux with a different TTS system (such as macOS).

@criezy

This comment has been minimized.

Copy link
Member

commented on backends/platform/sdl/posix/posix.cpp in 4ad9d05 Aug 15, 2019

This here will likely be an issue for classes that derive from OSystem_POSIX but have their own TTS system (such as OSystem_MacOSX ).

This comment has been minimized.

Copy link
Contributor Author

replied Aug 15, 2019

Oh, I forgot, that MacOS derives from OSystem_POSIX. So it might have been a good idea to have USE_PLATFORM_TTS afterall.

@criezy

This comment has been minimized.

Copy link
Member

commented on 4ad9d05 Aug 15, 2019

I think you might want to keep a define in configure for the Linux TTS. But rather than USE_LINUX_TTS it could be more specific, such as USE_SPEECH_DISPATCHER (actually it might maybe make sense to rename the LinuxTextToSpeechManager class as well). You can check how unity is handled in configure and in the code for the UnityTaskbarManager (this is a similar case as we have instead the MacOSXTaskbarManager on macOS.

This comment has been minimized.

Copy link
Contributor Author

replied Aug 15, 2019

Ok, I will do that. And rename LinuxTextToSpeechManager to what? SpeechDispatcherManager?

This comment has been minimized.

Copy link
Member

replied Aug 15, 2019

Yes, SpeechDispatcherManager sounds good.

TTS: Rename LinuxTextToSpeechManager to SpeechDispatcherManager
Add a new define for the SpeechDispatcherManager
@vyzigold

This comment has been minimized.

Copy link
Contributor Author

commented Aug 15, 2019

@bluegr Thank you for the review, I think I addressed all of the issues you pointed out. As for the recordings on my youtube channel, even though they are older, everything on the videos should still be working the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.