Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename speech-related subsystems and add a sample scene for dictation #11348

Merged
merged 19 commits into from
Jan 13, 2023

Conversation

MaxWang-MS
Copy link
Contributor

@MaxWang-MS MaxWang-MS commented Jan 5, 2023

Overview

  • Rename *SpeechRecognitionSubsystem to *DictationSubsystem and rename *PhraseRecognitionSubsystem to *KeywordRecognitionSubsystem based on internal feedback.
  • Add a sample scene for DictationSubsystem.
  • Add default profiles for the recognition subsystems.

@Zee2
Copy link
Contributor

Zee2 commented Jan 6, 2023

Missing a SampleSceneHandMenu, make sure it has one so that folks can navigate through the scenes!

@Zee2
Copy link
Contributor

Zee2 commented Jan 6, 2023

The naming is starting to confuse me. If "SpeechRecognition" is actually speech-to-text, or dictation/transcription (instead of event-based recognitions like see-it-say-it) can we call it Dictation or something?

/// events fired by SpeechRecognitionSubsystem.
/// </summary>
[AddComponentMenu("MRTK/Examples/Speech Recognition Handler")]
public class SpeechRecognitionHandler : MonoBehaviour
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it internal just so folks don't end up taking a dep on it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is a sample script (i.e. in Assembly-CSharp) so not sure if internaling it would do much.

@Zee2
Copy link
Contributor

Zee2 commented Jan 6, 2023

image

Mysterious threading error when exiting Play mode

@Zee2
Copy link
Contributor

Zee2 commented Jan 6, 2023

Huge lagspike when starting recognition

image

The StartRecognition method takes 230ms, most of it in DictationRecognizer.Create()

@Zee2
Copy link
Contributor

Zee2 commented Jan 6, 2023

Click the button to stop transcribing doesn't work, it just keeps going! I can take a video if you want, but clicking the button to stop it doesn't seem to have any effect at all

Zee2
Zee2 previously requested changes Jan 6, 2023
Copy link
Contributor

@Zee2 Zee2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully we can get the threading errors to go away. The only other thing I consider blocking is the fact that the button doesn't seem to stop transcription.

The perf spike is very unfortunate. Hopefully we can offload to a background thread.

@MaxWang-MS
Copy link
Contributor Author

@Zee2 I looked into the spike and tried to move the instantiate of the Unity DictationRecognizer to another thread, but unfortunately discovered that is not possible:
image

@MaxWang-MS
Copy link
Contributor Author

Fortunately, the spike only appears the first time the Unity recognizer gets created. I will be calling the recognizer constructor once in the subsystem constructor / Start() so that hopefully the hit is only taken at app launch if the user enables the subsystem via the profile.

@MaxWang-MS MaxWang-MS changed the title Add a sample scene for SpeechRecognitionSubsystem Rename speech-related subsystems and add a sample scene for dictation Jan 11, 2023
@MaxWang-MS
Copy link
Contributor Author

Fortunately, the spike only appears the first time the Unity recognizer gets created. I will be calling the recognizer constructor once in the subsystem constructor / Start() so that hopefully the hit is only taken at app launch if the user enables the subsystem via the profile.

Upon further investigation dummy initialization at start does not work. Will file a ticket to Unity regarding the time spike.

@MaxWang-MS
Copy link
Contributor Author

Click the button to stop transcribing doesn't work, it just keeps going! I can take a video if you want, but clicking the button to stop it doesn't seem to have any effect at all

Now fixed!

@MaxWang-MS
Copy link
Contributor Author

image

Mysterious threading error when exiting Play mode

Fixed as well.

{
return XRSubsystemHelpers.GetFirstRunningSubsystem<PhraseRecognitionSubsystem>() as PhraseRecognitionSubsystem;
return XRSubsystemHelpers.GetFirstRunningSubsystem<KeywordRecognitionSubsystem>() as KeywordRecognitionSubsystem;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is the as KeywordRecognitionSubsystem part needed? I thought this helper returned the same type as the passed-in T

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if we want to move this to the new pattern @Zee2 is introducing for the hands aggregator in #11333: https://github.com/microsoft/MixedRealityToolkit-Unity/pull/11333/files#diff-6bb0070ef5fe1b4e52210a345ab2f2c2f693e0b9cff69e18e05456aef8969376R160

Basically, deprecating the old helper from HandsUtils and putting it into XRSubsystemHelpers directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

/// <summary>
/// Event data associated with the result of dictation.
/// </summary>
public class DictationResultEventArgs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these event args reused or one-time-use? if the latter, could they be rewritten as readonly structs to more clearly express the design intent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now rewritten as readonly structs!

@MaxWang-MS MaxWang-MS enabled auto-merge (squash) January 12, 2023 23:52
@MaxWang-MS MaxWang-MS merged commit 0421c57 into microsoft:mrtk3 Jan 13, 2023
@MaxWang-MS MaxWang-MS deleted the add-speech-scene branch January 13, 2023 01:24
drusk-unity pushed a commit to drusk-unity/MixedRealityToolkit-Unity that referenced this pull request Jun 26, 2023
…microsoft#11348)

* Add a sample scene for SpeechRecognitionSubsystem

* Feedback

* Update WindowsSpeechRecognitionSubsystem.cs

* Interface and subsystem rename part 1

* Interface and subsystem rename part 2

* Interface and subsystem rename part 3

* Rename configs

* More renaming

* Rename sample scene

* Update DictationExample.unity

* Feedback

* Add private setters to expose properties in the inspector

* Add sample scene hand menu

* Better display of error message

Co-authored-by: Finn Sinclair <finnnorth@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants