-
Notifications
You must be signed in to change notification settings - Fork 823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema for specifying how to pronounce text #2108
Comments
As a supporting use case for this Google has been observed returning pronunciation information as a vertical in the SERPs. Thinking out loud, might it be useful to provide provenance information for the proposed TextPronunciation type here - i.e. from what source the data is provided? This could be accomplished by extending the domain of citation to include TextPronunciation (it would be unnecessary for audio.AudioObject, since it's already available to AudioObject). |
Also, looking at the Google example, might language and country to which the pronunciation applies need to be captured? E.g. the en_US pronunciation of "lieutenant" is different from the en_UK pronunciation; and referencing the provided example for City, "Montreal" is pronounced differently in English and in French. |
I was discussing this capability with someone recently but hadn't translated it into an issue yet. Things that came up in the discussion included:
I can also see the usefulness of adding an audio property. Potentially further support for the proposal in PR #1774 to add audio & video as properties of Thing alongside image. This would negate the need for this type specific range extension for audio. |
This is some of what Wikidata has as other forms ... https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Documentation
Where @RichardWallis has mentioned 2 of them so far. Also, I would assume that Wikidata's existing IPA Transcription (P898) would be a "sameAs" property for the proposed "phoneticText" ? |
Adding something like namePronunciation and PhoneticText makes sense. Since pronunciation varies with geography and time (and other dimensions), we also need to capture it. Adding attributes such as usedFrom , usedUntil, and usedInRegion to PhoneticText would solve this in a simple way, and it's extensible. Richard's solution of using BCP47 is elegant, though it only capture one dimension. |
It seems like this is replicating the Speech Synthesis Markup Language (SSML) which covers IPA and a range of other spoken words settings. SSML is already being used widely by the major voice interaction platforms. |
This looks like reinventing wheels where there are a couple of pretty good ones already. I think we should keep out of it. |
+1 to @nicolastorzec's suggestion to clarify how to combine SSML and schema.org. Authors find it hard enough to combine types within schema.org. I don't expect them to work in another markup language without some help. |
I share the concerns about reinventing wheels, but also share the concerns about helping authors get their heads around using multiple vocabularies. In the same way we delegate to ISO 8601 for date formats, could we not simply delegate to SSML for usage guidance on a new simple utility Schema type. The suggested PhoneticText subtype of Text could be shaped with properties to match that of SSML's phoneme. With a few well crafted examples we could satisfy the need without major vocabulary engineering, linking to a current wheel without inventing a new one. |
Wikidata provides properties that could be emulated, together with repeatable examples. |
Hello. I'm a member of the w3c Spoken Pronunciation Task Force. We're in the early goings but I've mocked up a couple of implementations using a schema that resembles/copies properties from SSML. Notable ones (JSON-LD and Ruby+Microdata) here: I'd really like your input since if one of these schema-based use cases are approved, it will likely become a normalized spec. One issue I haven't addressed is adding |
Returning to this and looking at how SSML is being used, the simplest thing would be to add a property to specify the phonic system used and then specify the appropriate string. So the above example becomes:
Using SSML for a different example, markup might look like:
|
I support moving forward on this. Questions regarding this simple proposal:
|
Richard, Vicki's example is for a US-specific name, but it may help to indicate the language explicitly.
I think for some scenarios, data consumers may want alternative pronunciations. Many words are pronounced differently in the US and UK, in some cases dramatically so (e.g., vase is |
@MichaelAndrews-RM is spot-on. Arguably, |
@MichaelAndrews-RM @jaygray0919 Multiple pronunciations of the same name could work simply thus:
However, it could soon get complex for an entity with several names in different languages such as:
Having effectively two arrays of properties (name, namePronunciation) that may or may not be in sync might be confusing, but I can't see another way to do it. Whatever should have at least one example that demonstrates how multiple pronunciations of a single name and names of multiple languages should be described. |
@RichardWallis @MichaelAndrews-RM A draft thought: Another variation here: |
Is there any possibility to append a language-tagged string such as |
@jaygray0919 Are you suggesting something like this?
Elegant, but unfortunately the A use-case that envisages a single name in a single language as addressed by @vholland is easily satisfied by the name/namePronounciation proposal. Unfortunately it would be not very usable once multiple locales and/or enter the use-case. From a vocabulary point of view, the introduction of a new datatype
could then be replaced by:
Also it would require work from data consumers to be able to recognise it. |
@MichaelAndrews-RM |
@RichardWallis your What can we do to help push your idea forward? aside: we cannot use a structure like |
I'm not sure the locale is important. After all, we're supplying the IPA pronunciation, the main thing is matching the phonetics used to the voice "pack" available to a TTS client. Here, I'm offering my own name as I hear it pronounced in English-speaking countries and in French-speaking countries as well as my personal preference, since I'm from the US. {
"@context": "http://schema.org",
"@type": "Person",
"name": {
"@type": "PronounceableText",
"value": "Paul Grenier",
"speechToTextMarkup": "IPA",
"defaultLanguage": "en",
"en": {
"phoneticText": "/pɔl ɡɹenɪəʳ/"
},
"fr": {
"phoneticText": "/pɑl ɡʁə.nje/"
}
},
"sameAs": "https://github.com/AutoSponge"
} Does this help with the multiple translations issue? |
@AutoSponge in our use-case your proposal won't work. Here is the reason. At run-time, when we expose the data to a processor, we need to server data according to a language selector. That means we need a compound key composed of |
@jaygray0919 I changed |
I appreciate your comments @AutoSponge . My point about the compound key was to say that we need to make a statement that is a combination of specific and unique "Type+Language+ID" where the ID structure holds the |
@jaygray0919 Understood. Just trying to find common ground. But at this point, I think the w3c will diverge toward SSML considering the wide-spread support. Exactly how the SSML semantics will appear in a JSON and/or microdata format is still under discussion. |
@RichardWallis @AutoSponge Any progress here? We would like to integrate a solution here with |
Like many a proposal this thread seems to have gone quiet. I could wake it up again by creating a Pull Request for my proposal for the introduction of a new datatype
could then be replaced by:
That is if folks are happy it would fit [most] use cases, and once implemented would be used, especially by data consumers. |
@RichardWallis We can work with your approach.
|
Created Pull Request #2352 |
PR has been merged (tx @RichardWallis :) Release text is
|
qq I still don't see TextPronunciation in webschemas.org or its pending layer. I see #2376 includes this, though it's not checked yet. Is this no longer planned for 6.0? |
It was renamed in process to PronounceableText and is visible on webschemas.org in the preview of V6.0. |
Included in schema.org 6.0 |
As devices are reading text more and more, there is greater need for specifying how to pronounce a bit of text. I realize this could get complicated fast. As most of the cases I have heard revolve around names, I propose the following to start exploring how to specify pronunciation.
namePronunciation
on Thing. The property expects the new typeTextPronunciation
.TextProncunciation
has the following properties:text
: The text to be pronounced.phoneticText
: The phonetic representation of the text property in IPA.audio
: An AudioObject that gives the pronunciationAn example would be:
The text was updated successfully, but these errors were encountered: