Schema for specifying how to pronounce text #2108

vholland · 2018-12-20T17:26:51Z

As devices are reading text more and more, there is greater need for specifying how to pronounce a bit of text. I realize this could get complicated fast. As most of the cases I have heard revolve around names, I propose the following to start exploring how to specify pronunciation.

Create a new property namePronunciation on Thing. The property expects the new type TextPronunciation.
TextProncunciation has the following properties:
text: The text to be pronounced.
phoneticText: The phonetic representation of the text property in IPA.
audio: An AudioObject that gives the pronunciation

An example would be:

{
  "@context": "http://schema.org/",
  "@type": "City",
  "namePronunciation": {
    "@type": "TextPronunciation",
    "text": "Worcester",
    "phoneticText": "/ˈwʊstɚ/"
  }
}

The text was updated successfully, but these errors were encountered:

Aaranged · 2018-12-20T17:38:28Z

As a supporting use case for this Google has been observed returning pronunciation information as a vertical in the SERPs.

Thinking out loud, might it be useful to provide provenance information for the proposed TextPronunciation type here - i.e. from what source the data is provided? This could be accomplished by extending the domain of citation to include TextPronunciation (it would be unnecessary for audio.AudioObject, since it's already available to AudioObject).

Aaranged · 2018-12-20T18:00:19Z

Also, looking at the Google example, might language and country to which the pronunciation applies need to be captured? E.g. the en_US pronunciation of "lieutenant" is different from the en_UK pronunciation; and referencing the provided example for City, "Montreal" is pronounced differently in English and in French.

RichardWallis · 2018-12-20T18:06:04Z

I was discussing this capability with someone recently but hadn't translated it into an issue yet.

Things that came up in the discussion included:

Ability to represent differing pronunciations. For example the name "Houston" which is pronounced as /ˈhjuːstən/ when it refers to the city in Texas and /ˈhaʊstən/ in the name of the Houston Street in New York.
Most need for this is around names, but could be applicable in other areas.
Probably, as per @vholland's proposal, it would be best implemented with a new type.
One option would be a new PhoneticName subtype of Role with a phoneticName property.
Another being a new PhoneticText sub datatype of Text. This could then be used anywhere that Text currently can.
Yet another, very different, option would be to use the capabilities current capabilities of the BCP 47 language tag definitions. These are used to define the language format of strings using tags such as "en-US". As I read the standard, adding the suffix 'fonipa' to a tag thus "en-US-fonipa" would indicate that the associated string contains the phonetic representation of a US English string. This could then be implemented by adding multiple name or alternateName properties.

I can also see the usefulness of adding an audio property. Potentially further support for the proposal in PR #1774 to add audio & video as properties of Thing alongside image. This would negate the need for this type specific range extension for audio.

thadguidry · 2018-12-20T20:32:14Z

This is some of what Wikidata has as other forms ... https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Documentation

A list of Form Statements further describing the Form or its relations to other Forms or Items (e.g. IPA transcription (P898), pronunciation audio, rhymes with, used until, used in region)

Where @RichardWallis has mentioned 2 of them so far.

Also, I would assume that Wikidata's existing IPA Transcription (P898) would be a "sameAs" property for the proposed "phoneticText" ?

nicolastorzec · 2018-12-21T02:58:16Z

Adding something like namePronunciation and PhoneticText makes sense.

Since pronunciation varies with geography and time (and other dimensions), we also need to capture it.

Adding attributes such as usedFrom , usedUntil, and usedInRegion to PhoneticText would solve this in a simple way, and it's extensible.

Richard's solution of using BCP47 is elegant, though it only capture one dimension.

MichaelAndrews-RM · 2018-12-21T08:02:41Z

It seems like this is replicating the Speech Synthesis Markup Language (SSML) which covers IPA and a range of other spoken words settings. SSML is already being used widely by the major voice interaction platforms.

chaals · 2018-12-21T08:50:55Z

This looks like reinventing wheels where there are a couple of pretty good ones already. I think we should keep out of it.

nicolastorzec · 2018-12-28T02:22:26Z

SSML does have tags such as say-as and phoneme to specify how words, phrases, and/or sentences should be pronounced.

Having said that, we may still want to clarify how to combine SSML and schema.org markups in practice.

vholland · 2019-01-02T19:02:49Z

+1 to @nicolastorzec's suggestion to clarify how to combine SSML and schema.org. Authors find it hard enough to combine types within schema.org. I don't expect them to work in another markup language without some help.

RichardWallis · 2019-01-16T15:54:37Z

I share the concerns about reinventing wheels, but also share the concerns about helping authors get their heads around using multiple vocabularies.

In the same way we delegate to ISO 8601 for date formats, could we not simply delegate to SSML for usage guidance on a new simple utility Schema type. The suggested PhoneticText subtype of Text could be shaped with properties to match that of SSML's phoneme. With a few well crafted examples we could satisfy the need without major vocabulary engineering, linking to a current wheel without inventing a new one.

jaygray0919 · 2019-01-22T11:33:09Z

Wikidata provides properties that could be emulated, together with repeatable examples.

AutoSponge · 2019-04-24T14:37:26Z

Hello. I'm a member of the w3c Spoken Pronunciation Task Force. We're in the early goings but I've mocked up a couple of implementations using a schema that resembles/copies properties from SSML. Notable ones (JSON-LD and Ruby+Microdata) here:

I'd really like your input since if one of these schema-based use cases are approved, it will likely become a normalized spec. One issue I haven't addressed is adding lang_locale property for voice suggestion (very important since none of the current voices used by browsers pronounce the entire IPA alphabet). That may be solved by html lang=* or by the lang attribute of an element in HTML, but it should be in the schema.

vholland · 2019-05-17T15:20:47Z

Returning to this and looking at how SSML is being used, the simplest thing would be to add a property to specify the phonic system used and then specify the appropriate string. So the above example becomes:

{
  "@context": "http://schema.org/",
  "@type": "City",
  "namePronunciation": {
    "@type": "TextPronunciation",
    "text": "Worcester",
    "speechToTextMarkup": "IPA",
    "phoneticText": "/ˈwʊstɚ/"
  }
}

Using SSML for a different example, markup might look like:

{
  "@context": "http://schema.org/",
  "@type": "RadioStation",
  "namePronunciation": {
    "@type": "TextPronunciation",
    "text": "WKRP",
    "speechToTextMarkup": "SSML",
    "phoneticText": "<speak><say-as interpret-as=\"characters\">WKRP</say-as>"
  }
}

RichardWallis · 2019-05-17T16:15:31Z

I support moving forward on this. Questions regarding this simple proposal:

Is it sufficient to only handle the pronunciation of names?
Will this be a new property for Thing?
How would this be used to represent multiple names in differing languages?

MichaelAndrews-RM · 2019-05-18T07:14:26Z

Richard,

Vicki's example is for a US-specific name, but it may help to indicate the language explicitly.

{
  "@context": "http://schema.org/",
  "@type": "City",
  "namePronunciation": {
    "@type": "TextPronunciation",
    "inLanguage": "en-US",
    "text": "Worcester",
    "speechToTextMarkup": "IPA",
    "phoneticText": "/ˈwʊstɚ/"
  }
}

I think for some scenarios, data consumers may want alternative pronunciations. Many words are pronounced differently in the US and UK, in some cases dramatically so (e.g., vase is vās in en-US and va:z in en-GB.)

jaygray0919 · 2019-05-19T12:10:19Z

@MichaelAndrews-RM is spot-on. Arguably, inLanguage is a mandatory property of TextPronunciation. Otherwise, phoneticText is missing context. It's important to take an non-en-US perspective here (!en-US) as this context issue is present for many language variations

RichardWallis · 2019-05-19T19:05:33Z

@MichaelAndrews-RM @jaygray0919 Multiple pronunciations of the same name could work simply thus:

{
	"@context": "http://schema.org/",
	"@type": "Product",
	"name": "vase",
	"namePronunciation": [
		{
			"@type": "TextPronunciation",
			"inLanguage": "en-US",
			"text": "vase",
			"speechToTextMarkup": "IPA",
			"phoneticText": "vās"
		},
		{
			"@type": "TextPronunciation",
			"inLanguage": "en-GB",
			"text": "vase",
			"speechToTextMarkup": "IPA",
			"phoneticText": "va:z"
		}
	]
}

However, it could soon get complex for an entity with several names in different languages such as:

{
	"@context": "http://schema.org/",
	"@type": "Product",
	"description": "A writing implement",
	"name": [
		{
			"@language": "en",
			"@value": "Pen"
		},
		{
			"@language": "fr",
			"@value": "Plume"
		}
	],
	"namePronunciation": [
		{
			"@type": "TextPronunciation",
			"inLanguage": "en-US",
			"text": {
				"@language": "en",
				"@value": "pen"
			},
			"speechToTextMarkup": "IPA",
			"phoneticText": "/pɛn/"
		},
		{
			"@type": "TextPronunciation",
			"inLanguage": "fr-FR",
			"text": {
				"@language": "fr",
				"@value": "plume"
			},
			"speechToTextMarkup": "IPA",
			"phoneticText": "plym"
		}
	]
}

Having effectively two arrays of properties (name, namePronunciation) that may or may not be in sync might be confusing, but I can't see another way to do it.

Whatever should have at least one example that demonstrates how multiple pronunciations of a single name and names of multiple languages should be described.

jaygray0919 · 2019-05-20T11:06:29Z

@RichardWallis @MichaelAndrews-RM

A draft thought:
Perhaps we could separate name from namePronounciation.
Instead, a property labelPronounciation is specific to @Language.
A @Language array would apply a value to each labelPronounciation.
name remains a property of @Thing.

Another variation here:
label is property specific to @Language (which unfortunately steps on RDF label)
but i can't think of another term (at the moment) that would uniquely specify a 'pronunciation item' for a specific @Language

MichaelAndrews-RM · 2019-05-20T12:00:16Z

@jaygray0919 @RichardWallis

Is there any possibility to append a language-tagged string such as “plume”@fr to avoid breaking out the @Language context? Or is that unsupported?

RichardWallis · 2019-05-20T12:01:12Z

@jaygray0919 Are you suggesting something like this?

{
    "@context": "http://schema.org/",
    "@type": "Product",
    "description": "A writing implement",
    "name": {
        "@language": "en",
        "@value": "Pen",
        "labelPronounciation": {
            "speechToTextMarkup": "IPA",
            "phoneticText": "/pɛn/"
        }
    }
}

Elegant, but unfortunately the @language, @value pair of properties are specific to JSON-LD markup of language tags, so extra properties such as your labelPronounciation would be ignored by standard processing tools.

A use-case that envisages a single name in a single language as addressed by @vholland is easily satisfied by the name/namePronounciation proposal. Unfortunately it would be not very usable once multiple locales and/or enter the use-case.

From a vocabulary point of view, the introduction of a new datatype PronounceableText ( a sub-datatype of Text) containing the extra properties we are discussing here would be comparatively simple. How intuitive it would be for the average markup generating person, I am not so sure.

"name": "Pen",

could then be replaced by:

    "name": {
        "@type" :"PronounceableText",
        "value": "Pen",
    	"inLanguage": "en-US",
    	"speechToTextMarkup": "IPA",
    	"phoneticText": "/pɛn/"
      },

Also it would require work from data consumers to be able to recognise it.

RichardWallis · 2019-05-20T12:07:58Z

@MichaelAndrews-RM "name": "Plume"@fr, is invalid JSON-LD syntax - Unfortunately :-(

jaygray0919 · 2019-05-20T15:18:13Z

@RichardWallis your PronounceableText would work for us.
Here is our problem and why your proposal meets our Use-Case.
We compose triples in National Languages (NL) and pass them to a text-to-speech engine.
Currently we use AWS Polly (meaning that we have to reverse engineer JSON into XML - the input data structure for Polly)
Since our triples are not part of an HTML document, we can't (and don't want to) use the current pointer to a CSS element that holds the NL string (the current schema.org approach)
Instead, based on a language selector, we serve the NL JSON (as XML) to the TTS processor.
We want to construct the triples in valid JSON-LD, for obvious reasons.
Our expectation is that TTS engines will evolve to process the JSON-LD (we have a prototype that does this, but the modified JSON-LD is not valid on GSDTT, Playground or Kellogg's SDL).
I checked with our team who confirm that we can construct our NL glossary using your proposal, where - in our JSON-LD database - the name is an array of key:values by @Language.
We would then localize the appropriate triples in an NL triple store.
When TTS engines processes JSON-LD, we won't need the triple store.

What can we do to help push your idea forward?

aside: we cannot use a structure like "name": "Plume"@fr, because we need the @Language type to be identified using @id. Similarly, PronounceableText must be identified using @id. Those conditions are met by your proposal.

AutoSponge · 2019-05-21T21:22:00Z

I'm not sure the locale is important. After all, we're supplying the IPA pronunciation, the main thing is matching the phonetics used to the voice "pack" available to a TTS client. Here, I'm offering my own name as I hear it pronounced in English-speaking countries and in French-speaking countries as well as my personal preference, since I'm from the US.

{
  "@context": "http://schema.org",
  "@type": "Person",
  "name": {
    "@type": "PronounceableText",
    "value": "Paul Grenier",
    "speechToTextMarkup": "IPA",
    "defaultLanguage": "en",
    "en": {
      "phoneticText": "/pɔl ɡɹenɪəʳ/"
    },
    "fr": {
      "phoneticText": "/pɑl ɡʁə.nje/"
    }
  },
  "sameAs": "https://github.com/AutoSponge"
}

Does this help with the multiple translations issue?

jaygray0919 · 2019-05-22T09:28:57Z

@AutoSponge in our use-case your proposal won't work. Here is the reason. At run-time, when we expose the data to a processor, we need to server data according to a language selector. That means we need a compound key composed of @Type (PronounceableText), @Language and @id (the string to be used by the processor).
While we would not "compose" the JSON-LD exactly as presented by @RichardWallis , his structure enables the requirements above.

AutoSponge · 2019-05-22T13:41:34Z

@jaygray0919 I changed defaultValue to defaultLanguage. The prop name could be changed but should be typed to @language. @id can be added to anything.

jaygray0919 · 2019-05-22T17:51:33Z

I appreciate your comments @AutoSponge . My point about the compound key was to say that we need to make a statement that is a combination of specific and unique "Type+Language+ID" where the ID structure holds the phoneticText. Said another way: there is no defaultLanguage; the @id and the string defined by that @id must be related to a specific @Language (which also has a specific @id).
i'll check with my team mates to get another perspective here and get back to you if my commentary is wrong and your solution works.

AutoSponge · 2019-05-23T13:30:47Z

@jaygray0919 Understood. Just trying to find common ground. But at this point, I think the w3c will diverge toward SSML considering the wide-spread support. Exactly how the SSML semantics will appear in a JSON and/or microdata format is still under discussion.

jaygray0919 · 2019-09-26T15:15:42Z

@RichardWallis @AutoSponge Any progress here? We would like to integrate a solution here with @SpeakableSpecification, even though @SpeakableSpecification is english only (at the moment).

RichardWallis · 2019-09-26T15:35:06Z

Like many a proposal this thread seems to have gone quiet.

I could wake it up again by creating a Pull Request for my proposal for the introduction of a new datatype PronounceableText ( a sub-datatype of Text ) containing the extra properties we are discussing here.

"name": "Pen",

could then be replaced by:

    "name": {
        "@type" :"PronounceableText",
        "value": "Pen",
    	"inLanguage": "en-US",
    	"speechToTextMarkup": "IPA",
    	"phoneticText": "/pɛn/"
      },

That is if folks are happy it would fit [most] use cases, and once implemented would be used, especially by data consumers.

jaygray0919 · 2019-09-27T16:58:18Z

@RichardWallis We can work with your approach.
We will use an @Language specification on inLanguage but your example is fine.
IOHO the initial test is the combination of @PronounceableText with @SpeakableSpecification.
At the moment, @SpeakableSpecification seems to work only on @WebPage and@Article; only in Google Assistant; only for en; and, possibly only en-us.
In any event, we could construct a test. It won't be valid GSDTT, but that may not be necessary now.
We'll probably want to test on "aluminum" and "aluminium" - there may be some better test cases others can come up with.
Of course, we are making a big assumption about how @SpeakableSpecification handles a value identified by a cssSelector.
I do not see any documentation about passing a KV pair like "phoneticText": "/pɛn/".
We may experiment with nudging such as KV pair into @WebPage. But since it's not GSDTT valid, it will probably fail. Maybe we can use an @Thing property.

Will report back.
We have another major GSDTT issue but have held off contacting the GSDTT team.
We might include a dialog around `@PronounceableText` when raising the other issue.
The other issue involves the Google AMP team, so we will want to connect those two Google teams, and sprinkle in the `@PronounceableText` issue at the same time.

RichardWallis · 2019-09-29T14:52:42Z

Created Pull Request #2352

/cc @vholland @jaygray0919 @AutoSponge @MichaelAndrews-RM

danbri · 2020-01-02T17:13:06Z

PR has been merged (tx @RichardWallis :)

Release text is

<li id="2108"><a href="https://github.com/schemaorg/schemaorg/issues/2108">Issue #2108</a>:
(implemented in <a href="https://github.com/schemaorg/schemaorg/pull/2352">PR #2352</a>):
Introduced new pending type <a href="/PronounceableText">PronounceableText</a> enabling phonetic markup of text values.
</li>

Eyas · 2020-01-21T19:47:02Z

qq I still don't see TextPronunciation in webschemas.org or its pending layer. I see #2376 includes this, though it's not checked yet. Is this no longer planned for 6.0?

RichardWallis · 2020-01-21T19:56:35Z

It was renamed in process to PronounceableText and is visible on webschemas.org in the preview of V6.0.

vholland · 2020-05-28T18:13:24Z

Included in schema.org 6.0

vholland self-assigned this Dec 20, 2018

vholland added the type:exact proposal label Dec 20, 2018

AutoSponge mentioned this issue Apr 24, 2019

Is an ARIA attribute too restrictive? w3c/pronunciation#2

Closed

RichardWallis mentioned this issue Sep 29, 2019

New Type TextPronunciation #2352

Merged

danbri added a commit that referenced this issue Jan 2, 2020

Noted commit for #2108

68c4291

vholland closed this as completed May 28, 2020

Schema for specifying how to pronounce text #2108

Schema for specifying how to pronounce text #2108

Comments

vholland commented Dec 20, 2018 • edited

Aaranged commented Dec 20, 2018

Aaranged commented Dec 20, 2018

RichardWallis commented Dec 20, 2018

thadguidry commented Dec 20, 2018

nicolastorzec commented Dec 21, 2018

MichaelAndrews-RM commented Dec 21, 2018

chaals commented Dec 21, 2018

nicolastorzec commented Dec 28, 2018

vholland commented Jan 2, 2019

RichardWallis commented Jan 16, 2019

jaygray0919 commented Jan 22, 2019

AutoSponge commented Apr 24, 2019 • edited

vholland commented May 17, 2019 • edited

RichardWallis commented May 17, 2019

MichaelAndrews-RM commented May 18, 2019

jaygray0919 commented May 19, 2019

RichardWallis commented May 19, 2019

jaygray0919 commented May 20, 2019

MichaelAndrews-RM commented May 20, 2019

RichardWallis commented May 20, 2019

RichardWallis commented May 20, 2019

jaygray0919 commented May 20, 2019 • edited

AutoSponge commented May 21, 2019 • edited

jaygray0919 commented May 22, 2019

AutoSponge commented May 22, 2019 • edited

jaygray0919 commented May 22, 2019

AutoSponge commented May 23, 2019

jaygray0919 commented Sep 26, 2019

RichardWallis commented Sep 26, 2019 • edited

jaygray0919 commented Sep 27, 2019 • edited

RichardWallis commented Sep 29, 2019

danbri commented Jan 2, 2020 • edited by RichardWallis

Eyas commented Jan 21, 2020

RichardWallis commented Jan 21, 2020

vholland commented May 28, 2020

vholland commented Dec 20, 2018 •

edited

AutoSponge commented Apr 24, 2019 •

edited

vholland commented May 17, 2019 •

edited

jaygray0919 commented May 20, 2019 •

edited

AutoSponge commented May 21, 2019 •

edited

AutoSponge commented May 22, 2019 •

edited

RichardWallis commented Sep 26, 2019 •

edited

jaygray0919 commented Sep 27, 2019 •

edited

danbri commented Jan 2, 2020 •

edited by RichardWallis