Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema for specifying how to pronounce text #2108

Closed
vholland opened this issue Dec 20, 2018 · 35 comments
Closed

Schema for specifying how to pronounce text #2108

vholland opened this issue Dec 20, 2018 · 35 comments
Assignees

Comments

@vholland
Copy link
Contributor

@vholland vholland commented Dec 20, 2018

As devices are reading text more and more, there is greater need for specifying how to pronounce a bit of text. I realize this could get complicated fast. As most of the cases I have heard revolve around names, I propose the following to start exploring how to specify pronunciation.

  • Create a new property namePronunciation on Thing. The property expects the new type TextPronunciation.
  • TextProncunciation has the following properties:
  • text: The text to be pronounced.
  • phoneticText: The phonetic representation of the text property in IPA.
  • audio: An AudioObject that gives the pronunciation

An example would be:

{
  "@context": "http://schema.org/",
  "@type": "City",
  "namePronunciation": {
    "@type": "TextPronunciation",
    "text": "Worcester",
    "phoneticText": "/ˈwʊstɚ/"
  }
}
@Aaranged
Copy link

@Aaranged Aaranged commented Dec 20, 2018

As a supporting use case for this Google has been observed returning pronunciation information as a vertical in the SERPs.

Thinking out loud, might it be useful to provide provenance information for the proposed TextPronunciation type here - i.e. from what source the data is provided? This could be accomplished by extending the domain of citation to include TextPronunciation (it would be unnecessary for audio.AudioObject, since it's already available to AudioObject).

@Aaranged
Copy link

@Aaranged Aaranged commented Dec 20, 2018

Also, looking at the Google example, might language and country to which the pronunciation applies need to be captured? E.g. the en_US pronunciation of "lieutenant" is different from the en_UK pronunciation; and referencing the provided example for City, "Montreal" is pronounced differently in English and in French.

@RichardWallis
Copy link
Contributor

@RichardWallis RichardWallis commented Dec 20, 2018

I was discussing this capability with someone recently but hadn't translated it into an issue yet.

Things that came up in the discussion included:

  • Ability to represent differing pronunciations. For example the name "Houston" which is pronounced as /ˈhjuːstən/ when it refers to the city in Texas and /ˈhaʊstən/ in the name of the Houston Street in New York.
  • Most need for this is around names, but could be applicable in other areas.
  • Probably, as per @vholland's proposal, it would be best implemented with a new type.
  • One option would be a new PhoneticName subtype of Role with a phoneticName property.
  • Another being a new PhoneticText sub datatype of Text. This could then be used anywhere that Text currently can.
  • Yet another, very different, option would be to use the capabilities current capabilities of the BCP 47 language tag definitions. These are used to define the language format of strings using tags such as "en-US". As I read the standard, adding the suffix 'fonipa' to a tag thus "en-US-fonipa" would indicate that the associated string contains the phonetic representation of a US English string. This could then be implemented by adding multiple name or alternateName properties.

I can also see the usefulness of adding an audio property. Potentially further support for the proposal in PR #1774 to add audio & video as properties of Thing alongside image. This would negate the need for this type specific range extension for audio.

@thadguidry
Copy link
Contributor

@thadguidry thadguidry commented Dec 20, 2018

This is some of what Wikidata has as other forms ... https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Documentation

  • A list of Form Statements further describing the Form or its relations to other Forms or Items (e.g. IPA transcription (P898), pronunciation audio, rhymes with, used until, used in region)

Where @RichardWallis has mentioned 2 of them so far.

Also, I would assume that Wikidata's existing IPA Transcription (P898) would be a "sameAs" property for the proposed "phoneticText" ?

@nicolastorzec
Copy link
Contributor

@nicolastorzec nicolastorzec commented Dec 21, 2018

Adding something like namePronunciation and PhoneticText makes sense.

Since pronunciation varies with geography and time (and other dimensions), we also need to capture it.

Adding attributes such as usedFrom , usedUntil, and usedInRegion to PhoneticText would solve this in a simple way, and it's extensible.

Richard's solution of using BCP47 is elegant, though it only capture one dimension.

@MichaelAndrews-RM
Copy link

@MichaelAndrews-RM MichaelAndrews-RM commented Dec 21, 2018

It seems like this is replicating the Speech Synthesis Markup Language (SSML) which covers IPA and a range of other spoken words settings. SSML is already being used widely by the major voice interaction platforms.

@chaals
Copy link
Contributor

@chaals chaals commented Dec 21, 2018

This looks like reinventing wheels where there are a couple of pretty good ones already. I think we should keep out of it.

@nicolastorzec
Copy link
Contributor

@nicolastorzec nicolastorzec commented Dec 28, 2018

SSML does have tags such as say-as and phoneme to specify how words, phrases, and/or sentences should be pronounced.

Having said that, we may still want to clarify how to combine SSML and schema.org markups in practice.

@vholland
Copy link
Contributor Author

@vholland vholland commented Jan 2, 2019

+1 to @nicolastorzec's suggestion to clarify how to combine SSML and schema.org. Authors find it hard enough to combine types within schema.org. I don't expect them to work in another markup language without some help.

@RichardWallis
Copy link
Contributor

@RichardWallis RichardWallis commented Jan 16, 2019

I share the concerns about reinventing wheels, but also share the concerns about helping authors get their heads around using multiple vocabularies.

In the same way we delegate to ISO 8601 for date formats, could we not simply delegate to SSML for usage guidance on a new simple utility Schema type. The suggested PhoneticText subtype of Text could be shaped with properties to match that of SSML's phoneme. With a few well crafted examples we could satisfy the need without major vocabulary engineering, linking to a current wheel without inventing a new one.

@jaygray0919
Copy link

@jaygray0919 jaygray0919 commented Jan 22, 2019

Wikidata provides properties that could be emulated, together with repeatable examples.

@AutoSponge
Copy link

@AutoSponge AutoSponge commented Apr 24, 2019

Hello. I'm a member of the w3c Spoken Pronunciation Task Force. We're in the early goings but I've mocked up a couple of implementations using a schema that resembles/copies properties from SSML. Notable ones (JSON-LD and Ruby+Microdata) here:

I'd really like your input since if one of these schema-based use cases are approved, it will likely become a normalized spec. One issue I haven't addressed is adding lang_locale property for voice suggestion (very important since none of the current voices used by browsers pronounce the entire IPA alphabet). That may be solved by html lang=* or by the lang attribute of an element in HTML, but it should be in the schema.

@vholland
Copy link
Contributor Author

@vholland vholland commented May 17, 2019

Returning to this and looking at how SSML is being used, the simplest thing would be to add a property to specify the phonic system used and then specify the appropriate string. So the above example becomes:

{
  "@context": "http://schema.org/",
  "@type": "City",
  "namePronunciation": {
    "@type": "TextPronunciation",
    "text": "Worcester",
    "speechToTextMarkup": "IPA",
    "phoneticText": "/ˈwʊstɚ/"
  }
}

Using SSML for a different example, markup might look like:

{
  "@context": "http://schema.org/",
  "@type": "RadioStation",
  "namePronunciation": {
    "@type": "TextPronunciation",
    "text": "WKRP",
    "speechToTextMarkup": "SSML",
    "phoneticText": "<speak><say-as interpret-as=\"characters\">WKRP</say-as>"
  }
}

@RichardWallis
Copy link
Contributor

@RichardWallis RichardWallis commented May 17, 2019

I support moving forward on this. Questions regarding this simple proposal:

  1. Is it sufficient to only handle the pronunciation of names?
  2. Will this be a new property for Thing?
  3. How would this be used to represent multiple names in differing languages?

@MichaelAndrews-RM
Copy link

@MichaelAndrews-RM MichaelAndrews-RM commented May 18, 2019

Richard,

Vicki's example is for a US-specific name, but it may help to indicate the language explicitly.

{
  "@context": "http://schema.org/",
  "@type": "City",
  "namePronunciation": {
    "@type": "TextPronunciation",
    "inLanguage": "en-US",
    "text": "Worcester",
    "speechToTextMarkup": "IPA",
    "phoneticText": "/ˈwʊstɚ/"
  }
}

I think for some scenarios, data consumers may want alternative pronunciations. Many words are pronounced differently in the US and UK, in some cases dramatically so (e.g., vase is vās in en-US and va:z in en-GB.)

@jaygray0919
Copy link

@jaygray0919 jaygray0919 commented May 19, 2019

@MichaelAndrews-RM is spot-on. Arguably, inLanguage is a mandatory property of TextPronunciation. Otherwise, phoneticText is missing context. It's important to take an non-en-US perspective here (!en-US) as this context issue is present for many language variations

@RichardWallis
Copy link
Contributor

@RichardWallis RichardWallis commented May 19, 2019

@MichaelAndrews-RM @jaygray0919 Multiple pronunciations of the same name could work simply thus:

{
	"@context": "http://schema.org/",
	"@type": "Product",
	"name": "vase",
	"namePronunciation": [
		{
			"@type": "TextPronunciation",
			"inLanguage": "en-US",
			"text": "vase",
			"speechToTextMarkup": "IPA",
			"phoneticText": "vās"
		},
		{
			"@type": "TextPronunciation",
			"inLanguage": "en-GB",
			"text": "vase",
			"speechToTextMarkup": "IPA",
			"phoneticText": "va:z"
		}
	]
}

However, it could soon get complex for an entity with several names in different languages such as:

{
	"@context": "http://schema.org/",
	"@type": "Product",
	"description": "A writing implement",
	"name": [
		{
			"@language": "en",
			"@value": "Pen"
		},
		{
			"@language": "fr",
			"@value": "Plume"
		}
	],
	"namePronunciation": [
		{
			"@type": "TextPronunciation",
			"inLanguage": "en-US",
			"text": {
				"@language": "en",
				"@value": "pen"
			},
			"speechToTextMarkup": "IPA",
			"phoneticText": "/pɛn/"
		},
		{
			"@type": "TextPronunciation",
			"inLanguage": "fr-FR",
			"text": {
				"@language": "fr",
				"@value": "plume"
			},
			"speechToTextMarkup": "IPA",
			"phoneticText": "plym"
		}
	]
}

Having effectively two arrays of properties (name, namePronunciation) that may or may not be in sync might be confusing, but I can't see another way to do it.

Whatever should have at least one example that demonstrates how multiple pronunciations of a single name and names of multiple languages should be described.

@jaygray0919
Copy link

@jaygray0919 jaygray0919 commented May 20, 2019

@RichardWallis @MichaelAndrews-RM

A draft thought:
Perhaps we could separate name from namePronounciation.
Instead, a property labelPronounciation is specific to @Language.
A @Language array would apply a value to each labelPronounciation.
name remains a property of @Thing.

Another variation here:
label is property specific to @Language (which unfortunately steps on RDF label)
but i can't think of another term (at the moment) that would uniquely specify a 'pronunciation item' for a specific @Language

@MichaelAndrews-RM
Copy link

@MichaelAndrews-RM MichaelAndrews-RM commented May 20, 2019

@jaygray0919 @RichardWallis

Is there any possibility to append a language-tagged string such as “plume”@fr to avoid breaking out the @Language context? Or is that unsupported?

@RichardWallis
Copy link
Contributor

@RichardWallis RichardWallis commented May 20, 2019

@jaygray0919 Are you suggesting something like this?

{
    "@context": "http://schema.org/",
    "@type": "Product",
    "description": "A writing implement",
    "name": {
        "@language": "en",
        "@value": "Pen",
        "labelPronounciation": {
            "speechToTextMarkup": "IPA",
            "phoneticText": "/pɛn/"
        }
    }
}

Elegant, but unfortunately the @language, @value pair of properties are specific to JSON-LD markup of language tags, so extra properties such as your labelPronounciation would be ignored by standard processing tools.

A use-case that envisages a single name in a single language as addressed by @vholland is easily satisfied by the name/namePronounciation proposal. Unfortunately it would be not very usable once multiple locales and/or enter the use-case.

From a vocabulary point of view, the introduction of a new datatype PronounceableText ( a sub-datatype of Text) containing the extra properties we are discussing here would be comparatively simple. How intuitive it would be for the average markup generating person, I am not so sure.

"name": "Pen",

could then be replaced by:

    "name": {
        "@type" :"PronounceableText",
        "value": "Pen",
    	"inLanguage": "en-US",
    	"speechToTextMarkup": "IPA",
    	"phoneticText": "/pɛn/"
      },

Also it would require work from data consumers to be able to recognise it.

@RichardWallis
Copy link
Contributor

@RichardWallis RichardWallis commented May 20, 2019

@MichaelAndrews-RM "name": "Plume"@fr, is invalid JSON-LD syntax - Unfortunately :-(

@jaygray0919
Copy link

@jaygray0919 jaygray0919 commented May 20, 2019

@RichardWallis your PronounceableText would work for us.
Here is our problem and why your proposal meets our Use-Case.
We compose triples in National Languages (NL) and pass them to a text-to-speech engine.
Currently we use AWS Polly (meaning that we have to reverse engineer JSON into XML - the input data structure for Polly)
Since our triples are not part of an HTML document, we can't (and don't want to) use the current pointer to a CSS element that holds the NL string (the current schema.org approach)
Instead, based on a language selector, we serve the NL JSON (as XML) to the TTS processor.
We want to construct the triples in valid JSON-LD, for obvious reasons.
Our expectation is that TTS engines will evolve to process the JSON-LD (we have a prototype that does this, but the modified JSON-LD is not valid on GSDTT, Playground or Kellogg's SDL).
I checked with our team who confirm that we can construct our NL glossary using your proposal, where - in our JSON-LD database - the name is an array of key:values by @Language.
We would then localize the appropriate triples in an NL triple store.
When TTS engines processes JSON-LD, we won't need the triple store.

What can we do to help push your idea forward?

aside: we cannot use a structure like "name": "Plume"@fr, because we need the @Language type to be identified using @id. Similarly, PronounceableText must be identified using @id. Those conditions are met by your proposal.

@AutoSponge
Copy link

@AutoSponge AutoSponge commented May 21, 2019

I'm not sure the locale is important. After all, we're supplying the IPA pronunciation, the main thing is matching the phonetics used to the voice "pack" available to a TTS client. Here, I'm offering my own name as I hear it pronounced in English-speaking countries and in French-speaking countries as well as my personal preference, since I'm from the US.

{
  "@context": "http://schema.org",
  "@type": "Person",
  "name": {
    "@type": "PronounceableText",
    "value": "Paul Grenier",
    "speechToTextMarkup": "IPA",
    "defaultLanguage": "en",
    "en": {
      "phoneticText": "/pɔl ɡɹenɪəʳ/"
    },
    "fr": {
      "phoneticText": "/pɑl ɡʁə.nje/"
    }
  },
  "sameAs": "https://github.com/AutoSponge"
}

Does this help with the multiple translations issue?

@jaygray0919
Copy link

@jaygray0919 jaygray0919 commented May 22, 2019

@AutoSponge in our use-case your proposal won't work. Here is the reason. At run-time, when we expose the data to a processor, we need to server data according to a language selector. That means we need a compound key composed of @Type (PronounceableText), @Language and @id (the string to be used by the processor).
While we would not "compose" the JSON-LD exactly as presented by @RichardWallis , his structure enables the requirements above.

@AutoSponge
Copy link

@AutoSponge AutoSponge commented May 22, 2019

@jaygray0919 I changed defaultValue to defaultLanguage. The prop name could be changed but should be typed to @language. @id can be added to anything.

@jaygray0919
Copy link

@jaygray0919 jaygray0919 commented May 22, 2019

I appreciate your comments @AutoSponge . My point about the compound key was to say that we need to make a statement that is a combination of specific and unique "Type+Language+ID" where the ID structure holds the phoneticText. Said another way: there is no defaultLanguage; the @id and the string defined by that @id must be related to a specific @Language (which also has a specific @id).
i'll check with my team mates to get another perspective here and get back to you if my commentary is wrong and your solution works.

@AutoSponge
Copy link

@AutoSponge AutoSponge commented May 23, 2019

@jaygray0919 Understood. Just trying to find common ground. But at this point, I think the w3c will diverge toward SSML considering the wide-spread support. Exactly how the SSML semantics will appear in a JSON and/or microdata format is still under discussion.

@jaygray0919
Copy link

@jaygray0919 jaygray0919 commented Sep 26, 2019

@RichardWallis @AutoSponge Any progress here? We would like to integrate a solution here with @SpeakableSpecification, even though @SpeakableSpecification is english only (at the moment).

@RichardWallis
Copy link
Contributor

@RichardWallis RichardWallis commented Sep 26, 2019

Like many a proposal this thread seems to have gone quiet.

I could wake it up again by creating a Pull Request for my proposal for the introduction of a new datatype PronounceableText ( a sub-datatype of Text ) containing the extra properties we are discussing here.

"name": "Pen",

could then be replaced by:

    "name": {
        "@type" :"PronounceableText",
        "value": "Pen",
    	"inLanguage": "en-US",
    	"speechToTextMarkup": "IPA",
    	"phoneticText": "/pɛn/"
      },

That is if folks are happy it would fit [most] use cases, and once implemented would be used, especially by data consumers.

@jaygray0919
Copy link

@jaygray0919 jaygray0919 commented Sep 27, 2019

@RichardWallis We can work with your approach.
We will use an @Language specification on inLanguage but your example is fine.
IOHO the initial test is the combination of @PronounceableText with @SpeakableSpecification.
At the moment, @SpeakableSpecification seems to work only on @WebPage and@Article; only in Google Assistant; only for en; and, possibly only en-us.
In any event, we could construct a test. It won't be valid GSDTT, but that may not be necessary now.
We'll probably want to test on "aluminum" and "aluminium" - there may be some better test cases others can come up with.
Of course, we are making a big assumption about how @SpeakableSpecification handles a value identified by a cssSelector.
I do not see any documentation about passing a KV pair like "phoneticText": "/pɛn/".
We may experiment with nudging such as KV pair into @WebPage. But since it's not GSDTT valid, it will probably fail. Maybe we can use an @Thing property.

Will report back.
We have another major GSDTT issue but have held off contacting the GSDTT team.
We might include a dialog around @PronounceableText when raising the other issue.
The other issue involves the Google AMP team, so we will want to connect those two Google teams, and sprinkle in the @PronounceableText issue at the same time.

@RichardWallis
Copy link
Contributor

@RichardWallis RichardWallis commented Sep 29, 2019

Created Pull Request #2352

/cc @vholland @jaygray0919 @AutoSponge @MichaelAndrews-RM

@danbri
Copy link
Contributor

@danbri danbri commented Jan 2, 2020

PR has been merged (tx @RichardWallis :)

Release text is

<li id="2108"><a href="https://github.com/schemaorg/schemaorg/issues/2108">Issue #2108</a>:
(implemented in <a href="https://github.com/schemaorg/schemaorg/pull/2352">PR #2352</a>):
Introduced new pending type <a href="/PronounceableText">PronounceableText</a> enabling phonetic markup of text values.
</li>

danbri added a commit that referenced this issue Jan 2, 2020
@Eyas
Copy link

@Eyas Eyas commented Jan 21, 2020

qq I still don't see TextPronunciation in webschemas.org or its pending layer. I see #2376 includes this, though it's not checked yet. Is this no longer planned for 6.0?

@RichardWallis
Copy link
Contributor

@RichardWallis RichardWallis commented Jan 21, 2020

It was renamed in process to PronounceableText and is visible on webschemas.org in the preview of V6.0.

@vholland
Copy link
Contributor Author

@vholland vholland commented May 28, 2020

Included in schema.org 6.0

@vholland vholland closed this May 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests