-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pronunciation information #27
Comments
Looks good in general. A few things:
|
Hi.
|
I agree with Michael's suggestions.
…On Thu, Jul 30, 2020 at 11:37 PM John McCrae ***@***.***> wrote:
Hi.
1. yes, this is a good point, variety is better than dialect
2. we could do this. I guess that this would mean duplicating the
language code, but this is okay
3. I would guess that some would prefer a phonemic transcription. We
could drop the / and have an attribute for phonemic transcriptions?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#27 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIPZRX6C4JD4ZZRGGD6LOTR6GHT7ANCNFSM4PNLIJMA>
.
--
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University
|
Hi there, We have been working/discussing this exact topic a tiny bit for Kristang -- we are hoping to provide IPA and voice recordings for individual lemmas soon. The problem I'd like to raised here is that Kristang shows a lot of metathesis in certain consonant clusters. E.g. ‘-dr-’(kodrah and kordah for ‘to wake up’). Within the context of revitalization, as we want people to start using a single spelling, we have decided it would be best to cluster these as "Forms" under a single lemma (i.e. the canonical form) . However, these internal forms do have different pronunciations. Up to this point we were happy to use the Tag element (available to both Forms and Lemmas) and come up with our own "category" notation. But I think including such a Pronunciation element is definitely an improvement. However, I would like to see what you all think about:
|
I think this makes sense. Actually, in a new Python-based wordnet module I'm working on, all lemmas are just forms anyway, so doing something different for lemmas and forms would be more trouble than doing the same thing (but this, at least, is just a selfish reason).
I'm less enthused about this, but if we're adding logos (#3), then it's not breaking new ground to link to external files. However, shouldn't this be a URL instead of a file path? If a file path, then absolute paths won't work, and we'd need some kind of resource directory such that the paths are relative to this directory, or something. This sounds like over-engineering. Better, perhaps, would be that your application provides a mapping from local paths into the ids of the wordnet. The trouble is that lemmas/forms do not have their own ids, so it would have to be linked to the LexicalEntry, then to the writtenForm under that entry (are forms guaranteed to be unique under a lexical entry?). Another issue is when you want multiple audio files for the same lemma/form (e.g., from multiple speakers). It doesn't seem like an attribute for a file path or URL would easily scale to multiple files. |
Implications: IPA symbols are not ASCII, so all tools must handle UTF8 (or whatever charset is defined as desired) |
|
Sorry, by sound file path I definitely meant URL or some public URI. |
To my knowledge the current state of EWN does not use characters that require coding outside ASCII, so the current files are both ASCII and UTF8. So the relevant tests are still to come. |
I was surprised by this and thought that surely things like jalapeño and résumé would have the diacritics in EWN, even if only as alternative forms, but found nothing but ascii throughout the whole file. In any case, there are non-English wordnets with plenty of non-ascii forms, so it would be unfortunate if any tools assumed wordnets to be ascii-only. |
Returning to the issue of marking a transcription as phonemic or phonetic... What do people think of keeping the IPA delimiters ( |
We are looking to add some pronunciation information to English WordNet and it would be good to add this as a schema extension. As I see it we would need to have the following information
As such, I would suggest something like as follow:
The text was updated successfully, but these errors were encountered: