Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISchema as a shortcut for similar orthographies #18

Open
LinguList opened this issue Dec 3, 2016 · 2 comments
Open

ISchema as a shortcut for similar orthographies #18

LinguList opened this issue Dec 3, 2016 · 2 comments

Comments

@LinguList
Copy link
Contributor

Lingpy distinguishes "schemas" for sound classes, including:

  1. one routine for segmentation
  2. one routine for conversion to sound classes (and a default sound class model)
  3. one default routine for the scoring function in alignments

Currently, lingpy has two schemas: "ipa" and "asjp", the latter working on ASJP alphabet.

We should add an additional schema in lingpy3, and the possibility to register new schemas by the user:

  1. plain ipa (assuming that orthogrpaphy is more or less regular IPA)
  2. fuzzy ipa (assuming a messy IPA, with aspiration not written as superscript, etc., requiring a segmentation function based on a clean_string strategy)
  3. asjp

More schemas are possible, for example "starling", as the whole data of Tower of Babel is in their own IPA version. The main argument for schemas is that it is too time-consuming to write individual orthography-profiles for all datasets, while on the other hand, many datasets are consistent enough to allow to be analysed by an enhanced function that is simpler than a full-fledged orthography profile.

@SimonGreenhill
Copy link
Collaborator

a sensible 'broad phonemic' schema would be great too.

@LinguList
Copy link
Contributor Author

I'd assume that we could cover this more or less in "fuzzy" ipa, as this schema will cover cases like:

thoxther > th o x th e r

And phonemic transcriptions are usually much more lazy regarding writing of strange unicode characters than other ones. Or do you have specific other cases in mind?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants