Languages with duplicate phonemes #53

drammock · 2014-10-01T10:43:10Z

There are some languages that are showing duplicate phonemes. Not sure if this is caused by the denormRenorm function, or if there are errors in the source data files, or some other cause. Affects 19 languages in SAPHON, 10 languages in AA, 3 languages in RA.

The text was updated successfully, but these errors were encountered:

bambooforest · 2014-10-01T10:48:44Z

Can you provide me with an example or two, then I'll go through and check.

On Wed, Oct 1, 2014 at 12:43 PM, Daniel McCloy notifications@github.com
wrote:

There are some languages that are showing duplicate phonemes. Not sure if
this is caused by the denormRenorm function, or if there are errors in
the source data files, or some other cause. Affects 19 languages in SAPHON,
10 languages in AA, 3 languages in RA.

—
Reply to this email directly or view it on GitHub
#53.

drammock · 2014-10-01T12:10:10Z

Writing from phone so can't do that easily. If you run the aggregation
script and then run the test script in this PR it will spit out the ISO
codes of the problem languages. Also this snippet may be useful:

foo <- split(final.data, final.data$LanguageCode)
sapply(foo, function(x) length(unique(x$Phonemes)) < length(x$Phonemes))

On Oct 1, 2014 6:48 PM, "Steven Moran" notifications@github.com wrote:

Can you provide me with an example or two, then I'll go through and check.

On Wed, Oct 1, 2014 at 12:43 PM, Daniel McCloy notifications@github.com
wrote:

There are some languages that are showing duplicate phonemes. Not sure
if
this is caused by the denormRenorm function, or if there are errors in
the source data files, or some other cause. Affects 19 languages in
SAPHON,
10 languages in AA, 3 languages in RA.

—
Reply to this email directly or view it on GitHub
#53.

—
Reply to this email directly or view it on GitHub
#53 (comment).

drammock · 2014-10-14T09:33:18Z

On further inspection these errors seem to be of two types:

cases where a single resource (aa, ra, saphon) has data on two different doculects that are classified with the same ISO code. In such cases we get lots of duplicate phonemes (basically, the set intersection of the inventories of the two dialects).
cases where something else is going wrong, and the allophone collapsing code doesn't reduce properly.

drammock · 2015-03-03T07:37:47Z

Many of these errors have been resolved, but #64 blocks this.

drammock added the inventory error label Oct 1, 2014

drammock mentioned this issue Oct 1, 2014

WIP: Duplicate phonemes within languages #54

Merged

drammock mentioned this issue Mar 3, 2015

Duplicate languages lmn and iru in Ramaswami raw data #65

Closed

bambooforest closed this as completed in #54 Mar 10, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Languages with duplicate phonemes #53

Languages with duplicate phonemes #53

drammock commented Oct 1, 2014

bambooforest commented Oct 1, 2014

drammock commented Oct 1, 2014

drammock commented Oct 14, 2014

drammock commented Mar 3, 2015

Languages with duplicate phonemes #53

Languages with duplicate phonemes #53

Comments

drammock commented Oct 1, 2014

bambooforest commented Oct 1, 2014

drammock commented Oct 1, 2014

drammock commented Oct 14, 2014

drammock commented Mar 3, 2015