Grouped Sounds Notation in Lexibank and other libraries #258

LinguList · 2022-07-03T13:19:37Z

Due to past efforts in automatic reconstruction, and individual tests in EDICTOR on individual datasets, I have realized that we can avoid having problematic alignments by introducing a "grouped-sounds notation" for sequences. This means, if I want to say that two sounds should form a unit, I separate them no longer by a space, but by a dot. This allows me to match, e.g., k.j vs. ts. We can also circumvent the problem of many diphthong vs. monophtong decisions, if we allow to notate a u as a.u where we are not sure. I am writing on a short article that shows how this can be very helpful in many approaches, specifically in alignments, where it avoids gaps, and gaps are always a problem, as they are often unmotivated (consider k j a ŋ vs. ts ã, which involves two gaps, but no gap if we resort to k.j a.ŋ for the former).

In orthography profiles this notation can be introduced with the profile. We can even introduce it only implicitly by (ab)using the slash notation, writing k .j/j a .ŋ/ŋ, which can be converted to k.j a.ŋ with a very short function:

def group_sounds(segments): 
    out = []
    for segment in segments:
        if "/" in segment:
            one, two = segment.split("/")
            if one.startswith("."):
                out[-1] += one
            else:
                out += [one]
        else:
            out += [segment]
    return out

In Lexibank, we can add a GroupedSegments to the FormTable in which the ungrouped Segments are grouped. Grouping can even be done later on the fly.

On the long run, when this has been properly tested, I'd however, suggest to make this part of normal Segments, and check for CLTS compatibility for the grouped elements individually rather than as a bunch, which would require to modify the pylexibank code.

The text was updated successfully, but these errors were encountered:

LinguList added the question Further information is requested label Jul 3, 2022

LinguList assigned LinguList, SimonGreenhill and xrotwang Jul 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grouped Sounds Notation in Lexibank and other libraries #258

Grouped Sounds Notation in Lexibank and other libraries #258

LinguList commented Jul 3, 2022 •

edited

Loading

Grouped Sounds Notation in Lexibank and other libraries #258

Grouped Sounds Notation in Lexibank and other libraries #258

Comments

LinguList commented Jul 3, 2022 • edited Loading

LinguList commented Jul 3, 2022 •

edited

Loading