Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Najdi Arabic phoneme inventory is missing items #332

Open
camoverride opened this issue Mar 28, 2021 · 10 comments
Open

Najdi Arabic phoneme inventory is missing items #332

camoverride opened this issue Mar 28, 2021 · 10 comments
Assignees

Comments

@camoverride
Copy link

Ground truth wikipedia vs Phoible Najdi Arabic Inventory page

I also confirmed this by inspecting the appropriate lines in data/phoible.csv

I discovered this with the following SQL query, where I searched for languages lacking phonemes with the + nasal feature (Najdi Arabic will be the last entry returned by this query):

SELECT x.LanguageName, SUM(x.nasal) AS num_nasals
FROM (SELECT InventoryID, LanguageName,
CASE WHEN nasal = '-' THEN 0 ELSE 1 END AS nasal
FROM phoible) AS x
GROUP BY InventoryID, x.LanguageName
ORDER BY num_nasals ASC
LIMIT 16

('Najdi Arabic', 0)

@bambooforest bambooforest self-assigned this Mar 28, 2021
@bambooforest
Copy link
Contributor

@camoverride -- thanks for pointing this out and sending some reproducible code. i'll look into it.

@drammock
Copy link
Member

@camoverride I'm chiming in here to provide a clarification: "ground truth" in this case is not Wikipedia, but rather Ingram 1994 (https://phoible.org/sources/67053). Phoneme inventories in PHOIBLE are not meant to represent a language but rather a particular instance of language documentation. It certainly can happen that we make a mistake in converting the analysis in Ingram 1994 into a PHOIBLE entry, but if the "mistake" here is that Ingram disagrees with other scholars about Najdi Arabic's phonology, that is a disagreement that we're interested in preserving.

@bambooforest
Copy link
Contributor

This issue is a bit more complicated than that, I think. I looked into the grammar by Ingram and indeed it does not list nasals among its consonants, but you nevertheless find them in the word forms in the grammar.

After some discussion with @macleginn (this particular inventory is from (an earlier version of) EURPhon https://eurphon.info/languages/html?lang_id=135), he told me that he encountered some systematic omission of nasals (and liquids and rhotics) from descriptions of Arabic dialects.

He put it rather succinctly to me: "There are doculects that demand some amount of hermeneutics, unfortunately."

A point for discussion.

@camoverride
Copy link
Author

Gotcha, Ingram as ground truth definitely makes sense - and it's reasonable not to want to play around with the source material too much.

However, have you all developed a general strategy for tracking known "errors" in Ingram? Maybe it could act as a secondary source to augment phoible?

@xrotwang
Copy link
Contributor

Just a somewhat technical note regarding a"secondary source to augment phoible": I think that's a good idea - some sort of curated errata. And that's exactly one of the use cases we had in mind when designing CLDF to allow for easy merging: Such an errata list could be distributed in the same overall format as PHOIBLE, and then be transparently used to override PHOIBLE data in specified cases.

However, in this particular case, I'd be a bit hesitant. I think the strength of PHOIBLE lies in in it being principled and complete. So for any use case that looks at all of PHOIBLE, an augmented phoible would also have to be complete to not diminish the PHOIBLE strength. "some amount of hermeneutics" doesn't really sound like systematic errata which can be fixed wholesale.

@bambooforest
Copy link
Contributor

@camoverride -- if you mean by tracking down errors, it depends on what one means by errors. As @drammock notes, above, inventories in phoible reflect doculects and in this case, at least for the missing nasals in Ingram's grammar, we would still be true to the original source because it does not list them in the consonants.

We have always had the issue of systematic gaps in full database sources, e.g. UPSID contains purposely no tones. But since this inventory is from EURPhon, if it gets updated by their editors to address systematic gaps in certain areal linguistic practice (e.g. some semitic language descriptions systematically leaving out nasals, etc.) then EURPhon becomes more like UPSID in the sense that multiple doclects may be used for a single inventory and some typologicalization may occur.

I think we will need to be clearer about such cases in our documentation moving forward, especially if some source editors identify systematic gaps and fill them without attributing multiple doculects.

@xrotwang
Copy link
Contributor

Yes, PHOIBLE being already the second-level aggregator makes things trickier. And since one of the the big advantages of PHOIBLE is machine-readable data, it would be nice, if documentation about systematic gaps - along the lines of the "no tones in UPSID" would be machine readable, too. But I don't really have an idea how to do that. There doesn't seem to be established terminology for "complete inventory" or "complete inventory without loans" or "complete inventory without tones" which could serve as basis for some sort of ontology.

@bambooforest
Copy link
Contributor

Sounds like an ontology is in order. :)

@xrotwang
Copy link
Contributor

@bambooforest you're the aggregator, you get to choose the categories :)

@bambooforest
Copy link
Contributor

@xrotwang sounds good. And you have / will have a place for them in CLDF :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants