Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong Language Families or Glottocodes in CLDF Datasets #47

Closed
LinguList opened this issue Apr 6, 2023 · 7 comments
Closed

Wrong Language Families or Glottocodes in CLDF Datasets #47

LinguList opened this issue Apr 6, 2023 · 7 comments

Comments

@LinguList
Copy link
Contributor

By checking for the family we provide and the one which Glottolog gives, we can find interesting errors:

ID Glottocode Family (Glottolog) Family (Lexibank)
castrozhuang-XinchengBeigeng east2326 Uralic Tai-Kadai
castrozhuang-XinchengGuosui east2326 Uralic Tai-Kadai
chindialectsurvey-RawngtuWeilong-A wela1234 Bookkeeping Sino-Tibetan
chindialectsurvey-RawngtuRamtim-A wela1234 Bookkeeping Sino-Tibetan
constenlachibchan-Arhuaco arhu1242 Chibchan Chibcha
constenlachibchan-Bari bari1297 Chibchan Chibcha
constenlachibchan-Boruca boru1252 Chibchan Chibcha
constenlachibchan-Bribri brib1243 Chibchan Chibcha
constenlachibchan-Buglere bugl1243 Chibchan Chibcha
constenlachibchan-Cabecar cabe1245 Chibchan Chibcha
constenlachibchan-CentralTunebo cent2150 Chibchan Chibcha
constenlachibchan-Chibcha chib1270 Chibchan Chibcha
constenlachibchan-Chimila chim1309 Chibchan Chibcha
constenlachibchan-Cogui cogu1240 Chibchan Chibcha
constenlachibchan-Malayo mala1522 Chibchan Chibcha
constenlachibchan-MalekuJaika male1297 Chibchan Chibcha
constenlachibchan-Ngabere ngab1239 Chibchan Chibcha
constenlachibchan-Pech pech1241 Chibchan Chibcha
constenlachibchan-Rama rama1270 Chibchan Chibcha
constenlachibchan-SanBlasKuna sanb1242 Chibchan Chibcha
constenlachibchan-Teribe teri1250 Chibchan Chibcha
constenlachibchan-Cacaopera caca1247 Misumalpan Misumalpam
constenlachibchan-Mayangna maya1285 Misumalpan Misumalpam
constenlachibchan-Ulwa ulwa1239 Misumalpan Misumalpam
constenlachibchan-Miskito misk1235 Misumalpan Misumalpam
hantganbangime-Fulfulde maas1239 Atlantic-Congo Atlantic
hubercolumbian-Jupda hupd1244 Naduhup Nadahup
hubercolumbian-Saliba sali1298 Saliban Jodi-Saliban
ivanisuansu-Suansu suan1234 Sino-Tibetan Isolate
johanssonsoundsymbolic-Aguaruna agua1253 Chicham Jivaroan
johanssonsoundsymbolic-Ahtena ahte1237 Athabaskan-Eyak-Tlingit Athapaskan-Eyak-Tlingit
johanssonsoundsymbolic-Ainu ainu1240 Ainu Ainu (Japan)
johanssonsoundsymbolic-Aymara cent2142 Aymaran Aymara
johanssonsoundsymbolic-Bambassi bamb1262 Blue Nile Mao Mao
johanssonsoundsymbolic-Cavinena cavi1250 Pano-Tacanan Tacanan
johanssonsoundsymbolic-Guahibo guah1255 Guahiboan Guahibo
johanssonsoundsymbolic-Hupde hupd1244 Naduhup Nadahup
johanssonsoundsymbolic-Kamula kamu1260 Kamula-Elevala Kamula
johanssonsoundsymbolic-Kunimaipa kuni1267 Kunimaipan Goilalan
johanssonsoundsymbolic-Lencasalvador lenc1244 Bookkeeping Lencan
johanssonsoundsymbolic-Limilngan nucl1327 Limilngan-Wulna Limilngan
johanssonsoundsymbolic-Mairasi nucl1594 Mairasic Mairasi
johanssonsoundsymbolic-Mongolian halh1238 Mongolic-Khitan Mongolic
johanssonsoundsymbolic-Moro moro1285 Heibanic Heiban
johanssonsoundsymbolic-Nimboran nucl1633 Nimboranic Nimboran
johanssonsoundsymbolic-Ninam nina1238 Yanomamic Yanomam
johanssonsoundsymbolic-PanoanKatukina pano1254 Pano-Tacanan Panoan
johanssonsoundsymbolic-Sanapanaangaite sana1281 Bookkeeping Lengua-Mascoy
johanssonsoundsymbolic-Sentani nucl1632 Sentanic Sentani
johanssonsoundsymbolic-Shatt shat1244 Dajuic Daju
johanssonsoundsymbolic-Toaripi toar1246 Eleman Nuclear Eleman
johanssonsoundsymbolic-Warembori ware1253 Austronesian Austronesian (Malayo-Polynesian: Central-Eastern Malayo-Polynesian: Eastern Malayo-Polynesian: South Halmahera-West New Guinea)
johanssonsoundsymbolic-Yawa nucl1454 Yawa-Saweru Yawa
joophonosemantic-Kabardian kaba1278 Abkhaz-Adyge N Caucasian
joophonosemantic-Wayuu wayu1243 Arawakan Maipurean
joophonosemantic-ShipiboConibo ship1254 Pano-Tacanan Panoan
northeuralex-khk halh1238 Mongolic-Khitan Mongolic
northeuralex-bua buri1258 Mongolic-Khitan Mongolic
northeuralex-xal kalm1243 Mongolic-Khitan Mongolic
@LinguList
Copy link
Contributor Author

This list should be handled by us.

@FredericBlum
Copy link
Collaborator

Is this issues superseeding #46 ? I'd assign this to @MuffinLinwist for early March, once some other tasks are done.

@FredericBlum
Copy link
Collaborator

@MuffinLinwist We can now start working on this issue.

  1. Please create a fresh virtual environment with a clean install of the most recent cldfbench version
  2. Go through all the datasets mentioned in this issue and fix the wrong Family names and glottocodes
  3. Create a PR that fixes the glottocodes/families in etc/languages.csv and re-runs cldfbench
  4. Tag me on the PR so I can review and merge

Some of those cases might already be solved, but most will not. Please first re-check the Glottocode cases described in #35 and answer in the respective issue once you have finished all the cases.

@MuffinLinwist
Copy link

@LinguList and @FredericBlum, I addressed all the errors on the datasets in this issue. @chrzyki is in the process of reviewing the final PRs and merging. If everything is fit, @chrzyki, we can go and consider this issue fix.

@chrzyki
Copy link
Contributor

chrzyki commented Mar 5, 2024

@LinguList and @FredericBlum, I addressed all the errors on the datasets in this issue. @chrzyki is in the process of reviewing the final PRs and merging. If everything is fit, @chrzyki, we can go and consider this issue fix.

Everything looks good and is merged. Thank you very much for taking the time to fix these issues!

@MuffinLinwist
Copy link

@LinguList, @chrzyki, and @FredericBlum just a kindly reminder that this issue is fixed and can be closed.

@LinguList
Copy link
Contributor Author

Cool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants