-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
glottojoin_language_level (or something similar) #54
Comments
@SietzeN Would you like me to make some sudo code or tidyverse code to exemplify this operation? |
That would be great. Perhaps we can add a new function to glottojoin.R, or build on this one: https://github.com/SietzeN/glottospace/blob/fe0b3ed8fb1ff87105fd932411de44c162c1e180/R/glottojoin.R#L196 |
p.s. I'm not sure whether random assignment is such a good idea. I think it's better if the user specifies the kind of join: https://r4ds.had.co.nz/relational-data.html |
Sure, it's not always a great idea but if the users have no apriori reason to pick one or the other picking randomly is better than defaulting to for example always picking the first, which I've seen people do in similar circumstances. |
Okay I'll have a go! |
Late, sorry. But here is a first draft of an approach: https://github.com/HedvigS/personal-cookbook/blob/main/R/glottojoin_language_level.R |
What do you think @SietzeN ? |
@SietzeN if this isn't a good fit for this package, I might move to try to convince Simon to put it in rcldf instead. |
Take back, I've apparently already tried that and hasn't worked. Oh well. |
Hi @HedvigS ! Thanks for the message, Sorry, I now see I missed your previous message with the link to the script. Implementing the suggested changes in glottojoin might not be easiest, because the function behaviour changes depending on the input type. So a standalone function might be easier. How about 'glottodialang'? If you create a pull request, I'm happy to add it! Cheers! |
It's alright. I've suggested a function to language-level datasets to rgrambank grambank/rgrambank#3 and then once two sets are levelled they can be joined together. Maybe this is the better path to take. |
Great! Yes, that sounds like a more suitable approach. And congrats on your nice paper!
Van: Hedvig Skirgård ***@***.***>
Verzonden: woensdag 31 mei 2023 15:07
Aan: SietzeN/glottospace ***@***.***>
CC: Norder, S.J. (Sietze) ***@***.***>; Mention ***@***.***>
Onderwerp: Re: [SietzeN/glottospace] glottojoin_language_level (or something similar) (Issue #54)
U ontvangt niet vaak e-mail van ***@***.*** Meer informatie over waarom dit belangrijk is<https://aka.ms/LearnAboutSenderIdentification>
It's alright. I've suggested a function to language-level datasets to rgrambank grambank/rgrambank#3<grambank/rgrambank#3> and then once two sets are levelled they can be joined together. Maybe this is the better path to take.
—
Reply to this email directly, view it on GitHub<#54 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AG5X33ZU2DJKSF2T7PVJOWLXI4655ANCNFSM5NMLKOJA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Dank u wel :)!
|
thanks @SietzeN for making this package.
It would be great if there was a function that joins different datasets together such that dialects of the same language are matched up and assigned the glottocode of their common ancestor, probably the language levelled-parent.
Currently in glottolog-cldf there is a col called "Language_ID" in the languages table which reports, if the languoid is a dialect, what the glottocode is of the parent languoid that has the level "language". This needn't be the direct parent of the dialect, sometimes dialects are nested within dialects and so on (I think the max levels I've seen is 4 or 5). I used to have a script that looped through each dialect languoid and check which of its parents is a language and what glottocode that has, then I convinced Robert F to add this information to the language table which has simplified things a lot. Essentially what I do now is I add in the glottocodes for language and family-levelled langouid in the Language_ID column and I join datasets by that instead (with different methods for what to do if more than one dialect of the same language). I've also taken the liberty of renaming this column "Language_level_ID" so as to not cause confusion with the other columns called "Language_ID" elsewhere in CLDF-tables (even if there isn't another col called that in the language table specifically)..
An improvement of this method could be to merge dialects based on their common ancestor, even if that is in itself also a dialect. That may be overdoing it though, joining based on the language levelled parent is probably best after all.
So, in summary: either an option to an existing function or a new function which
Once again, glad you're doing this and happy to be invited along!
The text was updated successfully, but these errors were encountered: