-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update to 2.0 #2
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! @tresoldi's comments are still helpful...
Looks good, and the decrease in BIPA coverage is not drastic (coming as a side-effect from better segmentation, apparently). I'd say cases like this should be merged (as already done) and we can open and work on issues like segmentation and concepts later. |
Have a look at the errors for transcriptoins, please: https://github.com/lexibank/robinsonap/blob/master/TRANSCRIPTION.md You will see that some relate to things in quotation marks, which are comments in English, so I suggest: add the starting end ending quotation mark to the brackets. Otherwise, it is a bit problematic to have a segments like "h u m a n b e i n g", right? |
I guess most if not all are due to my hand-crafted replacements not being applied anymore, like this code in the original: # replace single quote characters
form = form.replace("‘", "'")
form = form.replace("’", "'") They should either be part of the FormSpec or be integrated in the orthoprofile (I probably had them manually so I didn't need to code a product of all the three different apostrophes times all the vowels). |
Yes, but since this replacment you have there will leave the "h u m a n b e i n g", i.e, the english word which is a comment inside the form, it is better to assume that these two function as brackets, and inside brackets we strip, right? |
There were some unbalanced entries, if I recall correctly (or was it apostrophes that needed unicode normalization?), but yes -- they function as brackets and stripping them would be the best approach as of now. |
Thanks @tresoldi and @LinguList. Maybe we let this sit until the issue regarding |
Can't you just try my tip with the brakcets? I mean: you have closing and opening quote chars, so just treat them as brackets and see what happens? |
yes, I'll look into that too. |
ok, adding as brackets doesn't work, as we then convert forms like |
@SimonGreenhill but |
Yes, I'm adding a few tests to the CLDF output for particular forms to make it easier to work through. |
In the worst case, you will have to use lexemes, as we would, of course, not want to have any english comment text rendered as pseudo-ipa. How many forms are there anyway, in which you find the patterns |
I'd appreciate another pair of eyes looking at this -- in particular we've lost 100% BIPA coverage and the number of cognates has dropped..