Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update to 2.0 #2

Merged
merged 1 commit into from
Oct 30, 2019
Merged

update to 2.0 #2

merged 1 commit into from
Oct 30, 2019

Conversation

SimonGreenhill
Copy link
Contributor

I'd appreciate another pair of eyes looking at this -- in particular we've lost 100% BIPA coverage and the number of cognates has dropped..

Copy link
Contributor

@LinguList LinguList left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! @tresoldi's comments are still helpful...

@LinguList LinguList merged commit 34982a7 into master Oct 30, 2019
@tresoldi
Copy link

Looks good, and the decrease in BIPA coverage is not drastic (coming as a side-effect from better segmentation, apparently).

I'd say cases like this should be merged (as already done) and we can open and work on issues like segmentation and concepts later.

@LinguList
Copy link
Contributor

Have a look at the errors for transcriptoins, please: https://github.com/lexibank/robinsonap/blob/master/TRANSCRIPTION.md

You will see that some relate to things in quotation marks, which are comments in English, so I suggest: add the starting end ending quotation mark to the brackets. Otherwise, it is a bit problematic to have a segments like "h u m a n b e i n g", right?

@tresoldi
Copy link

I guess most if not all are due to my hand-crafted replacements not being applied anymore, like this code in the original:

                        # replace single quote characters
                        form = form.replace("‘", "'")
                        form = form.replace("’", "'")

They should either be part of the FormSpec or be integrated in the orthoprofile (I probably had them manually so I didn't need to code a product of all the three different apostrophes times all the vowels).

@LinguList
Copy link
Contributor

Yes, but since this replacment you have there will leave the "h u m a n b e i n g", i.e, the english word which is a comment inside the form, it is better to assume that these two function as brackets, and inside brackets we strip, right?

@tresoldi
Copy link

There were some unbalanced entries, if I recall correctly (or was it apostrophes that needed unicode normalization?), but yes -- they function as brackets and stripping them would be the best approach as of now.

@SimonGreenhill
Copy link
Contributor Author

Thanks @tresoldi and @LinguList. Maybe we let this sit until the issue regarding formspec.replacements is resolved, and then I'll try refactor to catch the last few things (see #3)

@LinguList
Copy link
Contributor

Can't you just try my tip with the brakcets? I mean: you have closing and opening quote chars, so just treat them as brackets and see what happens?

@SimonGreenhill
Copy link
Contributor Author

yes, I'll look into that too.

@SimonGreenhill
Copy link
Contributor Author

ok, adding as brackets doesn't work, as we then convert forms like iqa'an to iqa (instead of iqaʔan

@xrotwang
Copy link
Contributor

xrotwang commented Nov 4, 2019

@SimonGreenhill but replacements=["´", "'"] could work, right?

@SimonGreenhill
Copy link
Contributor Author

Yes, I'm adding a few tests to the CLDF output for particular forms to make it easier to work through.

@LinguList
Copy link
Contributor

In the worst case, you will have to use lexemes, as we would, of course, not want to have any english comment text rendered as pseudo-ipa. How many forms are there anyway, in which you find the patterns "something in english" in the data? If it's, say 200, I'd just regex those in lexemes.tsv...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants