update to 2.0 #2

SimonGreenhill · 2019-10-30T21:11:07Z

I'd appreciate another pair of eyes looking at this -- in particular we've lost 100% BIPA coverage and the number of cognates has dropped..

LinguList

looks good! @tresoldi's comments are still helpful...

tresoldi · 2019-10-31T12:21:23Z

Looks good, and the decrease in BIPA coverage is not drastic (coming as a side-effect from better segmentation, apparently).

I'd say cases like this should be merged (as already done) and we can open and work on issues like segmentation and concepts later.

LinguList · 2019-10-31T13:07:05Z

Have a look at the errors for transcriptoins, please: https://github.com/lexibank/robinsonap/blob/master/TRANSCRIPTION.md

You will see that some relate to things in quotation marks, which are comments in English, so I suggest: add the starting end ending quotation mark to the brackets. Otherwise, it is a bit problematic to have a segments like "h u m a n b e i n g", right?

tresoldi · 2019-10-31T13:13:27Z

I guess most if not all are due to my hand-crafted replacements not being applied anymore, like this code in the original:

                        # replace single quote characters
                        form = form.replace("‘", "'")
                        form = form.replace("’", "'")

They should either be part of the FormSpec or be integrated in the orthoprofile (I probably had them manually so I didn't need to code a product of all the three different apostrophes times all the vowels).

LinguList · 2019-10-31T13:57:51Z

Yes, but since this replacment you have there will leave the "h u m a n b e i n g", i.e, the english word which is a comment inside the form, it is better to assume that these two function as brackets, and inside brackets we strip, right?

tresoldi · 2019-10-31T14:10:58Z

There were some unbalanced entries, if I recall correctly (or was it apostrophes that needed unicode normalization?), but yes -- they function as brackets and stripping them would be the best approach as of now.

SimonGreenhill · 2019-11-01T10:02:02Z

Thanks @tresoldi and @LinguList. Maybe we let this sit until the issue regarding formspec.replacements is resolved, and then I'll try refactor to catch the last few things (see #3)

LinguList · 2019-11-01T11:41:23Z

Can't you just try my tip with the brakcets? I mean: you have closing and opening quote chars, so just treat them as brackets and see what happens?

SimonGreenhill · 2019-11-03T20:50:19Z

yes, I'll look into that too.

SimonGreenhill · 2019-11-04T10:23:19Z

ok, adding as brackets doesn't work, as we then convert forms like iqa'an to iqa (instead of iqaʔan

xrotwang · 2019-11-04T10:54:24Z

@SimonGreenhill but replacements=["´", "'"] could work, right?

SimonGreenhill · 2019-11-04T10:56:26Z

Yes, I'm adding a few tests to the CLDF output for particular forms to make it easier to work through.

LinguList · 2019-11-04T11:07:02Z

In the worst case, you will have to use lexemes, as we would, of course, not want to have any english comment text rendered as pseudo-ipa. How many forms are there anyway, in which you find the patterns "something in english" in the data? If it's, say 200, I'd just regex those in lexemes.tsv...

update to 2.0

a43dd75

SimonGreenhill requested review from tresoldi and LinguList October 30, 2019 21:11

LinguList approved these changes Oct 30, 2019

View reviewed changes

LinguList merged commit 34982a7 into master Oct 30, 2019

SimonGreenhill mentioned this pull request Nov 1, 2019

Refactor with formspec.replacements. #3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update to 2.0 #2

update to 2.0 #2

SimonGreenhill commented Oct 30, 2019

LinguList left a comment

tresoldi commented Oct 31, 2019

LinguList commented Oct 31, 2019

tresoldi commented Oct 31, 2019

LinguList commented Oct 31, 2019

tresoldi commented Oct 31, 2019

SimonGreenhill commented Nov 1, 2019

LinguList commented Nov 1, 2019

SimonGreenhill commented Nov 3, 2019

SimonGreenhill commented Nov 4, 2019

xrotwang commented Nov 4, 2019

SimonGreenhill commented Nov 4, 2019

LinguList commented Nov 4, 2019

update to 2.0 #2

update to 2.0 #2

Conversation

SimonGreenhill commented Oct 30, 2019

LinguList left a comment

Choose a reason for hiding this comment

tresoldi commented Oct 31, 2019

LinguList commented Oct 31, 2019

tresoldi commented Oct 31, 2019

LinguList commented Oct 31, 2019

tresoldi commented Oct 31, 2019

SimonGreenhill commented Nov 1, 2019

LinguList commented Nov 1, 2019

SimonGreenhill commented Nov 3, 2019

SimonGreenhill commented Nov 4, 2019

xrotwang commented Nov 4, 2019

SimonGreenhill commented Nov 4, 2019

LinguList commented Nov 4, 2019