-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement linkage_get_word_*() (issue #420) #564
Conversation
True iff a gword is an original sentence word
How-to: 1. Type a sentence (including e.g. ru). 2. Inspect the positions (first list is utf-8 character position, second list is byte position). For 'ru', try it also with DISPLAY_MORPHOLOGY=False. The code accounts for spell corrections. The following has a 3-way spell correction (2 as run-on correction): Enter sentences: I saw the seasand lakes Sentence parsed OK 012345678901234567890123 Sentence has 0 unlinked words: 1: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) sea[&].s(10, 13) sand[&].v(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) 1: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) sea[&].s(10, 13) sand[&].v(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) 2: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) sea[&].n-u(10, 13) sand[&].v(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) 2: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) sea[&].n-u(10, 13) sand[&].v(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) 3: LEFT-WALL(0, 0) I.p(0, 1) saw.w(2, 5) the(6, 9) sea[&].n-u(10, 13) sand[&].n-u(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) 3: LEFT-WALL(0, 0) I.p(0, 1) saw.w(2, 5) the(6, 9) sea[&].n-u(10, 13) sand[&].n-u(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) 4: LEFT-WALL(0, 0) I.p(0, 1) saw.w(2, 5) the(6, 9) sea[&].s(10, 13) sand[&].n-u(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) 4: LEFT-WALL(0, 0) I.p(0, 1) saw.w(2, 5) the(6, 9) sea[&].s(10, 13) sand[&].n-u(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) 5: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) sea[&].s(10, 13) sand[&].n-u(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) 5: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) sea[&].s(10, 13) sand[&].n-u(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) 6: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) sea[&].n-u(10, 13) sand[&].n-u(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) 6: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) sea[&].n-u(10, 13) sand[&].n-u(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) 7: LEFT-WALL(0, 0) I.p(0, 1) saw.w(2, 5) the(6, 9) seasoned[~].v-d(10, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) 7: LEFT-WALL(0, 0) I.p(0, 1) saw.w(2, 5) the(6, 9) seasoned[~].v-d(10, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) 8: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) seasoned[~].v-d(10, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) 8: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) seasoned[~].v-d(10, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) 9: LEFT-WALL(0, 0) I.p(0, 1) saw.w(2, 5) the(6, 9) seas[&].n(10, 14) and[&].j-n(14, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) 9: LEFT-WALL(0, 0) I.p(0, 1) saw.w(2, 5) the(6, 9) seas[&].n(10, 14) and[&].j-n(14, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) 10: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) seas[&].n(10, 14) and[&].j-n(14, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) 10: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) seas[&].n(10, 14) and[&].j-n(14, 17) lakes.n(18, 23) RIGHT-WALL(24, 24) (Try it in both Python2 and Python3.) The code tries to take into account length difference due to lowercasing, which I think is maybe limited only to Turkish İ/i. However, this is not checked because I don't have a "tr" sentence with a linkage to test it.
API functions per issue opencog#420.
thank you! |
My gut impression is that information about subscripts and guessing is inappropriate for users to look at and make use of. I used to believe that the subscripts could be used in "meaningful" ways, but eventually came to realize that they are bogus. it's really the case that the disjuncts carry the important information, and the other stuff (subscripts, guess marks) are "algorithm noise", best ignored. |
I'm vaguely thinking of having a version 5.4.0 possibly soon. Anything you want to do before then? |
Maybe guesses can have some metric, different than the current ones, which is sorted last. |
Maybe I can send PR for a python binding for my recently introduced lg_error_flush(), that I forgot to implement. |
I have just sent PR #565 for completing lg_error_flush(). |
We can leave linkage_get_word() as is (subscripted words) for compatibility, since these functions can be used to retrieve the bare words.
For the linkage words we can add functions to separately retrieve the bare word, the subscript, and the guess information. If this is a good idea I can open an issue for that.