Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement linkage_get_word_*() (issue #420) #564

Merged
merged 11 commits into from
Jul 13, 2017
Merged

Conversation

ampli
Copy link
Member

@ampli ampli commented Jul 11, 2017

We can leave linkage_get_word() as is (subscripted words) for compatibility, since these functions can be used to retrieve the bare words.

For the linkage words we can add functions to separately retrieve the bare word, the subscript, and the guess information. If this is a good idea I can open an issue for that.

ampli added 11 commits July 11, 2017 11:42
True iff a gword is an original sentence word
How-to:
1. Type a sentence (including e.g. ru).
2. Inspect the positions (first list is utf-8 character position,
second list is byte position).

For 'ru', try it also with DISPLAY_MORPHOLOGY=False.

The code accounts for spell corrections. The following has a 3-way spell
correction (2 as run-on correction):

Enter sentences:
I saw the seasand lakes
Sentence parsed OK
012345678901234567890123
Sentence has 0 unlinked words:
1: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) sea[&].s(10, 13) sand[&].v(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)
1: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) sea[&].s(10, 13) sand[&].v(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)
2: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) sea[&].n-u(10, 13) sand[&].v(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)
2: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) sea[&].n-u(10, 13) sand[&].v(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)
3: LEFT-WALL(0, 0) I.p(0, 1) saw.w(2, 5) the(6, 9) sea[&].n-u(10, 13) sand[&].n-u(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)
3: LEFT-WALL(0, 0) I.p(0, 1) saw.w(2, 5) the(6, 9) sea[&].n-u(10, 13) sand[&].n-u(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)
4: LEFT-WALL(0, 0) I.p(0, 1) saw.w(2, 5) the(6, 9) sea[&].s(10, 13) sand[&].n-u(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)
4: LEFT-WALL(0, 0) I.p(0, 1) saw.w(2, 5) the(6, 9) sea[&].s(10, 13) sand[&].n-u(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)
5: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) sea[&].s(10, 13) sand[&].n-u(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)
5: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) sea[&].s(10, 13) sand[&].n-u(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)
6: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) sea[&].n-u(10, 13) sand[&].n-u(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)
6: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) sea[&].n-u(10, 13) sand[&].n-u(13, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)
7: LEFT-WALL(0, 0) I.p(0, 1) saw.w(2, 5) the(6, 9) seasoned[~].v-d(10, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)
7: LEFT-WALL(0, 0) I.p(0, 1) saw.w(2, 5) the(6, 9) seasoned[~].v-d(10, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)
8: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) seasoned[~].v-d(10, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)
8: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) seasoned[~].v-d(10, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)
9: LEFT-WALL(0, 0) I.p(0, 1) saw.w(2, 5) the(6, 9) seas[&].n(10, 14) and[&].j-n(14, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)
9: LEFT-WALL(0, 0) I.p(0, 1) saw.w(2, 5) the(6, 9) seas[&].n(10, 14) and[&].j-n(14, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)
10: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) seas[&].n(10, 14) and[&].j-n(14, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)
10: LEFT-WALL(0, 0) I.p(0, 1) saw.v-d(2, 5) the(6, 9) seas[&].n(10, 14) and[&].j-n(14, 17) lakes.n(18, 23) RIGHT-WALL(24, 24)

(Try it in both Python2 and Python3.)

The code tries to take into account length difference due to
lowercasing, which I think is maybe limited only to Turkish İ/i.
However, this is not checked because I don't have a "tr" sentence with a
linkage to test it.
@linas linas merged commit c9a8527 into opencog:master Jul 13, 2017
@linas
Copy link
Member

linas commented Jul 13, 2017

thank you!

@linas
Copy link
Member

linas commented Jul 13, 2017

My gut impression is that information about subscripts and guessing is inappropriate for users to look at and make use of. I used to believe that the subscripts could be used in "meaningful" ways, but eventually came to realize that they are bogus. it's really the case that the disjuncts carry the important information, and the other stuff (subscripts, guess marks) are "algorithm noise", best ignored.

@linas
Copy link
Member

linas commented Jul 13, 2017

I'm vaguely thinking of having a version 5.4.0 possibly soon. Anything you want to do before then?

@ampli
Copy link
Member Author

ampli commented Jul 14, 2017

My gut impression is that information about subscripts and guessing is inappropriate for users to look at and make use of. I used to believe that the subscripts could be used in "meaningful" ways, but eventually came to realize that they are bogus. it's really the case that the disjuncts carry the important information, and the other stuff (subscripts, guess marks) are "algorithm noise", best ignored.

Maybe guesses can have some metric, different than the current ones, which is sorted last.
This may solve some of the problems such as in #404. It will also hint that the linkage is actually not of the exact original sentence.

@ampli
Copy link
Member Author

ampli commented Jul 14, 2017

I'm vaguely thinking of having a version 5.4.0 possibly soon. Anything you want to do before then?

Maybe I can send PR for a python binding for my recently introduced lg_error_flush(), that I forgot to implement.
(I have many other fixes and enhancements, but they need porting to the current code, and also some more developement and checks, so this will be for the next versions.)

@ampli ampli deleted the get_word branch July 14, 2017 12:42
@ampli
Copy link
Member Author

ampli commented Jul 14, 2017

I have just sent PR #565 for completing lg_error_flush().
After you merge it and before you issue version 5.4.0, I would like to send a general cleanup PR.

This was referenced Jul 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants