Surprising multi-connector behavior #1351

linas · 2022-11-04T05:05:15Z

Multi-connectors are behaving in a fashion I do not fully understand. Consider this dict

LEFT-WALL: @ANY+;

<many>: {[@ANY-]-0.1 or [@ANY+]-0.11};

<UNKNOWN-WORD>:  <many>;
% <UNKNOWN-WORD>:  <many> and <many>;
% <UNKNOWN-WORD>:  <many> and <many> and <many>;
% <UNKNOWN-WORD>:  <many> and <many> and <many> and <many>;

and then parse the sentence `asdf asdf asdf asdf. I get this:

linkparser> asdf asdf asdf asdf
Found 14 linkages (14 had no P.P. violations)
	Linkage 1, cost vector = (UNUSED=0 DIS=-0.43 LEN=6)

    +---------------ANY--------------+
    |        +----------ANY----------+
    |        |       +------ANY------+
    |        |       |       +--ANY--+
    |        |       |       |       |
LEFT-WALL asdf[?] asdf[?] asdf[?] asdf[?]

and the others are all trees. Not surprising.

When I set <UNKNOWN-WORD>: <many> and <many>; then I get

Found 435 linkages (435 had no P.P. violations)
	Linkage 1, cost vector = (UNUSED=0 DIS=-0.84 LEN=3)

             +----------ANY----------+
             |       +------ANY------+
    +---ANY--+       +--ANY--+--ANY--+
    |        |       |       |       |
LEFT-WALL asdf[?] asdf[?] asdf[?] asdf[?]

Loops now appear. Not surprising. This is what I wanted. Making the costs negative means that the loopy parses are shown first. But oddly, this is not showing the maximum-possible number of loops!

With <UNKNOWN-WORD>: <many> and <many> and <many>; I get

Found 939 linkages (939 had no P.P. violations)
	Linkage 1, cost vector = (UNUSED=0 DIS=-1.15 LEN=3)

             +----------ANY----------+
             |       +------ANY------+
    +---ANY--+--ANY--+--ANY--+--ANY--+
    |        |       |       |       |
LEFT-WALL asdf[?] asdf[?] asdf[?] asdf[?]

Much better! But wait, there's more: with four<many>'s I get

Found 1020 linkages (1020 had no P.P. violations)
	Linkage 1, cost vector = (UNUSED=0 DIS=-1.25 LEN=6)

    +---------------ANY--------------+
    |        +----------ANY----------+
    |        +------ANY------+       |
    +---ANY--+--ANY--+--ANY--+--ANY--+
    |        |       |       |       |
LEFT-WALL asdf[?] asdf[?] asdf[?] asdf[?]

I understand the difference between one and two <many>'s, But the change for three and four is surprising. I do not understand that or why it's happening.

Seem that five-word sentences need five <many>'s and so on, to explore al possibilities.

The text was updated successfully, but these errors were encountered:

linas · 2022-11-04T05:17:29Z

It's even crazier. LEFT-WALL: {@ANY+} and {@ANY+} ; gives

Found 1836 linkages (1000 of 1000 random linkages had no P.P. violations)
	Linkage 1, cost vector = (UNUSED=0 DIS=-1.25 LEN=6)

    +---------------ANY--------------+
    |        +----------ANY----------+
    |        |       +------ANY------+
    +---ANY--+--ANY--+--ANY--+--ANY--+
    |        |       |       |       |
LEFT-WALL asdf[?] asdf[?] asdf[?] asdf[?]

while LEFT-WALL: {@ANY+} and {@ANY+} and {@ANY+}; gives

Found 2070 linkages (2070 had no P.P. violations)
	Linkage 1, cost vector = (UNUSED=0 DIS=-1.25 LEN=6)

    +---------------ANY--------------+
    |        +----------ANY----------+
    |        +------ANY------+       |
    +---ANY--+--ANY--+--ANY--+--ANY--+
    |        |       |       |       |
LEFT-WALL asdf[?] asdf[?] asdf[?] asdf[?]

linas · 2022-11-04T05:22:13Z

I forgot to mention that both the number of linkages changes, and the cost of the best linkage changes too. The best cost for four-word sentences is achieved with four wall connectors:

<pany>: {[@ANY+]-0.11};
LEFT-WALL: <pany> and <pany> and <pany> and <pany>;

which gives me

Found 2097 linkages (2097 had no P.P. violations)
	Linkage 1, cost vector = (UNUSED=0 DIS=-1.47 LEN=5)

    +---------------ANY--------------+
    +-------ANY------+------ANY------+
    +---ANY--+--ANY--+--ANY--+--ANY--+
    |        |       |       |       |
LEFT-WALL asdf[?] asdf[?] asdf[?] asdf[?]

An interesting phenomenon shows up, described in issue #1351

ampli · 2022-11-04T05:38:09Z

Found 1836 linkages (1000 of 1000 random linkages had no P.P. violations)

Random sampling here. Does it matter?

linas · 2022-11-04T05:47:20Z

Random sampling here. Does it matter?

Yes. The costs are crazy. So, with

<pany>: {[@ANY+]-0.11};
LEFT-WALL: <pany> and <pany>;

I get:

Found 1836 linkages (1000 of 1000 random linkages had no P.P. violations)
	Linkage 1, cost vector = (UNUSED=0 DIS=-1.37 LEN=6)

    +---------------ANY--------------+
    |        +----------ANY----------+
    |        +------ANY------+       |
    +---ANY--+--ANY--+--ANY--+--ANY--+
    |        |       |       |       |
LEFT-WALL asdf[?] asdf[?] asdf[?] asdf[?]

Notice the cost. Then

limit set to 3000
linkparser> asdf asdf asdf asdf
Found 1836 linkages (1836 had no P.P. violations)
	Linkage 1, cost vector = (UNUSED=0 DIS=-1.47 LEN=6)

    +---------------ANY--------------+
    |        +----------ANY----------+
    |        +------ANY------+       |
    +---ANY--+--ANY--+--ANY--+--ANY--+
    |        |       |       |       |
LEFT-WALL asdf[?] asdf[?] asdf[?] asdf[?]

Note the cost. Counting by hand, I see 7 + links and seven - links, so I expect a cost of $-0.11 \times 7 + -0.1 \times 7 = -1.47$ so where is the $-1.37$ coming from?

linkparser> !lim=60
limit set to 60
linkparser> !dis
Display of disjuncts used turned on.
linkparser> asdf asdf asdf asdf
Found 1836 linkages (60 of 60 random linkages had no P.P. violations)
	Linkage 1, cost vector = (UNUSED=0 DIS=-1.27 LEN=6)

    +---------------ANY--------------+
    |        +----------ANY----------+
    |        +------ANY------+       |
    +---ANY--+--ANY--+--ANY--+--ANY--+
    |        |       |       |       |
LEFT-WALL asdf[?] asdf[?] asdf[?] asdf[?]

            LEFT-WALL    -0.220  @ANY+ @ANY+
              asdf[?]    -0.430  @ANY- @ANY+ @ANY+ @ANY+
              asdf[?]    -0.210  @ANY- @ANY+
              asdf[?]    -0.210  @ANY- @ANY+
              asdf[?]    -0.200  @ANY- @ANY-

so that totals to $-1.27$ because ... the multi-connectors. With lim=6000 I get

            LEFT-WALL    -0.220  @ANY+ @ANY+
              asdf[?]    -0.430  @ANY- @ANY+ @ANY+ @ANY+
              asdf[?]    -0.210  @ANY- @ANY+
              asdf[?]    -0.310  @ANY- @ANY- @ANY+
              asdf[?]    -0.300  @ANY- @ANY- @ANY-

So the multi-connectors are now working "individually".

linas · 2022-11-04T05:53:06Z

OK, so I understand the costs being computed differently. The multi-connector isn't incrementing the cost. Fine. But look at the last word ddd

linkparser> aaa bbb ccc ddd
Found 1836 linkages (1836 had no P.P. violations)
	Linkage 1, cost vector = (UNUSED=0 DIS=-1.47 LEN=6)

    +-------------ANY-------------+
    |        +---------ANY--------+
    |        +-----ANY-----+      |
    +---ANY--+--ANY-+--ANY-+--ANY-+
    |        |      |      |      |
LEFT-WALL aaa[?] bbb[?] ccc[?] ddd[?]

            LEFT-WALL    -0.220  @ANY+ @ANY+
               aaa[?]    -0.430  @ANY- @ANY+ @ANY+ @ANY+
               bbb[?]    -0.210  @ANY- @ANY+
               ccc[?]    -0.310  @ANY- @ANY- @ANY+
               ddd[?]    -0.300  @ANY- @ANY- @ANY-

vs.

linkparser> !lim=60
limit set to 60
linkparser> aaa bbb ccc ddd
Found 1836 linkages (60 of 60 random linkages had no P.P. violations)
	Linkage 1, cost vector = (UNUSED=0 DIS=-1.27 LEN=6)

    +-------------ANY-------------+
    |        +---------ANY--------+
    |        +-----ANY-----+      |
    +---ANY--+--ANY-+--ANY-+--ANY-+
    |        |      |      |      |
LEFT-WALL aaa[?] bbb[?] ccc[?] ddd[?]

            LEFT-WALL    -0.220  @ANY+ @ANY+
               aaa[?]    -0.430  @ANY- @ANY+ @ANY+ @ANY+
               bbb[?]    -0.210  @ANY- @ANY+
               ccc[?]    -0.210  @ANY- @ANY+
               ddd[?]    -0.200  @ANY- @ANY-

Notice how this time, the disjunct on ddd only has two multi-connectors Why two? why not one? There are clearly three - links being used ...

linas · 2022-11-04T06:11:47Z

Ah! OK, I think I understand what is going on. The two- case should be enough to enumerate all possible graphs. There seem to be more, because they are duplicated with different disjunct usage. The multi connector cost isn't accounted for when multi is used. So I'm thinking everything works correctly; its just confusing.

I guess the way cost is computed is a bug -- is there an ovious way to "fix" the cost for multi-connectors?

Good night, I'll sleep on this.

ampli · 2022-11-04T07:19:32Z

See also lg_compute_disjunct_strings(). It is my rewrite of the original one. It handles multi-connectors in a special way.
This code should be rechecked. I think a debug flag can be added to make it optionally use the chosen disjuncts so we see if the results are the same.

ampli · 2022-11-04T10:55:00Z

See also lg_compute_disjunct_strings()

I haven't tried yet to add an option to directly use the chosen disjuncts, but the concept of the current code seems to me solid for now.

With <UNKNOWN-WORD>: <many> and <many> and <many>; ...

linkparser> !!<UNKNOWN-WORD>
Token "<UNKNOWN-WORD>" matches:
    <UNKNOWN-WORD>                    9 disjuncts


Token "<UNKNOWN-WORD>" expressions:
    <UNKNOWN-WORD>             ({[@ANY-]-0.100 or [@ANY+]-0.110}) & ({[@ANY-]-0.100 or [@ANY+]-0.110})

linkparser> !!<UNKNOWN-WORD>//
Token "<UNKNOWN-WORD>" matches:
    <UNKNOWN-WORD>                    9 disjuncts


Token "<UNKNOWN-WORD>" disjuncts:
    <UNKNOWN-WORD>                    5/8 disjuncts

  <UNKNOWN-WORD>: [0]-0.110=  <> @ANY+
  <UNKNOWN-WORD>: [1]-0.100= @ANY- <> 
  <UNKNOWN-WORD>: [2]-0.210= @ANY- <> @ANY+
  <UNKNOWN-WORD>: [3]-0.200= @ANY- @ANY- <> 
  <UNKNOWN-WORD>: [4]-0.220=  <> @ANY+ @ANY+

linkparser>

Disjunct [0] and disjunct [4] can be considered duplicates (also [1] and [3]) since disjunct [0] can create similar linkages as disjunct [4]. Despite having diagrams that look the same, the total cost depends on the length of the multi-connector sequence which got used (if the multi-connector costs are not 0).

OK, so I understand the costs being computed differently. The multi-connector isn't incrementing the cost. Fine. But look at the last word ddd

The two- case should be enough to enumerate all possible graphs. There seem to be more, because they are duplicated with different disjunct usage.

As you said, this seems fine - on !lim=60 the lower sampled linkage happens to contain:
ddd[?] -0.200 @ANY- @ANY-
The linkage with ddd[?] -0.300 @ANY- @ANY- @ANY- just has not been sampled.

I guess the way cost is computed is a bug -- is there an ovious way to "fix" the cost for multi-connectors?

The current concept of disjunct cost is static (independent of the number of times a multi-connector is used).
We can recompute it after the linkage, but I'm not sure whether this is a good idea. Does multi-match makes a sentence less likely (for cost>0) or more likely (for cost<0)?

Instead, maybe we should discard such "duplicate" disjuncts. But the implementation looks very problematic because it is not clear how to handle the various cases (we can discuss that if needed).

linas · 2022-11-04T17:17:15Z

OK, so after sleeping, I've convinced myself everything is fine, except for the cost computation. The fully correct behavior is given by the dict

LEFT-WALL:  {[@ANY+]-0.11};
<UNKNOWN-WORD>: [@ANY-]-0.1 or [@ANY+]-0.11 or [@ANY- & @ANY+]-0.111;

and the above does generate an exhaustive collection of linkages. However, due to the cost computations, the ones with the maximal number of links are NOT presented first! And that is what confused me, and lead to this long report.

Does multi-match makes a sentence less likely (for cost>0) or more likely (for cost<0)?

Yes, that was the whole idea! In one case, the use is minimized; in the other, it is maximized.

linas · 2022-11-04T17:32:42Z

To fix cost computations, I think it can be done simply & easily in score.c by comparing the number of times a multi-connector appears in the disjuncts, vs the number of times that a link with the same name appears in the link_array.

linas · 2022-11-04T19:54:44Z

I implemented a cost-adjust function, that adjusts the total cost by the number of times that a multi-connector is used. It is in my branch cost-adjust. What is missing is the actual cost of a multi-connector. The natural place to put it would be in condesc_t but it looks like condesc_t is carefully packed to minimize space.

@ampli take a look, tell me what you think. Getting this to work is low priority -- the goal was to maximize the creation of links, when they have negative cost. It seems that maximizing the creation of links is useful, to maximally constrain the grammar. I was envisioning that this would be used only by the Atomese dict, only for UNKNOWN-WORD, to create maximally-linked unknown words. But this is all very hypothetical. It seems like a good idea, it seems like the right thing to do, but the practical impact of it, at this time, is nil.

The branch: https://github.com/linas/link-grammar/tree/cost-adjust

linas · 2022-11-04T19:56:38Z

Here's the diff:

https://github.com/opencog/link-grammar/compare/master...linas:cost-adjust?expand=1

Now that the issue is clearly understood, we can do it correctly. See #1351 for details.

linas added the question label Nov 4, 2022

linas added a commit that referenced this issue Nov 4, 2022

Add automatic support for <UNKNWON-WORD>

8875f2d

An interesting phenomenon shows up, described in issue #1351

linas added a commit that referenced this issue Nov 4, 2022

Revert the cartesian-any idea.

fa0e613

Now that the issue is clearly understood, we can do it correctly. See #1351 for details.

ampli mentioned this issue May 17, 2024

Improve the connector hashing and extend condesc_t #1528

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surprising multi-connector behavior #1351

Surprising multi-connector behavior #1351

linas commented Nov 4, 2022 •

edited

Loading

linas commented Nov 4, 2022

linas commented Nov 4, 2022

ampli commented Nov 4, 2022

linas commented Nov 4, 2022

linas commented Nov 4, 2022

linas commented Nov 4, 2022

ampli commented Nov 4, 2022

ampli commented Nov 4, 2022

linas commented Nov 4, 2022

linas commented Nov 4, 2022

linas commented Nov 4, 2022 •

edited

Loading

linas commented Nov 4, 2022

Surprising multi-connector behavior #1351

Surprising multi-connector behavior #1351

Comments

linas commented Nov 4, 2022 • edited Loading

linas commented Nov 4, 2022

linas commented Nov 4, 2022

ampli commented Nov 4, 2022

linas commented Nov 4, 2022

linas commented Nov 4, 2022

linas commented Nov 4, 2022

ampli commented Nov 4, 2022

ampli commented Nov 4, 2022

linas commented Nov 4, 2022

linas commented Nov 4, 2022

linas commented Nov 4, 2022 • edited Loading

linas commented Nov 4, 2022

linas commented Nov 4, 2022 •

edited

Loading

linas commented Nov 4, 2022 •

edited

Loading