PLN revisionFormula issue with count #587

cosmoharrigan · 2014-03-11T18:19:04Z

While running PLN, I noticed that the count attribute of some truth values was growing very large.

PLN is using a revisionFormula:
https://github.com/opencog/opencog/blob/master/opencog/python/pln/rules/formulas.py#L483

which is called in chainers.py here:
https://github.com/opencog/opencog/blob/master/opencog/python/pln/chainers.py#L726

It sets the new count to be the old count plus the new count:
https://github.com/opencog/opencog/blob/master/opencog/python/pln/rules/formulas.py#L479
n = x.count+y.count

I think that will cause the count to grow very large.

I suspect this may not be correct.

In the PLN book on page 116, it mentions:

We may also heuristically form a count rule
such as n = n1 + n2 – c min(n1, n2), where the parameter c indicates an 
assumption about the level of interdependency between the bases of 
information corresponding to the two premises. The value c=1 denotes 
the assumption that the two sets of evidence are completely redundant; 
the value c=0 denotes the assumption that the two sets of evidence are 
totally distinct. Intermediate values denote intermediate assumptions.

in this simple case, my guess is that the sets of evidence are closer to being redundant, which would mean the c value was close to 1, causing the new truth value count to approach the max of either n1 or n2, rather than the sum of both of them.

Furthermore, isn't truth value revision currently implemented as the responsibility of the C++ TruthValue::merge class? Shouldn't it be encapsulated once, in one place, whether that's in PLN or in the TruthValue class, rather than doing it two different ways?

@jadeoneill what are your thoughts?

The text was updated successfully, but these errors were encountered:

cosmoharrigan · 2014-03-11T18:20:39Z

cc: @moikle this might interest you as well

linas · 2014-03-11T22:33:35Z

I propose renaming SimpleTruthValue to PLNTruthValue. This opens the door to putting the various math formulas into PLNTruthValue, which would be faster in C++ than in python, and might make the python code look cleaner?

edajade · 2014-03-12T09:33:13Z

No, it wouldn't be.

On Wed, Mar 12, 2014 at 9:33 AM, Linas Vepstas notifications@github.comwrote:

I propose renaming SimpleTruthValue to PLNTruthValue. This opens the door
to putting the various math formulas into PLNTruthValue, which would be
faster in C++ than in python, and might make the python code look cleaner?

Reply to this email directly or view it on GitHubhttps://github.com//issues/587#issuecomment-37354823
.

bgoertzel · 2014-03-12T09:48:03Z

Hmmm...

Regardless of whether SimpleTruthValue is renamed to SimplePLNTruthValue
(it shouldn't just be PLNTruthValue, because PLN also will use other kinds
of truth values, like IndefiniteTruthValue...), putting the PLN truth value
formulas in the TruthValue class seems not a good idea...

Truth value formula execution is not remotely our bottleneck; the
bottleneck of PLN is of course Atomspace interaction... and having both the
PLN rules and formulas in python seems more convenient than having the
rules in python and the formulas in C++ in another part of the code...

ben

On Wed, Mar 12, 2014 at 5:33 PM, jadeoneill notifications@github.comwrote:

No, it wouldn't be.

On Wed, Mar 12, 2014 at 9:33 AM, Linas Vepstas <notifications@github.com

wrote:

I propose renaming SimpleTruthValue to PLNTruthValue. This opens the door
to putting the various math formulas into PLNTruthValue, which would be
faster in C++ than in python, and might make the python code look
cleaner?

Reply to this email directly or view it on GitHub<
https://github.com/opencog/opencog/issues/587#issuecomment-37354823>

.

Reply to this email directly or view it on GitHubhttps://github.com//issues/587#issuecomment-37389540
.

Ben Goertzel, PhD
http://goertzel.org

"In an insane world, the sane man must appear to be insane". -- Capt. James
T. Kirk

"Emancipate yourself from mental slavery / None but ourselves can free our
minds" -- Robert Nesta Marley

cosmoharrigan · 2014-03-12T17:42:52Z

@jadeoneill can you comment on my original post?

edajade · 2014-03-12T20:21:28Z

The revision formula should definitely be in python, like the rest of the
formulas and pln code. I think using the fancier formula from the PLN book
is a good idea (and it would just be a 1 line change + another arbitrary
parameter which is OK)

On Thu, Mar 13, 2014 at 4:42 AM, Cosmo Harrigan notifications@github.comwrote:

@jadeoneill https://github.com/jadeoneill can you comment on my
original post?

Reply to this email directly or view it on GitHubhttps://github.com//issues/587#issuecomment-37439770
.

cosmoharrigan · 2014-03-12T21:23:00Z

How shall the redundancy assumption parameter c be calculated?

edajade · 2014-03-12T21:32:36Z

Just make it up to start with. Ben might come up with some formula (i don't
think they have one yet)

On Thu, Mar 13, 2014 at 8:23 AM, Cosmo Harrigan notifications@github.comwrote:

How shall the redundancy assumption parameter c be calculated?

Reply to this email directly or view it on GitHubhttps://github.com//issues/587#issuecomment-37467281
.

linas · 2014-03-12T22:51:42Z

Several issues that are being overlooked:

Not all of the formulas can be in the python code. According to the atomspace design, some of the formulas MUST be in the C++ code. This is utterly unavoidable. Cosmo was just asking what to do about this.

The reason this is unavoidable is the following: When the atomspace contains an atom, and some user attempts to place an identical atom into the atomspace, but having a different truth value, one MUST decide what to do. One can do anything: discard the old truth value. Discard the new truth value. Do some fancy mathematical merge of the two. Doesn't mater (to me) what is done, however, something must be done.

If there are no objections, I will rename SimpleTruthValue to SimplePLNTruthValue 'real soon now'. The point of this renaming is to make it clear to current and future hackers that this truth value has a very specific semantics. Its not just 'some la-de-dah truth value'
Performance: Based on direct measurement, it should be clear to absolutely everyone involved that the fewer round-trips between python and C++, the faster everything will go. You will get direct and immediate performance improvements by minimizing the total number of calls into C++ code. You can minimize the total number of calls by batching up the work. Otherwise, you'll just waste all of your CPU time in the glue code between python and C++.
Don't just brush off my suggestions on performance. Based on direct experience, I can confidently say that, in general, people's intuitions about the location of bottlenecks are usually incorrect -- doesn't matter if you're the brilliant system architect or a lowly peon rank-n-file coder. That's just a fact of life, and it holds for all software projects, and OpenCog and PLN are no exception to this rule.

bgoertzel · 2014-03-13T00:47:35Z

The revision rule seems a special case to me...

I think most PLN formulas should be in python, as PLN is in python and this
is thus by far the easiest thing for PLN developers... and calculating PLN
formulas, in the course of doing python-based PLN inference, is really not
the bottleneck...

However, as Linas points out, the revision formula is not just invoked
during PLN reasoning, it should be invoked when someone introduces a new
Atom into the Atomspace and it's identical to another Atom. In this case
there is a question regarding efficiency. The question is whether the
round-trip from C++ to python for doing revision is the time-bottleneck in
this case. I don't know the answer to this question. Someone could do
some measurements and find out. If it is the bottleneck, then there is
an argument for putting the revision rule in C++. I hope not, but: One
might then end up with copies of the revision rule in C++ and in python,
which is awkward but might be the right compromise between developer
convenience and system efficiency.

About renaming. I don't mind if you rename SimpleTruthValue to
SimplePLNTruthValue. Someone else may then later bifurcate it into
SimpleProbabilisticPLNTruthValue and SimpleFuzzyPLNTruthValue ;-p ...

ben

On Thu, Mar 13, 2014 at 6:51 AM, Linas Vepstas notifications@github.comwrote:

Several issues that are being overlooked:

Not all of the formulas can be in the python code. According to the
atomspace design, some of the formulas MUST be in the C++ code. This is
utterly unavoidable. Cosmo was just asking what to do about this.

The reason this is unavoidable is the following: When the atomspace
contains an atom, and some user attempts to place an identical atom into
the atomspace, but having a different truth value, one MUST decide what to
do. One can do anything: discard the old truth value. Discard the new truth
value. Do some fancy mathematical merge of the two. Doesn't mater (to me)
what is done, however, something must be done.

If there are no objections, I will rename SimpleTruthValue to
SimplePLNTruthValue 'real soon now'. The point of this renaming is to make
it clear to current and future hackers that this truth value has a very
specific semantics. Its not just 'some la-de-dah truth value'

Performance: Based on direct measurement, it should be clear to
absolutely everyone involved that the fewer round-trips between python and
C++, the faster everything will go. You will get direct and immediate
performance improvements by minimizing the total number of calls into C++
code. You can minimize the total number of calls by batching up the work.
Otherwise, you'll just waste all of your CPU time in the glue code between
python and C++.

Don't just brush off my suggestions on performance. Based on direct
experience, I can confidently say that, in general, people's intuitions
about the location of bottlenecks are usually incorrect -- doesn't matter
if you're the brilliant system architect or a lowly peon rank-n-file coder.
That's just a fact of life, and it holds for all software projects, and
OpenCog and PLN are no exception to this rule.

Reply to this email directly or view it on GitHubhttps://github.com//issues/587#issuecomment-37479180
.

Ben Goertzel, PhD
http://goertzel.org

"In an insane world, the sane man must appear to be insane". -- Capt. James
T. Kirk

"Emancipate yourself from mental slavery / None but ourselves can free our
minds" -- Robert Nesta Marley

cosmoharrigan · 2014-03-13T01:02:19Z

There is this note that Linas had added on some wiki pages, such as this one:

CAUTION: in PLN, it is assumed that the truth value of EvaluationLink
and MemberLink is a fuzzy-set-membership value, and NOT a probabilistic 
truth value. Neither the C++ code, nor the python or scheme wrappers 
distinguish these two types of truth values at this time. Combining these 
two different kinds of truth values can and will result in crazy results.

In the PLN book, it talks about truth values representing fuzzy set membership and MemberLinks, and it also talks about truth values representing probabilities. Can you explain further what one needs to watch out for in the context of PLN?

linas · 2014-03-13T01:50:25Z

Ben,

Help me pick some good names. It's clear that "SimpleTruthValues" are not simple at all, they need a better descriptive name. I think FuzzyTruthValue is probably sufficient, as SimpleFuzzyPLNTruthValue is getting silly. The goal here is to allow the programmer to know what they're dealing with, and to throw an assert when the programmer accidentally uses of combines inappropriate truth values. Lets keep the names simple, short, direct, and meaningful.

bgoertzel · 2014-03-13T02:07:31Z

A standard FuzzyTruthValue would have only one number, the fuzzy set
membership degree

A FuzzyPLNTruthValue is different from that, because it has a
confidence/count as well...

The complicated names are meaningful, silly as they may seem..

I.e. PLN really does have notions of: a simple fuzzy truth value [with a
(s,c) pair] versus a more complex fuzzy truth value [with e.g. a fuzzy set
of probabilistic truth values]

So I think FuzzyPLNTruthValue is an OK name. Simply calling the fuzzy TV
with an (s,c) pair a FuzzyTruthValue is not nice to anyone who wants to use
OpenCog with fuzzy TVs containing a single number only (the membership
degre)

In summary, I think it would be OK to replace SimpleTruthValue with
PLNSimpleTruthValue and PLNFuzzyTruthValue

The complexity swept under the rug by this terminology is that
PLNSimpleTruthValue actually means SimpleProbabilisticPLNTruthValue. So
e.g. later on we will also have PLNIndefiniteTruthValue, which is a more
complex kind of probabilistic PLN truth value, and
PLNDistributionalTruthValue. But it seems OK to me to use
PLNSimpleTruthValue as a shorthand for SimpleProbabilisticPLNTruthValue.

Note that we could also have

ProbabilityTruthValue // just a single probability
BinaryTruthValue
TernaryTruthValue // for discrete TV logics
FuzzyTruthValue // a single fuzzy membership degree

None of these have users at the moment, but the are logically sensible...

For the Zen Buddhists, we can also use

UnaryTruthValue // everything is true!
0aryTruthValue // nothing exists

;-)

SUMMARY: my recommendation is for a list of types like

BinaryTruthValue
TernaryTruthValue
FuzzyTruthValue
ProbabilityTruthValue

PLNSimpleTruthValue ***
PLNFuzzyTruthValue ***
PLNIndefiniteTruthValue
...

and then others as needed...

It happens that the two marked *** are the ones we're using right now for
PLN . But earlier prototype PLN code used PLNIndefiniteTruthValue and
there was some support for this in the C++ PLN code...

-- Ben

On Thu, Mar 13, 2014 at 9:50 AM, Linas Vepstas notifications@github.comwrote:

Ben,

Help me pick some good names. It's clear that "SimpleTruthValues" are not
simple at all, they need a better descriptive name. I think FuzzyTruthValue
is probably sufficient, as SimpleFuzzyPLNTruthValue is getting silly. The
goal here is to allow the programmer to know what they're dealing with, and
to throw an assert when the programmer accidentally uses of combines
inappropriate truth values. Lets keep the names simple, short, direct, and
meaningful.

Reply to this email directly or view it on GitHubhttps://github.com//issues/587#issuecomment-37491767
.

Ben Goertzel, PhD
http://goertzel.org

"In an insane world, the sane man must appear to be insane". -- Capt. James
T. Kirk

"Emancipate yourself from mental slavery / None but ourselves can free our
minds" -- Robert Nesta Marley

cosmoharrigan · 2014-03-13T02:20:48Z

PLNSimpleTruthValue ***
PLNFuzzyTruthValue ***
It happens that the two marked *** are the ones we're using right now for
PLN

How is the current PLN implementation distinguishing between those two?

bgoertzel · 2014-03-13T02:32:19Z

The distinction is made inside the derivation of the truth value formulas
themselves. It is not made in the operation of the rules. The TV
formulas depend on the link types of the premises, and they know that
certain link types are fuzzy and certain are probabilistic

So in the current version the distinction btw PLNSimpleTV and PLNFuzzyTV is
not needed for correct operation of the code. However, IMO it would be
useful to remind humans not to confuse the two when developing new PLN
formulas or other related PLN code...

ben

On Thu, Mar 13, 2014 at 10:20 AM, Cosmo Harrigan
notifications@github.comwrote:

PLNSimpleTruthValue ***
PLNFuzzyTruthValue ***

It happens that the two marked *** are the ones we're using right now for
PLN

How is the current PLN implementation distinguishing between those two?

Reply to this email directly or view it on GitHubhttps://github.com//issues/587#issuecomment-37493142
.

Ben Goertzel, PhD
http://goertzel.org

"In an insane world, the sane man must appear to be insane". -- Capt. James
T. Kirk

"Emancipate yourself from mental slavery / None but ourselves can free our
minds" -- Robert Nesta Marley

cosmoharrigan · 2014-03-13T02:34:10Z

Ok, I see. Sorry for many questions, but: where do you suggest would be a convenient place for me to review the enumeration of which link types should belong to which category?

bgoertzel · 2014-03-13T02:39:20Z

Hmmm.... That's not systematically written down anywhere though it's
implicit in the PLN book's math...

MemberLink and EvaluationLink are the big PLNFuzzyTV guys... I can't think
of any others at this moment but I'm in a hurry and might be missing
something....

No need to apologize, the questions are good, useful ones ;)

On Thu, Mar 13, 2014 at 10:34 AM, Cosmo Harrigan
notifications@github.comwrote:

Ok, I see. Sorry for many questions, but: where do you suggest would be a
convenient place for me to review the enumeration of which link types
should belong to which category?

Reply to this email directly or view it on GitHubhttps://github.com//issues/587#issuecomment-37493786
.

Ben Goertzel, PhD
http://goertzel.org

"In an insane world, the sane man must appear to be insane". -- Capt. James
T. Kirk

"Emancipate yourself from mental slavery / None but ourselves can free our
minds" -- Robert Nesta Marley

cosmoharrigan · 2014-03-13T02:40:43Z

Ok, thanks, I am stepping out now as well, but I think this enumeration is something to revisit and make explicit.

linas · 2014-03-13T02:43:22Z

Yes, that's a good question: point is: I'm using EvaluationLink's to store counting information about discovered patterns. So e.g.

EvaluationLink (frequency-counts-and-entropy-truth-value)
RelationshipNode "SomeRelation"
ListLink
BlahAtom
BlahBlahAtom

which tells me about how often the Blahatoms have been observed in some relationship type. Ideally, no one should run this through PLN, but good programming hygiene says that such accidents are best avoided by throwing an error when they occur. Thus, having an implicit truth type depending on the atom type is ... maybe acceptable for now, but somewhat dangerous ...

Improve the revision formula (#587) and test the boolean transformation rules

cosmoharrigan · 2014-03-25T00:28:57Z

This issue still persists: instead of

n = x.count+y.count

it has now been replaced by

n = x.count + y.count – REVISION_REDUNDANCY_FACTOR * min(x.count, y.count)

with

REVISION_REDUNDANCY_FACTOR = 0.5

where 0.5 is an arbitrary number.

The result is that count rapidly grows very large until it approaches its maximum value of 4473923584.

In practice, that rapid increase in count can cause confidence to behave "more like" a boolean rather than a range between 0 and 1, because any non-zero value will rapidly approach 1, which would seem to eliminate most of its usefulness.

How should this revision formula be fixed?

bgoertzel · 2014-03-25T00:48:31Z

I'll look at this and other PLN issues in 6-7 hours when I'm at the
computer next, thx ;)

On Tue, Mar 25, 2014 at 8:29 AM, Cosmo Harrigan notifications@github.comwrote:

This issue still persists: instead of

n = x.count+y.count

it has now been replaced by

n = x.count + y.count - REVISION_REDUNDANCY_FACTOR * min(x.count, y.count)

with

REVISION_REDUNDANCY_FACTOR = 0.5

where 0.5 is an arbitrary number.

The result is that count rapidly grows very large until it approaches
its maximum value of 4473923584.

In practice, that rapid increase in count can cause confidence to behave
"more like" a boolean rather than a range between 0 and 1, because any
non-zero value will rapidly approach 1, which would seem to eliminate most
of its usefulness.

How should this revision formula be fixed?

Reply to this email directly or view it on GitHubhttps://github.com//issues/587#issuecomment-38518597
.

Ben Goertzel, PhD
http://goertzel.org

"In an insane world, the sane man must appear to be insane". -- Capt. James
T. Kirk

"Emancipate yourself from mental slavery / None but ourselves can free our
minds" -- Robert Nesta Marley

bgoertzel · 2014-03-25T09:59:24Z

Hmmm...

I'd need to understand the context better, to understand what are the Atoms
that are having their truth values revised over and over so many times...
Maybe there is some other problem causing an excessive amount of revision,
rather than the revision problem being the main issue in what you're doing...
(?)

If we want to consider a model in which there we assume N pieces of
evidence altogether relative to a given statement (a maximum count way less
than infinity), then we could use a heuristic formula like

n = x.count + r( max(x.count, y.count) ) * [ y.count - min(x.count, y.count) ]

where r is a function defined by, say,

r( N) = 0

r(1) = 1 or near 1

and r is monotone

So e.g. we could have

r(x) = ( 1-x/N )^+

I guess a different, nonlinear form for r(x) could be derived from some
probability theory assumptions....

The idea is that as

max(x.count, y.count)

gets closer and closer to N, the percentage of redundant evidence
among x and y is bound to increase...

-- Ben

On Tue, Mar 25, 2014 at 8:29 AM, Cosmo Harrigan notifications@github.comwrote:

This issue still persists: instead of

n = x.count+y.count

it has now been replaced by

n = x.count + y.count - REVISION_REDUNDANCY_FACTOR * min(x.count, y.count)

with

REVISION_REDUNDANCY_FACTOR = 0.5

where 0.5 is an arbitrary number.

The result is that count rapidly grows very large until it approaches
its maximum value of 4473923584.

In practice, that rapid increase in count can cause confidence to behave
"more like" a boolean rather than a range between 0 and 1, because any
non-zero value will rapidly approach 1, which would seem to eliminate most
of its usefulness.

How should this revision formula be fixed?

Reply to this email directly or view it on GitHubhttps://github.com//issues/587#issuecomment-38518597
.

Ben Goertzel, PhD
http://goertzel.org

"In an insane world, the sane man must appear to be insane". -- Capt. James
T. Kirk

"Emancipate yourself from mental slavery / None but ourselves can free our
minds" -- Robert Nesta Marley

linas · 2015-08-31T17:50:50Z

Closing; this appears to be based on the now-obsolete python-PLN codebase

cosmoharrigan added the pln label Mar 11, 2014

cosmoharrigan added the bug label Mar 11, 2014

edajade mentioned this issue Mar 18, 2014

Improve the revision formula (#587) and test the boolean transformation rules #610

Merged

edajade closed this as completed in fe77df0 Mar 19, 2014

edajade added a commit that referenced this issue Mar 19, 2014

Merge pull request #610 from jadeoneill/master

476b451

Improve the revision formula (#587) and test the boolean transformation rules

cosmoharrigan reopened this Mar 25, 2014

cosmoharrigan added this to the PLN/attention allocation demo milestone Mar 25, 2014

cosmoharrigan removed this from the PLN/attention allocation demo milestone May 28, 2014

amebel added this to the Fully Port to URE based PLN milestone Apr 18, 2015

linas closed this as completed Aug 31, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PLN revisionFormula issue with count #587

PLN revisionFormula issue with count #587

cosmoharrigan commented Mar 11, 2014

cosmoharrigan commented Mar 11, 2014

linas commented Mar 11, 2014

edajade commented Mar 12, 2014

bgoertzel commented Mar 12, 2014

cosmoharrigan commented Mar 12, 2014

edajade commented Mar 12, 2014

cosmoharrigan commented Mar 12, 2014

edajade commented Mar 12, 2014

linas commented Mar 12, 2014

bgoertzel commented Mar 13, 2014

cosmoharrigan commented Mar 13, 2014

linas commented Mar 13, 2014

bgoertzel commented Mar 13, 2014

cosmoharrigan commented Mar 13, 2014

bgoertzel commented Mar 13, 2014

cosmoharrigan commented Mar 13, 2014

bgoertzel commented Mar 13, 2014

cosmoharrigan commented Mar 13, 2014

linas commented Mar 13, 2014

cosmoharrigan commented Mar 25, 2014

bgoertzel commented Mar 25, 2014

bgoertzel commented Mar 25, 2014

linas commented Aug 31, 2015

PLN revisionFormula issue with count #587

PLN revisionFormula issue with count #587

Comments

cosmoharrigan commented Mar 11, 2014

cosmoharrigan commented Mar 11, 2014

linas commented Mar 11, 2014

edajade commented Mar 12, 2014

bgoertzel commented Mar 12, 2014

cosmoharrigan commented Mar 12, 2014

edajade commented Mar 12, 2014

cosmoharrigan commented Mar 12, 2014

edajade commented Mar 12, 2014

linas commented Mar 12, 2014

bgoertzel commented Mar 13, 2014

cosmoharrigan commented Mar 13, 2014

linas commented Mar 13, 2014

bgoertzel commented Mar 13, 2014

cosmoharrigan commented Mar 13, 2014

bgoertzel commented Mar 13, 2014

cosmoharrigan commented Mar 13, 2014

bgoertzel commented Mar 13, 2014

cosmoharrigan commented Mar 13, 2014

linas commented Mar 13, 2014

cosmoharrigan commented Mar 25, 2014

bgoertzel commented Mar 25, 2014

bgoertzel commented Mar 25, 2014

linas commented Aug 31, 2015