-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AtomSpace MOSES Port (part I) #3
Comments
Instead of |
This thing: |
According to the PLN book |
OK, I guess we can use Similarity for concepts. However, I want to abolish SetLink in favor of MemberLink ... |
So:
|
Regarding Then I saw further down the book the introduction of span, which I know is no is not exactly that, but is the closest I could find. Other suggestions are welcome. |
I don't think we want to abolish However what we do need is a construct to recursively step into a set, like |
Regarding span, just to be absolutely clear, the reason I'm not using To reuse the same example, using
You see the duplicated inputs, |
Actually |
There's no such thing as a Cartesian product in category theory. The Cartesian product is a completely unrelated concept from some other completely different area of mathematics, got nothing to do with anything. That said, the Cartesian product is an example of a https://en.wikipedia.org/wiki/Pullback_(category_theory) - its an example of a pullback, taken over a singleton (a single point). Pullbacks are generically a way of gluing two other objects together, over some common object to which they are both related. That is, if X and Y are two objects, having Z in common between them, then there exists a unique, universal object P which is the gluing-together of X and Y over Z. The Cartesian product is only defined over the category of sets (i.e. over set theory). The only thing that two generic sets have in common is that they are made out of points (singletons, sets containing only one element). Thus, when gluing together sets, one takes Z == a singleton == some arbitrary (any) set containing a single element. The pullback is built by mapping each and every element of X to the singleton. Also map each and every element of Y to the singleton. Think of each of these as being like peacocks, with a fan of arrows. Glue these two fan-shaped things together at the singleton (two peacocks having homo sex!?). The result looks like a fan of arrows all pointing into Z, and that result can be uniquely, universally represented by pairs of objects, one chosen from X and one from Y. But the set of all possible pairs is conventionally called the "Cartesian product". Not all categories have Cartesian products. Set theory is an example of a category that does have a Cartesian product. Not all categories have pullbacks. If they exist, they have to be proven to exist. Most of what we do in opencog is much closer to model theory than to category theory, and so I'm kind-of sorry I ever talked about category theory w.r.t. opencog. |
Part 2:
That is an example of a Cartesian product. For example: https://en.wikipedia.org/wiki/Product_(category_theory) where the product can be thought of as the "trivial" pullback. In the category of sets, if you have a function f:A->X and a function g:A->Y, then you can prove that there exists a unique, universal function <f,g>:A->X x Y that maps an element a to the pair (f(a), g(a)). For set theory, this is kind of trivial and goofy and "self-evident" (to some typical college freshman). The issue being tackled here is that there are categories for which products do not exist. When products do exist, then there are certain generic statements and proofs that can be made, that would apply to all categories that have products. This avoids the trouble of having to write the same proof, over and over, once for each different kind of category. (viz, once for set theory, once for group theory, once for topology, etc.) Its a short-cut. For moses, pretty much everything is naive set theory (well, that, and relational algebra). The notion of a table of rows exists only in set theory: each row is a cartesian product, a "tuple", and the table itself is a set. In set theory, a map from a "row" or "tuple" to some value is just a plain-old ordinary function. In set theory, morphisms are just plain-old functions.
Does it need a name? Call it RowLink or SequenceLink or OrderedSetLink or TupleLink or or VectorLink or ArrayLink or just .. ListLink. The biggest problem in opencog is that ListLinks are not computer-science lists; they aren't editable (the way a singly-linked list is). They should have been called VectorLinks or TupleLinks or ArrayLinks, to emphasize their immutability. |
During learning, while moses is running, one might want to alter the set of inputs. This becomes hard and klunky with set links. |
I don't understand why there is some need "duplicate the inputs". At any rate, Cartesian products and functions have nothing to do with one-another; they are unrelated concepts. The tables in moses are just tables. You do NOT have to specify which column is the dependent column in advance. Just build the table; then, later on, you can indicate that column p42 is the one that moses should train on. This way, you can just throw some generic tables into the atomspace, and decide, at a later point in time, which columns should be trained on. Maybe a table has two or three dependents, and you want to run two or three instances of moses, training on each. |
Re tables: I have a comment, kind-of off to the side, but -- in language learning, I have tables that are extremely sparse, with only one out of a million or one out of a billion entries that are non-zero. Representing these as ordinary tables would blow out RAM. (They already blow out RAM, even when they are sparse...) |
I'm working on an alternate way of representing tabular data right now, but decided to distract myself and read email today. Upshot, I think there's a simpler way of representing data than what's given above. |
@linas totally open to have alternate ways of representing tabular data, spare, dense, etc. However, keep in mind that this table will then have to be used to evaluate Atomese programs. Wouldn't it be really cool if the semantics corresponding to these tables directly guides how to evaluate Atomese programs? This is the reason why I didn't use a mere |
Well, that is exactly what I was looking for. So it's just a mere product (with a codomain that is defined as the Cartesian product of the codomains of the features, that is what confused me). So the link type name I'm after is really a |
??? ListLink is already a link that implements a Cartesian product. In fact, every link type that derived from OrderedLink is a Cartesian product. Why in the heck do we need yet another Cartesian product link? What's wrong with just OrderedLink? |
I meant
That feels true. |
@linas Can I have a look what that alternative way looks like & when are we going to get it? |
…reter Atomese program interpreter
This is the initial plan to get started with the AS-MOSES port. It
only treats the first steps, though does so in details.
Initiate the as-moses repo
AS-MOSES represents a rather major departure from the existing MOSES,
I believe it is best to move it to its own repository. This will
minimize confusions to the user and increase the awareness that MOSES
is transitioning to something different.
Here are the steps involved to seed the as-moses repository with the
old MOSES:
doc/as-moses
folder under this repodoc/as-moses
git@github.com:opencog/moses.git and force-push it to
git@github.com:opencog/as-moses.git (I'll temporary disable branch
protection on the master after step 1 and 2 have been carried).
Replace Combo by Atomese
This is a rather big undertaking and will be done
progressively. He start here with 2 tasks
Port Reduct to AtomeseFor now rely on combo reductMore tasks like storing the population and meta-population in
AtomSpaces will come next.
Port Fitness Evaluation to Atomese
This task can be decomposed into 3 subtasks
Update the Atomese interpreter to handle problem data.Implement dedicated atomese interpreter.Convert Combo to Atomese
To make Atomese program as compact as possible we will use higher
level operators, that is working with predicates or concepts as
opposed to individuals, let me give some examples.
Let's assume the following data to fit
with 2 input features
i1
,i2
, an output featureo
, and 3observations
r1
tor3
.In the boolean domain a possible Combo candidate would be:
The corresponding Atomese candidate will look like
you may notice that the variables have been replaced by
predicates. That is because
Or
is not operating on individuals, butrather on predicates, so here
Or
actually represent the union of thesatisfying sets of
i1
andi2
. Doing that allows us not togenerates more compact candidates.
Let's assume that the domain was actually real numbers, not Boolean.
Then a possible Combo candidate would be:
Likewise in Atomese this will be translated into
Notice that
i1
andi2
are schema, not predicates. That is becausesince the domain is Real, not Boolean, they can no longer be
predicates. Likewise
Plus
can be overloaded for schemata, similarlyto how one can add 2 mathematical functions where
h = f + g
meansForAll x, h(x) = f(x) + g(x)
Let us give another example
Its Atomese translation could be
but what is 3? In principle 3 should be a constant function that
returns the number 3 for each input. However I suppose it would be
acceptable to overload
Plus
so that it can mix schemata withconstants and assume constants are in fact constant functions. So that
would be understood as
We will see how it goes but I think it's doable.
Note that there is an existing Combo to Atomese converter here
https://github.com/opencog/moses/blob/master/moses/comboreduct/main/combo-fmt-converter.cc
but it is extremely limited and outputs strings, while what we want is
something that builds atoms directly.
The code can probably be added in a
converter
folder ofhttps://github.com/opencog/moses/tree/master/moses/comboreduct
it should not require an atomspace (it should be up to the user to add
it to the atomspace of his/her choice). So rather than using functions
such as
AtomSpace::add_link
it should usecreateLink
, etc.Represent Problem Data in AtomSpace
In order to use the Atomese interpreter to measure the fitness of a
program over some data, the data need to be loaded to some
AtomSpace.
Let us reconsider the example data
obtained from the CSV file
over the Boolean domain for now.
We need to tell how instances/observations relate to features. Here is
how it could be done. For instance
r1
could be represented asAssuming the domain of data is Real instead of Boolean, then
Execution
must be used instead ofEvaluation
. For instancer1
would be represented as
This can be made more compact by considering that a function, say
f
,is a set of pairs
(x, f(x))
. For instancei1
can be described withOr even more compact considering the Cartesian product over features,
using
ListLink
, to represent a Cartesian product between functionsover the same domain.
I recommend to use the last representation as it is more compact and I
suspect might actually be easier for the interpreter to process.
Update the Atomese Interpreter to Handle Problem Data
Assuming the AtomSpace is loaded with the table above, how to evaluate
?
Such schema
i1+i2
is expected to be represented asOr equivalently, seeing a function as a set of input/output pairs
However when invoking the
cog-execute!
onthe result could simply be
The Atomese interpreter would recognize that
Plus
is applied toschemata, gather their associated data, iteratively apply the lower
level operator
Plus
to the associated numbers, and reconstruct theresult as a mapping from inputs (observations
r1
tor3
) to outputs(
(Number 1)
, etc).Replace Combo by Atomese Interpreter in Fitness Evaluation
Due to Atomese Reduction not supported, we can only support Atomese
right after reduction. Basically, intances will be turned into reduced
combo trees, then turned into Atomese program, then fitness
evaluated. Let's recall how the fitness function is being called, in
the hill-climbing algorithm, the call occurs here
https://github.com/opencog/moses/blob/master/moses/moses/optimization/hill-climbing.cc#L234
which is an iscorer (for instance scorer), so we want to implement a
new class inheriting
iscorer_base
https://github.com/opencog/moses/blob/master/moses/moses/representation/instance_scorer.h#L35
that turns the instance into an Atomese program and evaluate its
fitness. Then upgrade fitness functions to support Atomese programs.
To break it down, the subtasks are
complexity_based_scorer
tocombo_based_scorer
atomese_based_scorer
similar tocombo_based_scorer
, that turns the instance into a reducedcombo_tree, convert it to atomese, then call the fitness on this
atomese program.
implementation to allow it to be optional for now). Since
instance_scorer.h
will start growing, it would be best to createa
instance_scorer.cc
and move the implementations there.recommand to start with
logical_bscore
which is probably thesimplest fitness function type.
Test MOSES using
atomese_based_scorer
insteadcombo_based_scorer
for the implemented fitness function types.Will be done in the next iteration.
Port Reduct to Atomese
The approach suggested at the moment is to explicitely represent the
result of a reduction as a relationship between Atomese program and
normal form. I think it makes sense to introduce a link type just for
that called
ReductLink
For instancewould represent that
f1 and f1
reduces tof1
.To acheive that,
ReductLink
as well as the operators involved in theAtomese programs must be axiomatized. Then the URE alone should be
able to perform the reduction.
This work is already under way, so far by Yidne.
If its completion takes too long, once we have a Combo to Atomese
converter and vise versa, we could probably just wrap the existing
Combo reduct engine into a URE rule, and use this as a temporary
replacement for Atomese reduction just so that we can keep going with
the remaining of the port.
The text was updated successfully, but these errors were encountered: