Loading data to AtomSpace #12

hilenamin · 2018-06-20T12:28:52Z

Question: which approach would be preferred?:

load csv directly into atomese
load the data into a table, which can then be converted to atomese.

ngeiswei · 2018-06-20T13:29:21Z

Although it is somewhat of a test issue let me answer it.

Ideally you want to load csv directly into atomese, to not pay the overhead of creating a structure like Table. However, I feel you would get more done with less code if you simply reuse https://github.com/opencog/as-moses/blob/master/moses/comboreduct/table/table_io.h#L151 then turn a table into atomese.

The completely ideal solution would be reuse as much as you can from loadTable (and its subroutines), possibly refactoring code when necessary, without recreating a Table, but since that table is an intermediary structure anyway it's not much overhead so for now just using plain loadTable should be fine. The ideal solution can be sought after only if it become a performance critical issue.

linas · 2018-06-20T19:20:19Z

Nil, How are you representing tables? Again, I want to draw your attention to the module (opencog matrix) which deals with sparse tables. As long as you represent your data as one of these:

(SomeLink (SomeRowAtom) (SomeColAtom))

and the types of all row atoms are the same type, etc. then the matrix API does "neat thing". (It does NOT read from a file, though)
Alternately, it also supports

(FooEvalLink
   (FooPredicateNode "stuff")
   (ListLink (SomeRowAtom) (SomeColAtom)))

as well,. I'm currently experimenting with much more awkward row-column representations.

What the matrix (aka vector) API does is allows you to have some complicated blob of data in the atomspace, and it allows you do declare that some subset of it looks "just like a matrix" or "just like a table" , and then it implements a bunch of generic table/matrix methods on it (currently, conditional probabilities, mutual information, cosine and jaccquard distances, etc.) -- It doesn't matter what the actual atoms really are, because all the algos just use the definition of the matrix to find the right atoms.

It would be nice if in some hazy future, MOSES would work the same way. Its probably too early for this, right now, but its an idea. (it would be nice to port the matrix API to C++, for speed, and to port it to "R", so that Mike and the biology guys could examine matrix-like slices of the atomspace in R. But that's a different project).

ngeiswei · 2018-06-21T04:22:55Z

I read the README but it didn't seem clear to me how to incorporate that data in the Atomese evaluation, I mean for instance evaluating

(Plus (Schema "f1") (Schema "f2"))

where (Schema "f1") and (Schema "f2") would be two columns.

What I've been thinking though is to have a column stored as a list of values, such as FloatValue. This would allow to easily associate columns to programs, and use this information as memoization mechanism. So for instance one could store the result of (Plus (Schema "f1") (Schema "f2")) in a column, then when it's time to evaluate

(Times
  (Plus (Schema "f1") (Schema "f2"))
  (Schema "f3"))

it could reuse this column to avoid re-evaluating f1 + f2.

Having said that, I'm not terribly concerned about efficiency at this point, I just want to attempt to move towards a direction that would foster "holistic" cognitive integration, like reasoning on programs, fitness functions and data.

Yidnekachew · 2018-06-25T12:28:45Z

@ngeiswei Assuming that we're going to implement this using option 2 (i.e by converting Table to its atomese counterpart),

As described at https://github.com/opencog/as-moses/blob/master/moses/comboreduct/table/table.h#L911 Table consists of an output table (OTable) of one column and an input table (ITable) consisting fo independent variables.

Typed data table.
The table consists of an ITable of inputs (independent variables),
an OTable holding the output (the dependent variable), and a type
tree identifiying the types of the inputs and outputs.

Where as our current representation at #3 doesn't separately hold the output and input data. Do we need to change it to handle that?

ngeiswei · 2018-06-25T12:47:26Z

@Yidnekachew I'm not sure what is best at this point. I suppose you may separate output and input data, like Table for now.

The other problem I'm seeing is that Boolean tables don't have any compact representation offered in #3 . Either we come up with one or you use the unfolded representation such as

(Evaluation (stv 0 1)
  (Predicate "i1")
  (Node "r1"))
(Evaluation (stv 1 1)
  (Predicate "i2")
  (Node "r1"))
(Evaluation (stv 1 1)
  (Predicate "o")
  (Node "r1"))

which has the "advantage" of forcing us to experiment with both representations and weight their pros and cons, maybe.

Yidnekachew · 2018-06-25T13:52:15Z

@ngeiswei If we're doing it both ways, a dataset like this

i1,i2,o
0,1,1
1,0,1
0,0,0

is going to be represented as:

For the Boolean type,

Using input and output table

(List 
 (Evaluation (stv 1 1) (Predicate "o") (Node "r1"))
 (Evaluation (stv 1 1) (Predicate "o") (Node "r2"))
 (Evaluation (stv 0 1) (Predicate "o") (Node "r3")))

(List 
 (Evaluation (stv 0 1) (Predicate "i1") (Node "r1"))
 (Evaluation (stv 1 1) (Predicate "i2") (Node "r1"))
 (Evaluation (stv 1 1) (Predicate "i1") (Node "r2"))
 (Evaluation (stv 0 1) (Predicate "i2") (Node "r2"))
 ..
)

Using the unfolded table

(Evaluation (stv 0 1) (Predicate "i1") (Node "r1"))
(Evaluation (stv 1 1) (Predicate "i2") (Node "r1"))
(Evaluation (stv 1 1) (Predicate "o") (Node "r1"))
..

For the Real type,

Using input and output table

(Similarity (stv 1 1)
  (List (Schema "i1") (Schema "i2"))
  (Set
    (List (Node "r1") (List (Number 0) (Number 1)))
    (List (Node "r2") (List (Number 1) (Number 0)))
    (List (Node "r3") (List (Number 0) (Number 0)))))

(Similarity (stv 1 1)
  (List (Schema "o"))
  (Set
    (List (Node "r1") (Number 1))
    (List (Node "r2") (Number 1))
    (List (Node "r3") (Number 0))))

Using the compact format

(Similarity (stv 1 1)
  (List (Schema "i1") (Schema "i2") (Schema "o"))
  (Set
    (List (Node "r1") (List (Number 0) (Number 1) (Number 1)))
    (List (Node "r2") (List (Number 1) (Number 0) (Number 1)))
    (List (Node "r3") (List (Number 0) (Number 0) (Number 0)))))

Am I right?

I will also need to have a look if Table holds the column labels.

ngeiswei · 2018-06-26T08:01:49Z

That's correct @Yidnekachew . Table does hold the column labels, as well as the ITable and OTable, themselves holding their labels.

ngeiswei · 2018-06-26T09:07:42Z

BTW, it's better if the first feature is the output (as its default MOSES' assumption, I've corrected #3 accordingly).

ngeiswei · 2018-06-26T09:24:24Z

Other representations to consider would be

(List
  (List (Schema "o") (Schema "i1") (Schema "i2"))
  (List (Number 1) (Number 0) (Number 1))
  (List (Number 1) (Number 1) (Number 0))
  (List (Number 0) (Number 0) (Number 0)))

this one is probably the most compact and doesn't need to introduce row nodes. Its drawback is that it has no self-contained semantics.

Also, another option, to avoid having 2 distinct representation for Boolean and numerical data, is to use TrueLink http://wiki.opencog.org/w/TrueLink and FalseLink http://wiki.opencog.org/w/FalseLink.

I'm perhaps thinking of another representation that may have the advantage of that one above (i.e. doesn't introduces row nodes) yet is semantically self-contained. I'll come back later on that.

Meanwhile, here's my suggestion: since we're are more less stepping into the unknown (well as far as I am concerned I don't have a clear cut idea of what is gonna be best) I suggest you implement all
options.

ngeiswei · 2018-06-26T09:30:23Z

Obviously, an option to have the table compact type representation sorta semantically self-contained is to wrap it in a "AS-MOSES:table" predicate or something, like

(Evaluation
  (Predicate "AS-MOSES:table")
  (List
    (List (Schema "o") (Schema "i1") (Schema "i2"))
    (List (Number 1) (Number 0) (Number 1))
    (List (Number 1) (Number 1) (Number 0))
    (List (Number 0) (Number 0) (Number 0))))

that requires subsequent transformations to reason about it and the axiomatization of (Predicate "AS-MOSES:table"), etc, but it's worth considering as well.

Port fitnesseval logical_bscore

hilenamin created this issue from a note in as-moses Port (High priority) Jun 20, 2018

ngeiswei changed the title ~~Represent data to AtomSpace~~ Loading data to AtomSpace Jun 20, 2018

This was referenced Jun 28, 2018

Data (like csv file content) representations in Atomese #14

Open

Efficient Table Representation #16

Closed

Yidnekachew mentioned this issue Jul 9, 2018

Load data to atomspace #20

Merged

behailu04 pushed a commit to behailu04/asmoses that referenced this issue Oct 1, 2018

Merge pull request opencog#12 from opencog/port-fitnesseval

986ba56

Port fitnesseval logical_bscore

kasimebrahim mentioned this issue Oct 25, 2018

Compressed Table Representation singnet/asmoses#19

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading data to AtomSpace #12

Loading data to AtomSpace #12

hilenamin commented Jun 20, 2018 •

edited by Yidnekachew

Loading

ngeiswei commented Jun 20, 2018

linas commented Jun 20, 2018

ngeiswei commented Jun 21, 2018

Yidnekachew commented Jun 25, 2018 •

edited

Loading

ngeiswei commented Jun 25, 2018 •

edited

Loading

Yidnekachew commented Jun 25, 2018 •

edited

Loading

ngeiswei commented Jun 26, 2018 •

edited

Loading

ngeiswei commented Jun 26, 2018 •

edited

Loading

ngeiswei commented Jun 26, 2018

ngeiswei commented Jun 26, 2018 •

edited

Loading

Loading data to AtomSpace #12

Loading data to AtomSpace #12

Comments

hilenamin commented Jun 20, 2018 • edited by Yidnekachew Loading

ngeiswei commented Jun 20, 2018

linas commented Jun 20, 2018

ngeiswei commented Jun 21, 2018

Yidnekachew commented Jun 25, 2018 • edited Loading

ngeiswei commented Jun 25, 2018 • edited Loading

Yidnekachew commented Jun 25, 2018 • edited Loading

ngeiswei commented Jun 26, 2018 • edited Loading

ngeiswei commented Jun 26, 2018 • edited Loading

ngeiswei commented Jun 26, 2018

ngeiswei commented Jun 26, 2018 • edited Loading

hilenamin commented Jun 20, 2018 •

edited by Yidnekachew

Loading

Yidnekachew commented Jun 25, 2018 •

edited

Loading

ngeiswei commented Jun 25, 2018 •

edited

Loading

Yidnekachew commented Jun 25, 2018 •

edited

Loading

ngeiswei commented Jun 26, 2018 •

edited

Loading

ngeiswei commented Jun 26, 2018 •

edited

Loading

ngeiswei commented Jun 26, 2018 •

edited

Loading