New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recursively retrieve links of a node. #2472
Comments
No, not really. This is a very simple search, and you don't need the pattern matcher to accomplish this. You can perform this search in maybe half-a-dozen lines of scheme code. (or less) The pattern matcher is designed for complex graph searches. What you describe above is just a very simple tree, and tree-walking is a very basic example of recursion. Take a look at section 2.2.2 "Hierarchical structures" of SICP https://web.mit.edu/alexmv/6.037/sicp.pdf (page 147 of that PDF) -- if you have not read SICP, you should. It's a lot like taking powerful hallucinogenic drugs - everything you believe about (programming) reality will be turned upside-down and inside-out. |
I have already written the scheme code and was wondering if the PM can do it and if it that version has a better performance.
OK. But if it is so simple why not just include it. Also this can get more complicated if we want to filter the parents based on some property, (in the above code it is filtering them by their
Thanks, I have been putting off reading this book for so long. I guess I should read it now. |
Hmm. I have two issues with that code: its completely undocumented: what are the input values? What does it do, what kind of output does it generate? Then quickly scanning it, it seems to be inefficient, using constructs like
Again, no. The PM is not meant for recursively-defined structures or tree-walking. It is perhaps misnamed, it is actually a "subgraph isomorphism solver"; see https://en.wikipedia.org/wiki/Subgraph_isomorphism_problem
Why do you believe that it would have better performance?
Because that causes code-bloat, cruft, spaghetti-code and debuggability nightmares. Writing a tree-walk algorithm should not take more than 5 minutes (OK, so the first time you do it in your life, it will take an hour or two or more; but once you understand the general idea, its easy.) When faced with "easy" algorithms, one has a trade-off: is it simpler to just write the code yourself (in 5 minutes) or spend 5 minutes to search the documentation to see if such a function exists, then five minutes to read the documentation to decide if it does what you need it to do, and then spend 5 more minutes testing it to make sure it actually does what you think it does, (and then discovering it does something slightly different) and then finally 5 more minutes to write a shim that adapts it to your code? Yuck. This process results in obscure, hard-to-read, hard-to understand stove-pipe code. There are, of course, exceptions: for example, srfi-1 is a large collection of utilities for performing basic manipulations on lists. https://srfi.schemers.org/srfi-1/srfi-1.html Even though everything in there is "simple", using it is easier than writing your own, for three reasons: (1) it is well-designed, and clear, and documented. (2) everyone else uses it, which makes it easy for other people to understand your code, and for you to understand their code. (3) after using srfi-1 for a few days or weeks, it becomes easy to memorize and remember what it does - it is easier to use something from srfi-1, than it is to re-create it. (4) its fully debugged. The key here is reason (1) well-designed. Once you get to know srfi-1 well, you will also spot a few areas where it is poorly designed; but these are "outlying regions", stuff you would not/should not normally do, except maybe in some emergency situation. Some 95% of it is in excellent shape, ad the remaining 5% is "good enough". There are no trees in srfi-1. There are 179 srfi's as of today, and as far as I know, none of them deal with trees. I don't know why not. Perhaps no one has invented a good API for trees? i.e. nothing better than what you could create "in 5 minutes"? |
As an exercise, try to write the simplest tree-walking code possible for the example you give above, and post it here, and we can try to reconstruct it or fix it or simplify it further. |
I believe it can be done using |
Exactly :-). I'm really thinking of adding this |
OK, but I urge extreme caution. Lets not replace something that can run in some dozens of microseconds with something that requires dozens of milliseconds. I'm not joking when I say "this tree can be walked in half-a-dozen lines of code". In particular, "half-a-dozen lines" means that you can sight-read where the CPU cycles are going to be spent. In this case, I see:
The forth bullet above is important: if you represented the vertexes and edges in pure scheme, just using Using the URE to perform edge traversal layers maybe another 10x performance hit (or more?) on top of everything else ... so now we are talking about code that runs at least 100x if not 1000x slower than what any SICP homework exercise requires. Is it worth it? (In my own defense, I guess one could also say that a tree-traversal in tinker-pop or one of the misc java-based graph DB's is also going to be 1000x slower than what raw scheme can do. But still -- lets be careful not to splurge.) |
p.s. |
Regarding chaining, in general: I spent the last month staring at the MOZI gene annotation code. It's .. interesting. It's consists of long hand-built chains, by which I mean:
and some of these chains are 4-5-6 nestings deep. This is .. well, it was an unexpected design style. Several remarks and a proposal:
By fiddling with the above, I was able to get that code to run from 10x to 100x faster. The |
Thanks a lot for the help w/ Mozi Atomese code optimization, Linas ;)
Nil, I agree w/ your sense that using the URE to provide a general,
easy-to-use tree-walking functionality for Atomspace is going to be
valuable. Of course the right specialized piece of code will almost
always be at least a little more efficient, but if we want to get wide
adoption for OpenCog we need (among many other things) to make it less work
for developers.... If optimization of a particular tree-walk is badly
needed then a developer can always write their own code in that case...
…On Thu, Jan 16, 2020 at 3:22 AM Linas Vepštas ***@***.***> wrote:
Regarding chaining, in general: I spent the last month staring at the MOZI
gene annotation code. It's .. interesting. It's consists of long hand-built
chains, by which I mean:
(define (some-func args)
BindLink
...query-body using the args...
ExecutionOutput
GroundedSchema "scm: some-other-func"
)
and some of these chains are 4-5-6 nestings deep. This is .. well, it was
an unexpected design style. Several remarks and a proposal:
- Some of these are simple enough that a srfi-1 variant would be
faster/easier/more efficient, e.g. by a factor of 10x in a few cases where
I tried it.
- None of them made use of memoization, and instead ran the same
search over and over. That is, called the same function, with the same
arguments - each call is a pattern matcher search, which is slow. By
memoizing the previous search results, and just returning those, another
factor of 10x per is possible. Although there is a handful of memoiztion
utilities (e.g. make-afunc-cache) they are not really "core" parts of
the atomspace. We do not have any API that can say "hey, the Atomspace
hasn't changed (much) since the last time you ran this search, so its safe
for you to re-use the results from your last search".
By fiddling with the above, I was able to get that code to run from 10x to
100x faster.
The BindLink... GroundedSchema layering .. that does indeed suggest that
maybe URE and/or ChainerLink or something might be a better way of
structuring the data flow. Maybe. I don't much like or care for the BindLink...
GroundedSchema idiom -- it seems fragile, convoluted. It's a hard-coded
pipeline.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2472?email_source=notifications&email_token=ABNCKXE7EBFLLBUSYNJLOYTQ55O4DA5CNFSM4KGVNBX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJBPUXY#issuecomment-574814815>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNCKXDCFADLS2DVZTQQ2DTQ55O4DANCNFSM4KGVNBXQ>
.
--
Ben Goertzel, PhD
http://goertzel.org
“The only people for me are the mad ones, the ones who are mad to live, mad
to talk, mad to be saved, desirous of everything at the same time, the ones
who never yawn or say a commonplace thing, but burn, burn, burn like
fabulous yellow roman candles exploding like spiders across the stars.” --
Jack Kerouac
|
OK, so let's return to @Habush original question, and cross out the words "pattern matcher", like so: Before trying to answer the question, let me expand on it a bit more (repeating myself, slightly)
Well, maybe nothing is wrong with that, but I don't really like it. The reason I don't like it is because it starts to turn Atomese into "just another programming language", and the world doesn't need yet another one of those. Worse, programming in atomese is like hand-writing AST's (see the first figure in the wikipedia article) instead of programming in a human-readable language (the short program underneath that first figure). The short program is human-friendly, the AST is not. What AST's are, is that they are term-rewriting friendly. Suitable for knowledge representation (KR). Suitable for graph algorithms. I'd like to have atomese stay true to it's roots as a KR language, or a https://en.wikipedia.org/wiki/Abstract_semantic_graph language. I don't want atomese to morph into a badly-designed functional programming language. So, again back to @Habush original question: "How can one express a recursive pattern match, in a declarative way, such that the execution of that recursive pattern match results in a re-write of the contents of the AtomSpace?" I do not yet have a good answer for that. Where can we look for inspiration? Well, all macro languages are a form of term-rewriting systems. This includes the C/C++-preprocessor macro language, the m4 macro language, and the scheme hygenic macro language. They all share a common trait: they run once, and only once, on a given input, producing an output, and then they stop. Also, you have little or no control over how the macros are applied: the macros are applied in whatever order they can be, until application terminates. I'm talking about macros because you can think of a single macro as a single "rewrite rule". Well, we already have those: we call them "BindLinks". and unlike macros, we can run BindLinks over and over, on demand, and we can even have the URE sequence them for us. So, we already have macro-style abilities, and more fine-grained, more controllable. Are there recursive macros? Sort-of-ish. In c/c++, see for example: https://stackoverflow.com/questions/12447557/can-we-have-recursive-macros How about m4? ftp://ftp.gnu.org/old-gnu/Manuals/m4/html_chapter/m4_5.html Ooooof. How about recursive macros in scheme? First: a quick overview of macros in scheme: https://beautifulracket.com/explainer/macros.html and then the first pitfall: https://stackoverflow.com/questions/49632262/recursive-macro-in-scheme-causes-unexpected-loop and https://www.reddit.com/r/scheme/comments/9zl4fe/recursive_macros/ ooof. Perhaps @amirouche can say something enlightening? The reddit discussion is interesting because it mentions PEG https://en.wikipedia.org/wiki/Parsing_expression_grammar Which now suggests another way to ask @Habush original question: is there a way to specify a PEG, but for graphs instead of for strings? I think this is a sensible question, because when you study term re-writing, it doesn't take very long to realize that string re-writing, graph re-writing and term re-writing are all very similar in many ways, and so something like PEG, intended for strings, should have some kind of graph analog. Part two of the question is, of course: "would a PEG for graphs actually solve Habush's original problem?" Anyway, those are the lines I'm currently thinking along, but I don't have any particular answer. Part of the problem is that Habush's original question is kind of ill-defined: atomspace operations never "return" anything; they just mutate the contents of the atomspace. So the original question failed to pose a constraint: after mutating the atomspace, what do you want the contents of the final atomspace to actually look like? |
We had some similar queries in Ghost for the VA project. Combining basic queries like cog-incoming-set and cog-outgoing-set together with BindLinks(creating them on the fly) in a recursive manner can be used as one of the solutions. |
Atomese can be used to represent programs in a form of abstract semantic graph instead of AST and what is wrong with that? Of cause the goal is not inventing another programming language but represent program in a knowledge base. |
Okay let me illustrate my point by showing the scheme code I wrote for this specific example:
Using the above code if you run
Yes, basically how can I express the above loop declaratively and get the same result. |
Also imagine we have the following extra info in the same atomspace:
Now, in addition to getting the parents recursively, I also want to constrain the parents to be citizens of a country X. Modifying the above code to that end we have the following:
The second part of my question (which I didn't clearly state on my original comment) is how can we also specify a constraint on the recursive search declaratively (like the above)? |
Nothing wrong with that. I'm ruminating on high performance and efficiency. The current AtomSpace is designed to traverse graphs with high performance. That's the one thing its good at. But this comes at a price: bulky Atoms, awkward API. Compare this to main-stream AST systems, and it looks perfectly absurd, crazy, untenable. Where, by "main-stream AST", I'm thinking GNU Lightning, or say, anything listed here: https://en.wikipedia.org/wiki/Intermediate_representation - I'm most familiar with GIMPLE and the LLVMIR. Part of what mkes an IR an IR, is that they have an extensive and very fancy, very sophisticated term-rewriting infrastructure (i.e. "BindLinks"), but they are optimized for traversing the AST trees downwards, only. The properties on the trees were semi-hard-coded (instruction names, register names, and so not general purpose, like atomese is) but they do have some general-purpose slots, like the Atom Values, for storing extended information (e.g. condition register info, interlocks, reservations, etc.). The gimple/llvm AST trees are optimized for extremely fast creation and disposal, and fast mutation under single threads (the re-write rules always run in a single thread, they always discarded the old version, after re-writing, or rather, they mutate it, in place.). By contrast, Atomese trees are immutable; the old version is always kept, a new one is created, and a hefty performance price is paid to stuff the new one into the atomspace index. In short, the performance of main-stream AST, and the performance of Atomese is very different, almost incomparable. No normal software engineer, in the process of inventing a brand new bytecode and IR, would choose Atomese for for the AST system. Atomese is too ... weird, for that. Now, to contradict myself: if one was a compiler scientist, and one wanted to look at the IR generated by compiling 1000 different open-source projects, and one wanted to look at the statistical distribution of the AST's generated by the compiler, then, yes, actually, the AtomSpace might be the ideal, the perfect system for doing that. So sure. |
@Habush you write:
Well, that is badly indented. Lets look at what you wrote, correctly indented:
which has almost no resemblance to what you originally wrote:
If I draw your input data as an ASCII-art tree, it would be
The most immediate, direct way of writing that ASCII-art as an S-expression would be
Compared to your output ... well, a comparison is difficult. Your code generated, for the first part
so F and C are at the same level, and B is above that. But A is above B, where is A? And what is that extra nil doing? It looks like the '() is inheriting from both F and C? But there's no Do you actually need this complicated structure? Can you explain the nil, and what it means? There is also a meta-issue: The AtomSpace cannot store S-expressions directly. That is, you can't just store
Is this really the output that you want? And why is this output somehow easier to work with than, (for example)
which is the direct S-expression version of your original |
Here is a six-line-long scheme program to do a simple recursive DAG tree traversal, where the edges are
Let's try it out:
So:
at level zero, A is unknown.
At level 1, A has just one unknown child.
At level 2, A has a single branch B, and B has two unknown children.
At level three, only C has an unknown child.
At level 4, the entire tree has been explored, there ae no unknown branches.
Just like level 4. Any DEPTH greater than 4 will just return the same tree as 4. This time, lets generate valid Atomese:
and verify:
Finally, let's check citizenship:
|
The point of the previous post is that you can do graph traversal just fine, using plain-old srfi-1 and some if-then-else work. You could also write equivalent code in python or C++, and it would be a little longer, but not much longer. And it would certainly run 10x faster than using the pattern matcher. So, again, the pattern matcher is a device that takes the AtomSpace as the input, and mutates it into a new, different AtomSpace. What do you want the new, different AtomSpace to look like? Note, by the way, that the following resemblance is NOT accidental:
and
They are both searching the atomspace for the same pattern. The first one performs the search in scheme, the second performs the search in the pattern matcher. I'm not sure which is faster -- it's probably a tie, to within a factor of 3x. The main difference between these two is that the former, the scheme code, must be written by a human being, a programmer, trained in the art of software. The later, the |
There is an even easier way to do the above. First, a generic routine to walk generic DAG's:
Next, three different examples of tail-getters:
Try it:
How about the UK?
For the final example, I need a handy-dandy utility:
and so the final example:
The point of this last example is that you can now replace the |
And why stop there? We don't have to transform one kind of DAG into a ListLink DAG ... we can transform into some other format.
Lets see if it works:
OK, so same as before. But instead of
Try it:
Or we can do it atomese-style:
Notice how |
So, hang on a second, if
We also need the tail-getter:
These can be composed "as usual"...
Lets try recursion. This is where things break down; there's a bug in the atomspace code. I'm certain that variants of this, using
Here's the recursive form:
But it also doesn't work -- it goes into an infinite loop. Not sure why. There's a bug somewhere. DO NOT RUN the below, it will loop and you will have to kill -9 guile.
so there are two-bugs: the no-op scheme didn't execute at all, and the recursive-rewrite executes too much. Bummer. I left out the decrement-the-depth code, just to keep it simple. You can add it back in, using |
from issue opencog/atomspace#2472
@Habush the following example https://github.com/opencog/pln/tree/master/examples/pln/ancestors shows how to use the PLN module to solve your problem with a few lines of code. |
Imagine we have the following relationship in a sample atomspace:
Assume the above
Inheritance
link representschild-parent
relationship . Is there a way ,using the pattern matcher, to recursively find the parents (B), grandparents (C, F) and great-grandparent (D) ...etc of(Concept A)
? Also can we limit the depth of the recursion? That is, if we specify 2, it will return the immediate parents and the grandparents of the node?The text was updated successfully, but these errors were encountered: