Much slower than SWI checking [wn_valid] from wordnet-prolog #1681

dcnorris · 2023-01-07T22:15:01Z

I have forked ekaf/wordnet-prolog and made a few changes needed to get a validity check to run in Scryer.

Whereas swipl does [wn_valid]. in about 13 seconds, I find that scryer-prolog ("v0.9.1-51-gfd191285") takes 1 hour. All but 24 seconds of this is spent in goal check_keys/0:

check_keys:-
  writeln('Searching for ambiguous sense keys'),
  findall(X, multikey(X), L),
  list_to_set(L,S),
  member(K,S),
  format("Ambiguous key: ~w\n",[K]),
  findall(I, sk(I,_,K), IL),
  list_to_set(IL,IS),
  member(I,IS),
  g(I,G),
  format("    ->~w (~w)\n",[I,G]),
  false.
check_keys:-
  ok,
  nl.

The text was updated successfully, but these errors were encountered:

UWN · 2023-01-08T07:39:40Z

Would you mind reducing the size of your finding? For example, you could introduce false after some goal and compare that failure slice with both systems. It could be the first findall/3 alone that makes problems. Also, are you sure that you used the same apply/2 definition? If you can, directly rewrite any goal for apply/2 with a fixed list to the corresponding call/N. These apply/2 goals are really so 1970s

UWN · 2023-01-08T07:45:34Z

... or just compare ?- multikey(X), false.

dcnorris · 2023-01-08T11:05:29Z

Wherever its 2nd arg was a fixed-length list, I had replaced apply/2 with call/N. (This is advised in SWI's documentation of apply/2.) But I think all the WordNet predicates are of arity known at 'compile time', so it should be straightforward to eliminate the remaining invocations of apply/2 by putting these facts objectively into the program.

dcnorris · 2023-01-08T11:14:17Z

Indeed, time((multikey(X), false)) is already taking quite awhile. Will confirm its final time an hour from now.

dcnorris · 2023-01-08T12:12:12Z

It's all in multikey/1, then:

?- time((multikey(X), false)).
   % CPU time: 3716.558s
   false.

SWI by comparison:

?- time((multikey(X), false)). % Using swipl
% 414,620 inferences, 0.078 CPU in 0.078 seconds (100% CPU, 5328386 Lips)
false.

Here is multikey/1 for quick reference:

multikey(K):-
  sk(I,_,K),
  sk(J,_,K),
  I\=J.

I am sold on failure slicing!

triska · 2023-01-08T12:27:15Z

This seems like an issue with indexing. Specifically, we have:

multikey(K):-
  sk(I,_,K),
  sk(J,_,K),
  I\=J.

where sk/3 consists of hundreds of thousands of plain facts.

In the highlighted goal above, i.e., sk(J, _, K), J is a variable and K is instantiated. So, a Prolog system with good indexing can use K to quickly detect which facts apply in this case.

Scryer Prolog currently only indexes on the first instantiated argument that occurs in any clause of a predicate, so in this case, only the first argument of sk/3 is used for indexing. This leads to quadratic overhead multikey/1, because for each fact, all facts have to be checked.

Such an issue can be worked around "manually", by adding (or generating) auxiliary facts, say sk_3/3, where the arguments are reordered to benefit from the engine's argument indexing if they are instantiated at the time of the call.

Ideally of course, the system would automatically generate suitable indices on demand, depending on which arguments are instantiated at the time of the call, so that the fitting clauses can later be found much more quickly.

Related to #192.

dcnorris · 2023-01-08T12:49:08Z

This is most helpful, thank you @triska! I just now naively tried

sk_3(K,I,J) :- sk(I,J,K).

multikey(K):-
  sk_3(K,I,_),
  sk_3(K,J,_),
  I\=J.

but of course this does not speed anything up because it is the underlying facts that get indexed. Is there any more declarative way to 'request' indexing than writing permuted facts one-by-one?

dcnorris · 2023-01-08T12:54:05Z

For the time being, maybe the best I can contribute back to the original repo via PR would be increased ISO standard compliance? I'll see what I can do to get this code running with gprolog.

triska · 2023-01-08T14:42:36Z

@dcnorris: Improving ISO compliance for better portability would definitely help already, if only to make such comparisons between different systems easier.

Regarding the auxiliary facts specifically useful for Scryer Prolog and other systems with limited built-in indexing, you may be able to use Scryer Prolog's term_expansion/2 mechanism to automatically generate, for each defined fact, also its corresponding alternative definition with a different argument order.

For example, try:

:- discontiguous(sk_1/3).
:- discontiguous(sk_3/3).

sk(X, Y, Z) :- sk_1(X, Y, Z).

term_expansion(sk(X,Y,Z),
               [sk_1(X,Y,Z),
                sk_3(Z,X,Y)]).

sk(100001740,1,'entity%1:03:00::').
sk(100001930,1,'physical_entity%1:03:00::').
sk(100002137,1,'abstraction%1:03:00::').
sk(100002137,2,'abstract_entity%1:03:00::').
sk(100002452,1,'thing%1:03:00::').
sk(100002684,1,'object%1:03:00::').
sk(100002684,2,'physical_object%1:03:00::').
...

In this way, you automatically get, for each sk/3 fact that occurs in the file, the corresponding sk_1/3 and sk_3/3 facts.

triska · 2023-01-08T14:50:04Z

And building on that, you can extend sk/3 to dynamically switch between the different variants, for example (in the case above):

sk(X, Y, Z) :-
        (   nonvar(X) ->
            sk_1(X, Y, Z)
        ;   nonvar(Z) ->
            sk_3(Z, X, Y)
        ;   sk_1(X, Y, Z)
        ).

In this way, all your code can use sk/3, and when Scryer Prolog's built-in indexing improves, you simply remove these manual workarounds.

dcnorris · 2023-01-08T14:55:16Z

I find gprolog is taking a long time already with [wn_valid], but that like Scryer it has a term_expansion/2 mechanism. So perhaps this solution will apply uniformly across both of these engines.

pmoura · 2023-01-08T15:16:04Z

I find gprolog is taking a long time already with [wn_valid], but that like Scryer it has a term_expansion/2 mechanism. So perhaps this solution will apply uniformly across both of these engines.

From the GNU Prolog documentation:

The GNU Prolog compiler (section 4.4) automatically calls expand_term/2 on each Term1 read in. However, in the current release, only DCG transformation are done by the compiler (i.e. term_expansion/2 cannot be used). To use term_expansion/2, it is necessary to call expand_term/2 explicitly.

dcnorris · 2023-01-08T17:11:07Z

Too bad about these gprolog limitations. In fact, using gprolog required increasing 3 resource limits (MAX_ATOM, GLOBALSZ and TRAILSZ) and its computation took 1.4 hours. But still I trust it's a good guide to ISO compliance?

UWN · 2023-01-08T18:28:30Z

... gprolog ... But still I trust it's a good guide to ISO compliance?

Syntax is excellent. Currently the best. Weak points are

expression evaluation (neither unbounded integers nor consistent overflow checking as required)
some non-conforming strict mode (5.1 e). As an example, there is no way to turn off the clpfd extension. In Scryer just don't import clpz.
clpfd cannot be considered a valid extension in the sense of 5.5, in particular because of missing overflow checking (or missing instantiation errors, if you take it the other way round). SICStus shows how this is done with representation errors. In Scryer, this is not an issue due to unbounded integers for (finite) domain bounds.

pmoura · 2023-01-08T18:37:29Z

Too bad about these gprolog limitations.

You can use Logtalk's portable term-expansion mechanism to easily expand a Prolog file into another Prolog file. See:

https://logtalk3.readthedocs.io/en/latest/libraries/hook_objects.html#outputting-term-expansion-results-to-a-stream-or-a-file

This solution will work with all Logtalk supported backend Prolog systems, including Scryer Prolog and GNU Prolog.

P.S. I maintain a Prolog packs registry for WordNet at https://github.com/LogtalkDotOrg/wordnet It includes a pack manifest file for the project you forked.

dcnorris · 2023-01-08T19:01:43Z

Cowabunga:

% Appended the following to 'scryer.pl'
:- discontiguous(sk_1/3).
:- discontiguous(sk_3/3).

term_expansion(sk(X,Y,Z),
               [sk_1(X,Y,Z),
                sk_3(Z,X,Y)]).

sk(X, Y, Z) :-
        (   nonvar(X) ->
            sk_1(X, Y, Z)
        ;   nonvar(Z) ->
            sk_3(Z, X, Y)
        ;   sk_1(X, Y, Z)
        ).

Result:

?- [wn_valid].
output/wn_valid.pl-Output-3.1
Semantic Relations: [at,cs,ent,hyp,ins,mm,mp,ms,sim,vgp]
... etc. etc. ...
   % CPU time: 30.251s
   true.
?-

dcnorris · 2023-01-08T19:06:16Z

With the above application of term_expansion/2, I can close this issue.

dcnorris · 2023-01-08T20:19:42Z

I find gprolog is taking a long time already with [wn_valid], but that like Scryer it has a term_expansion/2 mechanism. So perhaps this solution will apply uniformly across both of these engines.

From the GNU Prolog documentation:

The GNU Prolog compiler (section 4.4) automatically calls expand_term/2 on each Term1 read in. However, in the current release, only DCG transformation are done by the compiler (i.e. term_expansion/2 cannot be used). To use term_expansion/2, it is necessary to call expand_term/2 explicitly.

@pmoura I can confirm that @triska's solution indeed does not work in 'gprolog.pl' as it does in 'scryer.pl.' But may I expect that after loading all the facts, I could simply asserta/1 a clause that invokes expand_term/1?

triska · 2023-01-08T20:30:55Z

May I suggest to use the GNU Prolog repository to file and discuss GNU Prolog-specific issues:

https://github.com/didoudiaz/gprolog ?

didoudiaz · 2023-06-22T16:44:48Z

I have added a support for term_expansion/2 in gprolog which works with the above proposed workaround. However, here is an alternative solution based on findall/setof and keysort (which are generally efficiently implemented). It avoids to duplication of the large sk/3 database (207272 facts) as with the term_expansion/2 solution.

To try it: simply replace in wn_valid.pl the definitons of multikey/1 and checkeys/0 by the following code:

multikey(K, IL) :-
        findall(K-I, sk(I, _, K), L1),
        keysort(L1, L2),
        setof(I, member(K-I, L2), IL),
        length(IL, N),
        N > 1.

check_keys:-
        write('Searching for ambiguous sense keys'), nl,
        multikey(K, IL),
        format('Ambiguous key: ~w\n',[K]),
        member(I,IL),
        g(I,G),
        format('    ->~w (~w)\n',[I,G]),
        false.
check_keys:-
        ok,
        nl.

@dcnorris I had to do several changes in the original code from Eric Kafe (too much SWI-oriented). I also implemented the changes for the `term_expansion/2' solution and for the above solution. I can share them with you if you want.

dcnorris closed this as completed Jan 8, 2023

This was referenced Jan 8, 2023

git describe out of sync didoudiaz/gprolog#30

Open

How "to call expand_term/2 explicitly" (per manual 8.17.2) didoudiaz/gprolog#31

Closed

triska mentioned this issue Jan 31, 2024

loading WordNet synsets file is very slow #2311

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Much slower than SWI checking [wn_valid] from wordnet-prolog #1681

Much slower than SWI checking [wn_valid] from wordnet-prolog #1681

dcnorris commented Jan 7, 2023

UWN commented Jan 8, 2023

UWN commented Jan 8, 2023

dcnorris commented Jan 8, 2023

dcnorris commented Jan 8, 2023

dcnorris commented Jan 8, 2023 •

edited

Loading

triska commented Jan 8, 2023

dcnorris commented Jan 8, 2023

dcnorris commented Jan 8, 2023

triska commented Jan 8, 2023

triska commented Jan 8, 2023

dcnorris commented Jan 8, 2023

pmoura commented Jan 8, 2023

dcnorris commented Jan 8, 2023

UWN commented Jan 8, 2023

pmoura commented Jan 8, 2023

dcnorris commented Jan 8, 2023

dcnorris commented Jan 8, 2023

dcnorris commented Jan 8, 2023

triska commented Jan 8, 2023

didoudiaz commented Jun 22, 2023 •

edited

Loading

Much slower than SWI checking [wn_valid] from wordnet-prolog #1681

Much slower than SWI checking [wn_valid] from wordnet-prolog #1681

Comments

dcnorris commented Jan 7, 2023

UWN commented Jan 8, 2023

UWN commented Jan 8, 2023

dcnorris commented Jan 8, 2023

dcnorris commented Jan 8, 2023

dcnorris commented Jan 8, 2023 • edited Loading

triska commented Jan 8, 2023

dcnorris commented Jan 8, 2023

dcnorris commented Jan 8, 2023

triska commented Jan 8, 2023

triska commented Jan 8, 2023

dcnorris commented Jan 8, 2023

pmoura commented Jan 8, 2023

dcnorris commented Jan 8, 2023

UWN commented Jan 8, 2023

pmoura commented Jan 8, 2023

dcnorris commented Jan 8, 2023

dcnorris commented Jan 8, 2023

dcnorris commented Jan 8, 2023

triska commented Jan 8, 2023

didoudiaz commented Jun 22, 2023 • edited Loading

dcnorris commented Jan 8, 2023 •

edited

Loading

didoudiaz commented Jun 22, 2023 •

edited

Loading