Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect behavior on string literals from csv-import #73

Closed
andreschamschurko opened this issue Jun 26, 2021 · 23 comments
Closed

Incorrect behavior on string literals from csv-import #73

andreschamschurko opened this issue Jun 26, 2021 · 23 comments

Comments

@andreschamschurko
Copy link

Considering the following containments.csv:

"""chili sauce""","""chili pepper"""
"""beer""","""alcohol"""
juice,water

The input KB in Rulewerk syntax is:

@source contains[2]: load-csv("/home/.../containments.csv") .

The reasoning with the rulewerk client version 0.8.0 produces the following trace:

rulewerk> @query contains(?X, ?Y) .
?X -> "chili sauce", ?Y -> "chili pepper"
?X -> "beer", ?Y -> "alcohol"
?X -> juice, ?Y -> water
3 result(s) in 9ms. Results are sound and complete.
rulewerk> @query contains(?X, "chili pepper") .                                                            
0 result(s) in 0ms. Results are sound and complete.
rulewerk> @query contains(?X, "alcohol") .                                                             
0 result(s) in 0ms. Results are sound and complete.
rulewerk> @query contains(?X, water) .                                                                            
?X -> juice
1 result(s) in 0ms. Results are sound and complete.

The expected behavior can be verified with the previous version 0.7.0:

rulewerk> @query contains(?X, ?Y) .
?X -> "chili sauce", ?Y -> "chili pepper"
?X -> "beer", ?Y -> "alcohol"
?X -> juice, ?Y -> water
3 result(s) in 1ms. Results are sound and complete.
rulewerk> @query contains(?X, "chili pepper") . 
?X -> "chili sauce"
1 result(s) in 0ms. Results are sound and complete.
rulewerk> @query contains(?X, "alcohol") .                                                                             
?X -> "beer"
1 result(s) in 0ms. Results are sound and complete.
rulewerk> @query contains(?X, water) .                                                                                 
?X -> juice
1 result(s) in 0ms. Results are sound and complete.

It seems that the facts containing string literals are not processed correctly.

@andreschamschurko andreschamschurko changed the title Missing facts from csv-import Incorrect behavior on stron literals from csv-import Jun 26, 2021
@andreschamschurko andreschamschurko changed the title Incorrect behavior on stron literals from csv-import Incorrect behavior on string literals from csv-import Jun 26, 2021
@CerielJacobs
Copy link
Contributor

CerielJacobs commented Jun 28, 2021 via email

@CerielJacobs
Copy link
Contributor

CerielJacobs commented Jun 29, 2021 via email

@andreschamschurko
Copy link
Author

I get 0 results:

rulewerk> @query contains(?X, "chili pepper"^^<http://www.w3.org/2001/XMLSchema#string>)                                                                                                                           
0 result(s) in 2ms. Results are sound and complete.

Another thing i tried, was adding all facts imported from the csv and got this:

rulewerk> @query contains(?X,?Y).                                                                                                                                                                            
?X -> "chili sauce", ?Y -> "chili pepper"
?X -> "beer", ?Y -> "alcohol"
?X -> juice, ?Y -> water
3 result(s) in 0ms. Results are sound and complete.
rulewerk> @assert contains("chili sauce", "chili pepper") .
Asserted 1 fact(s) and 0 rule(s).
rulewerk> @assert contains("beer", "alcohol") .
Asserted 1 fact(s) and 0 rule(s).
rulewerk> @assert contains(juice, water) .                                                                                                                                                                     
Asserted 1 fact(s) and 0 rule(s).
rulewerk> @reason .
Loading and materializing inferences ...
... finished in 0ms (0ms CPU time).
rulewerk> @query contains(?X,?Y).                                                                                                   
?X -> juice, ?Y -> water
?X -> "chili sauce", ?Y -> "chili pepper"
?X -> "beer", ?Y -> "alcohol"
?X -> "chili sauce", ?Y -> "chili pepper"
?X -> "beer", ?Y -> "alcohol"
5 result(s) in 0ms. Results are sound and complete.

@CerielJacobs
Copy link
Contributor

CerielJacobs commented Jun 29, 2021 via email

@mkroetzsch
Copy link
Collaborator

mkroetzsch commented Jun 29, 2021

I think this is not a reasoning issue but something related to the API. From what I know, Rulewerk did not change any code related to how strings are represented, but updating the VLog version leads to different behaviour (this needs to be verified; I understand that the problem so far was seen by using different versions of Rulewerk and VLog, so it is not obvious what causes the problem). But we can investigate if there might have been any change in how data was passed from Rulewerk to VLog. If we are passing the same data but get different results, then it can still be that the new behaviour is "correct" and Rulewerk was written for the old "incorrect" behaviour, of course. Let's see what we can find out ...

@CerielJacobs
Copy link
Contributor

@mkroetzsch Could this issue be related to issue #55?

@CerielJacobs
Copy link
Contributor

When reading strings from a .csv file, vlog does not convert them to "..."^^http://www.w3.org/2001/XMLSchema#string.

@CerielJacobs
Copy link
Contributor

@andreschamschurko Could you please try changing the .csv file such that all strings have ^^http://www.w3.org/2001/XMLSchema#string appended and then see what happens?

@andreschamschurko
Copy link
Author

With the following csv:

"chili sauce"^^http://www.w3.org/2001/XMLSchema#string,"chili pepper"^^http://www.w3.org/2001/XMLSchema#string
"""beer"""^^http://www.w3.org/2001/XMLSchema#string,"""alcohol"""^^http://www.w3.org/2001/XMLSchema#string
juice,water

I get:

rulewerk> @query contains(?X, ?Y) .
?X -> <chili sauce"^^http://www.w3.org/2001/XMLSchema#string>, ?Y -> <chili pepper"^^http://www.w3.org/2001/XMLSchema#string>
Error: VLog returned a constant name '"beer""^^http://www.w3.org/2001/XMLSchema#string' that Rulewerk cannot make sense of.
rulewerk> @query contains(?X, "chili pepper") .                                                             
0 result(s) in 0ms. Results are sound and complete.
rulewerk> @query contains(?X, "alcohol") .                                                             
0 result(s) in 0ms. Results are sound and complete.
rulewerk> @query contains(?X, "chili pepper"^^<http://www.w3.org/2001/XMLSchema#string>)                    
0 result(s) in 0ms. Results are sound and complete.
rulewerk> @query contains(?X, "alcohol"^^<http://www.w3.org/2001/XMLSchema#string>)                    
0 result(s) in 0ms. Results are sound and complete.

@CerielJacobs
Copy link
Contributor

CerielJacobs commented Jul 1, 2021

I think it works with the following containments.csv:

"""chili sauce""^^http://www.w3.org/2001/XMLSchema#string","""chili pepper""^^http://www.w3.org/2001/XMLSchema#string"
"""beer""^^http://www.w3.org/2001/XMLSchema#string","""alcohol""^^http://www.w3.org/2001/XMLSchema#string"
juice,water

Note that http://www.w3.org/2001/XMLSchema#string should have < and > delimiters.

@andreschamschurko
Copy link
Author

I still get the error:

rulewerk> @query contains(?X, ?Y).                                                                       
Error: VLog returned a constant name '"chili sauce"^^http://www.w3.org/2001/XMLSchema#string' that Rulewerk cannot make sense of.

@CerielJacobs
Copy link
Contributor

I think the < > delimiters are missing in your .csv file. I don't know how to disable markdown in these comments, so you don't see them, but they are there in what I wrote. I'll try again:

"""chili sauce""^^<http://www.w3.org/2001/XMLSchema#string>","""chili pepper""^^<http://www.w3.org/2001/XMLSchema#string>"
"""beer""^^<http://www.w3.org/2001/XMLSchema#string>","""alcohol""^^<http://www.w3.org/2001/XMLSchema#string>"
juice,water

@andreschamschurko
Copy link
Author

You are right I misunderstood the sentence with the delimiters.
Now everything is there:

rulewerk> @query contains(?X,?Y).
?X -> "chili sauce", ?Y -> "chili pepper"
?X -> "beer", ?Y -> "alcohol"
?X -> juice, ?Y -> water
3 result(s) in 0ms. Results are sound and complete.
rulewerk> @query contains(?X, "chili pepper")                 
?X -> "chili sauce"
1 result(s) in 0ms. Results are sound and complete.
rulewerk> @query contains(?X, "alcohol")                 
?X -> "beer"
1 result(s) in 0ms. Results are sound and complete.

@CerielJacobs
Copy link
Contributor

So, I'm pretty sure that the difference between Rulewerk 0.7.0 and 0.8.0 comes from Rulewerk's changed handling of strings, see the discussion in issue #55. I'm closing this issue.

@mkroetzsch
Copy link
Collaborator

There is still a problem there. RDF, OWL, and Rulewerk do not distinguish "foo" from "foo"^^<http://www.w3.org/2001/XMLSchema#string>. In VLog, however, it seems that these two forms are distinct. Rulewerk could introduce a special auxiliary datatype for VLog plain strings and then represent VLog's "foo" as something like "foo"^^<http://rulewerk.semanticweb.org/vlog-plain-string>, but is this really a good solution?

@CerielJacobs
Copy link
Contributor

A yes, re-opening.

@CerielJacobs CerielJacobs reopened this Jul 2, 2021
@mkroetzsch
Copy link
Collaborator

It's mainly a design decision, whether VLog wants to use the RDF model for data values or something more general with literals that are not native to RDF. In the latter case, Rulewerk would need to find a clean way to represent any additional kinds of values in its RDF-based type system. This can be done, but I am not sure if it is convenient.

RDF itself used to distinguish "foo" from "foo"^^<http://www.w3.org/2001/XMLSchema#string> in version 1.0. The untyped version only got out of use since 2014, when RDF 1.1 unified the handling of literals. What remains different are the language-tagged strings "foo"@en, which have an own type too but are never written with a type.

@CerielJacobs
Copy link
Contributor

Maybe we should add an RDF mode to Vlog, so that, when running in that mode, "foo" is automatically converted to "foo"^^<http://www.w3.org/2001/XMLSchema#string>.

@mkroetzsch
Copy link
Collaborator

This could be a way, but I wonder if there are any cases where the non-RDF mode would be of interest. The two distinct forms of strings only can play a role if one also uses data of the form "foo"^^<http://www.w3.org/2001/XMLSchema#string>, but in this case it seems almost certain that one is using RDF and would want the RDF-mode.

Conversely, even in RDF-mode, it would be ok for users if their "foo"^^<http://www.w3.org/2001/XMLSchema#string> would turn into "foo" internally and in results. So maybe one should just represent "foo"^^<http://www.w3.org/2001/XMLSchema#string> as "foo" in all cases? In non-RDF applications, nothing would change. Of course, one would have to do this simplification (removing xsd:string) in all places where string constants may occur, including in SPARQL results and trident inputs, and I don't know how much work this would be.

@CerielJacobs
Copy link
Contributor

That sounds like a plan. I'm not working today, though, so this will have to wait until next week.

@CerielJacobs
Copy link
Contributor

And this would also solve issue #55.

CerielJacobs added a commit that referenced this issue Jul 7, 2021
@irina-dragoste
Copy link
Collaborator

irina-dragoste commented Sep 24, 2021

So, when this issue will be solved, when loading an RDF file containing constants "foo1" and "foo2"^^<http://www.w3.org/2001/XMLSchema#string>, and then querying for them, are we expecting the resulting karmaresearch.vlog.Term objects to have name field values "foo1" and "foo2" ?

@irina-dragoste
Copy link
Collaborator

Fixed, SPARQL results remain to be tested in open issue knowsys/rulewerk#223.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants