Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues in test cases #83

Closed
12 tasks done
chrdebru opened this issue Feb 13, 2024 · 7 comments
Closed
12 tasks done

Issues in test cases #83

chrdebru opened this issue Feb 13, 2024 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@chrdebru
Copy link
Contributor

chrdebru commented Feb 13, 2024

  • 2c --> IDs does not exist in the source and the mapping should not generate an output
  • 4b --> subjectMap is term type literal and the mapping should not generate an output
  • 15b --> "spanish" is considered well-formed (not valid) as it falls under the 5*8ALPHA (src: https://www.rfc-editor.org/rfc/rfc5646). I would not vouch for using a lit of valid tags, but may restrict the spec in explicitly referring to the first language rule: language = 2*3ALPHA["-" extlang]
  • 19a --> assumes the base IRI is the baseIRI of the mapping, but we've said that the baseIRI is provided as an argument (and there is a proposal to assign baseIRIs to triples maps (@dachafra).
  • 19b --> should yield an error. The nq file contains no triple for "Juan Daniel," but that triple cannot be generated. IRI-safe values are only generated for templates, not references. At least, that is the case for R2RML (it is not specified in core). "R2RML always performs percent-encoding when IRIs are generated from string templates. If IRIs need to be generated without percent-encoding, then rr:column should be used instead of rr:template, with an R2RML view that performs the string concatenation." --> implies no percent encoding when using column/reference.
  • 20b should yield an error, as http://example.com/base/path/../Danny is not an absolute IRI.
  • There are no (simple) datatype map tests?
  • 7h should not have .nq files
  • 2g should not have .nq files (missed this, as there is no 2g for CSV files)
  • 10c-JSON -> should be "\\{\\{\\{ {$.['ISO 3166']} \\}\\}\\}"
  • 21a-JSON -> contains an additional POM that is not reflected in the output. 21a CSV and MySQL do not have that POM
  • 2c -> should not have output files
@dachafra dachafra added the bug Something isn't working label Feb 13, 2024
@chrdebru
Copy link
Contributor Author

Here are some suggestions for simple datatype tests:

  • does rml:datatype work
  • does rml:datatypeMap work
  • does rml:datatypeMap for non-xsd datatypes work (should be accepted) as data validation is a separate process

The last one assumes http://example.com/base is given as input for the base IRI.

In other words, the datatype map "behaves" like a IRI generating term map.

RMLTC0021a-CSV.zip

@DylanVanAssche
Copy link
Collaborator

The ones covered by shapes are listed in tests.py:

  • RMLTC0004b: Literal as Term Type in Subject Map
  • RMLTC0007h: Named graph which is not an IRI
  • RMLTC0012c: missing Subject Map
  • RMLTC0012d: 2 Subject Maps
  • RMLTC0015b: invalid language tag --> do you have a proposal for the shapes here to improve it? I am not so sure what you suggest above.

These should already raise an error by the engine if they use the SHACL shapes to validate the mapping.
Currently, we don't have nice page like this: https://rml.io/test-cases/ where it is listed which test-cases throws an error.
We should have this in the future but also add metadata in each test-case what kind of error is thrown.
Do you have a suggestion for that?

Regarding 19a, that one might be a bit in the flux.

Regarding 19b, is on purpose with a data error. The test-cases currently assume 'best-effort' in that case.
Maybe we need to be here super strict and throw an error, to be on the same level as other test-cases like a named graph must be an IRI (0007h)

Regarding 20b, why is that an error? The RFC for URIs (https://www.rfc-editor.org/rfc/rfc3986#section-4.1) says that it can be resolved if needed. It is a valid IRI.
Question is: should we require engines to remove relative path stuff in IRIs like they do for encoding?

Datatype maps are missing yes, feel free to put it in a PR, thanks a lot! If you don't have time, I can do it as well, let me know.

@chrdebru
Copy link
Contributor Author

chrdebru commented Feb 15, 2024

Aha! Okay, I understood that the test cases (for engines) assume valid mappings. In other words, you now assume that each engine uses the shapes (or something else) to cover all cases. Which is OK, but I misinterpreted that.

19a --> Assuming the base IRI of the mapping is the same as the output is a dangerous assumption. You could leave it for the tests for backward compatibility with previous engines, but I would propose to document it as (either http://www.example.com/base is used as input or the base of the mapping file is assumed to be the base). R2RML explicitly states that the base IRI for the output is passed as a parameter. I prefer David's solution of rml:baseIRI per triples map.

19b --> "best effort" contradicts with "generating no file" so you want to be strict. You retain partial results of the same triples map.

20b --> Based on R2RML --> the string is tested for being an absolute IRI and does not mention anything about trying to compute the absolute IRI where possible. RML can propose this, but I have yet to find this mentioned in the spec. RDF does allow one to store information about <http://example.com/base/path/../Danny>, but I'm pretty certain that <http://example.com/base/path/../Danny> and <http://example.com/base/Danny> are two different resources. That will open a whole can of worms.

@DylanVanAssche
Copy link
Collaborator

Aha! Okay, I understood that the test cases (for engines) assume valid mappings. In other words, you now assume that each engine uses the shapes (or something else) to cover all cases. Which is OK, but I misinterpreted that.

Well not required but engines should not crash with invalid mappings. Thry can make their life ewsy by using the shapes or do it manually.

Assuming the base IRI of the mapping is the same as the output is a dangerous assumption. You could leave it for the tests for backward compatibility with previous engines, but I would propose to document it as (either http://www.example.com/base is used as input or the base of the mapping file is assumed to be the base). R2RML explicitly states that the base IRI for the output is passed as a parameter. I prefer David's solution of rml:baseIRI per triples map.

I agree here. I just need a way forward. Dropping the testcase seems the best way, we don't want engines to support this behavior. Do you agree?

19b --> "best effort" contradicts with "generating no file" so you want to be strict. You retain partial results of the same triples map.

Agreed! Especially this kind of stuff needs to go, the test cases must always follow the same paradigm. Let's change it to an error.

20b --> Based on R2RML --> the string is tested for being an absolute IRI and does not mention anything about trying to compute the absolute IRI where possible. RML can propose this, but I have yet to find this mentioned in the spec. RDF does allow one to store information about http://example.com/base/path/../Danny, but I'm pretty certain that http://example.com/base/path/../Danny and http://example.com/base/Danny are two different resources. That will open a whole can of worms.

Exactly! This is a whole can of worms so we need to pick a side here. These URIs are valid ones because they are in the end absolute. So not resolving? But if not resolving we can allow this case right?

@chrdebru
Copy link
Contributor Author

chrdebru commented Feb 15, 2024

Agree to drop the case.
Agree to remove the file.

Well, turning http://example.com/base/path/../Danny into http://example.com/base/Danny is called IRI normalization. RDF 1.1 states that non-normalized IRIs must be avoided, but it does not say that they must be normalized before ingestion. This makes sense, as we can say different things about the two IRIs. The RFC about IRIs states that one way to test IRI equality is by string-comparison (character-by-character), but other approaches may include normalization. That's what I appreciate about R2RML. If an IRI is absolute, then use that one. If not, test whether the base IRI + IRI is absolute. In other words, R2RML enforces the use of absolute IRIs.

rml:normalizeIRIs true (by default false) can be a solution. (an expensive solution, that is).

@DylanVanAssche
Copy link
Collaborator

I think this is all resolved? @chrdebru

@chrdebru
Copy link
Contributor Author

chrdebru commented Mar 4, 2024

Yup!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants