Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data broken when added after matching "rdfs:subClassOf" declaration. #706

Open
Yaakov-Belch opened this issue Dec 26, 2023 · 1 comment
Open

Comments

@Yaakov-Belch
Copy link

Yaakov-Belch commented Dec 26, 2023

Short description: When you define an ontology containing "rdfs:subClassOf" declarations before adding the data --- then any matching new documents will be broken. They cannot be found by FlureeQL queries. This implies that "rdfs:subClassOf" cannot be used in most practical use cases.

How to reproduce the bug: One way is to create a new data set on https://data.flur.ee/ and execute three cells in the Quick Start Guide in the following order:

  • First transact the cell that defines the "Humanoid" class and its sub classes: "Yeti" and "Person".
  • Then, transact the first cell -- adding data to the knowledge graph.
  • Finally, execute the query that searches for all "Humanoid" entries. This query will come up with no answers.
  • You can add another query searching for Yetis. This query will also come up with no answers.

By contrast, when you execute these three steps in their original order (add data first, define ontology later, query last), you will get responses including Yetis and Persons.

Here are the four steps spelled out:

Define the ontology:

{
  "@context": {
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#"
  },
  "ledger": "fluree-jld/387028092978056",
  "insert": [
      {
          "@id": "Humanoid",
          "@type": "rdfs:Class"
      },
      {
          "@id": "Yeti",
          "rdfs:subClassOf": { "@id": "Humanoid" }
      },
      {
          "@id": "Person",
          "rdfs:subClassOf": { "@id": "Humanoid" }
      }
  ]
}

Add the data:

{
  "ledger": "fluree-jld/387028092978056",
  "insert": [
    {
      "@id": "freddy",
      "@type": "Yeti",
      "name": "Freddy",
      "age": 4,
      "verified": true,
      "friends": [
        {
          "@id": "letty",
          "@type": "Yeti",
          "name": "Leticia",
          "nickname": "Letty",
          "age": 2
        },
        {
          "@id": "betty",
          "@type": "Yeti",
          "name": "Betty",
          "age": 82
        },
        {
          "@id": "andrew",
          "@type": "Person",
          "name": "Andrew",
          "age": 35
        }
      ]
    }
  ]
}

Search for Humanoids:

{
  "from": "fluree-jld/387028092978056",
  "where": {
    "@id": "?s",
    "@type": "Humanoid"
  },
  "select": { "?s": ["*"] }
}

This query returns an empty set: [].

Search for Yetis:

{
  "from": "fluree-jld/387028092978056",
  "where": {
    "@id": "?s",
    "@type": "Yeti"
  },
  "select": { "?s": ["*"] }
}

This query also returns an empty set: [].
This indicates that the ontology is not just missing -- but it is corrupting the data.

@Jackamus29
Copy link

Hey @Yaakov-Belch!
Thanks for all the excellent feedback in Discord and this Issue!

I've investigated, reproduced, and created a specific bug ticket that should address this issue.
Here's the bug ticket if you want to track it: fluree/core#82
I'll be sure to update you here, regardless, when this gets resolved.

Just to give you some insight into what's going on under the hood, I found two separate obstacles when investigating this.
The first has to do with how Fluree treats ontological data (rdfs:Class, rdf:Property, etc.); specifically if you want Fluree to treat a piece of data as an rdfs:Class, you currently need to be explicit about that the first time Fluree encounters that data.
So, in the case of the Getting Started Notebook, the inferencing query works if you follow the order of the notebook cells because the entity with @id of Yeti is first encountered (transacted) as the value of an @type property - this is one way to explicitly tell Fluree that "Yeti" is an rdfs:Class (because it's used as a type of a data node).
If you want to transact the ontology data before you insert instance data, you just need to be sure to be explicit about typing it as such. Here's what the transaction for your first step would look like:

{
  "@context": {
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#"
  },
  "ledger": "fluree-jld/387028092978056",
  "insert": [
      {
          "@id": "Humanoid",
          "@type": "rdfs:Class"
      },
      {
          "@id": "Yeti",
          "@type": "rdfs:Class",  // added this line
          "rdfs:subClassOf": { "@id": "Humanoid" }
      },
      {
          "@id": "Person",
          "@type": "rdfs:Class",  // added this line
          "rdfs:subClassOf": { "@id": "Humanoid" }
      }
  ]
}

Here, you can see that we're explicitly telling Fluree that Yeti and Person are rdfs:Classes and so the query should return instances of those classes as expected (after inserting the data, of course).

The second obstacle I found while investigating is that after making the above update, and transacting the data in the order you specified (ontology, then data) the inferencing query still does not return Persons and Yetis when querying for Humanoids. This bug is captured by the ticket I created and should be resolved soon.
I'll keep you posted here 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants