Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for /commit (e.g. /block) #333

Closed
aaj3f opened this issue Jan 9, 2023 · 19 comments · Fixed by #379
Closed

Support for /commit (e.g. /block) #333

aaj3f opened this issue Jan 9, 2023 · 19 comments · Fixed by #379
Assignees
Milestone

Comments

@aaj3f
Copy link
Contributor

aaj3f commented Jan 9, 2023

Description

  • There is a valuable need in having the ability to discover all of the metadata surrounding an individual commit (formerly "block"). This would be all the data included in a commit besides the actual substance of the submitted transaction(s)

Acceptance Criteria

  • An end user can submit to the endpoint one of various pieces of data that identify a single commit or a range of commits.
  • This would include possibly: commit t value, range of t values, ISO 8601 wall clock time, epoch instant value
  • The returned result would provide all metadata available to that commit (e.g. t value, previous hash, txn signature, commit instant...)
  • The behavior should align to some degree with the previous /block API behavior, documented here (although, unhelpfully, without return result examples):https://developers.flur.ee/docs/overview/query/block_query/

Implementation Details

  • There has been a suggestion by Brian that we could merge the functionality of /history and /commit (formerly "/block") into on API where the syntax allows for parsing and executing down two different paths
  • The documentation referenced above also describes a "prettyPrint" keyword. For reference, "prettyPrint" returned the retractions/assertions associated with the block as JSON instead of tuples. Up for design consideration whether the endpoint should/must return the actual commit data in addition to the metadata
@bplatz
Copy link
Contributor

bplatz commented Jan 9, 2023

Being this is for a commit "read" operation we should use a different name. We already use "stage" + "commit" to transact data in the library APIs and I think the HTTP API interface should strive to parallel the library API.

@bplatz
Copy link
Contributor

bplatz commented Jan 9, 2023

Could also consider the history API being more flexible and supporting commit reads, and not having this as a separate API at all. History API ticket is #332

@dpetran
Copy link
Contributor

dpetran commented Jan 18, 2023

Is the plan to transact the commit data into the db? Because right now it's just written to disk and the only way I know to get it is to scan through commits until we find the correct one. Or will we be creating another storage type for commits?

@mpoffald
Copy link
Contributor

Related question: it looks like you can get commit metadata for the latest-db of a given branch by calling status on a ledger. It looks like it basically reconstructs the commit metadata, rather than reaching into storage or anything https://github.com/fluree/db/blob/main/src/fluree/db/ledger/json_ld.cljc#L62

Could this metadata be retrieved using something similar, just generalized to look for a particular t instead of always grabbing the latest for a branch?

That is, could we get away with not reaching for it in storage anywhere?

@mpoffald
Copy link
Contributor

To be clear, I'm not sure how I feel about reconstructing on the fly rather than building these once (if we ever iterate on what's in a commit it'll have to change in more than one place), but for the moment I'm just wondering if this would even work

@dpetran
Copy link
Contributor

dpetran commented Jan 18, 2023

If we store all the metadata for each commit in memory after we create it we'll eventually use all the available memory, so we'll need a way to cycle some of that data out of cache and onto disk in a way that provides efficient random access, which is basically what our db does. I would vote to just include index the commit metadata along with the data, and we can query it just like any other query.

Metadata:

  • transaction - the raw transaction data
  • signature - the signature that accompanied the transaction
  • previous - the address of the previous commit
    and maybe
  • v - the version of the commit
  • size - the size in bytes of the transaction

@mpoffald mpoffald changed the title Support for /commit (e.g. /block) on HTTP API [ledger-level] Support for /commit (e.g. /block) on HTTP API Jan 19, 2023
@dpetran
Copy link
Contributor

dpetran commented Jan 20, 2023

@mpoffald and I cannot query for any of the commit flakes for some undetermined-at-this-time reason.

  (def conn @(fluree/connect {:method :memory
                              :defaults
                              {:context
                               {:id "@id"
                                :type "@type"
                                :xsd "http://www.w3.org/2001/XMLSchema#"
                                :schema "http://schema.org/"
                                :sh "http://www.w3.org/ns/shacl#"
                                :rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
                                :rdfs "http://www.w3.org/2000/01/rdf-schema#"
                                :wiki "https://www.wikidata.org/wiki/"
                                :skos "http://www.w3.org/2008/05/skos#"
                                :f "https://ns.flur.ee/ledger#"}
                               :did
                               {:id "did:fluree:Tf1sbdG5vZrMvdcfpVN9btp3nkYPKgWNu2H",
                                :public "020154853f38da17df4f1ad6e5cd1c424bf82066708ece89514b975c807aeccc60",
                                :private "ccd238808e29bda00a62cd27c592d1ef16efdf1f473854e6b276409f36435aed"}}}))

  (def ledger @(fluree/create conn "dan" {:context {:dan "fluree:dan/"}}))

  (def db1 @(fluree/stage (fluree/db ledger) {:id "fluree:dan/heyo"
                                              :dan/x "foo-1"
                                              :dan/y "bar-1"}))

  (def db1* @(fluree/commit! ledger db1))

;; for each subject in novelty, do an index-range query to return the flakes for that subject. Note all the empty (!!) results
  (->> db1* :novelty :spot (map flake/s) (into #{})
       (map (fn [sid] (async/<!! (query-range/index-range db1* :spot = [sid])))))
;; =>
([#Flake [0 0 "@id" 1 -1 true nil]]
   []
   [#Flake [1001 0 "https://ns.flur.ee/ledger#Context" 1 -1 true nil] #Flake [1001 200 203 0 -1 true nil]]
   [#Flake [211106232532992 0 "fluree:dan/heyo" 1 -1 true nil] #Flake [211106232532992 1002 "foo-1" 1 -1 true nil] #Flake [211106232532992 1003 "bar-1" 1 -1 true nil]]
   []
   []
   []
   []
   [#Flake [1002 0 "fluree:dan/x" 1 -1 true nil]]
   []
   [#Flake [150 0 "fluree-default-context" 1 -1 true nil] #Flake [150 200 1001 0 -1 true nil] #Flake [150 250 "{\"schema\":\"http://schema.org/\",\"dan\":\"fluree:dan/\",\"wiki\":\"https://www.wikidata.org/wiki/\",\"xsd\":\"http://www.w3.org/2001/XMLSchema#\",\"type\":\"@type\",\"rdfs\":\"http://www.w3.org/2000/01/rdf-schema#\",\"id\":\"@id\",\"f\":\"https://ns.flur.ee/ledger#\",\"sh\":\"http://www.w3.org/ns/shacl#\",\"skos\":\"http://www.w3.org/2008/05/skos#\",\"rdf\":\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"}" 1 -1 true nil]]
   [#Flake [1003 0 "fluree:dan/y" 1 -1 true nil]]
   [#Flake [250 0 "https://ns.flur.ee/ledger#context" 1 -1 true nil]]
   []
   []
   []
   [#Flake [200 0 "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" 1 -1 true nil]]
   []
   [#Flake [203 0 "http://www.w3.org/2000/01/rdf-schema#Class" 1 -1 true nil]])

There are commit flakes in novelty:

(-> db1* :novelty :spot)
;; count: 30
  #{#Flake [211106232532992 0 "fluree:dan/heyo" 1 -1 true nil]
    #Flake [211106232532992 1002 "foo-1" 1 -1 true nil]
    #Flake [211106232532992 1003 "bar-1" 1 -1 true nil]
    #Flake [35184372089835 0 "did:fluree:Tf1sbdG5vZrMvdcfpVN9btp3nkYPKgWNu2H" 1 -1 true nil]
    #Flake [35184372089834 55 1674253134998 4 -1 true nil]
    #Flake [35184372089834 57 nil 0 -1 true nil]
    #Flake [35184372089833 0 "fluree:memory://dd6e4a87bffe2a2fd9131eb306e81ea55accdb983ef4ab11f041b9e6c8e1209b" 1 -1 true nil]
    #Flake [1003 0 "fluree:dan/y" 1 -1 true nil]
    #Flake [1002 0 "fluree:dan/x" 1 -1 true nil]
    #Flake [1001 0 "https://ns.flur.ee/ledger#Context" 1 -1 true nil]
    #Flake [1001 200 203 0 -1 true nil]
    #Flake [250 0 "https://ns.flur.ee/ledger#context" 1 -1 true nil]
    #Flake [203 0 "http://www.w3.org/2000/01/rdf-schema#Class" 1 -1 true nil]
    #Flake [200 0 "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" 1 -1 true nil]
    #Flake [150 0 "fluree-default-context" 1 -1 true nil]
    #Flake [150 200 1001 0 -1 true nil]
    #Flake [150 250 "{\"schema\":\"http://schema.org/\",\"dan\":\"fluree:dan/\",\"wiki\":\"https://www.wikidata.org/wiki/\",\"xsd\":\"http://www.w3.org/2001/XMLSchema#\",\"type\":\"@type\",\"rdfs\":\"http://www.w3.org/2000/01/rdf-schema#\",\"id\":\"@id\",\"f\":\"https://ns.flur.ee/ledger#\",\"sh\":\"http://www.w3.org/ns/shacl#\",\"skos\":\"http://www.w3.org/2008/05/skos#\",\"rdf\":\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"}" 1 -1 true nil]
    #Flake [57 0 "https://www.w3.org/2018/credentials#issuer" 0 -1 true nil]
    #Flake [57 200 0 0 -1 true nil]
    #Flake [56 0 "https://ns.flur.ee/ledger#tag" 0 -1 true nil]
    #Flake [55 0 "https://ns.flur.ee/ledger#time" 0 -1 true nil]
    #Flake [54 0 "https://ns.flur.ee/ledger#message" 0 -1 true nil]
    #Flake [53 0 "https://ns.flur.ee/ledger#commit" 0 -1 true nil]
    #Flake [53 200 0 0 -1 true nil]
    #Flake [51 0 "https://ns.flur.ee/ledger#time" 0 -1 true nil]
    #Flake [51 200 0 0 -1 true nil]
    #Flake [0 0 "@id" 1 -1 true nil]
    #Flake [-1 0 "fluree:db:sha256:bpzrz6ivc4lmfs2omm34xlqnyl7ykeoqrrqyfkncgjrpno2auywi" 1 -1 true nil]
    #Flake [-1 51 35184372089833 0 -1 true nil] #Flake [-1 53 35184372089834 0 -1 true nil]}

But for some reason these are not returned by index-range:

(count (async/<!! (query-range/index-range db1* :spot = [])))
;; 14

@zonotope
Copy link
Contributor

I think this is a caching problem. We cache index nodes so we don't have to read them from storage multiple times to read back the same values. The caching key was based on the id, tempid, and tt-id of the unresolved node in fluree previous. The unresolved nodes inherited tempid and ttid from the db root, and each transaction updated the db root's ttid and tempid. It looks like something isn't updating those properly when data is staged in json-ld, so when we come back to resolve those nodes again on the staged db, we get the old values back from the cache, and those cached nodes are missing the new flakes.

At least, I think this is what's happening. If it is what's happening, we'll need to implement something to cache index nodes on that gets updated when novelty changes.

@dpetran
Copy link
Contributor

dpetran commented Jan 24, 2023

That matches our investigations from yesterday - we defed out the cache atom and looked at it and the flakes we care about are in there, but the wrong key is used.

@zonotope
Copy link
Contributor

That matches our investigations from yesterday - we defed out the cache atom and looked at it and the flakes we care about are in there, but the wrong key is used.

The old system kept the caching keys consistent by updating the db root with the tempid and ttid, which were supposed to be unique for each db value, including novelty. Then all descendant nodes inherited those values as child nodes were read from parents. My guess is some part of the new system either isn't setting those ids consistently or not reading updated db values when it needs to.

@bplatz
Copy link
Contributor

bplatz commented Jan 24, 2023

I think the issue is ttid is not being used or updated. This is something that wasn't finished that I forgot about.

My plan was to stop using node-id+ttid as a cache key and instead use the db's hash - which would obliviate the need for tt-id - which was also a source of confusion and created some hard to trace bugs.

The item I meant to explore before formally moving in that direction was to benchmark the hashing for a stage - my concern is that it is a meaningful amount of time. We calculate the db changes & hash @ a commit anyhow - so if there were just a single stage we could ideally pass that value to commit - so it need not be recalculated, resulting in negligible time.

If multiple stages were done however, we wouldn't typically calculate a new db hash - so that was what I was going to confirm. I think the most common use case will be to stage+commit, so even if it slowed down multiple stages it should be OK so long as not extreme.

I also think we could work on optimizing new db hashes... originally I was hand-assembling some of the JSON string that would be hashed instead of serializing + deserialized the results as is done now two times for a single commit.

@mpoffald
Copy link
Contributor

Ok, so this is our current understanding:

  • the fn add-tt-id in the transact ns is what's responsible for propagating the tt-id to descendant nodes
  • to fix this issue while still using tt-ids in cache keys, we would need to call the add-tt-id during the commit process (probably in add-commit-flakes-to-db?)

Does that sound right?

Otherwise, to proceed with using the db hash as the key, we'd need to thread through that change wherever caching is done.

Our preference would be to fix this with add-tt-id for now, and then separately follow up with evaluating the db hash idea.

@mpoffald
Copy link
Contributor

We just tried calling add-tt-id in add-commit-flakes-to-db and it did seem to work, we can now get the commit flakes back from index-range

@bplatz
Copy link
Contributor

bplatz commented Jan 24, 2023

Sounds good!

Drawback with tt-id is it specifically is a "cache breaker" - so subsequent db loads would not be able to use the the same cache, but worse case here is more loads happen than otherwise would need to.

tt-id is used because we don't know if a DB will succeed until we are done building it... but any query from it will cache as though it was final. If it does fail, we'll start the next transaction at the same 't' value, but it may end up inadvertently using the cache from the last failed transaction.

Anyhow, works for now!

@mpoffald mpoffald changed the title Support for /commit (e.g. /block) on HTTP API Support for /commit (e.g. /block) Jan 25, 2023
@dpetran
Copy link
Contributor

dpetran commented Jan 26, 2023

Today was productive, there were a bunch of missing flakes and malformed flakes in the commit data that I think we've ironed out.

However, there are multiple concepts with the name "commit", and I'd like to be clear about what we need to return.

Option 1: the CommitData document

{"@context":{"ex":"http://example.org/ns/",
               "f":"https://ns.flur.ee/ledger#",
               "f:assert":{"@container":"@graph"},
               "id":"@id",
               "rdf":"http://www.w3.org/1999/02/22-rdf-syntax-ns#",
               "rdfs":"http://www.w3.org/2000/01/rdf-schema#",
               "schema":"http://schema.org/",
               "sh":"http://www.w3.org/ns/shacl#",
               "type":"@type",
               "xsd":"http://www.w3.org/2001/XMLSchema#"},
   "f:assert":[{"f:context":"{\"schema\":\"http://schema.org/\",\"wiki\":\"https://www.wikidata.org/wiki/\",\"xsd\":\"http://www.w3.org/2001/XMLSchema#\",\"type\":\"@type\",\"rdfs\":\"http://www.w3.org/2000/01/rdf-schema#\",\"ex\":\"http://example.org/ns/\",\"id\":\"@id\",\"f\":\"https://ns.flur.ee/ledger#\",\"sh\":\"http://www.w3.org/ns/shacl#\",\"skos\":\"http://www.w3.org/2008/05/skos#\",\"rdf\":\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"}","id":"fluree-default-context",
                "type":"f:Context"}
               ,{"id":"ex:brian",
                 "schema:age":50,
                 "schema:email":"brian@example.org",
                 "schema:name":"Brian",
                 "type":"ex:User"}
               ,{"id":"ex:alice",
                 "schema:age":50,
                 "schema:email":"alice@example.org",
                 "schema:name":"Alice",
                 "type":"ex:User"}
               ,{"id":"ex:cam",
                 "schema:age":34,
                 "schema:email":"cam@example.org",
                 "schema:name":"Cam",
                 "type":"ex:User"}],
   "f:flakes":29,
   "f:size":2870,
   "f:t":1,
   "f:v":0,
   "id":"fluree:db:sha256:bt5brjzpfpoumzylapnutq3z6m6ckxb4hjq2ojr5bs33x3amybdl",
   "type":["f:DB"]}

Option 2: the Commit document

{"@context":"https://ns.flur.ee/ledger/v1",
   "address":"",
   "alias":"user/test",
   "branch":"main",
   "data":{"address":"fluree:file://user/test/main/commits/6bef2affc82b40881a8fc646e9ab0ef75e0e0346813a314f427d48bfa7fda291.json",
           "flakes":29,
           "id":"fluree:db:sha256:bt5brjzpfpoumzylapnutq3z6m6ckxb4hjq2ojr5bs33x3amybdl",
           "size":2870,
           "t":1,
           "type":["DB"]},
   "id":"fluree:commit:sha256:bnt6eqgzogwxhgv5sunsidtrhhdc5qtu4zan4odimwvi77x6sd3l",
   "time":"2023-01-25T17:16:58.555014Z",
   "type":["Commit"],
   "v":0}

Option 3: the Commit summary (under the :commit key on a db), which is similar but not quite the same to the Commit document.

{:address "fluree:memory://c48885385d969378fb193287c75ea4127af4bfcbf6d4794bf8945b2c00b0aaa4",
   :v 0,
   :time "2023-01-26T22:32:46.297302Z",
   :alias "dan",
   :previous {:id "fluree:commit:sha256:bbbr3yrkktpkjiywdnlmtcszyqv4kpfarrfbx3hyhi6ay5s6hyg2v",
              :address "fluree:memory://0d9519109757db32bd5c5db3067959f3a5869dfe0df50543abdc412b23fe9d8c"},
   :id "fluree:commit:sha256:b7ud2abq46wxthu3g6ehsyn7jm3jv6ddij3ujboox73o6ydozso6",
   :issuer {:id "did:fluree:Tf1sbdG5vZrMvdcfpVN9btp3nkYPKgWNu2H"},
   :branch "main",
   :data {:id "fluree:db:sha256:bbnoxkcwta3pxkfb765xfs7rk52v7l54wt74k4mlxoj37oogahp7y",
          :t 2,
          :address "fluree:memory://7aa54b5b7d5140bce8515a3992c6019ea0f6222412a64825540bbff39b2206a8",
          :flakes 40,
          :size 4028,
          :previous {:id "fluree:db:sha256:bbxeq2tcnsidosnbmg4ftjmkafmfvwlysztjf43latdqnwh4xyqfj",
                     :address "fluree:memory://23903f04005e75bca79e785a005e55c86600c03e0699ca09c7b900a20637869d"}}}

Also, if we just grab all of the flakes for a single t, we get a lot of extra data, especially if t is 1 and we've committed a bunch of commit vocab flakes. I don't think we want all of them.

[{:id "fluree:db:sha256:bbxeq2tcnsidosnbmg4ftjmkafmfvwlysztjf43latdqnwh4xyqfj",
    :f/address {:id
                "fluree:memory://23903f04005e75bca79e785a005e55c86600c03e0699ca09c7b900a20637869d"},
    :f/commit {:id nil}}
   {:id :id}
   {:id :f/address,
    :rdf/type
    [{:id 0,
      :class true,
      :idx? true,
      :ref? false,
      :subclassOf [],
      :equivalentProperty [],
      :iri "@id",
      :as :id}]}
   {:id :f/commit,
    :rdf/type
    [{:id 0,
      :class true,
      :idx? true,
      :ref? false,
      :subclassOf [],
      :equivalentProperty [],
      :iri "@id",
      :as :id}]}
   {:id :f/message}
   {:id :f/time}
   {:id :f/tag}
   {:id "https://www.w3.org/2018/credentials#issuer",
    :rdf/type
    [{:id 0,
      :class true,
      :idx? true,
      :ref? false,
      :subclassOf [],
      :equivalentProperty [],
      :iri "@id",
      :as :id}]}
   {:id "fluree-default-context",
    :rdf/type [:f/Context],
    :f/context
    "{\"schema\":\"http://schema.org/\",\"dan\":\"fluree:dan/\",\"wiki\":\"https://www.wikidata.org/wiki/\",\"xsd\":\"http://www.w3.org/2001/XMLSchema#\",\"type\":\"@type\",\"rdfs\":\"http://www.w3.org/2000/01/rdf-schema#\",\"id\":\"@id\",\"f\":\"https://ns.flur.ee/ledger#\",\"sh\":\"http://www.w3.org/ns/shacl#\",\"skos\":\"http://www.w3.org/2008/05/skos#\",\"rdf\":\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"}"}
   {:id :rdf/type}
   {:id :rdfs/Class}
   {:id :f/context}
   {:id :f/Context, :rdf/type [:rdfs/Class]}
   {:id :dan/x}
   {:id :dan/y}
   {:id
    "fluree:memory://23903f04005e75bca79e785a005e55c86600c03e0699ca09c7b900a20637869d"}
   {:f/time 1674767988411,
    "https://www.w3.org/2018/credentials#issuer"
    {:id "did:fluree:Tf1sbdG5vZrMvdcfpVN9btp3nkYPKgWNu2H"}}
   {:id "did:fluree:Tf1sbdG5vZrMvdcfpVN9btp3nkYPKgWNu2H"}
   {:id :dan/heyo, :dan/x "foo-1", :dan/y 1}]

So I guess the question I have is what do users actually want from the commit endpoint? A commit summary? The actual commit asserts and retracts? A combination of both?

@bplatz
Copy link
Contributor

bplatz commented Jan 27, 2023

I think by default it should be the full commit document which we currently store in the two different physical files but you'll see they can be merged into a single document.

@mpoffald
Copy link
Contributor

Ok, that would be Option 1 + Option 2, then, merged together.

Which is not quite "all flakes for a single t" (at least, not for t=1, where there will be vocab/schema flakes added as well).

Is that correct?

@bplatz
Copy link
Contributor

bplatz commented Jan 27, 2023

That's correct. It is redundant to say a value of @type (like, in your example ex:User) is a rdfs:Class - but anything used for @type must be a Class, so it is inferred.

We create the flakes for this automatically to make our internals work, but we filter out that statement when creating the data files due to the redundancy of it... and it is more common to not include it when submitting data than to include it, so ideally we want the db file output to be as similar as possible to the user's original input. (plus it saves a little space)

@bplatz
Copy link
Contributor

bplatz commented Jan 27, 2023

Related to my previous comment, here is where the spec states what I said above: https://www.w3.org/TR/rdf-schema/#ch_type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants