-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for /commit (e.g. /block) #333
Comments
Being this is for a commit "read" operation we should use a different name. We already use "stage" + "commit" to transact data in the library APIs and I think the HTTP API interface should strive to parallel the library API. |
Could also consider the |
Is the plan to transact the commit data into the db? Because right now it's just written to disk and the only way I know to get it is to scan through commits until we find the correct one. Or will we be creating another storage type for commits? |
Related question: it looks like you can get commit metadata for the latest-db of a given branch by calling Could this metadata be retrieved using something similar, just generalized to look for a particular That is, could we get away with not reaching for it in storage anywhere? |
To be clear, I'm not sure how I feel about reconstructing on the fly rather than building these once (if we ever iterate on what's in a commit it'll have to change in more than one place), but for the moment I'm just wondering if this would even work |
If we store all the metadata for each commit in memory after we create it we'll eventually use all the available memory, so we'll need a way to cycle some of that data out of cache and onto disk in a way that provides efficient random access, which is basically what our db does. I would vote to just include index the commit metadata along with the data, and we can query it just like any other query. Metadata:
|
@mpoffald and I cannot query for any of the commit flakes for some undetermined-at-this-time reason. (def conn @(fluree/connect {:method :memory
:defaults
{:context
{:id "@id"
:type "@type"
:xsd "http://www.w3.org/2001/XMLSchema#"
:schema "http://schema.org/"
:sh "http://www.w3.org/ns/shacl#"
:rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
:rdfs "http://www.w3.org/2000/01/rdf-schema#"
:wiki "https://www.wikidata.org/wiki/"
:skos "http://www.w3.org/2008/05/skos#"
:f "https://ns.flur.ee/ledger#"}
:did
{:id "did:fluree:Tf1sbdG5vZrMvdcfpVN9btp3nkYPKgWNu2H",
:public "020154853f38da17df4f1ad6e5cd1c424bf82066708ece89514b975c807aeccc60",
:private "ccd238808e29bda00a62cd27c592d1ef16efdf1f473854e6b276409f36435aed"}}}))
(def ledger @(fluree/create conn "dan" {:context {:dan "fluree:dan/"}}))
(def db1 @(fluree/stage (fluree/db ledger) {:id "fluree:dan/heyo"
:dan/x "foo-1"
:dan/y "bar-1"}))
(def db1* @(fluree/commit! ledger db1))
;; for each subject in novelty, do an index-range query to return the flakes for that subject. Note all the empty (!!) results
(->> db1* :novelty :spot (map flake/s) (into #{})
(map (fn [sid] (async/<!! (query-range/index-range db1* :spot = [sid])))))
;; =>
([#Flake [0 0 "@id" 1 -1 true nil]]
[]
[#Flake [1001 0 "https://ns.flur.ee/ledger#Context" 1 -1 true nil] #Flake [1001 200 203 0 -1 true nil]]
[#Flake [211106232532992 0 "fluree:dan/heyo" 1 -1 true nil] #Flake [211106232532992 1002 "foo-1" 1 -1 true nil] #Flake [211106232532992 1003 "bar-1" 1 -1 true nil]]
[]
[]
[]
[]
[#Flake [1002 0 "fluree:dan/x" 1 -1 true nil]]
[]
[#Flake [150 0 "fluree-default-context" 1 -1 true nil] #Flake [150 200 1001 0 -1 true nil] #Flake [150 250 "{\"schema\":\"http://schema.org/\",\"dan\":\"fluree:dan/\",\"wiki\":\"https://www.wikidata.org/wiki/\",\"xsd\":\"http://www.w3.org/2001/XMLSchema#\",\"type\":\"@type\",\"rdfs\":\"http://www.w3.org/2000/01/rdf-schema#\",\"id\":\"@id\",\"f\":\"https://ns.flur.ee/ledger#\",\"sh\":\"http://www.w3.org/ns/shacl#\",\"skos\":\"http://www.w3.org/2008/05/skos#\",\"rdf\":\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"}" 1 -1 true nil]]
[#Flake [1003 0 "fluree:dan/y" 1 -1 true nil]]
[#Flake [250 0 "https://ns.flur.ee/ledger#context" 1 -1 true nil]]
[]
[]
[]
[#Flake [200 0 "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" 1 -1 true nil]]
[]
[#Flake [203 0 "http://www.w3.org/2000/01/rdf-schema#Class" 1 -1 true nil]]) There are commit flakes in novelty: (-> db1* :novelty :spot)
;; count: 30
#{#Flake [211106232532992 0 "fluree:dan/heyo" 1 -1 true nil]
#Flake [211106232532992 1002 "foo-1" 1 -1 true nil]
#Flake [211106232532992 1003 "bar-1" 1 -1 true nil]
#Flake [35184372089835 0 "did:fluree:Tf1sbdG5vZrMvdcfpVN9btp3nkYPKgWNu2H" 1 -1 true nil]
#Flake [35184372089834 55 1674253134998 4 -1 true nil]
#Flake [35184372089834 57 nil 0 -1 true nil]
#Flake [35184372089833 0 "fluree:memory://dd6e4a87bffe2a2fd9131eb306e81ea55accdb983ef4ab11f041b9e6c8e1209b" 1 -1 true nil]
#Flake [1003 0 "fluree:dan/y" 1 -1 true nil]
#Flake [1002 0 "fluree:dan/x" 1 -1 true nil]
#Flake [1001 0 "https://ns.flur.ee/ledger#Context" 1 -1 true nil]
#Flake [1001 200 203 0 -1 true nil]
#Flake [250 0 "https://ns.flur.ee/ledger#context" 1 -1 true nil]
#Flake [203 0 "http://www.w3.org/2000/01/rdf-schema#Class" 1 -1 true nil]
#Flake [200 0 "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" 1 -1 true nil]
#Flake [150 0 "fluree-default-context" 1 -1 true nil]
#Flake [150 200 1001 0 -1 true nil]
#Flake [150 250 "{\"schema\":\"http://schema.org/\",\"dan\":\"fluree:dan/\",\"wiki\":\"https://www.wikidata.org/wiki/\",\"xsd\":\"http://www.w3.org/2001/XMLSchema#\",\"type\":\"@type\",\"rdfs\":\"http://www.w3.org/2000/01/rdf-schema#\",\"id\":\"@id\",\"f\":\"https://ns.flur.ee/ledger#\",\"sh\":\"http://www.w3.org/ns/shacl#\",\"skos\":\"http://www.w3.org/2008/05/skos#\",\"rdf\":\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"}" 1 -1 true nil]
#Flake [57 0 "https://www.w3.org/2018/credentials#issuer" 0 -1 true nil]
#Flake [57 200 0 0 -1 true nil]
#Flake [56 0 "https://ns.flur.ee/ledger#tag" 0 -1 true nil]
#Flake [55 0 "https://ns.flur.ee/ledger#time" 0 -1 true nil]
#Flake [54 0 "https://ns.flur.ee/ledger#message" 0 -1 true nil]
#Flake [53 0 "https://ns.flur.ee/ledger#commit" 0 -1 true nil]
#Flake [53 200 0 0 -1 true nil]
#Flake [51 0 "https://ns.flur.ee/ledger#time" 0 -1 true nil]
#Flake [51 200 0 0 -1 true nil]
#Flake [0 0 "@id" 1 -1 true nil]
#Flake [-1 0 "fluree:db:sha256:bpzrz6ivc4lmfs2omm34xlqnyl7ykeoqrrqyfkncgjrpno2auywi" 1 -1 true nil]
#Flake [-1 51 35184372089833 0 -1 true nil] #Flake [-1 53 35184372089834 0 -1 true nil]} But for some reason these are not returned by (count (async/<!! (query-range/index-range db1* :spot = [])))
;; 14 |
I think this is a caching problem. We cache index nodes so we don't have to read them from storage multiple times to read back the same values. The caching key was based on the id, tempid, and tt-id of the unresolved node in fluree previous. The unresolved nodes inherited tempid and ttid from the db root, and each transaction updated the db root's ttid and tempid. It looks like something isn't updating those properly when data is staged in json-ld, so when we come back to resolve those nodes again on the staged db, we get the old values back from the cache, and those cached nodes are missing the new flakes. At least, I think this is what's happening. If it is what's happening, we'll need to implement something to cache index nodes on that gets updated when novelty changes. |
That matches our investigations from yesterday - we |
The old system kept the caching keys consistent by updating the db root with the tempid and ttid, which were supposed to be unique for each db value, including novelty. Then all descendant nodes inherited those values as child nodes were read from parents. My guess is some part of the new system either isn't setting those ids consistently or not reading updated db values when it needs to. |
I think the issue is ttid is not being used or updated. This is something that wasn't finished that I forgot about. My plan was to stop using node-id+ttid as a cache key and instead use the db's hash - which would obliviate the need for tt-id - which was also a source of confusion and created some hard to trace bugs. The item I meant to explore before formally moving in that direction was to benchmark the hashing for a stage - my concern is that it is a meaningful amount of time. We calculate the db changes & hash @ a commit anyhow - so if there were just a single stage we could ideally pass that value to commit - so it need not be recalculated, resulting in negligible time. If multiple stages were done however, we wouldn't typically calculate a new db hash - so that was what I was going to confirm. I think the most common use case will be to stage+commit, so even if it slowed down multiple stages it should be OK so long as not extreme. I also think we could work on optimizing new db hashes... originally I was hand-assembling some of the JSON string that would be hashed instead of serializing + deserialized the results as is done now two times for a single commit. |
Ok, so this is our current understanding:
Does that sound right? Otherwise, to proceed with using the db hash as the key, we'd need to thread through that change wherever caching is done. Our preference would be to fix this with |
We just tried calling |
Sounds good! Drawback with tt-id is it specifically is a "cache breaker" - so subsequent db loads would not be able to use the the same cache, but worse case here is more loads happen than otherwise would need to. tt-id is used because we don't know if a DB will succeed until we are done building it... but any query from it will cache as though it was final. If it does fail, we'll start the next transaction at the same 't' value, but it may end up inadvertently using the cache from the last failed transaction. Anyhow, works for now! |
Today was productive, there were a bunch of missing flakes and malformed flakes in the commit data that I think we've ironed out. However, there are multiple concepts with the name "commit", and I'd like to be clear about what we need to return. Option 1: the CommitData document {"@context":{"ex":"http://example.org/ns/",
"f":"https://ns.flur.ee/ledger#",
"f:assert":{"@container":"@graph"},
"id":"@id",
"rdf":"http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"rdfs":"http://www.w3.org/2000/01/rdf-schema#",
"schema":"http://schema.org/",
"sh":"http://www.w3.org/ns/shacl#",
"type":"@type",
"xsd":"http://www.w3.org/2001/XMLSchema#"},
"f:assert":[{"f:context":"{\"schema\":\"http://schema.org/\",\"wiki\":\"https://www.wikidata.org/wiki/\",\"xsd\":\"http://www.w3.org/2001/XMLSchema#\",\"type\":\"@type\",\"rdfs\":\"http://www.w3.org/2000/01/rdf-schema#\",\"ex\":\"http://example.org/ns/\",\"id\":\"@id\",\"f\":\"https://ns.flur.ee/ledger#\",\"sh\":\"http://www.w3.org/ns/shacl#\",\"skos\":\"http://www.w3.org/2008/05/skos#\",\"rdf\":\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"}","id":"fluree-default-context",
"type":"f:Context"}
,{"id":"ex:brian",
"schema:age":50,
"schema:email":"brian@example.org",
"schema:name":"Brian",
"type":"ex:User"}
,{"id":"ex:alice",
"schema:age":50,
"schema:email":"alice@example.org",
"schema:name":"Alice",
"type":"ex:User"}
,{"id":"ex:cam",
"schema:age":34,
"schema:email":"cam@example.org",
"schema:name":"Cam",
"type":"ex:User"}],
"f:flakes":29,
"f:size":2870,
"f:t":1,
"f:v":0,
"id":"fluree:db:sha256:bt5brjzpfpoumzylapnutq3z6m6ckxb4hjq2ojr5bs33x3amybdl",
"type":["f:DB"]} Option 2: the Commit document {"@context":"https://ns.flur.ee/ledger/v1",
"address":"",
"alias":"user/test",
"branch":"main",
"data":{"address":"fluree:file://user/test/main/commits/6bef2affc82b40881a8fc646e9ab0ef75e0e0346813a314f427d48bfa7fda291.json",
"flakes":29,
"id":"fluree:db:sha256:bt5brjzpfpoumzylapnutq3z6m6ckxb4hjq2ojr5bs33x3amybdl",
"size":2870,
"t":1,
"type":["DB"]},
"id":"fluree:commit:sha256:bnt6eqgzogwxhgv5sunsidtrhhdc5qtu4zan4odimwvi77x6sd3l",
"time":"2023-01-25T17:16:58.555014Z",
"type":["Commit"],
"v":0} Option 3: the Commit summary (under the {:address "fluree:memory://c48885385d969378fb193287c75ea4127af4bfcbf6d4794bf8945b2c00b0aaa4",
:v 0,
:time "2023-01-26T22:32:46.297302Z",
:alias "dan",
:previous {:id "fluree:commit:sha256:bbbr3yrkktpkjiywdnlmtcszyqv4kpfarrfbx3hyhi6ay5s6hyg2v",
:address "fluree:memory://0d9519109757db32bd5c5db3067959f3a5869dfe0df50543abdc412b23fe9d8c"},
:id "fluree:commit:sha256:b7ud2abq46wxthu3g6ehsyn7jm3jv6ddij3ujboox73o6ydozso6",
:issuer {:id "did:fluree:Tf1sbdG5vZrMvdcfpVN9btp3nkYPKgWNu2H"},
:branch "main",
:data {:id "fluree:db:sha256:bbnoxkcwta3pxkfb765xfs7rk52v7l54wt74k4mlxoj37oogahp7y",
:t 2,
:address "fluree:memory://7aa54b5b7d5140bce8515a3992c6019ea0f6222412a64825540bbff39b2206a8",
:flakes 40,
:size 4028,
:previous {:id "fluree:db:sha256:bbxeq2tcnsidosnbmg4ftjmkafmfvwlysztjf43latdqnwh4xyqfj",
:address "fluree:memory://23903f04005e75bca79e785a005e55c86600c03e0699ca09c7b900a20637869d"}}} Also, if we just grab all of the flakes for a single [{:id "fluree:db:sha256:bbxeq2tcnsidosnbmg4ftjmkafmfvwlysztjf43latdqnwh4xyqfj",
:f/address {:id
"fluree:memory://23903f04005e75bca79e785a005e55c86600c03e0699ca09c7b900a20637869d"},
:f/commit {:id nil}}
{:id :id}
{:id :f/address,
:rdf/type
[{:id 0,
:class true,
:idx? true,
:ref? false,
:subclassOf [],
:equivalentProperty [],
:iri "@id",
:as :id}]}
{:id :f/commit,
:rdf/type
[{:id 0,
:class true,
:idx? true,
:ref? false,
:subclassOf [],
:equivalentProperty [],
:iri "@id",
:as :id}]}
{:id :f/message}
{:id :f/time}
{:id :f/tag}
{:id "https://www.w3.org/2018/credentials#issuer",
:rdf/type
[{:id 0,
:class true,
:idx? true,
:ref? false,
:subclassOf [],
:equivalentProperty [],
:iri "@id",
:as :id}]}
{:id "fluree-default-context",
:rdf/type [:f/Context],
:f/context
"{\"schema\":\"http://schema.org/\",\"dan\":\"fluree:dan/\",\"wiki\":\"https://www.wikidata.org/wiki/\",\"xsd\":\"http://www.w3.org/2001/XMLSchema#\",\"type\":\"@type\",\"rdfs\":\"http://www.w3.org/2000/01/rdf-schema#\",\"id\":\"@id\",\"f\":\"https://ns.flur.ee/ledger#\",\"sh\":\"http://www.w3.org/ns/shacl#\",\"skos\":\"http://www.w3.org/2008/05/skos#\",\"rdf\":\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"}"}
{:id :rdf/type}
{:id :rdfs/Class}
{:id :f/context}
{:id :f/Context, :rdf/type [:rdfs/Class]}
{:id :dan/x}
{:id :dan/y}
{:id
"fluree:memory://23903f04005e75bca79e785a005e55c86600c03e0699ca09c7b900a20637869d"}
{:f/time 1674767988411,
"https://www.w3.org/2018/credentials#issuer"
{:id "did:fluree:Tf1sbdG5vZrMvdcfpVN9btp3nkYPKgWNu2H"}}
{:id "did:fluree:Tf1sbdG5vZrMvdcfpVN9btp3nkYPKgWNu2H"}
{:id :dan/heyo, :dan/x "foo-1", :dan/y 1}] So I guess the question I have is what do users actually want from the commit endpoint? A commit summary? The actual commit asserts and retracts? A combination of both? |
I think by default it should be the full commit document which we currently store in the two different physical files but you'll see they can be merged into a single document. |
Ok, that would be Option 1 + Option 2, then, merged together. Which is not quite "all flakes for a single Is that correct? |
That's correct. It is redundant to say a value of We create the flakes for this automatically to make our internals work, but we filter out that statement when creating the data files due to the redundancy of it... and it is more common to not include it when submitting data than to include it, so ideally we want the db file output to be as similar as possible to the user's original input. (plus it saves a little space) |
Related to my previous comment, here is where the spec states what I said above: https://www.w3.org/TR/rdf-schema/#ch_type |
Description
Acceptance Criteria
Implementation Details
The text was updated successfully, but these errors were encountered: