Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement reduce and dependency task logic #144

Merged
merged 33 commits into from
Jun 15, 2022

Conversation

sandreae
Copy link
Member

@sandreae sandreae commented Jun 4, 2022

reduce

  • fetch operations by document_id
  • fetch operations by document_view_id
  • materialise document
  • materialise document_views
  • store specific document view
  • store documents current view
  • return next dependency tasks
  • error handling
  • tests
  • make requested review changes

dependency

  • retrieve relation graph from db (including parent and children) check child pinned relations are materialised
  • return reduce tasks for every missing child dependency
  • error on deleted documents
  • error handling
  • tests

📋 Checklist

  • Add tests that cover your changes
  • Add this PR to the Unreleased section in CHANGELOG.md
  • Link this PR to any issues it closes
  • New files contain a SPDX license header

@sandreae sandreae changed the title WIP: implement document materialisation and storage Materialise and store documents and views in reduce_task Jun 4, 2022
@sandreae sandreae changed the title Materialise and store documents and views in reduce_task Materialise and store document views in reduce_task Jun 4, 2022
@codecov
Copy link

codecov bot commented Jun 4, 2022

Codecov Report

Merging #144 (b0d0124) into development (32d12e7) will increase coverage by 0.95%.
The diff coverage is 96.05%.

@@               Coverage Diff               @@
##           development     #144      +/-   ##
===============================================
+ Coverage        87.63%   88.59%   +0.95%     
===============================================
  Files               42       42              
  Lines             2055     2280     +225     
===============================================
+ Hits              1801     2020     +219     
- Misses             254      260       +6     
Impacted Files Coverage Δ
aquadoggo/src/db/provider.rs 87.50% <ø> (ø)
aquadoggo/src/db/stores/document.rs 93.46% <ø> (-0.04%) ⬇️
aquadoggo/src/materializer/input.rs 100.00% <ø> (+100.00%) ⬆️
aquadoggo/src/materializer/tasks/schema.rs 0.00% <ø> (ø)
aquadoggo/src/materializer/worker.rs 88.34% <ø> (ø)
aquadoggo/src/materializer/tasks/reduce.rs 93.80% <93.80%> (+93.80%) ⬆️
aquadoggo/src/materializer/tasks/dependency.rs 98.26% <98.26%> (+98.26%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 32d12e7...b0d0124. Read the comment docs.

@sandreae sandreae changed the title Materialise and store document views in reduce_task Implement reduce and dependency task logic Jun 8, 2022
Copy link
Member

@adzialocha adzialocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just writing some comments while we're going as I was very curious 🤩

This is really really really beautiful, the tasks are so small and powerful!

aquadoggo/src/materializer/tasks/reduce.rs Outdated Show resolved Hide resolved
aquadoggo/src/materializer/tasks/reduce.rs Outdated Show resolved Hide resolved
aquadoggo/src/materializer/tasks/reduce.rs Show resolved Hide resolved
aquadoggo/src/materializer/tasks/reduce.rs Show resolved Hide resolved
aquadoggo/src/materializer/tasks/dependency.rs Outdated Show resolved Hide resolved
@adzialocha
Copy link
Member

The error handling is funny as we probably end up only having one critical error type or Ok(None) 🤣

There could be two approaches to the semantics of the return values:

  • Ok(...) Task finished without any critical problem
  • Err(Critical) Task failed with a critical problem

.. or:

  • Ok(...) Task finished and did it's job
  • Err(Failure) Task finished but couldn't do it's job
  • Err(Critical) Task failed with a critical problem

@sandreae
Copy link
Member Author

sandreae commented Jun 9, 2022

Thanks for all the comments @adzialocha! I will make those changes.

When you have time, you can also look over the dependency task logic. It seems a little more nuanced than I originally imagined. The current logic is:

  • for "parent" dependencies which have been materialised (the only kind) => dispatch a dependency task
  • for "child" dependencies which have been materialised => do nothing
  • for "child" dependencies which have not been materialised => dispatch a reduce task
  • for documents or document views which have all their child dependencies met => dispatch a schema task

"parent dependency" = a document relating to the document being processed by the task
"child dependency" = a document which the tasks' document relates to

This was linked to issues Jun 9, 2022
Copy link
Member

@adzialocha adzialocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reflected yet on the new schema store in this context but I think we currently don't need a schema task with this db layout.

I'd add it as soon as we build the service 👍

For now we can build the schema through schema store on every request. That's naive of course but it will be fine for now!

Later we can add the schema service which will be used to query schema instances in memory (as a caching layer), it will be populated during node startup and filled with new schemas in a schema task (only if the wanted schema was missing during startup). Later we can deprecate / delete schemas in the task when the related document got deleted.

@sandreae
Copy link
Member Author

sandreae commented Jun 9, 2022

but I think we currently don't need a schema task with this db layout

Yep, that makes sense.

Later we can add the schema service which will be used to query schema instances in memory (as a caching layer), it will be populated during node startup and filled with new schemas in a schema task (only if the wanted schema was missing during startup). Later we can deprecate / delete schemas in the task when the related document got deleted.

Cool, this is what I imagined we'd be doing as well 👍

@adzialocha
Copy link
Member

adzialocha commented Jun 13, 2022

  • for "parent" dependencies which have been materialised (the only kind) => dispatch a dependency task

This looks really close to the specification we have in the Whimsical 👍 except of this one point where I would expect this to dispatch a reduce task (which again will dispatch a dependency task), we might otherwise miss out on materialising the specific document view of this parent (it might be pinned as well from another parent). (that should itself being wrong after meeting with @sandreae, we actually don't need to check for parents at all)

@adzialocha adzialocha removed a link to an issue Jun 13, 2022
@sandreae sandreae linked an issue Jun 13, 2022 that may be closed by this pull request
@sandreae sandreae marked this pull request as ready for review June 13, 2022 15:48
@sandreae sandreae changed the base branch from development to updates-for-verified-operation June 14, 2022 11:15
aquadoggo/src/materializer/tasks/dependency.rs Outdated Show resolved Hide resolved
aquadoggo/src/materializer/tasks/dependency.rs Outdated Show resolved Hide resolved
aquadoggo/src/materializer/tasks/dependency.rs Outdated Show resolved Hide resolved
aquadoggo/src/materializer/worker.rs Show resolved Hide resolved
@sandreae sandreae requested a review from adzialocha June 14, 2022 13:00
Copy link
Member

@cafca cafca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some documentation for the dependency task, I think the same could be very helpful for the reducer (but dinner is ready so I stop now ;) ). The control flow is quite deeply nested with multiple matches. Flattening that out would probably help make this maintainable.

Overall this is good to go though :)

aquadoggo/src/materializer/tasks/dependency.rs Outdated Show resolved Hide resolved
aquadoggo/src/materializer/tasks/dependency.rs Outdated Show resolved Hide resolved
aquadoggo/src/materializer/tasks/reduce.rs Show resolved Hide resolved
@sandreae
Copy link
Member Author

Thanks for the review @cafca, I improved the docs and did a little flattening, I like it! Think I covered everything now.

@sandreae sandreae requested a review from cafca June 15, 2022 09:14
@adzialocha adzialocha changed the base branch from updates-for-verified-operation to development June 15, 2022 14:22
@adzialocha
Copy link
Member

Alright! I did some crazy rebase-kung-fu and now this is ready for merging into development! Also, I had a go on @cafca's idea to improve the control flow nesting in reduce_task by factoring all into smaller methods. Please have a (hopefully) last look, especially @sandreae, I hope I didn't mess with your code too much.

@sandreae
Copy link
Member Author

Thanks! That's great 👍 for some reason I thought @cafca 's comment was about the dependency task....

@adzialocha adzialocha merged commit 72df98b into development Jun 15, 2022
adzialocha added a commit that referenced this pull request Jun 20, 2022
* development:
  Implement `reduce` and `dependency` task logic (#144)
  Send published entries to materializer (#161)
  Updates for use of `VerifiedOperation` (#158)
  A backlink is not a skiplink (#163)
  Update README.md
adzialocha added a commit that referenced this pull request Jun 20, 2022
* development: (42 commits)
  Implement `reduce` and `dependency` task logic (#144)
  Send published entries to materializer (#161)
  Updates for use of `VerifiedOperation` (#158)
  A backlink is not a skiplink (#163)
  Update README.md
  Remove clippy warnings
  Update `p2panda-rs` API & refactor tests (#147)
  fmt
  Rename `ES` generic type param to `EntryStore`
  Add better instructions for regenerating the schema file.
  Rename `skiplinks` to `certificate_pool` and regen schema
  Use cafca's wording for error message.
  Fix typo
  Target p2panda main (#142)
  Fix a clipp warn
  clippy --fix
  Add rt-multi-thread tokio feature to fix build of dump_schema
  Fix naming of param in SingleEntryAndPayload
  Add docstring to the `AliasedAuthor` type.
  Fix import order of tests
  ...
@adzialocha adzialocha deleted the implement-reducer-task branch June 29, 2022 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants