Skip to content
This repository was archived by the owner on Feb 6, 2026. It is now read-only.

Switch to the new engine#3113

Merged
si-bors-ng[bot] merged 112 commits intomainfrom
new-engine
Mar 4, 2024
Merged

Switch to the new engine#3113
si-bors-ng[bot] merged 112 commits intomainfrom
new-engine

Conversation

@nickgerace
Copy link
Copy Markdown
Contributor

@nickgerace nickgerace commented Jan 3, 2024

Description

Since July 2023, we have been dreaming of this moment. The changes in new-engine all hit main. This has been a journey of many squashed commits, bad git merges, pain, sweat and tears. That all changes today. Welcome to the new engine.

History

This PR is a continuation of #2879, which was closed because the branch was renamed to new-engine.

nickgerace and others added 30 commits December 11, 2023 17:38
This commits moves the engine-layer towards a design where Workspaces
are snapshots of directed, acyclic graphs. In this design, ChangeSets
point to a existing snapshot rather than overlaying rows on top of each
other in the database. In addition, all graphs are immutable.

This commit also adds scaffolding for the Gobbler, which will handle
processing and performing updates for the new design.

An initial RabbitMQ implementation and library has been added because of
its durable properties, which NATS does not provide.

For data access, this design uses a pull-through cache and a content
store in the database.

Signed-off-by: Nick Gerace <nick@systeminit.com>
Co-authored-by: Jacob Helwig <jacob@systeminit.com>
Add first complex test for graph work, which includes multiple conflicts
and updates. It can be expanded upon to add even more scenarios in the
future, such as ordering conflicts (once ready).

This commit also prunes duplicate tests and cleans up panics other
existing tests. Some have been renamed as a result of these changes.

Signed-off-by: Nick Gerace <nick@systeminit.com>
Co-authored-by: Jacob Helwig <jacob@systeminit.com>
Previously, we were relying on `copy_node_index` to also increment the
vector clocks, but we updated `copy_node_index` to be more of a "pure"
copy, which means that it was no longer incrementing the vector clocks.
In `update_content` we explicitly want the vector clocks updated, since
we are doing a write, so we now do this directly.
The last "update_content" call was intended to cause a "ReplaceSubgraph"
update to be found. It was written incorrectly and has been fixed. The
expected results have been modified to reflect this change.

Signed-off-by: Nick Gerace <nick@systeminit.com>
Co-authored-by: Jacob Helwig <jacob@systeminit.com>
… if it has one

We're going to need to find the `Ordering` node more often than just in
the conflict/update detection logic, and the current method of finding a
`Vec` of possible nodes is a bit awkward to be using in multiple places.
This makes it so we have a method returning a
`Result<Option<NodeIndex>>`, which is what we were really wanting in the
existing places anyway.
While the concept of an ordering node has existed in the workspace
graph, and the conflict/update detection logic, there was nothing to
create (and maintain) ordering nodes. It is now possible to create nodes
that have ordered children (not all children of a node must be in the
ordering), and to re-order the children after it has been created.

This does not yet test removing children from the ordering, when the
edge to them is removed, nor does it yet test any of the conflict/update
detection logic around ordering conflicts & updates.
This commit ensure RabbitMQ is running with the correct ports and
enabled plugin (i.e. the stream plugin). When running tests with this
commit, you must use "buck2 build component/rabbitmq" and "docker tag"
to ensure the image name and tag are correct.

This commit also fixes the lang-js dependent target bug that the dal
integration tests also experienced when the pnpm logic was changed
recently.

You can test the connection with the following command:
```
SI_TEST_BUILTIN_SCHEMAS=none buck2 run \
  lib/gobbler-server:test-integration -- \
  integration_test::connection::connect_to_queue
```

Signed-off-by: Nick Gerace <nick@systeminit.com>
Co-authored-by: Jacob Helwig <jacob@systeminit.com>
This commit adds an integration test for ensuring producer logic works
for the gobbler.

Signed-off-by: Nick Gerace <nick@systeminit.com>
- Ensure we can produce messages to RabbitMQ by setting the advertised
  host to localhost (otherwise, the initial environment connection smoke
  test will work, but the stream client connection will fail)
- Re-word manager, client and connection naming under the "environment"
  umbrella, which reflects how producer and consumer are named in the
  RabbitMQ wrapper crate

Signed-off-by: Nick Gerace <nick@systeminit.com>
Co-authored-by: Zachary Hamm <zack@systeminit.com>
The consumer domain has been refactored to support processing deliveries
externally rather than within methods. While the producer and
environment domains have been hardened, the consumer domain is brittle.

The gobbler will consume messages on the queue, but it will have to
notify the original producer of those messages when it is done. That
work does not exist in this commit, but the consuming part does.

Signed-off-by: Nick Gerace <nick@systeminit.com>
In addition to adding new tests for basic conflict & update detection,
this fixes a bug where the "seen" vector clock was being updated when
changing the order of an `OrderingNodeWeight`, instead of updating the
"write" vector clock.

This also removes an unnecessary additional update of the "write" vector
clock of the `EdgeWeight` when adding an edge. By assuming that the
vector clock is already up to date, testing becomes easier & more
stable.
If the items the two "versions" of the containers have in common aren't
in the same order, and there have been changes to the ordering in both
"versions", then that's a conflict that needs to be resolved.
…g onto one that removed a sibling of the new child

We consider it to not be a conflict when we have a change set (workspace
snapshot) that is attempting to add an item to an ordered container, and
the change set that we're attempting to rebase onto has removed an
element that the change set being rebased knows about, but has not
modified in any way, nor has the change set being rebased reordered any
of the pre-existing elements of the container.
Subscription (from NATS) will not be used as we will be using
RabbitMQ instead for gobbler-server. Since there was an upstream
change that breaks compilation, it is not worth saving this
specific Subscription usage.

Signed-off-by: Nick Gerace <nick@systeminit.com>
This commit adds the gobbler client/server architecture. Primarily, it
operates on three crates:

- si-rabbitmq (modified): the crate for interacting with RabbitMQ that
  SI services will use or are using
- gobbler-server (modified): the crate for running an outer management
  consumer loop as well as inner core consumer loops
- gobbler-client (new): the crate for interfacing with the
  gobbler-server and testing the communication between the two

The addition of gobbler-client is at the center of this commit.
Essentially, it is not only being used as the interface to
gobbler-server, but it is also being used as the entrypoint for testing
the gobbler-server lifecycle.

The outer and inner consumer loops provide the ability to manage gobbler
singletons as well as run the gobbler singletons.
The outer loop consumers on the "gobbler-management" stream and its
lifecycle is owned by the gobbler-server. The inner loops are created on
a per-change-set basis and are only spun up and down by requests from
gobbler-clients. The inner loops provide the core work for the
"gobbler". However, in this commit, all the inner loops do is directly
return the message they were sent. This is done for testing purposes to
ensure that all the scaffolding is there and sound.

As far as the inner RabbitMQ communications go, this commit exposes
"OffsetSpecification" for consumers. It is unclear if this will be a
temporary change or not. We should know the exact offset of type u64 in
the future, but for now, we are using what works for testing purposes.
For context, these changes were driven by tests in the gobbler-client library.

As a bonus, this commit also adds si-rabbitmq testin to protect against
regressions. It's minimal for now and may remain that way.

Signed-off-by: Nick Gerace <nick@systeminit.com>
Signed-off-by: Nick Gerace <nick@systeminit.com>
Rename gobbler to rebaser to avoid any unintentional connotations as
well as to provide clarity to the service's purpose.

Signed-off-by: Nick Gerace <nick@systeminit.com>
Primary:
- Use graph logic in the rebaser via a round-trip loop from a dal
  integration test
- Add rebaser-core to ensure that rebaser-client never relies on the dal
- Ensure change set loop deliveries reply to the clients if a failure
  occurs when processing a delivery
- Add timeouts when clients are waiting for rebaser replies
- Add rebaser-server to dal-test and si-test-macros

Secondary:
- Rename temporary change set struct to change set pointers
- Add change set pointers table, but allow for unit tests to continue
  using in-memory versions (i.e. use "new_local()" for construction)
- Add "query_none" for queries where we expect no rows to be returned

Signed-off-by: Nick Gerace <nick@systeminit.com>
Fix buck2 compilation errors by moving rebaser-client tests from unit
tests in the rebaser-core crate to integration tests in the
rebaser-server crate.

Not only does this make more sense from a domain perspective (why would
rebaser-core host thoses tests?), but now we can prevent cyclic
dependencies by using the integration test buck2 rule for clean
separation.

Signed-off-by: Nick Gerace <nick@systeminit.com>
The `EdgeWeightKind` variants aside from `Uses` were all using more
imperative style naming. This renames `Uses` to `Use` to more closely
match that style of naming.
The "kind" was changed to also include the content hash a while back,
and the combined kind + hash is what we're calling a `ContentAddress`.
This renames the vaiable to reflect this refactoring.

The `Debug` implementation for `ContentNodeWeight` was copied from
before `NodeWeight` was refactored into being a wrapper enum around
inner types, so now we're using the correct name for the struct in the
debug output.
Fix compilation error in rebaser integration test where the
"ContentAddress" export location changed.

Signed-off-by: Nick Gerace <nick@systeminit.com>
This commit adds the content-store and content-store-test crates, which
primarily revolve around the "Store" trait. They also provide the
ability to perform migrations and database setup for "PgStore", a user
of the aforementioned crate.

Why two crates? The test database setup and migration logic should
never be used in production, but it should remain portable. Thus, the
content-store-test crate exists.

The "PgStore" can have three different backing databases:
"si_content_store", "si_test_content_store", and
"si_test_content_store_<id>". The first is the database used for
production. The second is the global database used for integration
tests and the final copies its contents on a test-by-test basis.

The new content-store crate brings over the "content" module contents
from the dal. That includes the "ContentPair" concept as well as the
primitives migration.

Currently, there are two users of the "Store" trait:

- "LocalStore": exists entirely in-memeory and can be used everywhere
  (currently used for unit tests, but may have other use cases in the
  future)
- "PgStore": leverages pull-through caching, offers persistent writes
  to Postgres and can be used everwhere so long as _someone_ creates
  the database and performs migrations

Let's talk about integration tests then since they are the sole users of
the "PgStore" currently. The dal-test macro now includes setup for the
global "si_test_content_store" database. It mirrors the "si_test_dal"
database setup path, but is provided via a single client from the
content-store-test crate.

There's a catch: the "DalTestPgStore" wrapper should be used in dal
integration tests because dal-test does not provide a "PgStore" for
every single test yet. Rather than performing surgery in dal-test and
si-test-macros' expand module, we'll create "PgStores" with accompanying
test-specific databases on a case-by-case basis for now.

This commit contains secondary changes encountered during refactoring as
well:

- Remove "si_test_rebaser" and rebaser integration tests since the
  rebaser mainly uses dal logic and few meaningful tests can currently
  be conducted with the rebaser on its own
- Reduce log level for the rebaser skip message during test setup
- ChangeSetPointer and WorkspaceSnapshot queries, structs and migrations
  have slightly changed to adapt to the new changes
- Where relevant, we implement "TryFrom<PgRow>" instead of using json
  serialization for speed
- Co-locate all integration tests for the "Mostly Everything is a Node
  or an Edge" work

Signed-off-by: Nick Gerace <nick@systeminit.com>
Primary:
- Add content store to DalContext as PgStore instead of a generic that
  implements Store
  - This is something we should consider once "Mostly Everything is a
    Node or an Edge" is done since we do not want to add generic trait
    bounds to DalContext with everything else that's being refactored
- Use the new content store off the DalContext in the content store
  integration test
- Use the content store from the test context in the dal test macro
  expansion
  - For now, we keep the dal context builder immutable, but in the
    future, we should get rid of the new
    "build_default_with_content_store" method and allow the builder to
    be mutable (which would allow us to use a setter method and THEN use
    "build_default")
- Provide a content store for every integration test rather than on a
  case-by-case basis

Secondary:
- Rename PgMigrationHelpers to PgStoreTools and ensure the module's name
  reflects the change
- Make PgStore cloneable to satisfy DalContext, but don't change the
  Store trait or LocalStore to do the same

Signed-off-by: Nick Gerace <nick@systeminit.com>
nickgerace and others added 11 commits February 22, 2024 13:53
This commit restores the "force new" functionality for change sets in
the new engine. It uses the same interface through the existing
"ChangeSet" struct, but goes through the new "ChangeSetPointer"
interface under the hood.

As a result of this change, existing logic attempting to coerce change
set pointer data into visibility has been unified to one method.

Signed-off-by: Nick Gerace <nick@systeminit.com>
This commit restores the ability to update and list secrets. Updating
secrets remains the same, but now the referential secrets on the
snapshot are updated. Listing secrets uses the same interfaces, but now
the graph is purely used and mimics the original logic (this first pass
is likely unoptimized as a result).

This is also the first time integration tests have been restored that
aren't part of the "new_engine" module! The secrets tests return. We
needed to expose "WorkspaceSignup" from the "dal-test" crate as a
result, since "WorkspaceSignup" is still in the "dal" on main. The
"test_harness" module has been partially restored as well.

Signed-off-by: Nick Gerace <nick@systeminit.com>
Fix content pair creation races by only writing out and not returning
anything. We do not use long-running transactions with the content
store, and since there may be many users of the content store, we should
not be picky when writing out. Simply write out, do nothing if the pair
already exists, and move on.

Signed-off-by: Nick Gerace <nick@systeminit.com>
Co-authored-by: Paulo Cabral Sanz <paulo@systeminit.com>
When ordinal edges were added to the system as part of the dependent
values update work, three unit tests regressed. This commit returns
optional ordinal edge information from "add_ordered_edge" without
performing any additional fetch logic. The three tests then use that
information to assemble the missing edge.

Signed-off-by: Nick Gerace <nick@systeminit.com>
We ack with progress when pulling a message off of the stream in the
rebaser. However, we do not finishing acking the message once done. This
probably needs a further look for the long term, but this will stop the
bleeding in the interim.

Signed-off-by: Nick Gerace <nick@systeminit.com>
Co-authored-by: Victor Bustamante <victor@systeminit.com>
Signed-off-by: Victor Bustamante <victor@systeminit.com>
Co-authored-by: Nick Gerace <nick@systeminit.com>
Signed-off-by: Victor Bustamante <victor@systeminit.com>
Co-authored-by: Nick Gerace <nick@systeminit.com>
Signed-off-by: Victor Bustamante <victor@systeminit.com>
Co-authored-by: Nick Gerace <nick@systeminit.com>
This commit fixes the assembly of before funcs for a component. In
addition, it as a new test exclusive schema for testing secret
definitions and before funcs.

Primary changes:
- Fix before funcs assembly by looking at the direct child of the
  "/root/secrets" prop in the secret defining schema variant rather
  than looking at the "/root/secrets" prop itself
- Move auth func getter logic to its own private function

Test changes:
- Add "BethesdaSecret" test exclusive schema that contains a secret
  definition and a dummy qualification that helps test that the secret
  subsystem is working
- Add integration test in a new module to test that the secret
  definition subsystem is working

Misc changes:
- Add new method for prop to only fetch a single child prop or fail
  (existing places that checked for a single child prop now use this
  method)
- Fix misc lints related to unused imports

Signed-off-by: Nick Gerace <nick@systeminit.com>
Co-authored-by: Victor Bustamante <victor@systeminit.com>
This commit removes the override schema builtin feature flag. It also
centralizes accessor patterns for property editor values.

Signed-off-by: Nick Gerace <nick@systeminit.com>
@nickgerace nickgerace changed the title Convert mostly everything to become a node or an edge Switch to the new engine Mar 4, 2024
nickgerace and others added 2 commits March 4, 2024 17:20
Signed-off-by: Nick Gerace <nick@systeminit.com>
Co-authored-by: Scott Prutton <scott@systeminit.com>
Co-authored-by: Zachary Hamm <zack@systeminit.com>
Co-authored-by: Paul Stack <paul@systeminit.com
@nickgerace nickgerace marked this pull request as ready for review March 4, 2024 23:02
@nickgerace nickgerace force-pushed the new-engine branch 2 times, most recently from 83dc257 to df8ce28 Compare March 4, 2024 23:09
@nickgerace
Copy link
Copy Markdown
Contributor Author

bors try

si-bors-ng Bot added a commit that referenced this pull request Mar 4, 2024
@si-bors-ng
Copy link
Copy Markdown
Contributor

si-bors-ng Bot commented Mar 4, 2024

try

Build failed:

This commit contains clippy lints, fixes and general cleanup for making
the contents of the "new-engine" branch become main.

Signed-off-by: Nick Gerace <nick@systeminit.com>
Co-authored-by: Zachary Hamm <zack@systeminit.com>
Co-authored-by: Paulo Cabral Sanz <paulo@systeminit.com>
Co-authored-by: Scott Prutton <scott@systeminit.com>
Co-authored-by: Paul Stack <paul@systeminit.com>
Co-authored-by: Brit Myers <brit@systeminit.com>
Co-authored-by: Fletcher Nichol <fletcher@systeminit.com>
Co-authored-by: Adam Jacob <adam@systeminit.com>
@nickgerace
Copy link
Copy Markdown
Contributor Author

bors merge

@si-bors-ng
Copy link
Copy Markdown
Contributor

si-bors-ng Bot commented Mar 4, 2024

Build succeeded:

@si-bors-ng si-bors-ng Bot merged commit d7517d0 into main Mar 4, 2024
@si-bors-ng si-bors-ng Bot deleted the new-engine branch March 4, 2024 23:43
si-bors-ng Bot added a commit that referenced this pull request Mar 29, 2024
3487: Update Rust dependencies since merging `new-engine` into `main` (ENG-2428) r=nickgerace a=nickgerace

## Description

This PR updates every Rust-based dependency by its major, minor or patch version by default. It also moves the `postcard` version declaration from `dal` to the main `Cargo.toml` since the `alloc` feature is a subset of the `use-std` feature.

<img src="https://media1.giphy.com/media/26u4n5NamL40WyB1u/giphy.gif"/>

## Exceptions

Given the large scope of these changes, this PR does not seek to change any Rust code or fixups barring a small change regarding`EnumVariantNames`. Exceptions to the goal of this PR include the following:

- Hold `rustls`, `webpki-roots`, `tokio-vsock` and `yrs` to their minor versions due to API changes
- Hold `nix` and `ring` to their current versions due to needing buck2-specific changes
- Hold all hyper v1 dependencies to their minor versions due to a longer term effort needed to get us to hyper v1
- Hold `serde_yaml` to its current version due to being unmaintained as of [`0.9.34`](https://github.com/dtolnay/serde-yaml/releases/tag/0.9.34)

## Note

As a reminder, the `new-engine` was merged into `main` as of #3113.

Co-authored-by: Nick Gerace <nick@systeminit.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants