Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixture generator tool for tests #195

Merged
merged 11 commits into from
May 17, 2022
Merged

Conversation

afterdusk
Copy link
Member

@afterdusk afterdusk commented May 12, 2022

Users of AKD might want to run integration tests against a fixed state of the directory. To this end, this PR adds a CLI tool that generates "fixture" files containing entire DB states, as well as directory updates between states. Users can then load this file to set up directories with fixed states, apply a given changeset, and assert that the output directory is as expected as part of their tests.

Follow-Up Tasks

  • Fixture reader so that users do not need to write their own parser.
  • Support using a database other than the in-memory one. Not necessary, see discussion
  • Support using a custom VRF key rather than the hardcoded one.
  • Support using a custom hasher rather than Blake3.
  • Better CLI binary name - it currently adopts the crate name (akd_integration_tests).
  • Error handling instead of unwraps.
  • Add mode to allow specification of a number of users to randomly generate updates for, without needing to list out the epochs or usernames.

I will create an issue tracking these tasks after this PR is approved.

Options

The available configuration parameters can be accessed with the --help flag:

akd_integration_tests 
This tool allows a directory to be created with specified and random contents, capturing the
directory state and epoch-to-epoch delta in an output file for use in debugging and as test fixtures

USAGE:
    akd_integration_tests [OPTIONS] --epochs <EPOCHS>

OPTIONS:
    -u, --user <USERS>
            Users and their associated update events. A username is expected, followed by a colon
            and a list of epochs OR (epoch, value). Usernames are expected to be utf-8 strings,
            which will be internally interpreted as bytes. The following are valid examples of user
            arguments: --user "username: 1, 3, (5, 'xyz')" --user="username: [(1,'abc'), 2]" -u
            "some username: 1"

    -e, --epochs <EPOCHS>
            Number of epochs to advance the tree by e.g. a value of 3 will perform 3 publishes on an
            empty directory

        --max_updates <MAX_UPDATES>
            Maximum number of updates **per epoch** the tool should perform. Note that all user
            events explicitly passed for an epoch will be included even if the number exceeds this
            value [default: 10]

        --min_updates <MIN_UPDATES>
            Minimum number of updates **per epoch** the tool should perform. The tool will generate
            random labels and values to include in an epoch if the user events explicitly passed for
            an epoch are not sufficients [default: 0]

    -s, --capture_states <CAPTURE_STATES>...
            Epochs where the state of the directory should be captured in the output e.g. the value
            3 will output all db records after epoch 3 is performed. Multiple values should be
            separated with ","

    -d, --capture_deltas <CAPTURE_DELTAS>...
            Epochs where the updates required to bring the directory to the epoch should be captured
            in the output. e.g. the value 3 will output all updates that were performed to advance
            the directory from epoch 2 to 3. Multiple values should be separated with ","

    -o, --out <OUT>
            Name of output file. If omitted, output will be printed to stdout

    -n, --no_generated_updates
            Stops tool from generating random updates in publishes. Use this if you want the tool to
            only use explicitly passed updates. Explicilty passed updates without values would still
            use randomly generated values

    -h, --help
            Print help information

Example

The following is a simple run with just 2 users and 1 epoch, as the fixture file does get pretty large for larger directories. The upcoming compression of the directory state in #189 should help with this.

cargo run -- \
    --user "User1: (1,’abc')" \
    --user "User2: 1" \
    --epochs 1 \
    --max_updates 2  \
    --capture_states 1 \
    --capture_deltas 1

Output:

# @generated This file was automatically generated by 
# the fixture generator tool with the following command:
# 
# cargo run -- \
#   --user User1: (1,’abc') \
#   --user User2: 1 \
#   --epochs 1 \
#   --max_updates 2 \
#   --capture_states 1 \
#   --capture_deltas 1 \

# Metadata
---
args:
  users:
    - label: "5573657231"
      events:
        - epoch: 1
          value: ~
    - label: "5573657232"
      events:
        - epoch: 1
          value: ~
  epochs: 1
  max_updates: 2
  min_updates: 0
  capture_states:
    - 1
  capture_deltas:
    - 1
  out: ~
  no_generated_updates: false
version: 0.0.0

# Delta - Epoch 1
---
epoch: 1
updates:
  - - "5573657232"
    - 496F684550476557707A46374548623253566F525A534246706A4745674A4876
  - - "5573657231"
    - 37473842516A656D4F565951376565573357636F57584A43334C456E52794D5A

# State - Epoch 1
---
epoch: 1
records:
  - HistoryNodeState:
      value: BA04DF61914ED989AA49483C51AA65A06700F4481EF6D335629C2CF713BFBA06
      child_states:
        - label:
            val: 0103D3A12A093B1D1D0A39EC77C4B20185CE3384E22CC25BCFC4563D54C5FC7E
            len: 256
          hash_val: 6C724ABB2EAA5FF1934E95D40136BDD0F3089592AEA7C8BDD4E6845F0B27FE13
          epoch_version: 1
        - label:
            val: 7A16746720D4DBAAF15D636AE7A638731203E25FC1FF76E07E347A25D6343F1D
            len: 256
          hash_val: DC3EB184BC9F94426271D9A6721EB3329468258A0A9023E1EFEC452D85937339
          epoch_version: 1
      key:
        - val: "0000000000000000000000000000000000000000000000000000000000000000"
          len: 1
        - 1
  - HistoryNodeState:
      value: AC4E406A635C407964A941871D3A0293BAC6BF721EA314D90C25975C25592FBE
      child_states:
        - label:
            val: "0000000000000000000000000000000000000000000000000000000000000000"
            len: 1
          hash_val: 024356D80D8DBD4D2017F31A4DC2A714C605EA0E27E903E9E3F40E09251E955C
          epoch_version: 1
        - ~
      key:
        - val: "0000000000000000000000000000000000000000000000000000000000000000"
          len: 0
        - 1
  - HistoryTreeNode:
      label:
        val: 0103D3A12A093B1D1D0A39EC77C4B20185CE3384E22CC25BCFC4563D54C5FC7E
        len: 256
      last_epoch: 1
      birth_epoch: 1
      parent:
        val: "0000000000000000000000000000000000000000000000000000000000000000"
        len: 1
      node_type: Leaf
  - Azks:
      latest_epoch: 1
      num_nodes: 4
  - HistoryTreeNode:
      label:
        val: "0000000000000000000000000000000000000000000000000000000000000000"
        len: 1
      last_epoch: 1
      birth_epoch: 1
      parent:
        val: "0000000000000000000000000000000000000000000000000000000000000000"
        len: 0
      node_type: Interior
  - HistoryNodeState:
      value: 2D3ADEDFF11B61F14C886E35AFA036736DCD87A74D27B5C1510225D0F592E213
      child_states:
        - ~
        - ~
      key:
        - val: "0000000000000000000000000000000000000000000000000000000000000000"
          len: 0
        - 0
  - HistoryTreeNode:
      label:
        val: "0000000000000000000000000000000000000000000000000000000000000000"
        len: 0
      last_epoch: 1
      birth_epoch: 0
      parent:
        val: "0000000000000000000000000000000000000000000000000000000000000000"
        len: 0
      node_type: Root
  - HistoryTreeNode:
      label:
        val: 7A16746720D4DBAAF15D636AE7A638731203E25FC1FF76E07E347A25D6343F1D
        len: 256
      last_epoch: 1
      birth_epoch: 1
      parent:
        val: "0000000000000000000000000000000000000000000000000000000000000000"
        len: 1
      node_type: Leaf
  - HistoryNodeState:
      value: 1ABF704F58B8807A5F7E971BCC2B23CBDBF90A146F02C9EBA7D6E1CAC179248F
      child_states:
        - ~
        - ~
      key:
        - val: 0103D3A12A093B1D1D0A39EC77C4B20185CE3384E22CC25BCFC4563D54C5FC7E
          len: 256
        - 1
  - HistoryNodeState:
      value: 3389E70D92C1D519762F293EC941A5EB5D6D5F99DFBF56B7756DDC0FC4F6A754
      child_states:
        - ~
        - ~
      key:
        - val: 7A16746720D4DBAAF15D636AE7A638731203E25FC1FF76E07E347A25D6343F1D
          len: 256
        - 1
  - ValueState:
      plaintext_val: 37473842516A656D4F565951376565573357636F57584A43334C456E52794D5A
      version: 1
      label:
        val: 7A16746720D4DBAAF15D636AE7A638731203E25FC1FF76E07E347A25D6343F1D
        len: 256
      epoch: 1
      username: "5573657231"
  - ValueState:
      plaintext_val: 496F684550476557707A46374548623253566F525A534246706A4745674A4876
      version: 1
      label:
        val: 0103D3A12A093B1D1D0A39EC77C4B20185CE3384E22CC25BCFC4563D54C5FC7E
        len: 256
      epoch: 1
      username: "5573657232"

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 12, 2022
Copy link
Contributor

@slawlor slawlor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some general ideas that should be investigated here, but as a starter (and since it's a test tool) I don't have any problems.

The one I would like investigated before merging is usage of serde_with as we're adding another dep. It's not the end of the world, but if we could get away without it, that would be ideal. Thanks for the great work!

akd/Cargo.toml Outdated Show resolved Hide resolved
akd/src/node_state.rs Outdated Show resolved Hide resolved
integration_tests/src/fixture_generator/parser.rs Outdated Show resolved Hide resolved
let comment = METADATA_COMMENT.to_string();
let metadata = Metadata {
args: args.clone(),
version: env!("CARGO_PKG_VERSION").to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is great for compat, but this also means we should start versioning this crate. And every commit on this crate should bump a version #. When we support reading back the metadata's we'll need versioned readers as well and should come up with a pattern of what we support.

Example:

  1. We won't support reading major version differences
  2. Minors are backwards, not fwd compat
  3. Build versions are backwards and forwards compatible (i.e. no schema change)

This should also go in a readme if we're going to try and do this (can be added when we actually read this back in however)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm additionally wondering if the comment-separated sections are going to be painful to parse downstream. Maybe it makes more sense to have a singular root struct which the N components exist in and write a single object? Add the header, no problem, but having a single struct should make deserialization a lot more straightforward I would think. In this case, we'll need to identify the section from the comments, and run a different parser each time right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single object idea SGTM. I guess we'd need an order-preserving serialization/deserialization in that case to make sense out of the generated file.

Copy link
Member Author

@afterdusk afterdusk May 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slawlor @eozturk1 I've thought about using a single struct to simplify parsing, but the main drawback is that we would have to hold the entire struct in memory until all data is populated and we are ready to write. serde_yaml (or serde_json for that matter) does not support writing a partial struct.

If we are capturing n states, this would mean holding n directories' worth of db records in memory at one time. This would limit the directory size of fixtures we can generate with the tool.

With the current approach, all I need to do is break the YAML up by the start marker (---) to get independent objects. For each chunk, I'll try to parse them as a certain struct, and move on to the next if it fails and only give up after attempting all structs. If this proves hard for some reason, I can experiment with using YAML tags to denote the struct types.

Does this make sense?

Copy link
Contributor

@eozturk1 eozturk1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added small comments, and a possible idea we can look into about storing node deltas to advance states without performing the actual computation.

The tests can start with a small tree, test the changes on it. Advance to a bigger tree by just applying the node delta, test the changes on the new tree etc. Let me know what you think.

Great work @afterdusk, thank you for working on this!

integration_tests/src/fixture_generator/generator.rs Outdated Show resolved Hide resolved
integration_tests/src/fixture_generator/parser.rs Outdated Show resolved Hide resolved
@afterdusk
Copy link
Member Author

Added small comments, and a possible idea we can look into about storing node deltas to advance states without performing the actual computation.

The tests can start with a small tree, test the changes on it. Advance to a bigger tree by just applying the node delta, test the changes on the new tree etc. Let me know what you think.

Ah, this was exactly my intention with this tool! A user can specify --capture_states 9,10 to get the directory state at epochs 9 and 10, and advance from 9->10 without performing actual computation by dropping 9's state and loading in 10's.

Capturing the entire directory state is less space efficient than just the node delta between epochs 9 and 10, but I think this gives rise to a simpler and more flexible design:

  • Firstly, we only need two types of structs - State and (update) Delta, no need for a third NodeDelta.
  • In order to advance a tree without computation, a user does not need to know how a node delta should be applied (i.e. what record should overwrite what). They can just dump the entire DB and load the next epoch's state.
  • Furthermore, if a user is only concerned about knowing the state for epochs far apart (say 1 and 10), we only need to output two state objects, rather than one state object and nine node delta objects.
  • Lastly, a node delta can be derived manually from diffing two states, so I would say the current design already implicitly contains the functionality.

Let me know if this makes sense!

@eozturk1
Copy link
Contributor

+1 on the list, great summary! I see the advantages of having NodeDeltas are that they are space efficient and they allow dynamic fixture generation. We could have a tool that consumes these deltas and can generate fixtures with different sizes on the fly. The advantage would be that the user wouldn't need to decide on what states to keep and discard at generation time. So I would say this is possible "future work" on top of the existing tool, please feel free to merge your changes! :)

@afterdusk afterdusk merged commit f754f16 into facebook:main May 17, 2022
@afterdusk afterdusk deleted the fixture-generator branch May 17, 2022 23:38
@afterdusk afterdusk mentioned this pull request May 17, 2022
7 tasks
@afterdusk afterdusk mentioned this pull request May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants