Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralized metadata storage #5

Closed
wlandau opened this issue Mar 29, 2020 · 14 comments
Closed

Centralized metadata storage #5

wlandau opened this issue Mar 29, 2020 · 14 comments

Comments

@wlandau
Copy link
Collaborator

wlandau commented Mar 29, 2020

For performance. Need some kind of database.

@wlandau
Copy link
Collaborator Author

wlandau commented Mar 30, 2020

Cache in memory as well as storage.

This was referenced Mar 30, 2020
@wlandau
Copy link
Collaborator Author

wlandau commented Mar 30, 2020

Affects the targets an the store. Maybe we need another class. But then it might smell of a data class

@wlandau
Copy link
Collaborator Author

wlandau commented Mar 30, 2020

Order of implementation:

  1. New metadata class
  2. New metadata field of target class
  3. Centralized metadata storage
  4. In-memory cache of stored metadata

@wlandau
Copy link
Collaborator Author

wlandau commented Mar 31, 2020

Everything in a target is metadata except for the key and command. A new metadata level would be awkward. Fields of a target:

  • cmd
  • key
  • hash_deps
  • hash_value
  • format
  • seed
  • seconds
  • value

@wlandau
Copy link
Collaborator Author

wlandau commented Mar 31, 2020

That's already a lot. Maybe we should decompose this.

  • key
  • cmd (already has subfields)
  • file (already has subfields)
  • result
    • value
    • seconds
    • warnings
    • error
  • settings
    • format
    • seed

@wlandau
Copy link
Collaborator Author

wlandau commented Apr 2, 2020

Targets can now read and write their own values. They should be able to read and write their own metadata too.

@wlandau wlandau changed the title Centralized metadata Centralized metadata storage Apr 2, 2020
@wlandau
Copy link
Collaborator Author

wlandau commented Apr 4, 2020

Dependency tracking and hashing needs to be centralized too.

@wlandau
Copy link
Collaborator Author

wlandau commented Apr 4, 2020

No need to reconstruct a whole target class from storage either. But it needs to be planned out thoughtfully. Need to sit back and let this one linger.

@wlandau
Copy link
Collaborator Author

wlandau commented Apr 11, 2020

Let's look at RSQLite for this.

@wlandau
Copy link
Collaborator Author

wlandau commented Apr 22, 2020

Let's consider a custom txtq-like database without filelock or base64url. Could even give us automatic history if we want it (probably want to opt out of that though so storage doesn't run away from us). txtq and especially RSQLite are slow and heavy.

Another thing: let's keep a copy of the current DB in memory (at least the relevant records) and push in bulk at regular time intervals instead of transacting once per target. That should scale better in terms of speed.

@wlandau
Copy link
Collaborator Author

wlandau commented Apr 22, 2020

fstpackage/fst#91 would be super nice for this.

@wlandau
Copy link
Collaborator Author

wlandau commented May 15, 2020

The meta class should decide whether targets are up to date. Algorithms get metas as objects. Subclasses of meta include the null meta (no dependency watching, should be the default for testing) the data.table-based meta (default for users) and other database-based metas that could be higher performant in some cases. Nontrivial metas need config files to denote the class.

@wlandau
Copy link
Collaborator Author

wlandau commented May 15, 2020

But all we need to worry about for quite some time will be the trivial meta and the data.table-based meta. And that's only to make testing faster.

@wlandau
Copy link
Collaborator Author

wlandau commented May 23, 2020

Got a meta class. Next steps:

  • Begin testing.
  • Make a class for metadata records.
  • meta$produce_record()
  • Test produce_record() for builders and branches.
  • hash_imports() should return a data frame of names, hashes, and classes.
  • Implement meta$record_imports() to record the import hashes. Correction: we just need meta$set_data(hash_imports(envir)).
  • Factor metadata objects into algorithm objects.
  • Initialize meta objects inside algorithms. Use record_imports() to jot down environment hashes.
  • Record metadata after each target's build. Use meta$record_target(target).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant