Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metadata time traveling #83

Merged
merged 1 commit into from
Jan 9, 2023
Merged

Add metadata time traveling #83

merged 1 commit into from
Jan 9, 2023

Conversation

criccomini
Copy link
Contributor

Recap now supports time traveling when browsing the catalog. Users can provide an --as-of timestamp to see what databases, tables, schemas, and metadata looked like at a point in time. Let's look at some CLI examples!

List tables as of 2022-12-01:

recap catalog list \
    /databases/postgresql/instances/localhost/schemas/public/tables \
    --aa-of '2022-12-01'

Read the metadata for table as it looked on 2022-12-10 01:01:01:

recap catalog read \
    /databases/postgresql/instances/localhost/schemas/public/tables/table \
    --as-of '2022-12-10 01:01:01'

Search for all table metadata in some_db as it looked on 2022-10-12 04:23:43:

recap catalog search \
    "json_extract(metadata, '$.location.schema') = 'some_db'" \
    --as-of '2022-10-12 04:23:43'

Of course, these changes flow through to AbstractCatalog, DatabaseCatalog, and RecapCatalog as well. I also had to update the FastAPI server to support as_of query parameters.

The FilesystemCatalog is well and busted right now. I'll need to fix that or ditch it. Saving that for a subsequent PR.

Recap now supports time traveling when browsing the catalog. Users can provide
an `--as-of` timestamp to see what databases, tables, schemas, and metadata
looked like at a point in time. Let's look at some CLI examples!

List tables as of 2022-12-01:

    recap catalog list \
        /databases/postgresql/instances/localhost/schemas/public/tables \
        --aa-of '2022-12-01'

Read the metadata for `table` as it looked on 2022-12-10 01:01:01:

    recap catalog read \
        /databases/postgresql/instances/localhost/schemas/public/tables/table \
        --as-of '2022-12-10 01:01:01'

Search for all table metadata in `some_db` as it looked on 2022-10-12 04:23:43:

    recap catalog search \
        "json_extract(metadata, '$.location.schema') = 'some_db'" \
        --as-of '2022-10-12 04:23:43'

Of course, these changes flow through to AbstractCatalog, DatabaseCatalog, and
RecapCatalog as well. I also had to update the FastAPI server to support
`as_of` query parameters.

The FilesystemCatalog is well and busted right now. I'll need to fix that or
ditch it. Saving that for a subsequent PR.
@criccomini criccomini merged commit 1d522c3 into main Jan 9, 2023
@criccomini criccomini deleted the add-as-of-api branch January 9, 2023 23:00
criccomini added a commit that referenced this pull request Jan 10, 2023
I've removed FilesystemCatalog. The time travel changes in #83 have complicated
things on the filesystem. I thought of a few ways to implement this using
clever directory structures, but I don't think it's worth it right now. I'd
rather spend time on other things.

The benefit of removing this is it means even fewer dependencies. No `fsspec`
or `jq` needed.
criccomini added a commit that referenced this pull request Jan 10, 2023
I've removed FilesystemCatalog. The time travel changes in #83 have complicated
things on the filesystem. I thought of a few ways to implement this using
clever directory structures, but I don't think it's worth it right now. I'd
rather spend time on other things.

The benefit of removing this is it means even fewer dependencies. No `fsspec`
or `jq` needed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant