-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support for non-DB state backends (s3
, dynamodb
, etc.)
#5981
Comments
The plugin approach discussed in #6270 (comment) I think does make sense to explore more. Particularly in light of efforts with #6130 |
I'm running Meltano in a container aswell, but I'm having a hard time figuring out what the best way is to upload that file to S3 before the container is deleted. There's no on-run-end hook in Meltano right? |
@rickiesmooth which file are asking about? There's a few ways I could imagine doing this but there's currently no |
Sorry I was referencing the workaround and how it uploads the state file. |
@rickiesmooth ah ok - I see now. You could have a tap and target that does that extract for you - it's possible there's already one for reading from a SQLite DB as well. |
ah that would be nice, for now I just do: meltano state get dev:tap-google-search-console-to-target-bigquery > meltano_state/gsc_state.json
meltano state get dev:tap-google-analytics-to-target-bigquery > meltano_state/ga_state.json
aws s3 sync meltano_state s3://$S3_BUCKET/meltano_state after meltano ran |
@rickiesmooth that's good to know! I think we will be building this into Meltano natively though as it's going to be a pre-req for our future Managed offering. |
s3
, dynamodb
, etc.s3
, dynamodb
, etc.
There have been some synchronous discussions on this--documenting the results of those here. The v1 of this feature is going to be to use a third-party library (likely smart_open) that will allow us to support users configuring state backends in the form of a simple URI, e.g. Creating state backends is going to require decoupling state from job history, so we'll need to tackle #3340 before getting started on the actual state backend implementation. This implementation will be done in such a way as to lay the foundation for user contributed state backend plugins in some future iteration, but "pluggable" backends are out of scope for the time being. The URI approach solves for a huge number of use cases and takes us one step closer to eliminating the need for a postgres backend in production deployments without the heavy lift of supporting arbitrary plugins. |
As I've been getting into the weeds on this, it's increasingly become clear that the best approach to this is much more entangled with #3340 than initially thought and a lot of this work will be front-loaded into that PR. I've also realized that I've spent a lot of time heads down on this without any PRs or touch points so here's a quick brain dump of the current status and timeline along with a more thorough implementation spec to make sure everyone is on the same page about what's being delivered here. Status and TimelineCurrent status is that the refactoring work for this has turned out to touch a lot of the codebase and existing testing suite. Since we're refactoring jobs and state as part of #3340, the PR for that issue will include a basic v1 for the state backend approach, but using systemdb as the first supported state backend. #3340's PR will have rewritten the way we manage state entirely to be basically the same pattern as we use for managing settings. We're rewriting Since this is a significant refactor and major new feature, it's difficult to chunk up into small iterations. Current timeline for releasing StateBackends will likely be Iteration 16. It should be ready to go in the first release after that and I'll have a demo ready to go as well. Implementation SpecStateServiceIn rolling out the StateStoreManagerWe're writing a new ConfiguringIn this v1, state backend will be configurable at the project level and will be a top-level key in |
@cjohnhanson appreciate the write-up and context setting 👍 |
👍 🎯
👍
Emphasis on 'small' iterations, yes? Small or not, does this seem like the right approach to breaking this into chunks?:
Do I have this right @cjohnhanson ? |
s3
, dynamodb
, etc.s3
, dynamodb
, etc.)
@aaronsteers --
100%, that's exactly the plan.
Yeah, I think that makes sense as the next step and should be pretty straightforward to implement after these changes. |
Spec discussion:
I think we could introduce state backends such as
s3
,dynamodb
, and other backends that have better reliability than an RDBMS and near-zero always on cost.In the future, a Meltano-managed state offering, similar to Pulumi's default experience.
Originally posted by @aaronsteers in #2520 (comment)
Additional context:
state
table in Meltano systemdb #3340 logged to refactor this so a single table row would be the "backend" to read and write from.state
table in Meltano systemdb #3340 - or at least the same refactoring would (likely) be needed in both cases to eliminate the need for scanning history logs.Workarounds:
#2520 talks about potential workaround, but basically the current workaround is to:
meltano state get...
to pull the latest state into a file.meltano state set ...
This works with our without a postgres or other long-lived rdmbs, since the built-in sqlite implementation is created on the fly if no postgresdb is specified, and the process above essentially just removes the long-term state storage requirement from the sqlite backend.
The text was updated successfully, but these errors were encountered: