Add --no-commit and --no-run options #227

Open
frsyuki opened this Issue Jul 6, 2015 · 7 comments

Projects

None yet

4 participants

@frsyuki
Contributor
frsyuki commented Jul 6, 2015

I want to verify whether configuration is correct or not before actually running a transaction.
Idea here is that:

  • --no-commit: Begin a transaction, and run the bulk import job. But don't commit the transaction.
  • --no-run: Begin a transaction, but don't run the bulk import job.

It should be used with --resume-path option.

@frsyuki
Contributor
frsyuki commented Jul 6, 2015

In terms of implementation, we can add a hook points to BulkLoader#run and #resume.
For example,

public static class TransactionHook
{
    private TransactionHook() { ... }  // private

    public static interface BeforeCommit
    {
        public void beforeCommit(ExecSession exec, ProcessState /* or another class? */ state);
    }

    public static interface BeforeRun
    {
        public void beforeRun(ExecSession exec);
    }

    public static class Builder
    {
        public Builder beforeRun(BeforeRun hook) { ... }
        public Builder beforeCommit(BeforeCommit hook) { ... }
        public TransactionHook build() { ... }
    }
}

public ExecutionResult run(ExecSession exec, final ConfigSource config, TransactionHook)
{
   ...
}
@muga
Contributor
muga commented Jul 6, 2015

Good feature.

@dmnlk
dmnlk commented Jul 21, 2015

I think --dry-run is more familiar.

@sonots
Member
sonots commented Jul 21, 2015

👍

But, embulk run --no-run sounds somewhat awkward. What I requested is something like configtest. dry-run would be suitable.

@frsyuki
Contributor
frsyuki commented Jul 21, 2015

@dmnlk @sonots but --dry-run actually runs transaction (create temporary tables, etc.). although the transaction won't be committed, is it ok?

@sonots
Member
sonots commented Jul 21, 2015

It is not okay.

@frsyuki
Contributor
frsyuki commented Jul 21, 2015

--no-commit (starts transaction, runs bulkload, doesn't commit) and --no-run (starts transaction, doesn't run bulkload, doesn't commit) are easy to implement.
--dry-run (doesn't start, doesn't run bulk load, doesn't commit) is not easy. So my idea for now is to add --no-commit and --no-run.
I welcome further comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment