-
Notifications
You must be signed in to change notification settings - Fork 0
[OSC-1136, OSC-1137] - add 'compare' and 'complete' commands #5
[OSC-1136, OSC-1137] - add 'compare' and 'complete' commands #5
Conversation
…hash function) upon successful completion of a data pipeline execution
|
|
…le existed when checksum calculated
??? |
… transform models upon finishing a data pipeline execution
details already in story on JIRA |
modules/ModelChangeDetector.py
Outdated
| metavar='execution_id', | ||
| help='data pipeline execution id as received using \'start\' command') | ||
| finish_command_parser.add_argument('model_folder_paths', | ||
| metavar='model-folder-paths', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should model-folder-paths be snake case like execution_id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.. Good catch.. i mucked around with it a bit too much.. it intrinsically turns it into a variable with snake case but i was trying to be explicit with metavar and shit happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turns out the discrepancy was actually in the execution_id and model-folder-paths was right all along.
…ist model checksums against the last successful execution (b) update 'finish' command to only mark execution as completed
dbe8ace
(a) rename 'start' command to 'init' (b) rename 'finish' command to 'complete' (c) remove 'in progress' status
| __tablename__ = TABLE_NAME | ||
| __table_args__ = {'schema': Constants.DATA_PIPELINE_EXECUTION_SCHEMA_NAME} | ||
|
|
||
| id = Column('id', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also use PRIMARY_KEY_COL_NAME = 'id' here as well? Similar to DataPipelineExecutionEntity
d7d1a83
Adds the below new commands:
compare: Compares & persists SHA256-hashed checksums of the given models against those of the last successful execution. Returns comma-separated string of changed model names. Parameters required:execution-id: a GUID identifier of an existing data pipeline execution as returned by theinitcommand.model-type: type of models being processed e.g.:load,transform, etc. thismodel-typeis used to group the model checksums by and used to find and compare older ones.base-path: absolute or relative path to the models e.g.:./load,/home/local/load,C:/path/to/loadmodel-patterns: path-based patterns (relative tobase-path) to different models with extensions. models within a model-type must be named uniquely regardless of their file extension. e.g.:*.txt,**/*.txt,./relative/path/to/some_models/**/*.csv,relative/path/to/some/more/related/models/**/*.sqlcomplete: Marks the completion of an existing execution by updating a record for the same in the given database. Returns nothing unless there's an error. Parameter required:execution-id: a GUID identifier of an existing data pipeline execution as returned by theinitcommand.