Execution Manager #84

Okanmercan99 · 2023-07-25T11:14:02Z

This issue describes how each execution of mapping jobs will be managed.

Jobs should be run in an asynchronous manner. If the job to be executed is valid, the execution should start in the background and job submission result should return to the client immediately. If the submitted job is not valid, an appropriate error message should be returned.
An ExecutionManager component should keep track of the active (i.e. running) jobs. This component should provide an API to stop running executions.
It should also be possible to start and stop individual mapping tasks. This would be required when a mapping is updated and it should be restarted.
There should be only one execution of a mapping task at the same time.
For file system streaming, processed files should be archived; mappings with errors should be aggregated in a separate file. This should be configurable.

Scenarios:

System restart / crash

Kafka stream
- Clear checkpoints config
  - true -> Existing checkpoint directory for the job will be deleted. Earliest and latest configs will apply as expected
  - false -> Records will be read as of the last offset
File system stream
- Clear checkpoints config
  - true -> Existing checkpoint directory for the job will be deleted. Existing files in the data source directory will be reprocessed. Users will need to put already processed files again to the data source directory monitored by Spark.
  - false -> Processing will continue from the last read point.

Mapping update (only the updated mapping task will be restarted)

Kafka stream
- Clear checkpoints config
  - true -> Existing checkpoint directory for the mapping will be deleted. Earliest and latest configs will apply as expected. (Note that if the config is set to latest, old records won't be affected by the mapping updates)
  - false -> Records will be read as of the last offset. Old records won't be affected from the mapping updates.
File system stream
- Clear checkpoints config
  - true -> Existing checkpoint directory for the mapping will be deleted. Existing files in the data source directory will be reprocessed. Users will need to put already processed files again to the data source directory monitored by Spark.
  - false -> Processing will continue from the last read point. Already processed records won't be affected from the mapping updates.

Technical specs:

Checkpoints should be per job (not per execution to be able to continue execution of a job in case of a crash) and per mapping task included in the job
There should be a configuration to clear Spark's checkpoint directory for a job as a whole and for individual mappings.

Sub-issues:

Available Bugs

When running a file streaming job, if new data source file is put in configured streaming folder, mapping task count is increased for that execution although the same mapping task is used.

Implemented REST endpoints to stop running streaming queries Relates to #84

Okanmercan99 assigned suatgonul, sinaci and Okanmercan99 Jul 25, 2023

suatgonul mentioned this issue Aug 31, 2023

Extend mapping execution with the option to clear Spark's checkpoint directory #93

Closed

suatgonul added a commit that referenced this issue Sep 1, 2023

👌 Implemented a registry to keep track of running streaming queries.

b0ba602

Implemented REST endpoints to stop running streaming queries Relates to #84

suatgonul added a commit that referenced this issue Sep 2, 2023

👌 Implemented a registry to keep track of running streaming queries.

1ec22cb

Implemented REST endpoints to stop running streaming queries Relates to #84

suatgonul added a commit that referenced this issue Sep 11, 2023

👌 Implemented a registry to keep track of running streaming queries.

76335f0

Implemented REST endpoints to stop running streaming queries Relates to #84

Okanmercan99 closed this as completed Dec 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execution Manager #84

Execution Manager #84

Okanmercan99 commented Jul 25, 2023 •

edited by suatgonul

Loading

Execution Manager #84

Execution Manager #84

Comments

Okanmercan99 commented Jul 25, 2023 • edited by suatgonul Loading

Okanmercan99 commented Jul 25, 2023 •

edited by suatgonul

Loading