You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue describes how each execution of mapping jobs will be managed.
Jobs should be run in an asynchronous manner. If the job to be executed is valid, the execution should start in the background and job submission result should return to the client immediately. If the submitted job is not valid, an appropriate error message should be returned.
An ExecutionManager component should keep track of the active (i.e. running) jobs. This component should provide an API to stop running executions.
It should also be possible to start and stop individual mapping tasks. This would be required when a mapping is updated and it should be restarted.
There should be only one execution of a mapping task at the same time.
For file system streaming, processed files should be archived; mappings with errors should be aggregated in a separate file. This should be configurable.
Scenarios:
System restart / crash
Kafka stream
Clear checkpoints config
true -> Existing checkpoint directory for the job will be deleted. Earliest and latest configs will apply as expected
false -> Records will be read as of the last offset
File system stream
Clear checkpoints config
true -> Existing checkpoint directory for the job will be deleted. Existing files in the data source directory will be reprocessed. Users will need to put already processed files again to the data source directory monitored by Spark.
false -> Processing will continue from the last read point.
Mapping update (only the updated mapping task will be restarted)
Kafka stream
Clear checkpoints config
true -> Existing checkpoint directory for the mapping will be deleted. Earliest and latest configs will apply as expected. (Note that if the config is set to latest, old records won't be affected by the mapping updates)
false -> Records will be read as of the last offset. Old records won't be affected from the mapping updates.
File system stream
Clear checkpoints config
true -> Existing checkpoint directory for the mapping will be deleted. Existing files in the data source directory will be reprocessed. Users will need to put already processed files again to the data source directory monitored by Spark.
false -> Processing will continue from the last read point. Already processed records won't be affected from the mapping updates.
Technical specs:
Checkpoints should be per job (not per execution to be able to continue execution of a job in case of a crash) and per mapping task included in the job
There should be a configuration to clear Spark's checkpoint directory for a job as a whole and for individual mappings.
Sub-issues:
Interactive CLI should be able to run, stop and list streaming queries
Running status information should be included in the execution summary
Simultaneous execution of the same mapping should be prevented
Checkpoint resetting
Long-running batch should be tracked
Services returning logs does not have spesific models. A new model needs to be built to cover all types of logs.
Currently, a startTime (the time streaming started) and endTime (the time streaming ended) are calculated in the frontend for the streaming mapping tasks. We need to calculate them and create a well-formatted response in the backend instead.
If there is no log in batch mappings, we cannot show the mapping URL on the execution detail page. Adding the started log may be the solution for this.
If there is no log for streaming mappings, we do not show the mapping URL. Again, adding a mapping started log for that mapping task could be a solution.
While running file streaming mapping task, if a data source (csv) gets invalid fhir resource error while writing to the onfhir server, that execution stops reading new data source files which means streaming execution stops.
Available Bugs
When running a file streaming job, if new data source file is put in configured streaming folder, mapping task count is increased for that execution although the same mapping task is used.
The text was updated successfully, but these errors were encountered:
This issue describes how each execution of mapping jobs will be managed.
Scenarios:
Technical specs:
Sub-issues:
Available Bugs
The text was updated successfully, but these errors were encountered: