feat: allow persistence of runtime generated job metadata to the DB and allow addition of config-driven job attributes#113
Conversation
| <Button | ||
| styleType='text-blue' | ||
| as='externalLink' | ||
| href={`https://spark-history.data-platform.aws.pattern.com/history/${jobData?.spark_application_id}/jobs/`} |
There was a problem hiding this comment.
Shivang Nagta (@ShivangNagta) this is going to be shipped with oss docker image. Lets find another way to inject this in UI
Yash Shrivastava (alephys26)
left a comment
There was a problem hiding this comment.
If this is going to the jobs table, add something more generic, maybe a json that stores key-value and the column is extra_job_attributes. And the frontend then uses the key as display text and the value as the hyperlink for all the attributes that exist for that column.
Or some better approach, anyway, a specific column for spark application ID is not what I would like to see.
I have added an |
|
Also, the spark history server URL is now added in the cluster context instead |
|
I have moved the logic for defining the template for extra job attribute values to config itself(cluster). Now the runtime metadata is stored in-memory (output field in job struct), and is rendered based on what was passed in the template. It is finally persisted to the DB column - extra_job_atttributes as it was previously |
prasadlohakpure
left a comment
There was a problem hiding this comment.
Nice work, LGTM
9df3cab
c51e2f7 to
584c8d1
Compare

Description
Adds a generic, config-driven mechanism to attach extra metadata to a job and surface it in the UI, plus persistence of that metadata to the DB via a new extra_job_attributes column. The first use case is a "Spark History" link on the job details page for Spark-on-EKS jobs.
Rather than a plugin-specific column, attributes are defined in config on a cluster and rendered by core after the job runs:
Each attribute is label → { kind, value }. kind (link/text) tells the UI how to render, and value is a Go text/template rendered over four namespaces: .Job, .Command, .Cluster (context maps), and .Outputs (runtime values published by the plugin).
Flow:
Plugins publish raw runtime values to a transient outputs channel with one call - the sparkeks plugin captures Status.SparkApplicationID during monitoring and emits job.SetOutput("spark_application_id", id).
Core renders cluster.Attributes after Execute into job.ExtraJobAttributes and persists it to the new extra_job_attributes jsonb not null default '{}' column.
UI renders each attribute by kind on the job details page.
This means new attributes (static, or from existing plugin outputs) can be added by editing config alone - devs don't touch plugin code per parameter unless very specific runtime generated metadata needs to be stored. The outputs channel is transient (not persisted); only the rendered extra_job_attributes is stored.
Test
Tested locally (migration, persistence, UI);
Haven't done e2e testing in sandbox as this adds no extra API call for the id. The monitor loop
already fetches
Status(forAppState), andSparkApplicationIDis just another field on that same object, so reading it adds no new call or failure mode.Confirmed the spark-operator populates Status.SparkApplicationID at runtime by running the operator's spark-pi example on a local kind cluster and reading the field back.

Manual seeding for testing (for spark and non-spark job)

Button Rendering in Job Details Page (for a spark job)

Some Notes (open for comments)
spark_application_idis the first plugin-specific column onjobs(the other columns are generic). It looks a little bit odd to me but Claude's reasoning for it was - "there's no generic home for plugin runtime metadata as of now", which seems to be true, because our use case is to store a runtime generated data (spark_application_id) in the heimdall database. I could not find in any other plugins, doing something like this.Some other options could be:
a. add a separate table for storing
spark_application_idwith a foreign key reference to the original job table. This separates the spark specific data from generic job table but that adds an extra API/read call, and also does not avoid the fact that we would still have to add spark specific table update somewhere inupdateAsyncJobStatusfunction.b. If more plugins need to store runtime metadata, a generic
metadatacolumn may be preferable to per-plugin columns. But this seems like an early abstraction.EDIT
Option
bstyled approach was chosen after discussions