Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List job runs #121

Closed
wslulciuc opened this issue Oct 30, 2018 · 8 comments
Closed

List job runs #121

wslulciuc opened this issue Oct 30, 2018 · 8 comments
Assignees
Labels
feature question Further information is requested
Milestone

Comments

@wslulciuc
Copy link
Member

wslulciuc commented Oct 30, 2018

Let's support listing job runs. We'll want to update the OpenAPI spec accordingly #100 #104

GET /namespaces/:namespace/jobs

{
  "name": "my_first_job",
  "description": "Best job ever!",
  ...
  "runs": [
    "/jobs/runs/cfc4b5e6-c630-48d4-ad19-f2bd16c93a9d",
    "/jobs/runs/d33ef190-73bd-4a65-ab59-1bbd65364d0b",
    "/jobs/runs/5ced1097-8d59-46d8-933e-c9a688be8b8c",
    ...
  ]
}

We may want to list job runs when fetching a job by name. The endpoint above is not yet finalized, and something we can iterate on (just wanted to capture the functionality).

@wslulciuc wslulciuc added this to the v0.1.0 milestone Oct 30, 2018
@wslulciuc
Copy link
Member Author

/ cc @Liorba

@wslulciuc wslulciuc modified the milestones: 0.1.0, 0.2.0 Jan 3, 2019
@ashulmanWeWork ashulmanWeWork added the question Further information is requested label Jan 6, 2019
@ashulmanWeWork
Copy link
Contributor

ashulmanWeWork commented Jan 6, 2019

Before I started coding, I wanted to better understand the intent of the issue before using the API definition suggested above.

The main way today to get info about current job runs is with this endpoint--
namespaces/{namespace}/jobs/{job}/runs

If we want to list the active jobs run for each job when we list all jobs, that might involve some changes to how we represent the Job object in the API, namely that we'll need to add the job's current runs to its description. This is problematic b/c the service layer will now need to also know about that current job runs for every job. So this change will permeate all the layers.

If the goal here is to get all the job runs in a namespace, could we instead create an endpoint for that? For example /namespaces/:namespace/job_runs, or create a new endpoint, /jobs/runs/, that will list all the job runs, with the optional parameter of namespace, which will get all the job runs for the namespace.

@wslulciuc
Copy link
Member Author

wslulciuc commented Jan 6, 2019

It might help to highlight the question we are looking to answer:

Given job K, return a list of runs L

So, let's say I have the job my_job under the namespace my_namespace. The API call:

GET namespaces/my_namespace/jobs/my_job/runs

will return the list of runs for my_job. Cool. Now, maybe I'm only interested in failed run attempts, we can and should add the filter run_state to limit our results:

GET namespaces/my_namespace/jobs/my_job/runs?run_state=FAILED

Note that the jobs/runs/* endpoints were introduced to simplify interactions with a single job run instance, not a global list.

But, now let's say we wanted to view runs for multiple jobs: my_job, my_other_job, then that would require following the steps outlined above for each job.

The issue does outline (possibly) returning a list of runs when retrieving a job:

  ...
  "runs": [
    "/jobs/runs/cfc4b5e6-c630-48d4-ad19-f2bd16c93a9d",
    "/jobs/runs/d33ef190-73bd-4a65-ab59-1bbd65364d0b",
    "/jobs/runs/5ced1097-8d59-46d8-933e-c9a688be8b8c",
    ...
  ]

My thinking here is that it's more of an optimization for the caller. Maybe we return the last N completed runs or something similar, but not a feature we'd need to support in release 0.2.0.

@wslulciuc
Copy link
Member Author

wslulciuc commented Jan 6, 2019

continued: I think it would be fine to define the endpoint:

GET /jobs/runs

returning a list of run IDs. But to learn more about the job run, the caller would have to make another API call:

GET /jobs/runs/cfc4b5e6-c630-48d4-ad19-f2bd16c93a9d

@ashulmanWeWork
Copy link
Contributor

ashulmanWeWork commented Jan 11, 2019

I like the idea of calling GET /jobs/runs, and then potentially filtering by namespace or by namespace and job_name. For example, to find all the jobs runs for namespace finance:
GET /api/v1/jobs/runs?namespace=finance
To find info about any runs for a given job, say quarterly_billings, which resides in the finance namespace, we would do this:
GET /api/v1/jobs/runs?namespace=finance?&job_name=quarterly_billings

Does this sound like a sensible way to proceed?

@wslulciuc
Copy link
Member Author

wslulciuc commented Jan 11, 2019

The endpoint /jobs/runs/{id} has a fundamental assumption:

The caller may or may not know the namespace and/or the job associated with {id}.

That is, a run ID would encode both the namespace and job version associated with the run instance, but this is very much an internal association maintained by Marquez.

I guess I'm not really sure how adding the filters namespace or job to /jobs/runs is any different than:

GET namespaces/{namespace}/jobs/{job}/runs

The call above would return a list of runs, allowing the caller to filter runs by job under a given namespace. What it wouldn't allow you to do is filter runs only by job name, but that's not a feature we have thought about supporting.

/cc @sshah-wework @hougs

@sshah-wework
Copy link
Contributor

Chiming in here, would rather see

GET namespaces/{namespace}/jobs/{job}/runs

Than

/jobs/runs?namespace=...&job=

since the first is more canonical. the /jobs/runs endpoint was just meant to be convenient for fetching a single run by ID, but I'd rather not support more functionality from that endpoint.

One day, we may want the equivalent lookup via a id filter (e.g. GET namespaces/{namespace}/jobs/{job}/runs?id=...) for consistency.

This was referenced Mar 28, 2019
@wslulciuc
Copy link
Member Author

Being added in #633

phixMe added a commit that referenced this issue Nov 12, 2020
* Fixing the lint toolchain and running the lint configuration.

Signed-off-by: Peter Hicks <peter@datakin.com>

* Simplifying the readme for the lint system.

Signed-off-by: Peter Hicks <peter@datakin.com>

Co-authored-by: Peter Hicks <peter@datakin.com>
wslulciuc added a commit that referenced this issue Dec 3, 2020
Signed-off-by: wslulciuc <willy@datakin.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants