# Workflow Cost Estimator
This notebook demonstrates cost estimation for finished or in-progress workflows.

This is an experimental feature:
  - Cost estimates may not be accurate.
  - CPUs, Memory, and runtime is pulled from Terra's Firecloud API
    [monitorSubmission](https://api.firecloud.org/#/Submissions/monitorSubmission) endpoint. This information is
    available for 42 days after workflow completion.
  - GCP Instance type is assumed custom configurations of eith N1 or N2 instance type.

*author: Brian Hannafious, Genomics Institute, University of California Santa Cruz*

Install the newest version of [terra-notebook-utils](https://github.com/DataBiosphere/terra-notebook-utils)

In [1]:
%pip install --upgrade --no-cache-dir git+https://github.com/DataBiosphere/terra-notebook-utils

Collecting git+https://github.com/DataBiosphere/terra-notebook-utils
  Cloning https://github.com/DataBiosphere/terra-notebook-utils to /tmp/pip-req-build-z026z9kf
  Running command git clone -q https://github.com/DataBiosphere/terra-notebook-utils /tmp/pip-req-build-z026z9kf
  Resolved https://github.com/DataBiosphere/terra-notebook-utils to commit a7985f11c220638b5c61ce3bd351b315df8fc084
Collecting google-cloud-storage<1.39.0,>=1.38.0
  Downloading google_cloud_storage-1.38.0-py2.py3-none-any.whl (103 kB)
[K     |████████████████████████████████| 103 kB 5.4 MB/s eta 0:00:01:01
[?25hCollecting gs-chunked-io<0.6,>=0.5.1
  Downloading gs-chunked-io-0.5.2.tar.gz (8.1 kB)
Collecting firecloud
  Downloading firecloud-0.16.31.tar.gz (53 kB)
[K     |████████████████████████████████| 53 kB 29.1 MB/s eta 0:00:01
[?25hCollecting bgzip<0.4,>=0.3.5
  Downloading bgzip-0.3.5.tar.gz (61 kB)
[K     |████████████████████████████████| 61 kB 12.4 MB/s eta 0:00:01
[?25hCollecting cli-builder<0.2,>

  Building wheel for gs-chunked-io (setup.py) ... [?25ldone
[?25h  Created wheel for gs-chunked-io: filename=gs_chunked_io-0.5.2-py3-none-any.whl size=9584 sha256=31a360eaf38c8b38d500f1c71ee9c6bfc3dc884aa1a542d38d19ea04279ff91a
  Stored in directory: /tmp/pip-ephem-wheel-cache-3ouycfp4/wheels/d9/17/31/ce24f67f7553e48d320f575bb0f91006ad96402af8a460b3eb
  Building wheel for firecloud (setup.py) ... [?25ldone
[?25h  Created wheel for firecloud: filename=firecloud-0.16.31-py3-none-any.whl size=53438 sha256=38cda46e337b94dc8d44e73b804ac639d72de41ba83c4baeae16abe6593f5e81
  Stored in directory: /tmp/pip-ephem-wheel-cache-3ouycfp4/wheels/df/5d/2a/cd382b7648f96c90a2fd0114807d83697c9a6c217b0d07d9fe
  Building wheel for wrapt (setup.py) ... [?25ldone
[?25h  Created wheel for wrapt: filename=wrapt-1.12.1-cp37-cp37m-linux_x86_64.whl size=71715 sha256=5f56b8da14d11f2d2afb55dd1152380ce6086258788bec0abccd2e2cd517dd00
  Stored in directory: /tmp/pip-ephem-wheel-cache-3ouycfp4/wheels/62/76/4c/aa2

Define some useful functions.

In [1]:
from terra_notebook_utils import costs, workflows, WORKSPACE_NAME, WORKSPACE_GOOGLE_PROJECT

def list_submissions_chronological(workspace: str=WORKSPACE_NAME,
                                   workspace_namespace: str=WORKSPACE_GOOGLE_PROJECT):
    listing = [(s['submissionDate'], s) for s in workflows.list_submissions(workspace, workspace_namespace)]
    for date, submission in sorted(listing):
        yield submission

def cost_for_submission(submission_id: str,
                        workspace: str=WORKSPACE_NAME,
                        workspace_namespace: str=WORKSPACE_GOOGLE_PROJECT):
    workflows_metadata = workflows.get_all_workflows(submission_id, workspace, workspace_namespace)
    for workflow_id, workflow_metadata in workflows_metadata.items():
        shard_number = 1  # keep track of scattered workflows
        for shard_info in workflows.estimate_workflow_cost(workflow_id, workflow_metadata):
            shard_info['workflow_id'] = workflow_id
            shard_info['shard'] = shard_number
            shard_number += 1
            yield shard_info

def estimate_job_cost(cpus: int, memory_gb: int, disk_gb: int, runtime_hours: float, preemptible: bool) -> float:
    disk = costs.PersistentDisk.estimate(disk_gb, runtime_hours * 3600)
    compute = costs.GCPCustomN1Cost.estimate(cpus, memory_gb, runtime_hours * 3600, preemptible)
    return disk + compute

List submissions in chronological order.

In [2]:
for s in list_submissions_chronological():
    print(s['submissionId'], s['submissionDate'], s['status'])

cddabaa0-2b81-4b71-b475-c9ee577538bf 2021-09-15T10:28:24.511Z Done
836e43c3-d398-4e3c-b47f-5842352386fc 2021-09-15T14:07:44.815Z Done
f3466e2b-c5ba-42dd-b78b-67a9817e6e8d 2021-09-15T14:17:06.035Z Done
b7171dd9-f5f0-4dc9-962a-e7a44771dd5e 2021-09-15T20:42:59.677Z Done
ba5bf5d6-90a3-4661-aa4a-135aef334a6f 2021-09-16T13:37:32.626Z Done
3aa0a027-40bd-4ee9-9d23-194d93bd7f9b 2021-09-16T16:42:07.795Z Done
4d5623d2-436a-41c3-8d7c-c52a01540d51 2021-09-21T17:30:21.658Z Aborted
35144888-031c-49b3-9102-e9085594d153 2021-09-21T19:09:59.414Z Done
da5b23a1-73b8-4e2f-ae18-9cadf794e0d2 2021-09-21T22:13:35.123Z Done
3021ced2-9617-4c97-a5eb-9ff8ac63d408 2021-09-21T23:02:07.923Z Done
0d32d7c9-9512-4db0-b6c9-6654f840de15 2021-09-22T05:29:33.012Z Done
1a3f5c55-561b-4dac-bab4-4a237574f3bd 2021-09-22T10:04:54.061Z Done
7c6b753c-4b80-4cef-ace4-5438279af7c9 2021-09-22T10:40:10.962Z Done
ae8fa1bc-bc1a-4753-b013-3db4f11d3ad4 2021-09-22T11:30:15.972Z Aborted
925461b3-bc1c-48e0-9e15-7d9a1b9d1443 2021-09-23T09:28:12

In [3]:
submission_id = "5710f9e5-759b-41e9-839f-a8a5af3efb0c"  # Uncomment and insert your submission id here
total_cost = 0
print("%37s" % "workflow_id",
      "%30s" % "task_name",
      "%5s" % "cpus",
      "%7s" % "memory",
      "%7s" % "disk",
      "%9s" % "duration",
      "%7s" % "cost")
for shard_info in cost_for_submission(submission_id):
    total_cost += shard_info['cost']
    print("%37s" % shard_info['workflow_id'],
          "%30s" % shard_info['task_name'],
          "%5i" % shard_info['number_of_cpus'],
          "%5iGB" % shard_info['memory'],
          "%5iGB" % shard_info['disk'],
          "%8.2fh" % (shard_info['duration'] / 3600),  # convert from seconds to hours
          "%7s" % ("$%.2f" % shard_info['cost']))
    shard_info['duration'] /= 3600  # convert from seconds to hours
print("%108s" % ("total_cost: $%.2f" % round(total_cost, 2)))

                          workflow_id                      task_name  cpus  memory    disk  duration    cost
 d9cf0051-2a80-48b7-985f-1f0dffb36eb3            DiscoverBreakpoints     8     7GB    10GB     3.00h   $0.19
 d9cf0051-2a80-48b7-985f-1f0dffb36eb3            DiscoverBreakpoints     8     7GB    10GB     2.91h   $0.18
                                                                                           total_cost: $0.37


Explore costs for potential workflow configurations and runtimes.

In [10]:
# Define configurations for: cpus, memory(GB), runtime(hours), preemptible
configurations = [(10, 64, 700, 5, False),
                  (8, 32, 700, 10, False),
                  (10, 64, 700, 5, True),
                  (8, 32, 700, 10, True),
                  (8, 32, 400, 10, True),
                  (8, 32, 400, 10, True),
                  (1, 2, 10, 100, True)]

print("%8s" % "cpus",
      "%8s" % "memory",
      "%8s" % "disk",
      "%8s" % "runtime",
      "%12s" % "preemptible",
      "%8s" % "cost")
for cpus, memory_gb, disk_gb, runtime_hours, preemptible in configurations:
    cost = estimate_job_cost(cpus, memory_gb, disk_gb, runtime_hours, preemptible)
    print("%8i" % cpus,
          "%6iGB" % memory_gb,
          "%6iGB" % disk_gb,
          "%7ih" % runtime_hours,
          "%12s" % str(preemptible),
          "%8s" % ("$%.2f" % cost))

    cpus   memory     disk  runtime  preemptible     cost
      10     64GB    700GB       5h        False    $3.27
       8     32GB    700GB      10h        False    $4.46
      10     64GB    700GB       5h         True    $0.84
       8     32GB    700GB      10h         True    $1.24
       8     32GB    400GB      10h         True    $1.08
       8     32GB    400GB      10h         True    $1.08
       1      2GB     10GB     100h         True    $0.94


## Contributions
Contributions, bug reports, and feature requests are welcome on:
  - [terra-notebook-utils GitHub](https://github.com/DataBiosphere/terra-notebook-utils) for general functionality.
  - [featured-notebooks GitHub](https://github.com/DataBiosphere/featured-notebooks) for this notebook.