Skip to content

keng404/monitor_ica_analysis_run

Repository files navigation

Monitor ICA analysis run

This repo contains scripts and demo code to monitor and troubleshoot analysis runs in ICA

Scripts and demo code to monitor analysis runs in ICA

  • test_websocket.py
  • requirements.txt --- contains modules to run pip install on

If analysis run is InProgress --- this script hopes to help stream logs

If analysis run is completed (i.e. Succeeded or Failed)--- this script will download the logs

These logs will contain:

  • the stderr/stdout of ICA as it stages the analysis run before it runs it
  • the stderr/stdout collected at each step during an analysis run
  • the stderr/stdout of CWL/Nextflow as it orchestrates the analysis run
  • the stderr/stdout of ICA as it brings the result back to your ICA project

More details about the logs that ICA collects during an analysis run can be found here

You can use the docker image keng404/monitor_ica_analysis_run:0.0.2 with all the appropriate scripts and libraries installed

See here for the Docker image

Template command line

python3 test_websocket.py --api_key_file {FILE} [--project_name {STR}|--project_id {STR}] [OPTIONAL:--analysis_name {STR} | --analysis_id {STR}]
  • --api_key_file : path to text file that contains your API key
  • --project_name : name of your ICA project or --project_id : project id of your ICA project
  • --analysis_name : user_reference or name of your analysis run or --analysis_id : analysis id of the analysis you want to monitor

If both --analysis_name and --analysis_id are undefined, the script will try to grab/monitor logs from the most recent analysis run in your ICA project.

Rscript extension

  • An additional Rscript is provided to help parse the JSON message returned from the ICA getAnalysisSteps endpoint and provide a table containing steps to monitor a running pipeline. This can be particularly useful for nextflow-based pipelines. An example command-line to run this script can be found below:
 Rscript ica.analysis_table.R --process-steps $PWD/analysis_id_{ANALYSIS_ID}/step_metadata.txt
  • directory where step_metadata.txt is generated will be created by the python script above.

Limitations

  • Distinguishes between analysis runs that have the same user_reference
    • picks the most recent analysis with the user_reference name
  • ICA CLI limitation launching an ICA pipeline where you have a null (i.e. not specified) multi-value parameter. You won't be able to configure this in the CLI.
    • This is possible when launching via the API (default settings).

Supplementary addition to get CPU, memory, disk usage on ICA for each analysis/pipeline run

Adding logic to pull back kubernetes logs and metrics files to your ICA analysis run

See this file for recommendations

Getting CPU and memory usage in an ICA pipeline run --- follow recommendations above

Rscript ica_pipelines.check_out_workflow_metrics.R --db-file {db_file}

db_file is an SQLite DB generated by the kubernetes pod that runs your CWL/NF based ICA pipelines The R script will generate graphs that can be used to identify how to optimize your pipeline runs (i.e. w.r.t CPU and memory). This script is actually run when running the script test_websocket.py. You will see an warning message if the script cannot find a file metrics.db in the analysis run output.

Limitations of finding the db_file

If you specify your analysis output files in your analysis run request, this script, and move the metrics.db file to a user-defined location, this script will not work.

Your pipeline may have done this by using the ICA endpoints /api/projects/{project_id}/analysis:nextflow or /api/projects/{project_id}/analysis:cwl. See the swagger page here.

Your pipeline request would have included the following parameter shown below:

  "analysisOutput": [
    {
      "sourcePath": "string",
      "type": "FILE",
      "targetProjectId": "string",
      "targetPath": "string",
      "actionOnExist": "string"
    }
  ]

Todos

  • create Docker image bundling the python script and supplementary R scripts
  • create documentation identifying the edits required to pull back the SQLite DB generated by the kubernetes pod that runs your CWL/NF based ICA pipelines

About

demo code and scripts on monitoring Analysis run(s) in ICA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published