| layout | title | zenodo_link | questions | objectives | time_estimation | key_points | contributors | requirements | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
tutorial_hands_on |
Recording Job Metrics |
|
15m |
|
|
Overview
{:.no_toc}
Job metrics record properties of the jobs that are executed, information that can help you plan for trainings or plan capacity for further expansions of your Galaxy server.
Agenda
- TOC {:toc}
{: .agenda}
Metrics
Galaxy includes a built-in framework to collect job metrics and store these in its database. Some work was done to try and analyse job runtime metrics to optimise cluster allocation based on job inputs, and enhance job submission ({% cite Tyryshkina_2019 %}). More work will be done in this area.
{% icon comment %} Note
Job metrics are only visible to Galaxy admin users, unless you set
expose_potentially_sensitive_job_metrics: true, like UseGalaxy.eu does. EU's intention with this is to empower users and make everything as transparent as possible.
{: .comment}
Setting up Galaxy
By default, Galaxy enables the core metrics:
These include very basic submission parameters. We want more information!
{% icon hands_on %} Hands-on: Setting up the job metrics file
Create the file
templates/galaxy/config/job_metrics_conf.xml.j2with the following contents:<?xml version="1.0"?> <job_metrics> <core /> <cpuinfo /> <meminfo /> <uname /> <env /> <cgroup /> <hostname /> </job_metrics>You can see the sample file for further options regarding metrics.
Edit your playbook to install the package named
cgroup-toolsin a pre-task (with git/make/etc). This package is required to usecggetwhich is used in metrics collection.Edit the group variables file,
group_vars/galaxyservers.yml:You'll need to make two edits:
- Setting the
job_metrics_config_file, to tell Galaxy where to look for the job metrics configuration.- Adding the file to the list of
galaxy_config_templatesto deploy it to the server:{% raw %}
--- galaxyservers.yml.old +++ galaxyservers.yml galaxy_config: galaxy: + job_metrics_config_file: "{{ galaxy_config_dir }}/job_metrics_conf.xml" brand: "My Galaxy" admin_users: admin@example.org database_connection: "postgresql:///galaxy?host=/var/run/postgresql" @@ -120,6 +121,8 @@ gie_proxy_setup_service: systemd gie_proxy_sessions_path: "{{ galaxy_mutable_data_dir }}/interactivetools_map.sqlite" galaxy_config_templates: + - src: templates/galaxy/config/job_metrics_conf.xml.j2 + dest: "{{ galaxy_config.galaxy.job_metrics_config_file }}" - src: templates/galaxy/config/tool_conf_interactive.xml dest: "{{ galaxy_config_dir }}/tool_conf_interactive.xml" - src: templates/galaxy/config/job_conf.xml{% endraw %}
Run the playbook
ansible-playbook galaxy.yml
{: .hands_on}
Generating Metrics
With this, the job metrics tracking should be set up. Now when you run a job, you will see many more metrics:
{% icon hands_on %} Hands-on: Generate some metrics
Run a job (any tool is fine, even upload)
View the information of the output dataset ({% icon galaxy-info %})
{: .hands_on}
What should I collect?
There is not a good rule we can tell you, just choose what you think is useful or will be. Numeric parameters are "cheaper" than the text parameters like uname to store, eventually you may find yourself wanting to remove old job metrics if you decide to collect the environment variables or similar.
Accessing the data
You can access the data via BioBlend (JobsClient.get_metrics), or via SQL with gxadmin

