# Collect data for github action workflow runs

In this notebook, we collect historical test data like the test duration values from running workflows on Github using the GitHub API

## Collect data for selected workflow runs of a repository

From historical test workflow runs, want to extract
- time durations
- workflow run status & conclusion

We can get workflow IDs of the test that we are interested in from https://api.github.com/repos/{ORG}/{REPO}/actions/workflows 

In [1]:
from dotenv import find_dotenv, load_dotenv
import os
import json
import subprocess
from datetime import datetime
from subprocess import PIPE
import pandas as pd
pd.options.mode.chained_assignment = None

In [2]:
load_dotenv(find_dotenv(), override=True)
TOKEN = os.getenv("GITHUB_ACCESS_TOKEN")

For example, lets collect data for the test ID 40063986 for the workflow runs in the repository `redhat-et/time-to-merge-tool` https://api.github.com/repos/redhat-et/time-to-merge-tool/actions/workflows

In [3]:
test_id = '40063986'

In [4]:
command = """curl \
  -H "Accept: application/vnd.github+json" \
  -H "Authorization: Bearer {}"\
  -H "X-GitHub-Api-Version: 2022-11-28" \
  https://api.github.com/repos/redhat-et/time-to-merge-tool/actions/workflows/{}/runs""".format(TOKEN, test_id)

In [5]:
args= []
args.append(command)

In [6]:
output = subprocess.run(args, shell=True, check=True, stdout=PIPE, stderr=PIPE)

In [7]:
output = json.loads(output.stdout)

In [8]:
df = pd.json_normalize(output['workflow_runs'])

In [9]:
df.head()

Unnamed: 0,id,name,node_id,head_branch,head_sha,path,display_title,run_number,event,status,...,head_repository.merges_url,head_repository.archive_url,head_repository.downloads_url,head_repository.issues_url,head_repository.pulls_url,head_repository.milestones_url,head_repository.notifications_url,head_repository.labels_url,head_repository.releases_url,head_repository.deployments_url
0,4115465281,Run Inference,WFR_kwLOIFof-871TQRB,main,940699930da573f88e2da28042617a83ead6b1b3,.github/workflows/inference.yaml,Run Inference,113,workflow_dispatch,completed,...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...
1,4087545441,Run Inference,WFR_kwLOIFof-87zov5h,main,6c7b5a868f3e495464e8b0a542bc1c53c551c68c,.github/workflows/inference.yaml,Run Inference,112,workflow_dispatch,completed,...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...
2,4086685722,Run Inference,WFR_kwLOIFof-87zleAa,main,f7be6da5eefd1682312a5a835a76fd4e70908fdc,.github/workflows/inference.yaml,Run Inference,111,workflow_dispatch,completed,...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...
3,4085388991,Run Inference,WFR_kwLOIFof-87zgha_,version,77fcc3852b4134b355c7e40dc148b41db618e962,.github/workflows/inference.yaml,added version to point to latest on marketplace,110,pull_request,completed,...,https://api.github.com/repos/oindrillac/time-t...,https://api.github.com/repos/oindrillac/time-t...,https://api.github.com/repos/oindrillac/time-t...,https://api.github.com/repos/oindrillac/time-t...,https://api.github.com/repos/oindrillac/time-t...,https://api.github.com/repos/oindrillac/time-t...,https://api.github.com/repos/oindrillac/time-t...,https://api.github.com/repos/oindrillac/time-t...,https://api.github.com/repos/oindrillac/time-t...,https://api.github.com/repos/oindrillac/time-t...
4,4084113655,Run Inference,WFR_kwLOIFof-87zbqD3,main,940699930da573f88e2da28042617a83ead6b1b3,.github/workflows/inference.yaml,Run Inference,109,workflow_dispatch,completed,...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...,https://api.github.com/repos/redhat-et/time-to...


In [10]:
test_df = df[['created_at','updated_at', 'id', 'status', 'conclusion']]

In [11]:
test_df['run_duration'] = test_df.apply(lambda x: (datetime.strptime(x['updated_at'],"%Y-%m-%dT%H:%M:%SZ") - \
                                           datetime.strptime(x['created_at'],"%Y-%m-%dT%H:%M:%SZ")), axis = 1)

test_df['test'] = test_id

In [12]:
test_df.head()

Unnamed: 0,created_at,updated_at,id,status,conclusion,run_duration,test
0,2023-02-07T15:25:57Z,2023-02-07T15:28:25Z,4115465281,completed,success,0 days 00:02:28,40063986
1,2023-02-03T20:10:40Z,2023-02-03T20:13:06Z,4087545441,completed,success,0 days 00:02:26,40063986
2,2023-02-03T18:03:08Z,2023-02-03T18:05:38Z,4086685722,completed,success,0 days 00:02:30,40063986
3,2023-02-03T15:08:54Z,2023-02-03T15:12:01Z,4085388991,completed,failure,0 days 00:03:07,40063986
4,2023-02-03T12:24:43Z,2023-02-03T13:07:23Z,4084113655,completed,success,0 days 00:42:40,40063986


In [13]:
# generating passing and failing dfs which are neccesary for computing fit distributions
passing_df = test_df[test_df['conclusion'] == 'success'] 
failing_df = test_df[test_df['conclusion'] == 'failure'] 

In [14]:
passing_df.head()

Unnamed: 0,created_at,updated_at,id,status,conclusion,run_duration,test
0,2023-02-07T15:25:57Z,2023-02-07T15:28:25Z,4115465281,completed,success,0 days 00:02:28,40063986
1,2023-02-03T20:10:40Z,2023-02-03T20:13:06Z,4087545441,completed,success,0 days 00:02:26,40063986
2,2023-02-03T18:03:08Z,2023-02-03T18:05:38Z,4086685722,completed,success,0 days 00:02:30,40063986
4,2023-02-03T12:24:43Z,2023-02-03T13:07:23Z,4084113655,completed,success,0 days 00:42:40,40063986
18,2022-12-12T19:56:29Z,2022-12-12T19:59:53Z,3679280627,completed,success,0 days 00:03:24,40063986


In [15]:
failing_df.head()

Unnamed: 0,created_at,updated_at,id,status,conclusion,run_duration,test
3,2023-02-03T15:08:54Z,2023-02-03T15:12:01Z,4085388991,completed,failure,0 days 00:03:07,40063986
5,2023-01-27T19:19:24Z,2023-01-27T19:22:21Z,4027515764,completed,failure,0 days 00:02:57,40063986
6,2023-01-27T17:22:33Z,2023-01-27T17:25:38Z,4026723258,completed,failure,0 days 00:03:05,40063986
7,2023-01-27T16:28:53Z,2023-01-27T16:31:19Z,4026319279,completed,failure,0 days 00:02:26,40063986
8,2023-01-12T19:10:40Z,2023-01-12T19:13:11Z,3905293698,completed,failure,0 days 00:02:31,40063986
