# Collect data for github action workflow runs

In this notebook, we collect historical test data like the test duration values from running workflows on Github using the GitHub API

## Collect data for selected workflow runs of a repository

From historical test workflow runs, want to extract
- time durations
- workflow run status & conclusion

We can get workflow IDs of the test that we are interested in from https://api.github.com/repos/{ORG}/{REPO}/actions/workflows 

In [1]:
from dotenv import find_dotenv, load_dotenv
import os
import json
import subprocess
from datetime import datetime
from subprocess import PIPE
import pandas as pd
pd.options.mode.chained_assignment = None

import warnings

warnings.filterwarnings("ignore")

In [2]:
load_dotenv(find_dotenv(), override=True)
TOKEN = os.getenv("GITHUB_ACCESS_TOKEN")

For example, lets collect data for the test ID 28698040 for the workflow runs in the repository `oss-aspen/8Knot` https://api.github.com/repos/oss-aspen/8Knot/actions/workflows

In [3]:
def get_page_numbers(test_id):
    """
    Get the total count of tests.
    Find the pages on github-actions.
    """
    command = """curl \
      -H "Accept: application/vnd.github+json" \
      -H "Authorization: Bearer {}"\
      -H "X-GitHub-Api-Version: 2022-11-28" \
      https://api.github.com/repos/oss-aspen/8Knot/actions/workflows/{}/runs?""".format(TOKEN,test_id)
    args = []
    args.append(command)
    output = subprocess.run(args, shell=True, check=True, stdout=PIPE, stderr=PIPE)
    output = json.loads(output.stdout)
    total_count = output['total_count']
    page_numbers = int(total_count/30) # by default number of tests on one page is 30
    return page_numbers

In [4]:
def get_runs(test_id, page_numbers):
    """
    This function takes test_id and number of pages of workflow runs as input.
    Interacts with github api and collects the data for the tests with the specified id.
    Outputs the data frame with test data.
    """
    for p in range(1,page_numbers+1):
        command = """curl \
      -H "Accept: application/vnd.github+json" \
      -H "Authorization: Bearer {}"\
      -H "X-GitHub-Api-Version: 2022-11-28" \
      https://api.github.com/repos/oss-aspen/8Knot/actions/workflows/{}/runs?page={}""".format(TOKEN, test_id, p)
        args = []
        args.append(command)

        output = subprocess.run(args, shell=True, check=True, stdout=PIPE, stderr=PIPE)
        output = json.loads(output.stdout)

        if p==1:
            df = pd.json_normalize(output['workflow_runs'])
        else:
            df2 = pd.json_normalize(output['workflow_runs'])
            df = pd.concat([df, df2], axis=0)
    return df

In [5]:
test_id = "28698040" # Pre-commit test
page_numbers = get_page_numbers(test_id)
page_numbers

9

In [6]:
df = get_runs(test_id, page_numbers)

In [7]:
df

Unnamed: 0,id,name,node_id,head_branch,head_sha,path,display_title,run_number,event,status,...,head_repository.merges_url,head_repository.archive_url,head_repository.downloads_url,head_repository.issues_url,head_repository.pulls_url,head_repository.milestones_url,head_repository.notifications_url,head_repository.labels_url,head_repository.releases_url,head_repository.deployments_url
0,4247826657,pre-commit,WFR_kwLOGwuno879MLDh,metric-patch,28babd49bd997723559eca60e142373d281f8565,.github/workflows/pre-commit.yml,patch home page metric failure,298,pull_request,completed,...,https://api.github.com/repos/JamesKunstle/8Kno...,https://api.github.com/repos/JamesKunstle/8Kno...,https://api.github.com/repos/JamesKunstle/8Kno...,https://api.github.com/repos/JamesKunstle/8Kno...,https://api.github.com/repos/JamesKunstle/8Kno...,https://api.github.com/repos/JamesKunstle/8Kno...,https://api.github.com/repos/JamesKunstle/8Kno...,https://api.github.com/repos/JamesKunstle/8Kno...,https://api.github.com/repos/JamesKunstle/8Kno...,https://api.github.com/repos/JamesKunstle/8Kno...
1,4245797712,pre-commit,WFR_kwLOGwuno879EbtQ,main,5e74f905b70a20ac23356e4685d199d5c4e9d109,.github/workflows/pre-commit.yml,Merge pull request #273 from oss-aspen/dev,297,push,completed,...,https://api.github.com/repos/oss-aspen/8Knot/m...,https://api.github.com/repos/oss-aspen/8Knot/{...,https://api.github.com/repos/oss-aspen/8Knot/d...,https://api.github.com/repos/oss-aspen/8Knot/i...,https://api.github.com/repos/oss-aspen/8Knot/p...,https://api.github.com/repos/oss-aspen/8Knot/m...,https://api.github.com/repos/oss-aspen/8Knot/n...,https://api.github.com/repos/oss-aspen/8Knot/l...,https://api.github.com/repos/oss-aspen/8Knot/r...,https://api.github.com/repos/oss-aspen/8Knot/d...
2,4245558935,pre-commit,WFR_kwLOGwuno879DhaX,dev,74efd9e9c0659e3c5ad4c567c8e6c713ad76ef3d,.github/workflows/pre-commit.yml,catch main up for deployment on ocp,296,pull_request,completed,...,https://api.github.com/repos/oss-aspen/8Knot/m...,https://api.github.com/repos/oss-aspen/8Knot/{...,https://api.github.com/repos/oss-aspen/8Knot/d...,https://api.github.com/repos/oss-aspen/8Knot/i...,https://api.github.com/repos/oss-aspen/8Knot/p...,https://api.github.com/repos/oss-aspen/8Knot/m...,https://api.github.com/repos/oss-aspen/8Knot/n...,https://api.github.com/repos/oss-aspen/8Knot/l...,https://api.github.com/repos/oss-aspen/8Knot/r...,https://api.github.com/repos/oss-aspen/8Knot/d...
3,4243845186,pre-commit,WFR_kwLOGwuno8788_BC,dev,74efd9e9c0659e3c5ad4c567c8e6c713ad76ef3d,.github/workflows/pre-commit.yml,Merge pull request #267 from JamesKunstle/feat...,295,push,completed,...,https://api.github.com/repos/oss-aspen/8Knot/m...,https://api.github.com/repos/oss-aspen/8Knot/{...,https://api.github.com/repos/oss-aspen/8Knot/d...,https://api.github.com/repos/oss-aspen/8Knot/i...,https://api.github.com/repos/oss-aspen/8Knot/p...,https://api.github.com/repos/oss-aspen/8Knot/m...,https://api.github.com/repos/oss-aspen/8Knot/n...,https://api.github.com/repos/oss-aspen/8Knot/l...,https://api.github.com/repos/oss-aspen/8Knot/r...,https://api.github.com/repos/oss-aspen/8Knot/d...
4,4236840889,pre-commit,WFR_kwLOGwuno878iQ-5,dev,a6ec8155ba1b1cc4c3f7f5a045ded892a322809a,.github/workflows/pre-commit.yml,Merge pull request #272 from JamesKunstle/docu...,294,push,completed,...,https://api.github.com/repos/oss-aspen/8Knot/m...,https://api.github.com/repos/oss-aspen/8Knot/{...,https://api.github.com/repos/oss-aspen/8Knot/d...,https://api.github.com/repos/oss-aspen/8Knot/i...,https://api.github.com/repos/oss-aspen/8Knot/p...,https://api.github.com/repos/oss-aspen/8Knot/m...,https://api.github.com/repos/oss-aspen/8Knot/n...,https://api.github.com/repos/oss-aspen/8Knot/l...,https://api.github.com/repos/oss-aspen/8Knot/r...,https://api.github.com/repos/oss-aspen/8Knot/d...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25,2584302525,pre-commit,WFR_kwLOGwuno86aCVO9,dev,1e60e7ab4ae7cb60df180b0d7c2602e5257fdb40,.github/workflows/pre-commit.yml,multiple minor fixes,20,pull_request,completed,...,https://api.github.com/repos/cdolfi/8Knot/merges,https://api.github.com/repos/cdolfi/8Knot/{arc...,https://api.github.com/repos/cdolfi/8Knot/down...,https://api.github.com/repos/cdolfi/8Knot/issu...,https://api.github.com/repos/cdolfi/8Knot/pull...,https://api.github.com/repos/cdolfi/8Knot/mile...,https://api.github.com/repos/cdolfi/8Knot/noti...,https://api.github.com/repos/cdolfi/8Knot/labe...,https://api.github.com/repos/cdolfi/8Knot/rele...,https://api.github.com/repos/cdolfi/8Knot/depl...
26,2579220310,pre-commit,WFR_kwLOGwuno86Zu8dW,dev,a88eca3837b5a9986794279ffda57cb43c93edfe,.github/workflows/pre-commit.yml,Merge pull request #102 from JamesKunstle/buil...,19,push,completed,...,https://api.github.com/repos/oss-aspen/8Knot/m...,https://api.github.com/repos/oss-aspen/8Knot/{...,https://api.github.com/repos/oss-aspen/8Knot/d...,https://api.github.com/repos/oss-aspen/8Knot/i...,https://api.github.com/repos/oss-aspen/8Knot/p...,https://api.github.com/repos/oss-aspen/8Knot/m...,https://api.github.com/repos/oss-aspen/8Knot/n...,https://api.github.com/repos/oss-aspen/8Knot/l...,https://api.github.com/repos/oss-aspen/8Knot/r...,https://api.github.com/repos/oss-aspen/8Knot/d...
27,2572510070,pre-commit,WFR_kwLOGwuno86ZVWN2,build_push_img,a3bbd92b0d443416ecc3ef7f0669ed2e6fe366a3,.github/workflows/pre-commit.yml,"add build / push script, targets quay",18,pull_request,completed,...,https://api.github.com/repos/JamesKunstle/8Kno...,https://api.github.com/repos/JamesKunstle/8Kno...,https://api.github.com/repos/JamesKunstle/8Kno...,https://api.github.com/repos/JamesKunstle/8Kno...,https://api.github.com/repos/JamesKunstle/8Kno...,https://api.github.com/repos/JamesKunstle/8Kno...,https://api.github.com/repos/JamesKunstle/8Kno...,https://api.github.com/repos/JamesKunstle/8Kno...,https://api.github.com/repos/JamesKunstle/8Kno...,https://api.github.com/repos/JamesKunstle/8Kno...
28,2557079513,pre-commit,WFR_kwLOGwuno86Yae_Z,dev,78f0115b1d2aa6a293f4c6d424cd995c350386b4,.github/workflows/pre-commit.yml,Merge pull request #95 from JamesKunstle/add-p...,17,push,completed,...,https://api.github.com/repos/oss-aspen/8Knot/m...,https://api.github.com/repos/oss-aspen/8Knot/{...,https://api.github.com/repos/oss-aspen/8Knot/d...,https://api.github.com/repos/oss-aspen/8Knot/i...,https://api.github.com/repos/oss-aspen/8Knot/p...,https://api.github.com/repos/oss-aspen/8Knot/m...,https://api.github.com/repos/oss-aspen/8Knot/n...,https://api.github.com/repos/oss-aspen/8Knot/l...,https://api.github.com/repos/oss-aspen/8Knot/r...,https://api.github.com/repos/oss-aspen/8Knot/d...


In [8]:
test_df = df[['created_at','updated_at', 'id', 'status', 'conclusion']]
test_df.value_counts() #verify all entries are collected. 

created_at            updated_at            id          status     conclusion
2022-06-23T21:45:57Z  2022-06-23T21:46:33Z  2552166092  completed  success       1
2022-12-01T21:48:46Z  2022-12-01T21:49:24Z  3596813212  completed  failure       1
2022-11-17T16:40:36Z  2022-11-17T16:41:03Z  3490197661  completed  failure       1
2022-11-17T20:23:11Z  2022-11-17T20:23:35Z  3491727555  completed  failure       1
2022-11-17T21:40:25Z  2022-11-17T21:40:46Z  3492220957  completed  failure       1
                                                                                ..
2022-10-07T17:45:27Z  2022-10-07T17:46:16Z  3206633716  completed  failure       1
2022-10-07T17:50:17Z  2022-10-07T17:51:07Z  3206656422  completed  failure       1
2022-10-07T17:56:25Z  2022-10-07T17:57:20Z  3206685888  completed  failure       1
2022-10-07T22:14:01Z  2022-10-07T22:14:44Z  3207978527  completed  failure       1
2023-02-22T23:15:44Z  2023-02-22T23:16:07Z  4247826657  completed  success       1
Length: 2

In [9]:
test_df['run_duration'] = test_df.apply(lambda x: (datetime.strptime(x['updated_at'],"%Y-%m-%dT%H:%M:%SZ") - \
                                           datetime.strptime(x['created_at'],"%Y-%m-%dT%H:%M:%SZ")), axis = 1)
test_df['test'] = test_id

In [10]:
test_df.shape

(270, 7)

In [11]:
# generating passing and failing dfs which are neccesary for computing fit distributions
passing_test = test_df[test_df['conclusion'] == 'success'] 
failures_test = test_df[test_df['conclusion'] == 'failure'] 

In [12]:
passing_test.shape

(122, 7)

In [13]:
failures_test.shape

(148, 7)

## Conclusion

In this notebook, we interact with the github api to collect the data for all workflow runs. In future work, we will look into using this data to perform statistical tests using OSP model.