# Github Activity Metrics-Issues

**Activity by Repo**

This notebook will aim to query the Augur DB to access the neccessary information to be able to get the following issue metrics dirived from the GitHub Community Metrics working document https://docs.google.com/document/d/1Yocr6fk0J8EsVZnJwoIl3kRQaI94tI-XHe7VSMFT0yM/edit?usp=sharing

Any necessary computations from the data to get the metric value will be done as the queries are determined

In [4]:
import psycopg2
import pandas as pd 
import sqlalchemy as salc
import json
import os
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (15, 5)

with open("../config_temp.json") as config_file:
    config = json.load(config_file)

In [5]:
database_connection_string = 'postgresql+psycopg2://{}:{}@{}:{}/{}'.format(config['user'], config['password'], config['host'], config['port'], config['database'])

dbschema='augur_data'
engine = salc.create_engine(
    database_connection_string,
    connect_args={'options': '-csearch_path={}'.format(dbschema)})

In [6]:
#add your repo name(s) here of the repo(s) you want to query if known (and in the database)
repo_name_set = ['augur', 'grimoirelab']
repo_set = []

for repo_name in repo_name_set:
    repo_query = salc.sql.text(f"""
                 SET SCHEMA 'augur_data';
                 SELECT 
                    b.repo_id
                FROM
                    repo_groups a,
                    repo b
                WHERE
                    a.repo_group_id = b.repo_group_id AND
                    b.repo_name = \'{repo_name}\'
        """)

    t = engine.execute(repo_query)
    repo_id =t.mappings().all()[0].get('repo_id')
    repo_set.append(repo_id)
print(repo_set)

[25440, 25448]


In [7]:
#Take this out of quotes if you want to manually assign a repo_id number(s)
#repo_set = [25440]

Median Time to Close 
-reviews 
-issues 

median time to close/merge PR 

Mean time to first response 
-Issues 
-PR

In [14]:
df_review = pd.DataFrame()

for repo_id in repo_set: 

    pr_query = salc.sql.text(f"""
                SELECT
                    r.repo_name,
					prr.pr_review_id,
                    prr.pull_request_id,
					prr.pr_review_body,
					prr.pr_review_submitted_at
                FROM
                	repo r,
                    pull_requests pr,
                    pull_request_reviews prr
                WHERE
                    prr.pull_request_id = pr.pull_request_id AND
                	pr.repo_id = r.repo_id AND
                    r.repo_id = \'{repo_id}\'
        """)
    df_current_repo = pd.read_sql(pr_query, con=engine)
    df_review = pd.concat([df_review, df_current_repo])

df_review  = df_review.reset_index()
df_review .drop("index", axis=1, inplace=True)
        
df_review

Unnamed: 0,repo_name,pr_review_id,pull_request_id,pr_review_body,pr_review_submitted_at
0,augur,7139,210373,,2020-03-22 15:46:22
1,augur,7140,213657,,2020-06-22 17:57:57
2,augur,7141,218787,,2020-12-16 10:33:17
3,augur,7142,218787,,2020-12-16 11:08:06
4,augur,7143,218787,,2020-12-16 10:47:03
...,...,...,...,...,...
860,grimoirelab,8304,219668,"Sorry, I should had been more clear. I think y...",2021-08-06 08:50:09
861,grimoirelab,8305,219668,LGTM,2021-08-06 10:33:30
862,grimoirelab,8618,213799,"LGTM, thanks!",2021-08-23 13:09:05
863,grimoirelab,8619,219761,Thanks for the PR @eyehwan.\r\nDo you think we...,2021-08-25 11:13:50


### Query for Pull Request Analysis

In [5]:
df_pr = pd.DataFrame()

for repo_id in repo_set: 

    pr_query = salc.sql.text(f"""
                SELECT
                    r.repo_name,
					pr.pull_request_id AS pull_request, 
					pr.pr_created_at AS created, 
					pr.pr_closed_at AS closed,
					pr.pr_merged_at  AS merged 
                FROM
                	repo r,
                    pull_requests pr
                WHERE
                	r.repo_id = pr.repo_id AND
                    r.repo_id = \'{repo_id}\'
        """)
    df_current_repo = pd.read_sql(pr_query, con=engine)
    df_pr = pd.concat([df_pr, df_current_repo])

df_pr = df_pr.reset_index()
df_pr.drop("index", axis=1, inplace=True)
        
df_pr.head()

Unnamed: 0,repo_name,pull_request,created,closed,merged
0,augur,214028,2020-10-19 12:10:22,2020-10-19 13:27:26,NaT
1,augur,210011,2017-02-01 20:41:17,2017-02-02 16:51:16,2017-02-02 16:51:16
2,augur,210012,2017-02-01 21:43:24,2017-02-02 16:47:25,2017-02-02 16:47:25
3,augur,210019,2017-03-16 21:16:33,2017-03-16 21:17:07,2017-03-16 21:17:07
4,augur,210219,2019-10-23 22:27:53,2019-10-23 22:28:01,2019-10-23 22:28:01


In [5]:
df_issues = pd.DataFrame()

for repo_id in repo_set: 

    pr_query = salc.sql.text(f"""
                SELECT
                    r.repo_name,
					i.issue_id AS issue, 
					i.gh_issue_number AS issue_number,
					i.gh_issue_id AS gh_issue,
					i.created_at AS created, 
					i.closed_at AS closed
                FROM
                	repo r,
                    issues i
                WHERE
                	r.repo_id = i.repo_id AND
                    i.repo_id = \'{repo_id}\'
        """)
    df_current_repo = pd.read_sql(pr_query, con=engine)
    df_issues = pd.concat([df_issues, df_current_repo])

df_issues = df_issues.reset_index()
df_issues.drop("index", axis=1, inplace=True)
        
df_issues

Unnamed: 0,repo_name,issue,issue_number,gh_issue,created,closed
0,augur,340115,28,213149529,2017-03-09 20:06:18,2017-04-07 21:18:01
1,augur,343231,886,682259157,2020-08-20 00:09:30,2020-08-20 00:16:50
2,augur,343216,880,679627659,2020-08-15 19:11:45,2020-08-17 14:30:04
3,augur,343467,967,724668885,2020-10-19 14:21:08,2020-10-19 14:21:34
4,augur,342738,740,628534692,2020-06-01 15:34:33,2020-08-20 10:48:14
...,...,...,...,...,...,...
1889,grimoirelab,735294,437,941801983,2021-07-12 08:23:29,2021-07-28 08:58:49
1890,grimoirelab,735295,436,924259145,2021-06-17 19:24:54,NaT
1891,grimoirelab,340606,284,559853733,2020-02-04 17:00:31,NaT
1892,grimoirelab,734649,429,889819068,2021-05-12 08:28:28,NaT
