# Kibiter Github-Pull-Requests

In this notebook we'll be retrieving pull requests info from the kibiter repo which is located at this link:
https://github.com/chaoss/grimoirelab-kibiter

<h4>Sections</h4>

<ul>
    <li>Retrieving data from the data source</li>
    <li>Date of retrieval</li>
    <li>Class for computing Code_Pull_Requests-Github (based on sarvesh211999 Code_Issues class, more info in the section)</li>
    <li>Total Pull Requests to the Kibiter repo</li>
    <li>Checking data consistency</li>
    <ul>
        <li>Verifying that the only category is "pull_request"</li>
    </ul>
    <li>Counting and displaying distinct users who made pull requests to Kibiter repository</li>
    <li>Counting merged and non-merged pull requests</li>
    <li>Displaying users with non-merged pull requests</li>
</ul>


## Retrieving data from the data source

<p>-You should replace the XXXX after the -t with a valid Github API token</p>
<p>-Data will be written in the kibiter-pull-requests.json file</p>

````
iMac-de-Jose:Github Notebook josemasa$ perceval github --json-line --category pull_request grimoirelab kibiter --sleep-for-rate -t XXXX > kibiter-pull-requests.json 
[2019-04-03 22:40:54,606] - Sir Perceval is on his quest.
[2019-04-03 22:40:56,967] - Getting info for https://api.github.com/users/dpose
[2019-04-03 22:40:57,516] - Getting info for https://api.github.com/users/sduenas
[2019-04-03 22:40:57,712] - Getting info for https://api.github.com/users/acs
[2019-04-03 22:40:58,670] - Getting info for https://api.github.com/users/dmabtrg
[2019-04-03 22:41:00,194] - Getting info for https://api.github.com/users/jsmanrique
[2019-04-03 22:41:05,491] - Getting info for https://api.github.com/users/dlumbrer
[2019-04-03 22:41:59,786] - Getting info for https://api.github.com/users/sanacl
[2019-04-03 22:42:08,676] - Getting info for https://api.github.com/users/dicortazar
[2019-04-03 22:42:10,857] - Getting info for https://api.github.com/users/valeriocos
[2019-04-03 22:42:23,363] - Getting info for https://api.github.com/users/anajsana
[2019-04-03 22:42:25,939] - Sir Perceval completed his quest.
````


## Date of retrieval

<p> 3rd April, 2019 </p>

## Class for computing Code_Pull_Requests-Github 
(Based on sarvesh211999 Code_Issues class) 

<p>Link to notebook with class: https://github.com/sarvesh211999/CHAOSS-Gsoc/blob/master/Microtask-0/github-pull-request.ipynb</p>


In [2]:
##Importing necessary libraries for the class
import json
import datetime
from dateutil import parser

In [3]:
class Code_Pull_Requests:
    """Class for Code_Pull Requests for Git repositories.
    
    Objects are instantiated by specifying a file with the
    commits obtained by Perceval from a set of repositories.
        
    :param path: Path to file with one Perceval JSON document per line
    """
    
    def __init__(self, path):
        
        self.pull_rs = []
        with open(path) as commits_file:
            for line in commits_file:
                pull_r = json.loads(line)
                self.pull_rs.append(pull_r)
    
    def total_pull_rs(self):
        """
        Count Total Number of Pull Requests
        """
        return len(self.pull_rs)
    
    def count(self, since = None, until = None):
        """
        :param since: Period Start
        :param until: Period End
        """
        date = "created_at"
        commits = self.pull_rs
        count = 0
        if not since and until:
            until = parser.parse(until)  #convert string date time format into date time type, easy for comparission
        if not until and since:
            since = parser.parse(since)  #convert string date time format into date time type, easy for comparission
        if until and since:
            until = parser.parse(until)  #convert string date time format into date time type, easy for comparission
            since = parser.parse(since)  #convert string date time format into date time type, easy for comparission
        
        for i in commits:
            author_date = parser.parse(i['data'][date])
            author_date = author_date.replace(tzinfo = None) #removing tzoffset from date-time object making compatible for comaprision
            if since and until:
                if(author_date >= since and author_date < until):
                    count += 1 
            if since and not until:
                if(author_date >= since):
                    count += 1 
            if not since and until:
                if(author_date >= since):
                    count += 1
            if not since and not until:
                count = self.total_count()
        
        return count

## Total Pull Requests to the Kibiter repo

In [5]:
#creating Code_Pull_Requests_Object
kibiter_pull_rs = Code_Pull_Requests('kibiter-pull-requests.json')
##variable storing total pull requests number
total_prs = kibiter_pull_rs.total_pull_rs()
print ("NUMBER OF PULL REQUESTS FOR KIBITER'S GITHUB REPOSITORY = " + str(total_prs))

NUMBER OF PULL REQUESTS FOR KIBITER'S GITHUB REPOSITORY = 96


## Checking data consistency 

### Verifying that the only category is "pull_request"

In [6]:
#creating a set to keep only unique values
categories = set()
with open('kibiter-pull-requests.json') as pullrs_file:
    for line in pullrs_file:
        pull_r = json.loads(line)
        categories.add(pull_r['category'])
        
#printing unique values      
print (categories)

{'pull_request'}


## Counting and displaying distinct users who made pull requests to Kibiter repository

In [12]:
#creating a set to keep only unique values
distinct_contributors = set()

##opening and traversing the pull requests file
with open('kibiter-pull-requests.json') as pullrs_file:
    for line in pullrs_file:
        pull_r = json.loads(line)
        distinct_contributors.add(pull_r['data']['user_data']['name'])
        
print ('NAMES OF DIFFERENT PULL REQUESTERS TO THE KIBITER REPO \n \n')

for author in distinct_contributors:
    print (author)
    
print ("\nTOTAL DISTINCT PULL REQUESTERS TO THE REPO " + str(len(distinct_contributors)))

NAMES OF DIFFERENT PULL REQUESTERS TO THE KIBITER REPO 
 

Ana Jimenez Santamaria
David Muriel
David Pose Fernández
Manrique Lopez
David Moreno Lumbreras
Luis Cañas-Díaz
Alvaro del Castillo

TOTAL DISTINCT PULL REQUESTERS TO THE REPO 7


## Counting merged and non-merged pull requests

In [13]:
merged_prs = 0
with open('kibiter-pull-requests.json') as pullrs_file:
    for line in pullrs_file:
        pull_r = json.loads(line)
        if (pull_r['data']['merged'] == True):
            merged_prs = merged_prs + 1
        
print ("Merged Pull Requests = " + str(merged_prs))
print ("Non Merged Pull Requests = " + str(total_prs - merged_prs))

Merged Pull Requests = 79
Non Merged Pull Requests = 17


## Displaying users with non-merged pull requests

In [14]:
non_mrgd_users = set()
with open('kibiter-pull-requests.json') as pullrs_file:
    for line in pullrs_file:
        pull_r = json.loads(line)
        if (pull_r['data']['merged'] == False):
            non_mrgd_users.add(pull_r['data']['user_data']['name'])
            
print("DISTINCT USERS WITH NON-MERGED PULL REQUESTS \n\n")
for user in non_mrgd_users:
    print (user)

DISTINCT USERS WITH NON-MERGED PULL REQUESTS 


David Pose Fernández
Manrique Lopez
Luis Cañas-Díaz
David Moreno Lumbreras
