### Retrieving Data from OmegaUp

From the command line run Perceval on the github repositories to analyze, to produce a file with JSON documents for all its issues (the list obtained contains the pull request also), one per line (git-commits.json).


Syntax for using Perceval for Github
`perceval github owner repository [--sleep-for-rate] [-t XXXXX]`


Date of Retrieval: 1st March 2019
##### Example:
`$ perceval github --json-line --category issue omegaup omegaup --sleep-for-rate -t a247a6b7d506736da6d653cddc060a96bfbd9cb3 > data_source_issues.json     
 `
 
--sleep-for-rate To avoid having perceval exiting when the rate limit is exceeded

-t is token for Github API

In [4]:
import json
import datetime
from dateutil import parser

### Class Code_Changes

Takes path to the JSON file as input parameter

In [5]:
class Code_Issues:
    """Class for Code_Issues for Git repositories.
    
    Objects are instantiated by specifying a file with the
    commits obtained by Perceval from a set of repositories.
        
    :param path: Path to file with one Perceval JSON document per line
    """
    
    def __init__(self, path):
        
        self.issues = []
        with open(path) as commits_file:
            for line in commits_file:
                issue = json.loads(line)
                if "pull_request" not in issue['data']:
                    self.issues.append(issue)
    
    def total_issues(self):
        """
        Count Total Number of Issues
        """
        return len(self.issues)
    
    def count(self, since = None, until = None):
        """
        :param since: Period Start
        :param until: Period End
        """
        date = "AuthorDate"
        commits = self.issues
        count = 0
        if not since and until:
            until = parser.parse(until)  #convert string date time format into date time type, easy for comparission
        if not until and since:
            since = parser.parse(since)  #convert string date time format into date time type, easy for comparission
        if until and since:
            until = parser.parse(until)  #convert string date time format into date time type, easy for comparission
            since = parser.parse(since)  #convert string date time format into date time type, easy for comparission
        
        for i in commits:
            author_date = parser.parse(i['data'][date])
            author_date = author_date.replace(tzinfo = None) #removing tzoffset from date-time object making compatible for comaprision
            if since and until:
                if(author_date >= since and author_date < until):
                    count += 1 
            if since and not until:
                if(author_date >= since):
                    count += 1 
            if not since and until:
                if(author_date >= since):
                    count += 1
            if not since and not until:
                count = self.total_count()
        
        return count
        
        

#### Functions Available
- total_count() : returns the total number of issues till date
- count(): returns number of issues created in Period Of Time
    ###### Parameters
    - Since
    - Until
            

## Example of the implementation

In [7]:
issues = Code_Issues('data_source_issues.json')
print("Code changes total count:", issues.total_issues())

Code changes total count: 1471


## Number of issues Open and Closed:

In [11]:
open_issues = 0
for issue in issues.issues:
        if(issue['data']['state'] != "closed"):
            open_issues +=1
print("Total number of open issues:", open_issues)
print("Total number of closed issues:", issues.total_issues() - open_issues)

Total number of open issues: 246
Total number of closed issues: 1225


## Number of Labels
Issue is given a label or not. Multiple issues can have same label so create a set and push all the labels into it. This way there will be no duplicate entries.

In [21]:
labels = set()
for issue in issues.issues
    for label in issue['data']['labels']:
        labels.add(label['name'])            
                
print("Number of Labels:", len(labels))

Number of Labels: 44


## Number of Milestones
It is similar to concept of finding number of issues. Create a set and push the milestine

In [31]:
milestone = set()
for issue in issues.issues:
    if issue['data']['milestone']:
        milestone.add(issue['data']['milestone']['title'])


In [32]:
len(milestone)

42