### Retrieving Data from OmegaUp

From the command line run Perceval on the git repositories to analyze, to produce a file with JSON documents for all its commits, one per line (git-commits.json).

##### Example:
`$ perceval git --json-line https://github.com/omegaup/omegaup > git-commits.json `

In [39]:
import json
import datetime
from dateutil import parser

### Class Code_Changes

Takes path to the JSON file as input parameter

In [71]:
class Code_Changes:
    """Class for Code_Changes for Git repositories.
    
    Objects are instantiated by specifying a file with the
    commits obtained by Perceval from a set of repositories.
        
    :param path: Path to file with one Perceval JSON document per line
    """
    
    def __init__(self, path):
        
        self.commits = []
        with open(path) as commits_file:
            for line in commits_file:
                commit = json.loads(line)
                self.commits.append(commit)
    
    def total_count(self):
        """
        Count Total Number of Commits
        """
        return len(self.commits)
    
    def count(self, since = None, until = None):
        """
        :param since: Period Start
        :param until: Period End
        """
        date = "AuthorDate"
        commits = self.commits
        count = 0
        if not since and until:
            until = parser.parse(until)  #convert string date time format into date time type, easy for comparission
        if not until and since:
            since = parser.parse(since)  #convert string date time format into date time type, easy for comparission
        if until and since:
            until = parser.parse(until)  #convert string date time format into date time type, easy for comparission
            since = parser.parse(since)  #convert string date time format into date time type, easy for comparission
        
        for i in commits:
            author_date = parser.parse(i['data'][date])
            author_date = author_date.replace(tzinfo = None) #removing tzoffset from date-time object making compatible for comaprision
            if since and until:
                if(author_date >= since and author_date < until):
                    count += 1 
            if since and not until:
                if(author_date >= since):
                    count += 1 
            if not since and until:
                if(author_date >= since):
                    count += 1
            if not since and not until:
                count = self.total_count()
        
        return count
        
        

#### Functions Available
- total_count() : returns the total number of count till date
- count(): returns number of commits in Period Of Time
    ###### Parameters
    - Since
    - Until
            

## Example of the implementation

In [72]:
changes = Code_Changes('git-commits.json')
print("Code changes total count:", changes.total_count())
print("Code changes count all period:", changes.count())
print("Code changes count from 2018-01-01 to 2018-07-01:",
      changes.count(since="2018-01-01", until="2018-07-01"))

Code changes total count: 4206
Code changes count all period: 4206
Code changes count from 2018-01-01 to 2018-07-01: 251


## Creating Dictionary of Commits

In [88]:
commits = {}
with open('git-commits.json') as commits_file:
    for line in commits_file:
        commit = json.loads(line)
        commits[commit['data']['commit']] = commit
print("Total number of commits:", len(commits))

Total number of commits: 4206


## Non Empty Commits

Empty commits are those that touch no file (for example, most merge commits). We can find them by looking at the list of files involved in the commit, and checking that all of them have no 'action' field ('action' is for identifying the action performed on the file, such as modification or creation):

In [87]:
non_empty_commits = 0
for commit in commits.values():
    for file in commit['data']['files']:
        if 'action' in file:
            non_empty_commits += 1
            break
                
print("Code Commits (non-empty):", non_empty_commits)

Code Commits (non-empty): 3932


## Only non-merge commits
Now, instead of filtering out empty commits, let's filter those commits that are merge commits. Those involve no real coding, but merging commits in different branches (for example, after a pull request).

In [82]:
non_merge_commits = 0
for commit in commits.values():
    if 'Merge' not in commit['data']:
        non_merge_commits += 1
                
print("Code Commits (non-merge):", non_merge_commits)

Code Commits (non-merge): 3856


## Only Commits in Master

In data field of every line of JSON file

In [93]:
todo = set()
for id, commit in commits.items():
    if 'HEAD -> refs/heads/master' in commit['data']['refs']:
        todo.add(id)

master = set()

while len(todo) > 0:
    current = todo.pop()
    master.add(current)
    for parent in commits[current]['data']['parents']:
        if parent not in master:
            todo.add(parent)
    
code_commits = len(master)
    
print("Code Commits (master branch):", code_commits)

Code Commits (master branch): 4201


## Only Non Empty Commits in Master Branch

This find the number of non empty commits in the Master Branch. It uses the result recorded previously in master which contains all the commits in master branch

In [94]:
code_commits = 0
for commit_id in master:
        commit = commits[commit_id]
        for file in commit['data']['files']:
            if 'action' in file:
                code_commits += 1
                break

print("Code Commits (non-empty in master branch):", code_commits)

Code Commits (non-empty in master branch): 3932
