# Microtask 4

## Aim

Produce a listing of repositories, as a table and as CSV file, with the number of commits authored, issues opened, and pull/merge requests opened, during the last three months, ordered by the total number (commits plus issues plus pull requests). Use plain Python3 (eg, no Pandas) for this.

### Retrieving Data from Different Repositories

From the command line run Perceval on the github repositories to analyze, to produce a file with JSON documents for all its issues (the list obtained contains the pull request also), one per line (data-source.json).


Syntax for using Perceval for Github
`perceval github owner repository [--sleep-for-rate] [-t XXXXX]`


Date of Retrieval: 25st March 2019
##### Example:

`perceval github --from-date "2018-12-01" omegaup omegaup --category issue --sleep-for-rate -t a247a6b7d506736da6d653cddc060a96bfbd9cb3 >> data_source.json`

`perceval github --from-date "2018-12-01" omegaup omegaup --category pull_request --sleep-for-rate -t a247a6b7d506736da6d653cddc060a96bfbd9cb3 >> data_source.json`      

`perceval git --json-line https://github.com/omegaup/omegaup >> data_source.json`


`perceval github --from-date "2018-12-01" Submitty Submitty --category issue --sleep-for-rate -t a247a6b7d506736da6d653cddc060a96bfbd9cb3 >> data_source.json`

`perceval github --from-date "2018-12-01" Submitty Submitty --category pull_request --sleep-for-rate -t a247a6b7d506736da6d653cddc060a96bfbd9cb3 >> data_source.json`  

`perceval git --json-line https://github.com/Submitty/Submitty >> data_source.json`

`perceval github --from-date "2018-12-01" streetmix streetmix --category issue --sleep-for-rate -t a247a6b7d506736da6d653cddc060a96bfbd9cb3 >> data_source.json`      

`perceval github --from-date "2018-12-01" streetmix streetmix --category pull_request --sleep-for-rate -t a247a6b7d506736da6d653cddc060a96bfbd9cb3 >> data_source.json`      

`perceval git --json-line https://github.com/streetmix/streetmix >> data_source.json`

`perceval github --from-date "2018-12-01" fossasia susi_server --category issue --sleep-for-rate -t a247a6b7d506736da6d653cddc060a96bfbd9cb3 >> data_source.json`      

`perceval github --from-date "2018-12-01" fossasia susi_server --category pull_request --sleep-for-rate -t a247a6b7d506736da6d653cddc060a96bfbd9cb3 >> data_source.json`     

`perceval git --json-line https://github.com/fossasia/susi_server >> data_source.json`

----------------------------------------------------------------------------------------------------
--from-date fetch items updated since this date

--sleep-for-rate To avoid having perceval exiting when the rate limit is exceeded

-t is token for Github API

In [1]:
import json
import datetime
import dateutil.relativedelta
import re
from copy import deepcopy
from dateutil import parser
import pandas as pd

import warnings ## to ignore warnings that come in importing pandas
warnings.filterwarnings("ignore", message="numpy.dtype size changed")

  return f(*args, **kwds)
  return f(*args, **kwds)


## Summarize Function

#### @arguments 

<b>line</b>: item to be summarized<br>
<b>type</b>: type of item(commit,issue,pull_request)

summary{
    repo,<br>
    hash(in case of commit) or uuid(in case of PR or Issue),<br>
    author,<br>
    author_date,<br>
    ....<br>
}

In [2]:
def summarize(line,type):
    repo = line['origin']
    cdata = line['data']
    if(type=='commit'):    
        summary = {
                'repo': repo,
                'hash': cdata['commit'],
                'author': cdata['Author'],
                'author_date': datetime.datetime.strptime(cdata['AuthorDate'],
                                                          "%a %b %d %H:%M:%S %Y %z"),
                'commit': cdata['Commit'],
                'created_date': datetime.datetime.strptime(cdata['CommitDate'],
                                                          "%a %b %d %H:%M:%S %Y %z"),
                'files_no': len(cdata['files']),
        }
        actions = 0
        for file in cdata['files']:
            if 'action' in file:
                actions += 1
        summary['files_action'] = actions
        if 'Merge' in cdata:
            summary['merge'] = True
        else:
            summary['merge'] = False
    elif(type=='issue'):
        summary = {
                'repo': repo,
                'uuid': line['uuid'],
                'author': cdata['user']['login'],
                'created_date': datetime.datetime.strptime(cdata['created_at'],
                                            "%Y-%m-%dT%H:%M:%SZ"),
                'closed_date':datetime.datetime.strptime(cdata['closed_at'],
                                            "%Y-%m-%dT%H:%M:%SZ") if cdata['closed_at'] else None, 
                'comments': cdata['comments'],
                'labels': cdata['labels'],
                'url': cdata['html_url'],
                'state':cdata['state']
        }
    elif(type=='pull_request'):
        summary = {
                'repo': repo,
                'uuid': line['uuid'],
                'author': cdata['user']['login'],
                'created_date': datetime.datetime.strptime(cdata['created_at'],"%Y-%m-%dT%H:%M:%SZ"),
                'closed_date': datetime.datetime.strptime(cdata['closed_at'],"%Y-%m-%dT%H:%M:%SZ")
                                            if cdata['closed_at'] else None,
                'merged_date': datetime.datetime.strptime(cdata['merged_at'],"%Y-%m-%dT%H:%M:%SZ")
                                        if cdata['merged_at'] else None,
                'comments': cdata['comments'],
                'commits': cdata['commits'],
                'additions': cdata['additions'],
                'deletions': cdata['deletions'],
                'changed_files':cdata['changed_files'],
                'url': cdata['html_url'],
                'state':cdata['state']
        }
    
    return summary

## Class Code_Changes

Takes path to the JSON file as input parameter

In [3]:
class Code_Changes:
    """"Class for Code_Changes for Git repositories.
    
    Objects are instantiated by specifying a file with the
    commits obtained by Perceval from a set of repositories.
    
    Contains individual list for Issues, Pull Requests and Commits
        
    :param path: Path to file with one Perceval JSON document per line
    """
    
    def __init__(self, path):
        
        self.changes = {'issue':[],'commit':[],'pull_request':[]}
        with open(path) as data_file:
            for data in data_file:
                line = json.loads(data)
                if(line['category'] ==  'commit'):
                    self.changes['commit'].append(summarize(line,'commit'))
                else:
                    if (line['category'] == 'pull_request'):
                        self.changes['pull_request'].append(summarize(line,'pull_request'))
                    elif ('pull_request' not in line['data']) and (line['category'] == 'issue'):
                        self.changes['issue'].append(summarize(line,'issue'))

#### Functions Available
- total_count() : returns the total number of issues till date
- count(): returns number of issues created in Period Of Time
    ###### Parameters
    - Since
    - Until
            

## Example of the implementation

In [4]:
code = Code_Changes('data_source.json')

JSONDecodeError: Expecting property name enclosed in double quotes: line 2 column 1 (char 2)

## Analysing Data for Last Three Months

Last three months (doesn't include current month) [Dec 2018, Jan 2019, Feb 2019]

In [157]:
today_date = datetime.datetime.now().date()
last_third_month = today_date - dateutil.relativedelta.relativedelta(months=3)
last_third_month_first_date = last_third_month.replace(day=1)
current_month_first_date = today_date.replace(day=1)
print("Last third month first Date: ",last_third_month_first_date)
print("Last month first Date: ",current_month_first_date)

Last third month first Date:  2018-12-01
Last month first Date:  2019-03-01


We have to analyse data between the above dates

In [167]:
since = last_third_month_first_date
until = current_month_first_date

analysis = {'issue':0,'commit':0,'pull_request':0}

for change_type,items in code.changes.items():
    if(change_type=='issue'):
        for item in items:
            if item['state']=='open':
                if since<=item['created_date'].date()<=until:
                    analysis[change_type] += 1
    if(change_type=='commit'):
        for item in items:
                if since<=item['author_date'].date()<=until:
                    analysis[change_type] += 1
    if(change_type=='pull_request'):
        for item in items:
            if item['state']=='open':
                if since<=item['created_date'].date()<=until:
                    analysis[change_type] += 1
            
#     if(change_type=='commit'):
#         frame['author_date'] = frame['author_date'].apply(lambda x:x.date())
#         frame = frame[(since<=frame['author_date'])]
#         frame = frame[(until>=frame['author_date'])]
#         print("Commits authored in last three months: ",frame.shape[0])
#     elif(change_type=='issue'):
#         frame['created_date'] = frame['created_date'].apply(lambda x:x.date())
#         frame = frame[(since<=frame['created_date']) & (frame['created_date']<=until)]
#         frame = frame[frame['state']=='open']
#         print("Issues Created in last three months: ",frame.shape[0])
#     elif(change_type=='pull_request'):
#         frame['created_date'] = frame['created_date'].apply(lambda x:x.date())
#         frame = frame[(since<=frame['created_date']) & (frame['created_date']<=until)]
#         frame = frame[frame['state']=='open']
#         print("Pull Requests Opened in last three months: ",frame.shape[0])

In [170]:
print("Issues created in Last three months:",analysis['issue'])
print("Pull Request opened in Last three months:",analysis['pull_request'])
print("Commits Authored in Last three months:",analysis['commit'])

Issues created in Last three months: 16
Pull Request opened in Last three months: 2
Commits Authored in Last three months: 74
