# Microtask 4

## Aim

Produce a listing of repositories, as a table and as CSV file, with the number of commits authored, issues opened, and pull/merge requests opened, during the last three months, ordered by the total number (commits plus issues plus pull requests). Use plain Python3 (eg, no Pandas) for this.

### Retrieving Data from Different Repositories

Rather than retrieving data from Default Since Date, it would be better to retrieve data for last 3 months only by providing --from-date argument while retrieving data using perceveal. <br>  
From the command line run Perceval on the github repositories to analyze, to produce a file with JSON documents for all its issues (the list obtained contains the pull request also), one per line (data-source.json).


Syntax for using Perceval for Github
`perceval github owner repository [--sleep-for-rate] [-t XXXXX]`


Date of Retrieval: 25st March 2019
##### Example:

`perceval github --from-date "2018-12-01" omegaup omegaup --category issue --sleep-for-rate -t a247a6b7d506736da6d653cddc060a96bfbd9cb3 >> data_source.json`

`perceval github --from-date "2018-12-01" omegaup omegaup --category pull_request --sleep-for-rate -t a247a6b7d506736da6d653cddc060a96bfbd9cb3 >> data_source.json`      

`perceval git --json-line https://github.com/omegaup/omegaup >> data_source.json`


`perceval github --from-date "2018-12-01" Submitty Submitty --category issue --sleep-for-rate -t a247a6b7d506736da6d653cddc060a96bfbd9cb3 >> data_source.json`

`perceval github --from-date "2018-12-01" Submitty Submitty --category pull_request --sleep-for-rate -t a247a6b7d506736da6d653cddc060a96bfbd9cb3 >> data_source.json`  

`perceval git --json-line https://github.com/Submitty/Submitty >> data_source.json`

`perceval github --from-date "2018-12-01" streetmix streetmix --category issue --sleep-for-rate -t a247a6b7d506736da6d653cddc060a96bfbd9cb3 >> data_source.json`      

`perceval github --from-date "2018-12-01" streetmix streetmix --category pull_request --sleep-for-rate -t a247a6b7d506736da6d653cddc060a96bfbd9cb3 >> data_source.json`      

`perceval git --json-line https://github.com/streetmix/streetmix >> data_source.json`

`perceval github --from-date "2018-12-01" fossasia susi_server --category issue --sleep-for-rate -t a247a6b7d506736da6d653cddc060a96bfbd9cb3 >> data_source.json`      

`perceval github --from-date "2018-12-01" fossasia susi_server --category pull_request --sleep-for-rate -t a247a6b7d506736da6d653cddc060a96bfbd9cb3 >> data_source.json`     

`perceval git --json-line https://github.com/fossasia/susi_server >> data_source.json`

----------------------------------------------------------------------------------------------------
--from-date fetch items updated since this date

--sleep-for-rate To avoid having perceval exiting when the rate limit is exceeded

-t is token for Github API

In [1]:
import json
import datetime
import dateutil.relativedelta
import re
from copy import deepcopy
from dateutil import parser
import pandas as pd

import warnings ## to ignore warnings that come in importing pandas
warnings.filterwarnings("ignore", message="numpy.dtype size changed")

  return f(*args, **kwds)
  return f(*args, **kwds)


## Summarize Function

#### @arguments 

<b>line</b>: item to be summarized<br>
<b>type</b>: type of item(commit,issue,pull_request)

summary{
    repo,<br>
    hash(in case of commit) or uuid(in case of PR or Issue),<br>
    author,<br>
    author_date,<br>
    ....<br>
}

In [5]:
def summarize(line,type):
    repo = line['origin']
    cdata = line['data']
    if(type=='commit'):    
        summary = {
                'repo': repo,
                'hash': cdata['commit'],
                'author': cdata['Author'],
                'author_date': datetime.datetime.strptime(cdata['AuthorDate'],
                                                          "%a %b %d %H:%M:%S %Y %z"),
                'commit': cdata['Commit'],
                'created_date': datetime.datetime.strptime(cdata['CommitDate'],
                                                          "%a %b %d %H:%M:%S %Y %z"),
                'files_no': len(cdata['files']),
        }
        actions = 0
        for file in cdata['files']:
            if 'action' in file:
                actions += 1
        summary['files_action'] = actions
        if 'Merge' in cdata:
            summary['merge'] = True
        else:
            summary['merge'] = False
    elif(type=='issue'):
        summary = {
                'repo': repo,
                'uuid': line['uuid'],
                'author': cdata['user']['login'],
                'created_date': datetime.datetime.strptime(cdata['created_at'],
                                            "%Y-%m-%dT%H:%M:%SZ"),
                'closed_date':datetime.datetime.strptime(cdata['closed_at'],
                                            "%Y-%m-%dT%H:%M:%SZ") if cdata['closed_at'] else None, 
                'comments': cdata['comments'],
                'labels': cdata['labels'],
                'url': cdata['html_url'],
                'state':cdata['state']
        }
    elif(type=='pull_request'):
        summary = {
                'repo': repo,
                'uuid': line['uuid'],
                'author': cdata['user']['login'],
                'created_date': datetime.datetime.strptime(cdata['created_at'],"%Y-%m-%dT%H:%M:%SZ"),
                'closed_date': datetime.datetime.strptime(cdata['closed_at'],"%Y-%m-%dT%H:%M:%SZ")
                                            if cdata['closed_at'] else None,
                'merged_date': datetime.datetime.strptime(cdata['merged_at'],"%Y-%m-%dT%H:%M:%SZ")
                                        if cdata['merged_at'] else None,
                'comments': cdata['comments'],
                'commits': cdata['commits'],
                'additions': cdata['additions'],
                'deletions': cdata['deletions'],
                'changed_files':cdata['changed_files'],
                'url': cdata['html_url'],
                'state':cdata['state']
        }
    
    return summary

## Class Code_Changes

Takes path to the JSON file as input parameter

In [31]:
class Code_Changes:
    """"Class for Code_Changes for Git repositories.
    
    Objects are instantiated by specifying a file with the
    commits obtained by Perceval from a set of repositories.
    
    Contains individual list for Issues, Pull Requests and Commits
        
    :param path: Path to file with one Perceval JSON document per line
    """
    
    def __init__(self, path):
        
        self.changes = {'issue':[],'commit':[],'pull_request':[]}
        with open(path) as data_file:
            for data in data_file:
                line = json.loads(data)
                if(line['category'] ==  'commit'):
                    self.changes['commit'].append(summarize(line,'commit'))
                else:
                    if (line['category'] == 'pull_request'):
                        self.changes['pull_request'].append(summarize(line,'pull_request'))
                    elif ('pull_request' not in line['data']) and (line['category'] == 'issue'):
                        self.changes['issue'].append(summarize(line,'issue'))

## Example of the implementation

In [32]:
code = Code_Changes('../../data_source2.json')

## Creating Set Of Repositories

intialize a set, iterate over all items and add repo to the set

In [39]:
repos = set()
for change_type,items in code.changes.items():
    for item in items:
        repos.add(item['repo'])

In [44]:
print("List of Repos :")
print("---------------")
for i in repos:
    print(i)

List of Repos :
---------------
https://github.com/streetmix/streetmix
https://github.com/fossasia/susi_server
https://github.com/omegaup/omegaup
https://github.com/Submitty/Submitty


## Analysing Data for Last Three Months

Last three months (doesn't include current month) [Dec 2018, Jan 2019, Feb 2019]

In [12]:
today_date = datetime.datetime.now().date()
last_third_month = today_date - dateutil.relativedelta.relativedelta(months=3)
last_third_month_first_date = last_third_month.replace(day=1)
current_month_first_date = today_date.replace(day=1)
print("Last third month first Date: ",last_third_month_first_date)
print("Current month first Date: ",current_month_first_date)

Last third month first Date:  2019-01-01
Current month first Date:  2019-04-01


We have to analyse data between the above dates

Iterating over each item for each category and adding to a dictionary of repos

Analysis is a dictionary with key named as repo url mapped to a dictionary that has issue, commit and pull_request

`Analysis: 
{
"repo1":{'issue':x1,'commit':y1,'pull_request':z1},
"repo1":{'issue':x2,'commit':y2,'pull_request':z2},
"repo1":{'issue':x3,'commit':y3,'pull_request':z3},
}`

In [38]:
since = last_third_month_first_date
until = current_month_first_date

analysis = {}
for i in repos:
    analysis[i] = {'issue':0,'commit':0,'pull_request':0}
for change_type,items in code.changes.items():
    if(change_type=='issue'):
        for item in items:
            if item['state']=='open':
                if since<=item['created_date'].date()<=until:
                    analysis[item['repo']][change_type] += 1
    if(change_type=='commit'):
        for item in items:
                if since<=item['author_date'].date()<=until:
                    analysis[item['repo']][change_type] += 1
    if(change_type=='pull_request'):
        for item in items:
            if item['state']=='open':
                if since<=item['created_date'].date()<=until:
                    analysis[item['repo']][change_type] += 1
          


In [39]:
print("Activities in Last three months:")
for repo,items in analysis.items():
    for item in items:
        print(repo,"  ",item,"  ",analysis[repo][item])

Activities in Last three months:
https://github.com/omegaup/omegaup    issue    45
https://github.com/omegaup/omegaup    commit    143
https://github.com/omegaup/omegaup    pull_request    2
https://github.com/Submitty/Submitty    issue    63
https://github.com/Submitty/Submitty    commit    398
https://github.com/Submitty/Submitty    pull_request    31
https://github.com/fossasia/susi_server    issue    5
https://github.com/fossasia/susi_server    commit    37
https://github.com/fossasia/susi_server    pull_request    0
https://github.com/streetmix/streetmix    issue    11
https://github.com/streetmix/streetmix    commit    329
https://github.com/streetmix/streetmix    pull_request    4


## Unsorted Analysis

In [40]:
analysis

{'https://github.com/Submitty/Submitty': {'commit': 398,
  'issue': 63,
  'pull_request': 31},
 'https://github.com/fossasia/susi_server': {'commit': 37,
  'issue': 5,
  'pull_request': 0},
 'https://github.com/omegaup/omegaup': {'commit': 143,
  'issue': 45,
  'pull_request': 2},
 'https://github.com/streetmix/streetmix': {'commit': 329,
  'issue': 11,
  'pull_request': 4}}

### Sorting by sum of the issue, pull_request and commit count.

Using Sorted function and lambda function.<br>
Passing the sum of issue count, pull_request count and commit count for sorting

In [25]:
sorted_by_sum = sorted(analysis.items(), key=lambda repo: (repo[1]["issue"]) + (repo[1]["pull_request"]) + (repo[1]["commit"]))

## Writing CSV

Using csv module to write the activities quarterwise in the CSV file.

#### fields = ['Repo', 'No. of Commits', 'No. of PRs', 'No. of Issues']

In [4]:
import csv

In [32]:
fields = ['Repo', 'No. of Commits', 'No. of PRs', 'No. of Issues']

In [33]:
for item in sorted_by_sum:
        print([item[0],item[1]['commit'],item[1]['pull_request'],
                         item[1]['issue']])

['https://github.com/fossasia/susi_server', 37, 0, 5]
['https://github.com/omegaup/omegaup', 143, 2, 45]
['https://github.com/streetmix/streetmix', 329, 4, 11]
['https://github.com/Submitty/Submitty', 398, 31, 63]


In [37]:
with open('microtask4-analysis.csv', 'w') as file:
    writer = csv.writer(file,quoting=csv.QUOTE_MINIMAL) 
    writer.writerow(fields)
    for item in sorted_by_sum:
        writer.writerow([item[0],item[1]['commit'],item[1]['pull_request'],
                         item[1]['issue']])
      

## Printing CSV into table form


In [2]:
from tabulate import tabulate 

In [7]:
with open('microtask4-analysis.csv','r', newline='') as File:  
    reader = csv.reader(File)
    print(tabulate(reader))

---------------------------------------  --------------  ----------  -------------
Repo                                     No. of Commits  No. of PRs  No. of Issues
https://github.com/fossasia/susi_server  37              0           5
https://github.com/omegaup/omegaup       143             2           45
https://github.com/streetmix/streetmix   329             4           11
https://github.com/Submitty/Submitty     398             31          63
---------------------------------------  --------------  ----------  -------------
