# Microtask1
>Microtask 1: Produce a notebook showing (and producing) a list with the activity per quarter: number of new committers, submitters of issues, and submitters of pull/merge requests, number of items (commits, issues, pull/merge requests), number of repositories with new items (all of this per quarter) as a table and as a CSV file. Use plain Python3 (eg, no Pandas) for this.



## Quarters
The quarters are defined using the following dictionary where the **key corresponds to the month** and **value corresponds to the quarter**  :
```
quarter={1:'Q1', 2:'Q1', 3:'Q1', 4:'Q2', 5:'Q2', 6:'Q2', 7:'Q3', 8:'Q3', 9:'Q3', 10:'Q4', 11:'Q4', 12:'Q4'}
```

In [1]:
import json
from datetime import datetime 
from collections import defaultdict
import csv
from tabulate import tabulate
import requests
from pprint import pprint


## Start and end year for the analysis
The startyear and endyear for a repository are obtained from the Github API using the [requests].(http://www.python-requests.org/en/latest/)
module
- startyear : year of repository creation.
- endyear : when the repository was last updated.

The steps are :
1. Call the API
1. Assuming the API returns a JSON, parse the JSON object into a Python dict using json.loads function
1. Loop through the dict to extract information.

`if(Response.ok)`: will help help you determine if your API call is successful (Response code - 200)

`Response.raise_for_status()` will help you fetch the http code that is returned from the API.

In [2]:
url = 'https://api.github.com/repos/tensorflow/datasets' # url of repo to be analysed
info = requests.get(url)
if(info.ok):
    startyear = datetime.strptime(json.loads(info.text)['created_at'], "%Y-%m-%dT%H:%M:%SZ").year
    endyear = datetime.strptime(json.loads(info.text)['updated_at'], "%Y-%m-%dT%H:%M:%SZ").year
else : 
    info.raise_for_status()

### Create a dictionary mapping months to quarters, and a list **indexes** which is as shown below

In [3]:
indexes = []
for year in range(startyear, endyear+1):
    for quarter in ['Q1', 'Q2', 'Q3', 'Q4']:
        indexes.append(str(year) + '_' + quarter)


In [4]:
print(indexes)

['2018_Q1', '2018_Q2', '2018_Q3', '2018_Q4', '2019_Q1', '2019_Q2', '2019_Q3', '2019_Q4']


### Create dictionaries, and sets to get activity per quarter
Suppose we have an item of category "commit":<br>
Get the quarter(or index) for that commit(eg : '2018_Q3') using its date. Using that quarter(or index) as the key:<br>
- Increment the value of commits_count[key].
- Add the author of that commit to the set commiters_new[key].

This is done for each item (commit, issue or pull_request) in the jsonfile,
and the dictionaries are updated. We will print them below to get exactly what it looks like.

In [5]:
commits_count = {}
issue_count = {}
pr_count = {}

commiters_new = defaultdict(set)
iss_creators_new = defaultdict(set)
pr_senders_new = defaultdict(set)

# indexes : ['2018_Q1', '2018_Q2', '2018_Q3', '2018_Q4', '2019_Q1', '2019_Q2', '2019_Q3', '2019_Q4']
for key in indexes:         
        commits_count[key] = 0
        issue_count[key] = 0
        pr_count [key] = 0

### Function to get information from json and populate the dictionary fileds 

In [6]:
# dictionary mapping month to quarter
quarter={1:'Q1', 2:'Q1', 3:'Q1', 4:'Q2', 5:'Q2', 6:'Q2', 7:'Q3', 8:'Q3', 9:'Q3', 10:'Q4', 11:'Q4', 12:'Q4'}

# function to get information from json and populate the dictionary fileds 
def analyse(path):
    with open(path) as f:
        for line in f:
            line = json.loads(line)
            if (line['category']=='commit'):
                date = datetime.strptime(line['data']['CommitDate'],"%a %b %d %H:%M:%S %Y %z")
                year = date.year
                month = date.month
                key = str(year) + "_" + quarter[month]
                commits_count[key]+= 1
                commiters_new[key].add(line['data']['Author'])

            elif(line['category']=='issue'):
                date = datetime.strptime(line['data']['created_at'],"%Y-%m-%dT%H:%M:%SZ")
                year = date.year
                month = date.month
                key = str(year) + '_' + quarter[month]
                issue_count[key]+=1
                iss_creators_new[key].add(line['data']['user_data']['login'])

            else:
                date = datetime.strptime(line['data']['created_at'],"%Y-%m-%dT%H:%M:%SZ")
                year = date.year
                month = date.month
                key = str(year) + '_' + quarter[month]
                pr_count[key]+=1
                pr_senders_new[key].add(line['data']['user_data']['login'])

### Calling the analyse function

In [7]:
path = "./tf_analysis.json"
analyse(path)

### Updating the commiters in each quarter (or index) to contain only new contributors
This is done by creating a overall set of contributors for each category which is initialized by the contributors in the initial quarter and then updated by each quarter.<br>
<br>
For each subsequent quarter,
1. commiters_new[quarter(or index)] set would be updated by removing elements of the old_commiters set(all commmiters till now) from it (if present), and
1. the old_commiters set would be updated by adding commiters_new[quarter(or index)] to it if not present.

In [9]:
quarter_count = len(indexes)
old_commiters = commiters_new[0]
old_pr_senders = pr_senders_new[0]
old_iss_creators = iss_creators_new[0]

for i in range (1, quarter_count):
    commiters_new[indexes[i]] = commiters_new[indexes[i]].difference(old_commiters)
    iss_creators_new[indexes[i]] = iss_creators_new[indexes[i]].difference(old_iss_creators)
    pr_senders_new[indexes[i]] = pr_senders_new[indexes[i]].difference(old_pr_senders)
    old_commiters.update(commiters_new[indexes[i]])
    old_pr_senders.update(pr_senders_new[indexes[i]])
    old_iss_creators.update(iss_creators_new[indexes[i]])        

### Printing the dictionaries created above for commits - commiters_new and commits_count

In [14]:
pprint(commiters_new)

defaultdict(<class 'set'>,
            {0: {'Adam Roberts <adarob@google.com>',
                 'Afroz Mohiuddin <afrozm@google.com>',
                 'Andrew Kondrich <andrewk1@stanford.edu>',
                 'Billy Lamberta <blamb@google.com>',
                 'Chanchal Kumar Maji '
                 '<31502077+ChanchalKumarMaji@users.noreply.github.com>',
                 'Christopher Suter <cgs@google.com>',
                 'Copybara-Service <copybara-piper@google.com>',
                 'Copybara-Service <copybara-worker@google.com>',
                 'David Bieber <dbieber@google.com>',
                 'Derek Murray <mrry@google.com>',
                 'Dominic Jack <thedomjack@gmail.com>',
                 'Drew Szurko <15271992+drewszurko@users.noreply.github.com>',
                 'Dustin Tran <trandustin@google.com>',
                 'Etienne Pot <epot@google.com>',
                 'Jidin Dinesh <43285614+jidroid404@users.noreply.github.com>',
                 'Jiri S

In [17]:
pprint(commits_count)

{'2018_Q1': 0,
 '2018_Q2': 0,
 '2018_Q3': 10,
 '2018_Q4': 272,
 '2019_Q1': 414,
 '2019_Q2': 0,
 '2019_Q3': 0,
 '2019_Q4': 0}


## Writing and reading from csv
- Makes use of the [csv](https://docs.python.org/3/library/csv.html) python module - csv.writer and csv.reader.

In [10]:
header = ['Quarter','# Commits','# Issues','# PullRequests','# NewCommitters','# NewIssueSubmitters','# NewPRSubmitters' ]

with open('data.csv', 'w') as file:
    writer = csv.writer(file)
    writer.writerow(header)

    for i, key in enumerate(indexes):
        row = (indexes[i], commits_count[key], issue_count[key], pr_count[key], len(commiters_new[key]), len(iss_creators_new[key]), len(pr_senders_new[key]) )
        writer.writerow(row)
    

## Printing the result as a table 
- Using the [csv](https://docs.python.org/3/library/csv.html) - csv.reader for accessing the csv and the [tabulate](https://pypi.org/project/tabulate/) python module for printing in tabular form.

In [11]:
with open('data.csv') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    print(tabulate(reader))

-------  ---------  --------  --------------  ---------------  --------------------  -----------------
Quarter  # Commits  # Issues  # PullRequests  # NewCommitters  # NewIssueSubmitters  # NewPRSubmitters
2018_Q1  0          0         0               0                0                     0
2018_Q2  0          0         0               0                0                     0
2018_Q3  10         4         1               3                3                     1
2018_Q4  272        7         1               9                6                     1
2019_Q1  414        312       175             30               71                    34
2019_Q2  0          0         0               0                0                     0
2019_Q3  0          0         0               0                0                     0
2019_Q4  0          0         0               0                0                     0
-------  ---------  --------  --------------  ---------------  --------------------  ------------