## Microtask #2 : GitHub Backend
>To create a Python script to execute [Perceval](https://github.com/chaoss/grimoirelab-perceval) via its Python interface using the *GitHub* backend.

#### What is [Perceval](https://github.com/chaoss/grimoirelab-perceval) ?
- Perceval is a Python module for retrieving data from repositories related to software development. It works with many data sources, from Git repositories and GitHub projects to mailing lists, Gerrit or StackOverflow.

> In this notebook, we'll be using `github` backend module to retrieve information from a selected repository. 
> Documentation for GitHub backend can be found [here](https://perceval.readthedocs.io/en/latest/perceval.backends.core.html#module-perceval.backends.core.github)

**NOTE:** To avoid the problems due to unauthenticated access to the GitHub API, we can use the Perceval GitHub backend with authentication [GitHub](http://github.com) API Token in order to retrieve information.

> it's suggested in order to reproduce results you can get your own API Token by registering a new OAuth App [here](https://github.com/settings/tokens/new) ( and import it as shown below )

**We'll start off by importing required modules**

In [2]:
from perceval.backends.core.github import ( GitHub, 
                                            CATEGORY_ISSUE, CATEGORY_PULL_REQUEST)
from datetime import datetime
from pprint import pprint

# Importing GitHub API Token 
from config import API_TOKEN 

* The `GitHub` Backend class required two mandatory arguments
    - `owner` – GitHub owner
    - `api_token` – GitHub auth token to access the API

In [3]:
REPOSITORY_NAME = "MeetInTheMiddle"

# Initializing the GitHub backend
github_backend = GitHub(owner="inishchith", api_token=API_TOKEN, repository=REPOSITORY_NAME)

Tip: There are two ways of using the `GitHub` backend
    - Wihout authenticating: uses Github class
    - Authenticating using API Token: here Github class calls initiation of GithubClient class

**Printing out some general data**

In [4]:
print(github_backend.owner)

# Categories of information which can be retrieved
print(github_backend.categories)

print(github_backend.repository)
print(github_backend.origin)

inishchith
['issue', 'pull_request']
MeetInTheMiddle
https://github.com/inishchith/MeetInTheMiddle


As you can see that, we can retrieve two types of information from the specified GitHub repository - ISSUES & PULL REQUESTS.

**We'll now try to [`fetch()`](https://github.com/chaoss/grimoirelab-perceval/blob/805d73122b871c29146a70601d8f3d78267b41e1/perceval/backends/core/github.py#L112) ISSUEs information from the github repository**
- The `fetch()` method returns a generator, we'll convert it to a list for our convenience
- We can alternatively call `__fetch_issues()` which does the same task

In [5]:
# Datetime range in which ISSUEs information is to be fetched
from_date = datetime(2019, 1, 1)
to_date = datetime(2019,2,2)

# Calling fetch method
range_issues = github_backend.fetch(category=CATEGORY_ISSUE, from_date=from_date, to_date=to_date)
range_issues_list = list(range_issues)
n_issues = len(range_issues_list)
print("NUMBER OF ISSUES: ", n_issues)

NUMBER OF ISSUES:  5


**Let's check the structure of one of the issues.**

In [6]:
last_issue = range_issues_list[n_issues-1]
print("Attributes of issue JSON document: ", last_issue.keys())
pprint(last_issue)

Attributes of issue JSON document:  dict_keys(['backend_name', 'backend_version', 'perceval_version', 'timestamp', 'origin', 'uuid', 'updated_on', 'category', 'tag', 'data'])
{'backend_name': 'GitHub',
 'backend_version': '0.18.0',
 'category': 'issue',
 'data': {'assignee': None,
          'assignee_data': {},
          'assignees': [],
          'assignees_data': [],
          'author_association': 'NONE',
          'body': 'Other points of interest inside highlight are not '
                  'interactive, I would suggest showing the places returned in '
                  "midway area with Google's default POI, using place types to "
                  "filter, but not sure that's possible.",
          'closed_at': '2019-01-18T12:26:09Z',
          'comments': 1,
          'comments_data': [{'author_association': 'OWNER',
                             'body': 'Possible. Would add this a bit later. '
                                     'Thanks for the suggestion ;)',
                 

<hr>

**`timestamp`** - Field is a Unix Timestamp conversion of the time when the `.fetch()` method is executed in UTC (Universal Time Coordinated) time scale.

**`updated_on`** - Field is a Unix Timestamp conversion of last update datetime in UTC (Universal Time Coordinated) time scale of the GitHub Item set via retrieving the `update time` of the corresponding GitHub Repository. ( More like last modified time )

* Tip: We can a tool called [unixtimestamp](https://www.unixtimestamp.com/index.php) to check the cross conversion.

**Let us now print out some useful information from all the fetched issues such as Username, Association type, Comment Created at, Issue Comment**

In [7]:
for issue in range_issues_list:
    print("-"*100)
    
    # Issue Title
    print("TITLE: ",issue["data"]["title"])
    # Issue Closed at 
    print("CLOSED AT: ", issue["data"]["closed_at"])
    # Number of comments that the issue received
    print("No of comments: ", issue["data"]["comments"])
    
    # Issue creator details
    print("Issue Creator Username: ", issue["data"]["user"]["login"])
    print("\tUser Association type with repository: {association}\n\tCreated at: {created}\n\tComment: {comment}\n".format(association=issue["data"]["author_association"], comment=issue["data"]["body"],created=issue["data"]["created_at"]))
    
    # Issue comments details
    comments_data = issue["data"]["comments_data"]
    for comment in comments_data:
        print("Username: ", comment["user"]["login"])
        print("\tUser Association type with repository: {association}\n\tCreated at: {created}\n\tComment: {comment}\n".format(association=comment["author_association"], comment=comment["body"],created=comment["created_at"]))
    
    print("-"*100)

----------------------------------------------------------------------------------------------------
TITLE:  Same location bug
CLOSED AT:  2019-01-16T15:23:59Z
No of comments:  1
Issue Creator Username:  BraunEduardo
	User Association type with repository: NONE
	Created at: 2019-01-12T17:48:43Z
	Comment: It's possible to add same location twice (feature i guess, should keep it), but when removing, it removes all repeated locations keeping just one, rendering the map again for each remove.
The final render shows wrong midway, I think it keeps removed locations somewhere and search for midway with they included.

Username:  inishchith
	User Association type with repository: OWNER
	Created at: 2019-01-16T15:23:59Z
	Comment: Thanks for the suggestion, I personally think this addition would be irrelevant. Feel free to get back and reopen if you have some additions related to this ;)

----------------------------------------------------------------------------------------------------
------

**We'll now try to [`fetch()`](https://github.com/chaoss/grimoirelab-perceval/blob/805d73122b871c29146a70601d8f3d78267b41e1/perceval/backends/core/github.py#L112) PULL REQUESTs information from the github repository**
- The `fetch()` method returns a generator, we'll convert it to a list for our convenience
- We can alternatively call `__fetch_pull_requests()` which does the same task

In [8]:
# Datetime range in which PULL REQUESTs information is to be fetched
from_date = datetime(2018, 10, 1)
to_date = datetime(2019, 2, 10)

# Calling fetch method
pull_requests = github_backend.fetch(category=CATEGORY_PULL_REQUEST, from_date=from_date, to_date=to_date)
range_pull_request_list = list(pull_requests)
n_pulls = len(range_pull_request_list)
print("NUMBER OF PULL REQUESTS: ", n_pulls)

NUMBER OF PULL REQUESTS:  3


**Let's check the structure of one of the issues.**

In [9]:
pprint(range_pull_request_list[n_pulls-1])

{'backend_name': 'GitHub',
 'backend_version': '0.18.0',
 'category': 'pull_request',
 'data': {'_links': {'comments': {'href': 'https://api.github.com/repos/inishchith/MeetInTheMiddle/issues/11/comments'},
                     'commits': {'href': 'https://api.github.com/repos/inishchith/MeetInTheMiddle/pulls/11/commits'},
                     'html': {'href': 'https://github.com/inishchith/MeetInTheMiddle/pull/11'},
                     'issue': {'href': 'https://api.github.com/repos/inishchith/MeetInTheMiddle/issues/11'},
                     'review_comment': {'href': 'https://api.github.com/repos/inishchith/MeetInTheMiddle/pulls/comments{/number}'},
                     'review_comments': {'href': 'https://api.github.com/repos/inishchith/MeetInTheMiddle/pulls/11/comments'},
                     'self': {'href': 'https://api.github.com/repos/inishchith/MeetInTheMiddle/pulls/11'},
                     'statuses': {'href': 'https://api.github.com/repos/inishchith/MeetInTheMiddle/statu

**Let us now print out some useful information from all the fetched pull requests such as Pull request message, Merged / Closed / Open details, Additions / Deletions and many more**

In [10]:
for pull_request in range_pull_request_list:
    print("-"*100)
    
    # Pull request Number and Title
    print("#{pull_request}: {title}".format(pull_request=pull_request["data"]["number"], title=pull_request["data"]["title"]))
    
    # Pull request state [ open / closed ]
    print("Pull Request State: ", pull_request["data"]["state"])
    
    # Merged True / False
    print("\nMerged: ", pull_request["data"]["merged"])
    
    if pull_request["data"]["merged"]:
        print("Merged at: ", pull_request["data"]["merged_at"])
    else:
        print("Closed at: ", pull_request["data"]["closed_at"])
    
    print("Number of comments: ", pull_request["data"]["comments"])

    print("\nAdditions: +{adds}\nDeletions: -{dels}".format(adds=pull_request["data"]["additions"], dels=pull_request["data"]["deletions"]))
    
    print("\nNumber of Commits: {commits}\nNumber of files changed: {file_changes}".format(commits=pull_request["data"]["commits"], file_changes=pull_request["data"]["changed_files"]))
    
    # Pull request creator details
    print("Username: ", pull_request["data"]["user"]["login"])
    print("\tUser Association type with repository: {association}\n\tCreated at: {created}\n\tComment: {comment}\n".format(association=pull_request["data"]["author_association"], comment=pull_request["data"]["body"], created=pull_request["data"]["created_at"]))
    
    print("-"*100)

----------------------------------------------------------------------------------------------------
#2: Improve grammar for the information message
Pull Request State:  closed

Merged:  True
Merged at:  2018-12-14T11:28:38Z
Number of comments:  0

Additions: +4
Deletions: -4

Number of Commits: 1
Number of files changed: 1
Username:  mehmetseckin
	User Association type with repository: CONTRIBUTOR
	Created at: 2018-12-14T11:20:27Z
	Comment: Replaced `You're` with `Your`, I think that is the correct word to use in the information message.

----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
#10: Feat/redesigning
Pull Request State:  closed

Merged:  True
Merged at:  2019-02-03T18:05:24Z
Number of comments:  7

Additions: +306
Deletions: -260

Number of Commits: 5
Number of files changed: 6
Username:  zikosichi
	User Association type with 

- This concludes Microtask #2: executing Perceval via it's Python interface using GitHub Backend