Copyright 2022 VMware, Inc. 

SPDX-License-Identifier: BSD-2-Clause

# Gathering Data about Repositories (REST API)

Demo: accessing data about repositories (last updated, open issues / PRs, license, …)

Learn more:
* [PyGithub repository documentation](https://pygithub.readthedocs.io/en/latest/github_objects/Repository.html)
* [GitHub REST API Documentation for repos](https://docs.github.com/en/rest/repos/repos)

In [None]:
# Setup: read personal access token from gh_key and create GitHub Instance
# You'll need to do this in each notebook

# Import PyGithub library
from github import Github

# Open your gh_key file and read the personal access token into a variable
with open('gh_key', 'r') as kf:
    key = kf.readline().rstrip() # remove newline & trailing whitespace

# Use your personal access token to create a GitHub instance
g = Github(key)

In [None]:
sc = g.get_repo("ossf/scorecard")
print(sc.name, sc.homepage, sc.stargazers_count)

In [None]:
# Show other fields here



In [None]:
sc_lic = sc.get_license()

print(sc_lic.name, sc_lic.url)

In [None]:
# Other licence fields - content vs. decoded content



## Repo Contributors

Again, like with the other parts of the API, we can access user objects for contributors, committers, and other things that people do with repos. Any user field can be accessed for these users when you iterate through the list.

In [None]:
pa = g.get_repo("ossf/package-analysis")
pa_contrib = pa.get_contributors()

In [None]:
for person in pa_contrib:
    print(person.login, person.name, person.company)

## PRs and Issues - not the data you might think!

The GitHub REST API combines some issues and pull request info into "issues".

In [None]:
# Example: Visit https://github.com/ossf/scorecard and look at open PRs / Issues

# This gives us a combined count 
print(sc.open_issues_count)

In [None]:
# You can get the PRs and issues this way
prs = sc.get_pulls()
issues = sc.get_issues()

In [None]:
# And then you can count them, but note that these are only the open counts.
# When you use get_pulls() or get_issues(), it defaults to only the open ones.
print(prs.totalCount)
print(issues.totalCount)

In [None]:
# This gives you all of the PRs. You can also have state of "open" or "closed"
prs_all = sc.get_pulls(state="all")
print(prs_all.totalCount)

In [None]:
# As with any of the paginated lists, you can loop through to get detailed info
for pr in prs:
    print(pr.user.login, pr.title)

## Ethical Use Reminder

Please adhere to the GitHub Acceptable Use Policies:
https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies