Copyright 2022 VMware, Inc. 

SPDX-License-Identifier: BSD-2-Clause

# Gathering Data about Repositories (REST API)

Demo: accessing data about repositories (last updated, open issues / PRs, license, …)

Learn more:
* [PyGithub repository documentation](https://pygithub.readthedocs.io/en/latest/github_objects/Repository.html)
* [GitHub REST API Documentation for repos](https://docs.github.com/en/rest/repos/repos)

In [21]:
# Setup: read personal access token from gh_key and create GitHub Instance
# You'll need to do this in each notebook

# Import PyGithub library
from github import Github

# Open your gh_key file and read the personal access token into a variable
with open('gh_key', 'r') as kf:
    key = kf.readline().rstrip() # remove newline & trailing whitespace

# Use your personal access token to create a GitHub instance
g = Github(key)

In [22]:
sc = g.get_repo("ossf/scorecard")
print(sc.name, sc.homepage, sc.stargazers_count)

scorecard https://securityscorecards.dev 2924


In [23]:
sc_lang = sc.get_languages()
print(sc_lang)

{'Go': 1163317, 'Makefile': 19663, 'Dockerfile': 8022, 'Shell': 1500, 'JavaScript': 736}


In [24]:
# The value shown for each language is the number of bytes of code written in that language.
print(type(sc_lang))

<class 'dict'>


In [25]:
# Since it returns a Python dictionary, you could also access it by element
print(sc_lang['Shell'])

1500


In [26]:
# Show other fields here



In [27]:
sc_lic = sc.get_license()

print(sc_lic.name, sc_lic.url)

LICENSE https://api.github.com/repos/ossf/scorecard/contents/LICENSE?ref=main


In [28]:
# Other licence fields - content vs. decoded content



## Repo Contributors

Again, like with the other parts of the API, we can access user objects for contributors, committers, and other things that people do with repos. Any user field can be accessed for these users when you iterate through the list.

In [29]:
pa = g.get_repo("ossf/package-analysis")
pa_contrib = pa.get_contributors()
print(pa_contrib)

<github.PaginatedList.PaginatedList object at 0x106d10790>


In [30]:
for person in pa_contrib:
    print(person.login, person.name, person.company)

dependabot[bot] None None
calebbrown Caleb Brown None
oliverchang Oliver Chang @google 
dlorenc None None
naveensrinivasan Naveen None
jordan-wright Jordan Wright None
Qinusty Josh None
maxfisher-g Max Fisher @google
tom--pollard Tom Pollard Codethink
david-a-wheeler David A. Wheeler Linux Foundation
case Eric Case nb.io
another-rex Rex P None
steiza Zach Steindler None
olivekl None None


## PRs and Issues - not the data you might think!

The GitHub REST API combines some issues and pull request info into "issues".

In [31]:
# Example: Visit https://github.com/ossf/scorecard and look at open PRs / Issues

# This gives us a combined count 
print(sc.open_issues_count)

256


In [32]:
# You can get the PRs and issues this way
prs = sc.get_pulls()
issues = sc.get_issues()

In [33]:
# And then you can count them, but note that these are only the open counts.
# When you use get_pulls() or get_issues(), it defaults to only the open ones.
print(prs.totalCount)
print(issues.totalCount)

12
256


In [34]:
# This gives you all of the PRs. You can also have state of "open" or "closed"
prs_all = sc.get_pulls(state="all")
print(prs_all.totalCount)

1632


In [35]:
# As with any of the paginated lists, you can loop through to get detailed info
for pr in prs:
    print(pr.user.login, pr.title)

dependabot[bot] :seedling: Bump ossf/scorecard-action from 2.0.0.pre.alpha.2 to 2.0.4
raghavkaul 🌱 Split CI-Tests check into a raw and evaluation section
N8BWert :sparkles: Gitlab support (Part 2) - Tests
naveensrinivasan :seedling: Format the JSON output
ethanent 🐛 Skip in-progress gradle/wrapper-validation-action runs
naveensrinivasan :seedling: Fix maintainer activity upto last year
shissam :sparkles: Improved Security Policy Check
aidenwang9867 ✨ Feature [experimental]: The Scorecard Dependencydiff CLI (Version 0 Part 2)
aidenwang9867 ✨ Feature [experimental]: The Scorecard Dependencydiff CLI (Version 0 Part 1)
aidenwang9867 :sparkles: Support for C++ fuzz functions in the fuzzing check, add more const LanguageNames for clients
naveensrinivasan :sparkles: Validation for command-line flags
laurentsimon ✨ Support more SAST tools


## Bonus Content: GitHub CLI API Calls for Repo Data

Reminder: You'll need to [install and configure](https://cli.github.com/manual/) the GitHub CLI before running this.

In [None]:
# ! is used to run a shell / terminal command.
# You could easily run this in a terminal, instead of a notebook.

!gh api repos/ossf/scorecard

In [None]:
!gh api repos/ossf/scorecard/languages

## Ethical Use Reminder

Please adhere to the GitHub Acceptable Use Policies:
https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies

## Key Takeaways

* Be careful about how you interpret PR / Issue data from repositories - it may not mean what you think it does!
* Encourage to validate what you think you're getting using the website.
* There is so much info that you can get about repositories from the API. I encourage you to explore what else is available.