Copyright 2022 VMware, Inc. 

SPDX-License-Identifier: BSD-2-Clause

# Gathering Data about GitHub Organizations (REST API)

Demo: getting data about organizations (description, creation date, …)

Learn More:

* [PyGithub organization documentation](https://pygithub.readthedocs.io/en/latest/github_objects/Organization.html)
* [GitHub REST API documentation for orgs](https://docs.github.com/en/rest/orgs/orgs)

In [None]:
# Setup: read personal access token from gh_key and create GitHub Instance
# You'll need to do this in each notebook

# Import PyGithub library
from github import Github

# Open your gh_key file and read the personal access token into a variable
with open('gh_key', 'r') as kf:
    key = kf.readline().rstrip() # remove newline & trailing whitespace

# Use your personal access token to create a GitHub instance
g = Github(key)

In [None]:
# Get details about an organization
rh = g.get_organization("RedHatOfficial")
print(rh.name)
print(rh.description)
print(rh.created_at)

In [None]:
# Show other fields here (blog)



## Get organization members

Note: This will only return public organization members unless you have permission to see private members for an organization

In [None]:
# Get the members for the Red Hat org used above
rh_members = rh.get_members()

# As expected, we can see that this returns a list
print(rh_members)

In [None]:
# Need to loop through the results
# We can access any field for a user object as discussed earlier

for person in rh_members:
    print(person.login, person.name, person.updated_at)

## Get repos from an organization

In [None]:
# Get a GitHub object for the GH org as did above
lfph = g.get_organization("lfph")
print(lfph.name)

# Get the repos for that GH org
lfph_repos = lfph.get_repos()

In [None]:
# You can also do this in one step
lfph_repos = g.get_organization("lfph").get_repos()

In [None]:
# As expected, this is another paginated list object
print(lfph_repos)

In [None]:
# Need to loop through the results of the list
for repo in lfph_repos:
    print(repo.name, repo.updated_at, repo.pushed_at)

## Brief Caution about date fields in GitHub

Sometimes they don't mean what you think they do. 

Example: updated_at is the last time the object was updated (not the most recent commit / PR):
* For users: the last time they updated their profile or other account info.
* For repos: you can see that pushed_at and updated_at are often different.

I recommend manually verifying that the date is telling you what you think it is.

## Bonus Content: GitHub CLI API Calls for Organization Data

Reminder: You'll need to [install and configure](https://cli.github.com/manual/) the GitHub CLI before running this.

In [None]:
# ! is used to run a shell / terminal command.
# You could easily run this in a terminal, instead of a notebook.

!gh api orgs/redhatofficial

In [None]:
!gh api orgs/redhatofficial/members

## Ethical Use Reminder

Please adhere to the GitHub Acceptable Use Policies:
https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies

## Key Takeaways

* Be careful about how you use date fields from the GitHub API. They aren't well documented in the GH REST API and should probably be manually verified.
* Accessing repositories, users, and other objects from an organization object allows you to access any fields you would normally have access to from that object.
* Your personal access token can only access the information that you have access to, so results on things like organizations might differ depending on whether you have special access to an org or only public access. 