Copyright 2022 VMware, Inc. 

SPDX-License-Identifier: BSD-2-Clause

# Gathering Data about GitHub Organizations (REST API)

Demo: getting data about organizations (description, creation date, …)

Learn More:

* [PyGithub organization documentation](https://pygithub.readthedocs.io/en/latest/github_objects/Organization.html)
* [GitHub REST API documentation for orgs](https://docs.github.com/en/rest/orgs/orgs)

In [12]:
# Setup: read personal access token from gh_key and create GitHub Instance
# You'll need to do this in each notebook

# Import PyGithub library
from github import Github

# Open your gh_key file and read the personal access token into a variable
with open('gh_key', 'r') as kf:
    key = kf.readline().rstrip() # remove newline & trailing whitespace

# Use your personal access token to create a GitHub instance
g = Github(key)

In [13]:
# Get details about an organization
rh = g.get_organization("RedHatOfficial")
print(rh.name)
print(rh.description)
print(rh.created_at)

Red Hat
The official GitHub account for Red Hat.
2017-11-25 02:39:58


In [15]:
# Show other fields here (blog)
print()





## Get organization members

Note: This will only return public organization members unless you have permission to see private members for an organization

In [16]:
# Get the members for the Red Hat org used above
rh_members = rh.get_members()

# As expected, we can see that this returns a list
print(rh_members)

<github.PaginatedList.PaginatedList object at 0x1063107f0>


In [17]:
# Need to loop through the results
# We can access any field for a user object as discussed earlier

for person in rh_members:
    print(person.login, person.name, person.updated_at)

bproffitt Brian Proffitt 2022-10-11 18:22:05
dmc5179 Dan Clark 2022-10-19 09:57:42
eschabell Eric D. Schabell 2022-10-14 09:02:07
Fryguy Jason Frey 2022-10-24 11:18:37
starryeyez024 Kendall Totten 2022-07-13 15:38:07
suehle Ruth Suehle 2022-10-11 18:29:50


## Get repos from an organization

In [18]:
# Get a GitHub object for the GH org as did above
lfph = g.get_organization("lfph")
print(lfph.name)

# Get the repos for that GH org
lfph_repos = lfph.get_repos()

LF Public Health


In [19]:
# You can also do this in one step
lfph_repos = g.get_organization("lfph").get_repos()

In [20]:
# As expected, this is another paginated list object
print(lfph_repos)

<github.PaginatedList.PaginatedList object at 0x10640bb50>


In [21]:
# Need to loop through the results of the list
for repo in lfph_repos:
    print(repo.name, repo.updated_at, repo.pushed_at)

lfph-landscape 2022-08-02 22:01:19 2022-10-28 04:53:31
artwork 2021-11-15 17:32:18 2021-11-15 17:32:14
lfph.io 2022-06-30 03:54:41 2022-10-18 15:26:57
sig-contributor-experience 2022-09-07 20:43:36 2021-04-16 18:11:16
implementers-forum 2022-10-06 05:16:31 2022-09-30 19:00:12
foundation 2022-01-04 17:48:46 2022-09-18 18:20:28
sig-design 2022-03-12 18:56:39 2021-09-08 16:27:27
events 2021-04-16 18:10:55 2021-04-16 18:10:50
tac 2022-09-01 21:53:13 2021-06-15 18:21:18
gaen-risk-scoring 2022-06-06 21:31:56 2022-06-07 16:14:10
enx-translations 2021-01-14 21:57:18 2022-07-18 20:29:11
cci-community 2021-02-25 18:09:12 2021-02-25 18:09:10
GCCN-POC 2022-06-17 16:21:42 2022-06-17 16:17:44


## Brief Caution about date fields in GitHub

Sometimes they don't mean what you think they do. 

Example: updated_at is the last time the object was updated (not the most recent commit / PR):
* For users: the last time they updated their profile or other account info.
* For repos: you can see that pushed_at and updated_at are often different.

I recommend manually verifying that the date is telling you what you think it is.

## Bonus Content: GitHub CLI API Calls for Organization Data

Reminder: You'll need to [install and configure](https://cli.github.com/manual/) the GitHub CLI before running this.

In [None]:
# ! is used to run a shell / terminal command.
# You could easily run this in a terminal, instead of a notebook.

!gh api orgs/redhatofficial

In [None]:
!gh api orgs/redhatofficial/members

## Ethical Use Reminder

Please adhere to the GitHub Acceptable Use Policies:
https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies

## Key Takeaways

* Be careful about how you use date fields from the GitHub API. They aren't well documented in the GH REST API and should probably be manually verified.
* Accessing repositories, users, and other objects from an organization object allows you to access any fields you would normally have access to from that object.
* Your personal access token can only access the information that you have access to, so results on things like organizations might differ depending on whether you have special access to an org or only public access. 