# Description

 This notebook researches the abilities of GitHub API.
 https://docs.github.com/en/rest?apiVersion=2022-11-28

Obviously, the actual list of github API endpoints is much longer than presented in this notebook.
The data from the endpoints presented here are assumed to have the biggest predictive power.

Endpoints used in this notebook (using `bitcoin` repo as an example):
1) https://api.github.com/repos/bitcoin/bitcoin - common repo info
2) https://api.github.com/repos/bitcoin/bitcoin/stats/commit_activity - last yeat of commit activity
3) https://api.github.com/repos/bitcoin/bitcoin/stats/code_frequency - historical weekly aggregate of commits
4) https://api.github.com/repos/bitcoin/bitcoin/stats/participation - total commits of repo owner vs. non owners
5) https://api.github.com/repos/bitcoin/bitcoin/stats/punch_card - hourly commit count for the last week
6) https://api.github.com/repos/bitcoin/bitcoin/issues - open issues
7) https://api.github.com/search/repositories?q=blockchain - search repos by `blockchain keyword`
8) https://api.github.com/rate_limit - search api rate limits

# Imports

In [1]:
import logging

import requests

import helpers.hdbg as hdbg
import helpers.henv as henv
import helpers.hprint as hprint

In [2]:
hdbg.init_logger(verbosity=logging.INFO)

_LOG = logging.getLogger(__name__)

_LOG.info("%s", henv.get_system_signature()[0])

hprint.config_notebook()

[0m[36mINFO[0m: > cmd='/venv/lib/python3.8/site-packages/ipykernel_launcher.py -f /home/.local/share/jupyter/runtime/kernel-31e4f1cc-9cfc-464d-88ee-0285b59703a1.json'
INFO  # Git
  branch_name='CmTask3220_Add_new_data_sources'
  hash='06f369e06'
  # Last commits:
    * 06f369e06 tamriq   CmTask3220: research github api                                   (60 minutes ago) Mon Dec 5 11:51:30 2022  (HEAD -> CmTask3220_Add_new_data_sources, origin/CmTask3220_Add_new_data_sources)
    * 356837ceb tamriq   CmTask3220: research github api                                   (   2 hours ago) Mon Dec 5 11:17:42 2022           
    * 97285c9af Juraj Smeriga Change labelling of bar [a, b) to 'b' (#3314)                     (    6 days ago) Tue Nov 29 18:34:09 2022           
# Machine info
  system=Linux
  node name=7f012608dfa4
  release=5.15.0-1023-aws
  version=#27~20.04.1-Ubuntu SMP Wed Oct 26 20:02:26 UTC 2022
  machine=x86_64
  processor=x86_64
  cpu count=8
  cpu freq=scpufreq(current=2499.

# Common repository info

In [4]:
common = requests.get("https://api.github.com/repos/bitcoin/bitcoin").json()
display(common)

{'id': 1181927,
 'node_id': 'MDEwOlJlcG9zaXRvcnkxMTgxOTI3',
 'name': 'bitcoin',
 'full_name': 'bitcoin/bitcoin',
 'private': False,
 'owner': {'login': 'bitcoin',
  'id': 528860,
  'node_id': 'MDEyOk9yZ2FuaXphdGlvbjUyODg2MA==',
  'avatar_url': 'https://avatars.githubusercontent.com/u/528860?v=4',
  'gravatar_id': '',
  'url': 'https://api.github.com/users/bitcoin',
  'html_url': 'https://github.com/bitcoin',
  'followers_url': 'https://api.github.com/users/bitcoin/followers',
  'following_url': 'https://api.github.com/users/bitcoin/following{/other_user}',
  'gists_url': 'https://api.github.com/users/bitcoin/gists{/gist_id}',
  'starred_url': 'https://api.github.com/users/bitcoin/starred{/owner}{/repo}',
  'subscriptions_url': 'https://api.github.com/users/bitcoin/subscriptions',
  'organizations_url': 'https://api.github.com/users/bitcoin/orgs',
  'repos_url': 'https://api.github.com/users/bitcoin/repos',
  'events_url': 'https://api.github.com/users/bitcoin/events{/privacy}',
  'rece

# Stars

Get the current number of stars for the repository.

In [5]:
display(common["stargazers_count"])

67250

# Commits

## /commit_activity

Returns the last year of commit activity grouped by week. The days array is a group of commits per day, starting on Sunday.



In [6]:
commits_yearly = requests.get(
    "https://api.github.com/repos/bitcoin/bitcoin/stats/commit_activity"
).json()

E.g. in the array [8, 11, 10, 25, 5, 13, 2] 8 is the number of commits for Sun, 11 - for Monday, 10 - for Tuesday, 25 - for Wednesday, 5 - for Thursday, 13 - for Friday and 2 - for Saturday

In [7]:
display(commits_yearly[:5])

[{'total': 66, 'week': 1639267200, 'days': [8, 13, 17, 9, 7, 7, 5]},
 {'total': 42, 'week': 1639872000, 'days': [4, 1, 8, 13, 6, 7, 3]},
 {'total': 20, 'week': 1640476800, 'days': [1, 7, 1, 1, 5, 3, 2]},
 {'total': 55, 'week': 1641081600, 'days': [11, 11, 7, 12, 9, 5, 0]},
 {'total': 55, 'week': 1641686400, 'days': [0, 7, 17, 8, 8, 10, 5]}]

## /code_frequency

Returns a historical weekly aggregate of the number of additions and deletions pushed to a repository.

In [9]:
all_commits_weekly_aggregated = requests.get(
    "https://api.github.com/repos/bitcoin/bitcoin/stats/code_frequency"
).json()

In [12]:
# First date Sun Aug 30 2009 00:00:00 GMT+0000, but common info says that repository was created on '2010-12-19T15:16:43Z'
# How is it possible?
print("First five weeks:")
display(all_commits_weekly_aggregated[:5])
print("Last five weeks:")
display(all_commits_weekly_aggregated[-5:])

First five weeks:


[[1251590400, 64482, 0],
 [1252195200, 0, 0],
 [1252800000, 0, 0],
 [1253404800, 6724, -4890],
 [1254009600, 396, -50]]

Last five weeks:


[[1667692800, 1320, -1055],
 [1668297600, 1015, -317],
 [1668902400, 563, -453],
 [1669507200, 108, -72],
 [1670112000, 0, 0]]

## /participation

Returns the total commit counts for the owner and total commit counts in all. all is everyone combined, including the owner in the last 52 weeks. If you'd like to get the commit counts for non-owners, you can subtract owner from all.

The array order is oldest week (index 0) to most recent week.

In [13]:
total_commits = requests.get(
    "https://api.github.com/repos/bitcoin/bitcoin/stats/participation"
).json()
display(total_commits)

{'all': [133,
  103,
  58,
  51,
  86,
  71,
  86,
  89,
  94,
  97,
  109,
  88,
  55,
  97,
  91,
  117,
  90,
  129,
  96,
  98,
  115,
  64,
  62,
  103,
  77,
  62,
  81,
  71,
  89,
  102,
  60,
  78,
  93,
  73,
  53,
  69,
  64,
  50,
  56,
  49,
  71,
  54,
  41,
  62,
  51,
  41,
  50,
  27,
  15,
  33,
  26,
  39],
 'owner': [0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0]}

## /punch_card

Get the hourly commit count for each day of the last week.


Each array contains the day number, hour number, and number of commits:

0-6: Sunday - Saturday
0-23: Hour of day
Number of commits
For example, [2, 14, 25] indicates that there were 25 total commits, during the 2:00pm hour on Tuesdays. All times are based on the time zone of individual commits.

In [14]:
hourly_commits = requests.get(
    "https://api.github.com/repos/bitcoin/bitcoin/stats/punch_card"
).json()
display(hourly_commits)

[[0, 0, 48],
 [0, 1, 40],
 [0, 2, 41],
 [0, 3, 9],
 [0, 4, 16],
 [0, 5, 12],
 [0, 6, 12],
 [0, 7, 9],
 [0, 8, 22],
 [0, 9, 46],
 [0, 10, 79],
 [0, 11, 98],
 [0, 12, 90],
 [0, 13, 84],
 [0, 14, 98],
 [0, 15, 99],
 [0, 16, 93],
 [0, 17, 104],
 [0, 18, 101],
 [0, 19, 81],
 [0, 20, 90],
 [0, 21, 94],
 [0, 22, 80],
 [0, 23, 72],
 [1, 0, 59],
 [1, 1, 43],
 [1, 2, 28],
 [1, 3, 20],
 [1, 4, 12],
 [1, 5, 13],
 [1, 6, 6],
 [1, 7, 20],
 [1, 8, 58],
 [1, 9, 161],
 [1, 10, 179],
 [1, 11, 223],
 [1, 12, 202],
 [1, 13, 231],
 [1, 14, 344],
 [1, 15, 331],
 [1, 16, 301],
 [1, 17, 217],
 [1, 18, 189],
 [1, 19, 154],
 [1, 20, 135],
 [1, 21, 126],
 [1, 22, 106],
 [1, 23, 92],
 [2, 0, 80],
 [2, 1, 40],
 [2, 2, 34],
 [2, 3, 18],
 [2, 4, 16],
 [2, 5, 9],
 [2, 6, 20],
 [2, 7, 53],
 [2, 8, 68],
 [2, 9, 177],
 [2, 10, 232],
 [2, 11, 254],
 [2, 12, 245],
 [2, 13, 273],
 [2, 14, 298],
 [2, 15, 369],
 [2, 16, 289],
 [2, 17, 305],
 [2, 18, 196],
 [2, 19, 177],
 [2, 20, 159],
 [2, 21, 150],
 [2, 22, 117],
 [2, 23, 8

# ISSUES

List issues in a repository. Only open issues will be listed.

Note: GitHub's REST API considers every pull request an issue, but not every issue is a pull request. For this reason, "Issues" endpoints may return both issues and pull requests in the response. You can identify pull requests by the pull_request key. Be aware that the id of a pull request returned from "Issues" endpoints will be an issue id. To find out the pull request id, use the "List pull requests" endpoint.

In [15]:
issues = requests.get(
    "https://api.github.com/repos/bitcoin/bitcoin/issues"
).json()

In [16]:
display(len(issues))

30

# Search

The Search API has a custom rate limit. For requests using Basic Authentication, OAuth, or client ID and secret, you can make up to 30 requests per minute. For unauthenticated requests, the rate limit allows you to make up to 10 requests per minute.

## Repositories

The search can be performed not only on repositories, we also can search:
1) Code `https://api.github.com/search/code?q=Q`
2) By label in the specific repo `https://api.github.com/search/labels?repository_id=REPOSITORY_ID&q=Q`
3) Issues and PRs `https://api.github.com/search/issues?q=Q`
4) Commits `https://api.github.com/search/commits?q=Q`
5) Users `https://api.github.com/search/users?q=Q`
6) Topics `https://api.github.com/search/topics?q=Q`

In [18]:
query = "blockchain"
search_repos = requests.get(
    f"https://api.github.com/search/repositories?q={query}"
).json()
display(search_repos["total_count"])

152676

In [19]:
# First result for "blockchain" search query.
display(search_repos["items"][:1])

[{'id': 104670977,
  'node_id': 'MDEwOlJlcG9zaXRvcnkxMDQ2NzA5Nzc=',
  'name': 'blockchain',
  'full_name': 'dvf/blockchain',
  'private': False,
  'owner': {'login': 'dvf',
   'id': 1169974,
   'node_id': 'MDQ6VXNlcjExNjk5NzQ=',
   'avatar_url': 'https://avatars.githubusercontent.com/u/1169974?v=4',
   'gravatar_id': '',
   'url': 'https://api.github.com/users/dvf',
   'html_url': 'https://github.com/dvf',
   'followers_url': 'https://api.github.com/users/dvf/followers',
   'following_url': 'https://api.github.com/users/dvf/following{/other_user}',
   'gists_url': 'https://api.github.com/users/dvf/gists{/gist_id}',
   'starred_url': 'https://api.github.com/users/dvf/starred{/owner}{/repo}',
   'subscriptions_url': 'https://api.github.com/users/dvf/subscriptions',
   'organizations_url': 'https://api.github.com/users/dvf/orgs',
   'repos_url': 'https://api.github.com/users/dvf/repos',
   'events_url': 'https://api.github.com/users/dvf/events{/privacy}',
   'received_events_url': 'https:

# Rate Limit

The Search API has a custom rate limit, separate from the rate limit governing the rest of the REST API. The GraphQL API also has a custom rate limit that is separate from and calculated differently than rate limits in the REST API.

For these reasons, the Rate Limit API response categorizes your rate limit. Under resources, you'll see four objects:

The core object provides your rate limit status for all non-search-related resources in the REST API.

The search object provides your rate limit status for the Search API.

The graphql object provides your rate limit status for the GraphQL API.

The integration_manifest object provides your rate limit status for the GitHub App Manifest code conversion endpoint.

In [21]:
rate_limit = requests.get("https://api.github.com/rate_limit").json()
display(rate_limit)

{'resources': {'core': {'limit': 60,
   'remaining': 53,
   'reset': 1670248296,
   'used': 7,
   'resource': 'core'},
  'graphql': {'limit': 0,
   'remaining': 0,
   'reset': 1670248490,
   'used': 0,
   'resource': 'graphql'},
  'integration_manifest': {'limit': 5000,
   'remaining': 5000,
   'reset': 1670248490,
   'used': 0,
   'resource': 'integration_manifest'},
  'search': {'limit': 10,
   'remaining': 8,
   'reset': 1670244900,
   'used': 2,
   'resource': 'search'}},
 'rate': {'limit': 60,
  'remaining': 53,
  'reset': 1670248296,
  'used': 7,
  'resource': 'core'}}