##**GitHub API Data Extraction**
This notebook demonstrates how to authenticate, extract, and handle data from the GitHub REST API v3 for the public repository `josephmisiti/awesome-machine-learning`.

# **1. Authentication setup**

In [27]:
import requests
from getpass import getpass
import time

# Prompt for GitHub token securely
GITHUB_TOKEN = getpass("Enter your github token:")
HEADERS = {
    "Authorization": f"token {GITHUB_TOKEN}",
    "Accept": "application/vnd.github.v3+json"
}

def verify_token():
    test_url = "https://api.github.com/user"
    response = requests.get(test_url, headers=HEADERS)
    if response.status_code == 200:
        print("Authorized. Token is valid.")
        return True
    elif response.status_code == 401:
        print("Unauthorized. Please check your token.")
        return False
    else:
        print(f"Unexpected error: {response.status_code}")
        return False
verify_token()

REPO_OWNER = "josephmisiti"
REPO_NAME = "awesome-machine-learning"
BASE_URL = f"https://api.github.com/repos/{REPO_OWNER}/{REPO_NAME}"



Enter your github token:··········
Unauthorized. Please check your token.


#**2. Repository information**

In [21]:
def get_repo_info():
    url = BASE_URL
    response = requests.get(url, headers=HEADERS)
    return response.json()

repo_info = get_repo_info()
print("Repository name:", repo_info["full_name"])
print("Stars:", repo_info["stargazers_count"])
print("Forks:", repo_info["forks_count"])
print("Created at:", repo_info["created_at"])


Repository name: josephmisiti/awesome-machine-learning
Stars: 68678
Forks: 14974
Created at: 2014-07-15T19:11:19Z


#**3. Commits with pagination**

In [22]:
def get_commits(per_page=100, max_pages=2):
    commits = []
    for page in range(1, max_pages + 1):
        url = f"{BASE_URL}/commits?per_page={per_page}&page={page}"
        response = requests.get(url, headers=HEADERS)
        if response.status_code != 200:
            print("Error on page", page, "-", response.status_code)
            break
        commits_page = response.json()
        if not commits_page:
            break
        commits.extend(commits_page)
        print(f"Page {page}: {len(commits_page)} commits")
    return commits

commits = get_commits()
print("Total commits retrieved:", len(commits))
if commits:
    print("Sample commit message:", commits[0]["commit"]["message"])


Page 1: 100 commits
Page 2: 100 commits
Total commits retrieved: 200
Sample commit message: Merge pull request #1051 from Morgan-Sell/master

Add "Python Feature Engineering Cookbook" to the ML section on books.md


#**4. Repository contents**

In [23]:
def get_contents(path=""):
    url = f"{BASE_URL}/contents/{path}"
    response = requests.get(url, headers=HEADERS)
    return response.json()

contents = get_contents()
print("Root directory content:")
for item in contents:
    print("-", item["name"], "(type:", item["type"] + ")")

Root directory content:
- LICENSE (type: file)
- README.md (type: file)
- blogs.md (type: file)
- books.md (type: file)
- courses.md (type: file)
- events.md (type: file)
- meetups.md (type: file)
- ml-curriculum.md (type: file)
- scripts (type: dir)


#**5. Rate limit status**

In [24]:
def check_rate_limit():
    url = "https://api.github.com/rate_limit"
    response = requests.get(url, headers=HEADERS)
    if response.status_code == 200:
        data = response.json()
        print("Remaining requests:", data['rate']['remaining'])
        print("Rate limit resets at:", data['rate']['reset'])
    else:
        print("Failed to retrieve rate limit status.")

check_rate_limit()

Remaining requests: 4986
Rate limit resets at: 1751059319


## **6. Retry with exponential backoff**

In [25]:
def safe_get(url, max_retries=3):
    delay = 1
    for attempt in range(max_retries):
        response = requests.get(url, headers=HEADERS)
        if response.status_code == 200:
            return response
        elif response.status_code >= 500:
            print(f"Server error {response.status_code}, retrying in {delay} seconds...")
            time.sleep(delay)
            delay *= 2
        else:
            print(f"Request failed with status {response.status_code}")
            return response
    print("Max retries exceeded.")
    return None

#**7. Easy error handling example**

In [26]:
def get_repo_info_safe():
    url = BASE_URL
    response = requests.get(url, headers=HEADERS)
    if response.status_code == 200:
        return response.json()
    elif response.status_code == 401:
        print("Unauthorized: Check your GitHub token.")
    elif response.status_code == 403:
        print("Rate limit exceeded. Try again later.")
    else:
        print(f"Error {response.status_code}: {response.text}")
    return None

# Call and print result only if successful
repo_info = get_repo_info_safe()
if repo_info:
    print("Repository name:", repo_info["full_name"])
    print("Stars:", repo_info["stargazers_count"])
    print("Forks:", repo_info["forks_count"])
    print("Created at:", repo_info["created_at"])

Repository name: josephmisiti/awesome-machine-learning
Stars: 68678
Forks: 14974
Created at: 2014-07-15T19:11:19Z
