##**GitHub API Data Extraction**
This notebook demonstrates how to authenticate, extract, and handle data from the GitHub REST API v3 for the public repository `josephmisiti/awesome-machine-learning`.

# **1. Authentication with github token**

In [13]:

import requests
GITHUB_TOKEN = "PUT_YOUR_TOKEN"
#You have to replace PUT_YOUR_TOKEN with your real token at github
HEADERS = {
    "Authorization": f"token {GITHUB_TOKEN}",
    "Accept": "application/vnd.github.v3+json"
}

REPO_OWNER = "josephmisiti"
REPO_NAME = "awesome-machine-learning"
BASE_URL = f"https://api.github.com/repos/{REPO_OWNER}/{REPO_NAME}"



#**2. Repository information**

In [8]:
def get_repo_info():
    url = BASE_URL
    response = requests.get(url, headers=HEADERS)
    return response.json()

repo_info = get_repo_info()
print("Repository name:", repo_info["full_name"])
print("Stars:", repo_info["stargazers_count"])
print("Forks:", repo_info["forks_count"])
print("Created at:", repo_info["created_at"])


Repository name: josephmisiti/awesome-machine-learning
Stars: 68647
Forks: 14973
Created at: 2014-07-15T19:11:19Z


#**3. Get commits (with pagination)**

In [9]:
def get_commits(per_page=100, max_pages=2):
    commits = []
    for page in range(1, max_pages + 1):
        url = f"{BASE_URL}/commits?per_page={per_page}&page={page}"
        response = requests.get(url, headers=HEADERS)
        if response.status_code != 200:
            print("Failed at page", page)
            break
        commits.extend(response.json())
    return commits

commits = get_commits()
print("Total commits retrieved:", len(commits))
print("Sample commit message:", commits[0]["commit"]["message"])


Total commits retrieved: 200
Sample commit message: Merge pull request #1051 from Morgan-Sell/master

Add "Python Feature Engineering Cookbook" to the ML section on books.md


#**4. Get repository contents**

In [10]:
def get_contents(path=""):
    url = f"{BASE_URL}/contents/{path}"
    response = requests.get(url, headers=HEADERS)
    return response.json()

contents = get_contents()
print("Root directory content:")
for item in contents:
    print("-", item["name"], "(type:", item["type"] + ")")

Root directory content:
- LICENSE (type: file)
- README.md (type: file)
- blogs.md (type: file)
- books.md (type: file)
- courses.md (type: file)
- events.md (type: file)
- meetups.md (type: file)
- ml-curriculum.md (type: file)
- scripts (type: dir)


#**5. Error handling example**

In [11]:
def get_repo_info_safe():
    url = BASE_URL
    response = requests.get(url, headers=HEADERS)
    if response.status_code == 200:
        return response.json()
    elif response.status_code == 401:
        print("Unauthorized: Check your token")
    elif response.status_code == 403:
        print("Rate limit exceeded")
    else:
        print("Error", response.status_code)
    return None

repo_info_safe = get_repo_info_safe()