# 🚀 Hands-On Lecture: Exploring the GitHub API in Python with Google Colab 🐍

Welcome to the Hands-on lecture on GitHub API magic!  With the power of Python, we’ll unlock new skills to interact with GitHub repositories like true software engineering pros.

**What’s on the Map?**

Here’s what we’ll uncover in this session:

  🛠️ Building API Superpowers: Dive into exciting use cases, including:

*  Fetching and analyzing issues and comments.
*  Accessing code and repositories programmatically.
*  Exploring advanced operations to automate your workflows and many more.


💡 **Why This Matters**

Imagine automating tedious tasks, analyzing repository data like a detective, or building tools that integrate directly with GitHub. The GitHub API opens up limitless possibilities for innovation in software engineering. By the end of this session, you’ll have the tools to transform your ideas into powerful automations!

## Let's retrieve some trending projects from GitHub

GitHub does not allow to collect trending projects through GitHub API. So, we will do web scrapping. Let's see how it is done.

In [None]:
import requests
from bs4 import BeautifulSoup


def fetch_trending_python_projects(time_period="daily", spoken_language="en", limit=5):
    # Validate the time period
    if time_period not in ["daily", "weekly", "monthly"]:
        print("Invalid time period. Please choose from 'daily', 'weekly', or 'monthly'.")
        return

    # Construct the URL with time period and spoken language
    url = f"https://github.com/trending/python?since={time_period}&spoken_language_code={spoken_language}"
    print(url)
    try:
        response = requests.get(url)
        response.raise_for_status()

        # Parse the HTML content
        soup = BeautifulSoup(response.text, "html.parser")

        # Find repository entries
        projects = soup.find_all("article", class_="Box-row")

        print(
            f"Top {limit} Trending Python Projects on GitHub ({time_period.capitalize()}, Spoken Language: English):\n"
        )
        for i, project in enumerate(projects[:limit]):  # Limit to the top 'limit' projects
            # Extract repository name
            repo_name_tag = project.find("h2", class_="h3 lh-condensed").find("a")
            repo_name = repo_name_tag.text.strip().replace("\n", "").replace(" ", "")

            # Extract repository URL
            repo_url = f"https://github.com{repo_name_tag['href']}"

            # Extract description
            description_tag = project.find("p", class_="col-9 color-fg-muted my-1 pr-4")
            description = description_tag.text.strip() if description_tag else "No description provided"

            # Extract stars
            stars_tag = project.find("a", href=lambda x: x and x.endswith("/stargazers"))
            stars = stars_tag.text.strip() if stars_tag else "0"

            print(f"Name: {repo_name}")
            print(f"Description: {description}")
            print(f"Stars: {stars}")
            print(f"Repo URL: {repo_url}")
            print("-" * 50)
    except requests.exceptions.RequestException as e:
        print(f"Error: {e}")


if __name__ == "__main__":
    print("Available time periods: 'daily', 'weekly', 'monthly'")
    time_period = "daily"
    fetch_trending_python_projects(time_period=time_period, spoken_language="en", limit=5)

Available time periods: 'daily', 'weekly', 'monthly'
https://github.com/trending/python?since=daily&spoken_language_code=en
Top 5 Trending Python Projects on GitHub (Daily, Spoken Language: English):

Name: benbusby/whoogle-search
Description: A self-hosted, ad-free, privacy-respecting metasearch engine
Stars: 10,276
Repo URL: https://github.com/benbusby/whoogle-search
--------------------------------------------------
Name: OpenBB-finance/OpenBB
Description: Investment Research for Everyone, Everywhere.
Stars: 35,438
Repo URL: https://github.com/OpenBB-finance/OpenBB
--------------------------------------------------
Name: AUTOMATIC1111/stable-diffusion-webui
Description: Stable Diffusion web UI
Stars: 146,149
Repo URL: https://github.com/AUTOMATIC1111/stable-diffusion-webui
--------------------------------------------------
Name: make-all/tuya-local
Description: Local support for Tuya devices in Home Assistant
Stars: 1,583
Repo URL: https://github.com/make-all/tuya-local
------------

## 🔑 Creating a GitHub Classic API Token

To interact with GitHub's API, you need a **Personal Access Token (PAT)**, which acts as a secure key for authentication. Here's how you can generate one:

### ✨ Step-by-Step Guide:  

1. **Login to GitHub**: Start by logging into your GitHub account.  
2. **Navigate to Settings**:  
   - Click on your profile picture in the top-right corner.  
   - Select **Settings** from the dropdown menu.  

3. **Go to Developer Settings**:  
   - Scroll to the bottom of the left-hand menu in the **Settings** page.  
   - Click on **Developer Settings**.  

4. **Access Personal Access Tokens**:  
   - In the **Developer Settings** menu, find the **Personal Access Tokens** section.  
   - Select **Token (classic)** from the dropdown.  

5. **Generate New Token**:  
   - On the top-right corner, click **Generate new token**.  
   - From the dropdown, select **Generate new token (classic)**.  

6. **Fill Out the Token Form**:  
   - **Note**: Add a descriptive name for the token (e.g., *CMPT470 API Token*) to remember why it was created.  
   - **Expiration**: Choose a suitable expiration period (e.g., 7 days, 30 days, or custom).  
   - **Scopes**: Select the permissions the token will have. For this lecture, I chose following scopes:  
     - `repo`  
     - `workflow`  
     - `user`  
     - `audit_log`  
     - `project`  

7. **Generate and Save Your Token**:  
   - Scroll to the bottom and click **Generate Token**.  
   - Once generated, **copy the token immediately**. GitHub will not show it again, and you'll need to create a new token it if lost.

---

Now that you have your GitHub API token, you’re ready to connect to GitHub programmatically!

## Technique 1: Let's fetch issues from GitHub API and save them in CSV or JSON file

In [None]:
import csv
import requests
import json

REPOSITORIES = ["psf/black"]  # targeted repo
TOKEN = ""  # GitHub Personal Access Token
HEADERS = {"Authorization": f"token {TOKEN}"}  # header for the request
# additional params to access specific types of data
PARAMS = {"state": "closed", "since": "2022-01-01T00:00:00Z", "sort": "updated"}


def fetch_issues(repo, max_pages=2):
    csv_file = f'{repo.replace("/", "-")}-issues.csv'
    json_file = f'{repo.replace("/", "-")}-issues.json'
    issues_data = []
    page_count = 0
    url = f"https://api.github.com/repos/{repo}/issues"

    with open(csv_file, "w", newline="", encoding="utf-8") as file:
        writer = csv.writer(file)
        writer.writerow(["ID", "State", "Title", "Body", "Labels", "Created At", "Closed At"])

        while url and page_count < max_pages:
            response = requests.get(url, params=PARAMS, headers=HEADERS)
            if response.status_code != 200:
                raise Exception(f"Error {response.status_code}: {response.json()}")

            # looping through each issue
            for issue in response.json():
                if "pull_request" not in issue:  # Skip pull requests
                    labels = ",".join(label["name"] for label in issue.get("labels", []))
                    body = issue.get("body", "")[:10000]
                    row = [
                        issue["number"],
                        issue["state"],
                        issue["title"],
                        body,
                        labels,
                        issue["created_at"],
                        issue["closed_at"],
                    ]
                    # saving to csv
                    writer.writerow(row)
                    issues_data.append(
                        {
                            "ID": issue["number"],
                            "State": issue["state"],
                            "Title": issue["title"],
                            "Body": body,
                            "Labels": labels,
                            "Created At": issue["created_at"],
                            "Closed At": issue["closed_at"],
                        }
                    )
            # fetching all the issues links so that we can go one by one
            links = {
                rel.split("=")[1]: url.strip("<>")
                for link in response.headers.get("Link", "").split(",")
                for url, rel in [link.split(";")]
            }
            url = links.get("next")  # getting the next url to fetch
            page_count += 1

    # saving it to json
    with open(json_file, "w", encoding="utf-8") as jf:
        json.dump(issues_data, jf, indent=4)


for repository in REPOSITORIES:
    print(f"Processing issues repository: {repository}")
    fetch_issues(repository)
    print(f"Finished issues processing: {repository}")

Processing issues repository: psf/black
Finished issues processing: psf/black


## Technique 2: Let's fetch issues from GitHub API

In [3]:
# import requests
# import json

# # GitHub API endpoint for fetching issues from a public repository
# repo_owner = "tensorflow"  # Change to any repo you want
# repo_name = "tensorflow"
# api_url = f"https://api.github.com/repos/{repo_owner}/{repo_name}/issues"

# # Define parameters to filter issues
# TOKEN = ""  # GitHub Personal Access Token
# params = {"state": "open", "labels": "type:bug", "per_page": 5}  # Number of issues to fetch

# # Add headers (GitHub API requires a user-agent)
# headers = {"Accept": "application/vnd.github.v3+json", "Authorization": f"token {TOKEN}"}  # add toke

# # Send request to GitHub API
# response = requests.get(api_url, headers=headers, params=params)

# # Process the response
# if response.status_code == 200:
#     issues = response.json()
#     print(f"Fetched {len(issues)} issues from {repo_owner}/{repo_name}")

#     # Extract relevant fields
#     for issue in issues:
#         print(f"Issue ID: {issue.get('id')}")
#         print(f"Title: {issue.get('title')}")
#         print(f"Description: {issue.get('body', 'No description provided')}")
#         print(f"Labels: {[label['name'] for label in issue.get('labels', [])]}")
#         print(f"Created At: {issue.get('created_at')}")
#         print(f"Comments: {issue.get('comments')}")
#         print("=" * 80)

# else:
#     print(f"Failed to fetch data: {response.status_code}, {response.text}")

import requests
import json
import pandas as pd

# GitHub API Configuration
TOKEN = ""  # Add your GitHub Personal Access Token here (if needed)
headers = {"Accept": "application/vnd.github.v3+json"}
if TOKEN:
    headers["Authorization"] = f"token {TOKEN}"

# Define repository to fetch issues from (TensorFlow)
repos = ["tensorflow/tensorflow"]

# Define parameters to filter issues (fetch only open issues with "type:bug" label)
params = {"state": "open", "labels": "type:bug", "per_page": 493}

# List to store issues data
issues_list = []

# Fetch issues
for repo in repos:
    api_url = f"https://api.github.com/repos/{repo}/issues"
    response = requests.get(api_url, headers=headers, params=params)

    if response.status_code == 200:
        issues = response.json()
        print(f"Fetched {len(issues)} issues from {repo}")

        # Extract relevant fields
        for issue in issues:
            issues_list.append({
                "repository": repo,
                "title": issue.get("title"),
                "description": issue.get("body", "No description provided"),
                "labels": [label["name"] for label in issue.get("labels", [])],
                "created_at": issue.get("created_at"),
                "comments": issue.get("comments"),
                "reactions": issue.get("reactions", {}).get("total_count", 0),
                "url": issue.get("html_url")
            })
    else:
        print(f"Failed to fetch data from {repo}: {response.status_code}, {response.text}")

# Save the data to CSV and JSON
df = pd.DataFrame(issues_list)
df.to_csv("tensorflow_issues.csv", index=False)
df.to_json("tensorflow_issues.json", orient="records")

print("Issues data saved successfully!")


Fetched 100 issues from tensorflow/tensorflow
Issues data saved successfully!


## Resources used for this lecture:

1.   https://docs.github.com/en/rest?apiVersion=2022-11-28
2.   https://github.com/psf/black
3.   https://github.com/vercel/vercel
4.   https://github.com/trending/
5.   https://medium.com/analytics-vidhya/getting-started-with-github-api-dc7057e2834d
6.   https://seart-ghs.si.usi.ch/
https://blog.apify.com/python-github-api/

