## **Goals for this notebook:**

*   Collect GitHub issue data.
*   Create a space with the data on Mantis.

Thank you for joining the Mantis IAP! This workshop is focused on creating Mantis spaces with data from GitHub: a platform used by developers to collaborate on and release software. For those who are new to GitHub, you can create an account [here](https://github.com/signup).

In this workbook, we will collect **issues** from a GitHub repository (storage for software) and organization to visualize in Mantis. Issues are a way that companies, collaborative open-source projects, and other organizations keep track of bugs and feature requests in software they host on GitHub.

If you have access to a GitHub repository, you have the ability to add an issue to it, which you can choose to fix yourself or leave for someone else to assign themselves to. You can think of issues like a way of organizing open tasks that developers can take on when working on software.

After we create a dataset of issues, we will upload it to Mantis to navigate a cartography of issues and set up automations. You will have the ability to test out this feature with your/an open-source GitHub organization or repository.

##**Step 1: Add your GitHub authentication**

Once you use the link above to create an account, you will need to get a Personal Access Token to collect data from GitHub. This will be used in API calls for collecting GitHub issues (from private repos/orgs). **It is recommended to use authentication even if you are accessing public repositories (to avoid 401 errors or rate limiting).**

### **How to get your GitHub PAT**

1) Go to https://github.com/settings.
2) Scroll down and click on "Developer Settings" in the left sidebar.
3) Click on "Personal access tokens" in the left sidebar, and click "Fine-grained tokens"
4) In the "Fine-grained tokens"
5) Find and click the button, "Generate new token"
6) Give the token a name, and select when you want the token to expire (7 days is good if you only want to use experiment with this notebook this week).
7) Set Repository access to "All repositories."
8) Click "Add permissions." Add "Issues." If "Issues" doesn't show up, you may add "Issue Fields" or "Issue Types."
9) **Add your fine-grained GitHub PAT below once you generate it, and click the run button so it saves to your notebook:**


In [12]:
GITHUB_PAT = "github_pat_rest-of-your-PAT-here" # Set to "github_pat_rest-of-your-PAT-here" if specifying none (may cause issues)

### **Step 2: Add your org or repository**

**Once you edit the variables below, click run to save them to your notebook.**


In [16]:

USING_ORG = False # Set to False if you want to access issues from a specific repository.
GITHUB_LINK = "https://github.com/microsoft/vscode" # You can set this to the GitHub link for an organization (e.g., https://github.com/microsoft) or a repo (e.g., https://github.com/microsoft/vscode)
MAX_PAGES = 100 # Max number of pages to scrape issues from (feel free to adjust)

### **Step 3: Run the following to generate a CSV of issues**

In [None]:
import requests
import pandas as pd
import time

PLACEHOLDER = "github_pat_rest-of-your-PAT-here"
headers = {"Accept": "application/vnd.github.v3+json"}

if GITHUB_PAT != PLACEHOLDER and GITHUB_PAT != "":
    headers["Authorization"] = f"token {GITHUB_PAT}"

parts = GITHUB_LINK.strip("/").split("/")
owner = parts[-2] if not USING_ORG else parts[-1]
repo = parts[-1] if not USING_ORG else None

if USING_ORG:
    url = f"https://api.github.com/orgs/{owner}/issues?filter=all&state=all"
else:
    url = f"https://api.github.com/repos/{owner}/{repo}/issues?state=all"

issues_list = []
page = 1

while True:
    response = requests.get(f"{url}&page={page}", headers=headers)

    if response.status_code == 403:
        print("Error: Rate limit exceeded or access denied. A valid PAT is required.")
        break
    elif response.status_code != 200:
        print(f"Error: {response.status_code} - {response.text}")
        break

    data = response.json()
    if not data:
        break

    for issue in data:
        if "pull_request" not in issue:
            issues_list.append({
                "title": issue.get("title"),
                "number": issue.get("number"),
                "user": issue.get("user", {}).get("login"),
                "state": issue.get("state"),
                "created_at": issue.get("created_at"),
                "body": issue.get("body"),
                "labels": ", ".join([l['name'] for l in issue.get("labels", [])])
            })

    print(f"Collecting page {page}...")
    page += 1
    time.sleep(0.1)
    if page > MAX_PAGES:
        break

if issues_list:
    df = pd.DataFrame(issues_list)
    df.to_csv("github_issues.csv", index=False)
    print(f"Done. github_issues.csv created with {len(df)} rows.")
else:
    print("Error: No data was collected.")

### **Step 4: Get the CSV**

Once the above script is done running, you can download the CSV you get of GitHub issues by clicking the Folder icon in the left sidebar of Google Colab. If you are running it locally, be prepared to upload your CSV to Mantis.