# Fetch Issues from GitHub Repositories
Per [issue-2](https://github.com/mishasinitcyn/RepoCleanup-backend/issues/2), the purpose of this notebook is to fetch GitHub issues directly from various repositories in hopes of using labelled entries in the corpus.

In [11]:
import pandas as pd
import numpy as np
import requests
from dotenv import load_dotenv
import os
import requests

In [2]:
DATA_PATH = os.path.join("..", 'data')
CORPUS_FILENAME = 'corpus.csv'

## Fetch issues

In [41]:
def fetch_github_issues(repo_url: str):
    repo_path = "/".join(repo_url.split("https://github.com/")[1].split("/issues")[0].split("/"))
    api_url = f"https://api.github.com/repos/{repo_path}/issues"
    
    response = requests.get(api_url)
    if response.status_code == 200:
        issues_data = response.json()
        filtered_issues = [
            {
                "issue_id": issue["id"], 
                "issue_title": issue["title"], 
                "issue_body": issue["body"], 
                "issue_labels": [label["name"] for label in issue["labels"]],
                "user_name": issue["user"]["login"], 
                "user_id": issue["user"]["id"], 
                "user_is_admin": issue["user"]["site_admin"],
                "created_at": issue["created_at"]
            }
            for issue in issues_data
        ]
        return {"issues": filtered_issues}
    else:
        print("Error fetching issues from GitHub")
        return None


In [42]:
REPO_URL = "https://github.com/mishasinitcyn/RepoCleanup-backend" # Placeholder repo

# Fetch issues as JSON object
issues_data = fetch_github_issues(REPO_URL)

# Convert JSON object into pandas dataframe
issues_df = pd.DataFrame(issues_data['issues'])
issues_df.head()

## Data analysis
- How many unique issue_label categories does a given repository have?
- Is it possible to automatically map the label categories to our label categories `[bug,feature,spam,discussion,question]`?

### Unique labels

In [36]:
def get_unique_labels(df):
    all_labels = [label for labels in df['issue_labels'] for label in labels]
    unique_labels = pd.Series(all_labels).unique()
    print("Unique labels:", unique_labels)

In [37]:
get_unique_labels(issues_df)

Unique labels: ['data']
