<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>

# GitHub - Clone open branches from repository on local
<a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/GitHub/GitHub_Clone_repository.ipynb" target="_parent"><img src="https://naasai-public.s3.eu-west-3.amazonaws.com/Open_in_Naas_Lab.svg"/></a><br><br><a href="https://bit.ly/3JyWIk6">Give Feedbacks</a> | <a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/Naas/Naas_Start_data_product.ipynb" target="_parent">Generate Data Product</a>

**Tags:** #github #snippet #operations #repository #efficiency

**Author:** [Antonio Georgiev](www.linkedin.com/in/antonio-georgiev-b672a325b)

**Description:** Automates cloning of open branches from a GitHub repository to a local machine. 

**References:**
- [GitHub Documentation - Cloning a repository](https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/cloning-a-repository)

## Input

### Import libraries

In [1]:
import os
import naas
import pandas as pd
import requests
from pprint import pprint
import subprocess

### Setup Variables
- `repo_url`: URL of the repository to clone
- `output_dir`: Output directory to clone repo. If None, we will create a folder with the name of the repo
- `token`: [Generate a personal access token](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token)
- `owner`: owner of the repository
- `repository`: name of the repository

In [2]:
# Inputs
repo_url = "https://github.com/jupyter-naas/awesome-notebooks"

# Outputs
output_dir = None

# Setup variables for list branches with open PR
token = naas.secret.get(name="GITHUB_TOKEN") or "YOUR_GITHUB_TOKEN"
owner = "jupyter-naas" #Example for naas
repository = "awesome-notebooks" #Example for naas awesome-notebooks repository

## Model

### Identify missing repositories on local

In [3]:
def get_branches_with_open_prs(
    token,
    owner,
    repository
):
    url = f"https://api.github.com/repos/{owner}/{repository}/pulls"
    headers = {"Authorization": f"token {token}"}
    response = requests.get(url, headers=headers)
    pulls = response.json()
    
    branches_data = []
    
    for pull in pulls:
        branch = pull['head']['ref']
        creator = pull['user']['login']
        creation_date = pull['created_at']
        
        branches_data.append({
            'branch': branch,
            'creator': creator,
            'creation_date': creation_date
        })
    
    branches_df = pd.DataFrame(branches_data)
    return branches_df

branches_with_open_prs = get_branches_with_open_prs(token, owner, repository)

### Clone repository
Clone the repository from the given URL and create a local copy of it.

In [4]:
def clone_branch(repo_url, output_dir, branch_name):
    # Get GitHub owner and repo name
    owner = repo_url.split("https://github.com/")[-1].split("/")[0]
    repo_name = repo_url.split("/")[-1]
    
    # Add repo name with .git extension
    if not repo_name.endswith(".git"):
        repo_name = f"{repo_name}.git"
    repo = f"{owner}/{repo_name}"
        
    # Init output dir
    if not output_dir:
        output_dir = branch_name
    
    # Create output directoy
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
        
    # GitHub Action
    !cd '{output_dir}'
    !git clone git@github.com:'{repo}' '{output_dir}'
    print(f"✅ GitHub repo cloned: {output_dir}")
    return output_dir

### Clone the branches with open PRs that haven't been clones yet

In [None]:
for index, row in branches_with_open_prs.iterrows():
    branch_name = row['branch']
    if not os.path.exists(branch_name):
        output_dir = clone_branch(repo_url, None, branch_name)


Cloning into '2023-matplotlib-stacked-bar-chart-creation'...
remote: Enumerating objects: 29892, done.[K
remote: Counting objects: 100% (7553/7553), done.[K
remote: Compressing objects: 100% (1629/1629), done.[K
remote: Total 29892 (delta 6389), reused 6968 (delta 5917), pack-reused 22339[K
Receiving objects: 100% (29892/29892), 83.71 MiB | 17.94 MiB/s, done.
Resolving deltas: 100% (22792/22792), done.
Updating files: 100% (903/903), done.
✅ GitHub repo cloned: 2023-matplotlib-stacked-bar-chart-creation
Cloning into '2029-hubspot-list-communications-from-contact'...
remote: Enumerating objects: 29892, done.[K
remote: Counting objects: 100% (7565/7565), done.[K
remote: Compressing objects: 100% (1631/1631), done.[K
remote: Total 29892 (delta 6401), reused 6979 (delta 5927), pack-reused 22327[K
Receiving objects: 100% (29892/29892), 84.18 MiB | 7.29 MiB/s, done.
Resolving deltas: 100% (22769/22769), done.
Updating files: 100% (903/903), done.
✅ GitHub repo cloned: 2029-hubspot-li