# This notebook will help you register any Rasgo datasets checked into your Version Control system as 'verified' if they exist in the main branch of your repository

## Get Rasgo datasets and create YAML file representations for each

In [None]:
# imports
import pyrasgo
import requests
import yaml

In [None]:
# pyrasgo connection
api_key = "..."
rasgo = pyrasgo.connect(api_key)

In [None]:
# cd into your git dir 
%cd path/to/your/git/dir

In [None]:
# get all Rasgo Datasets
# alternatvely, just get a list of resource keys you want to set the "git_sync" status to False (i.e. Datasets awaiting approval in a PR)
datasets = rasgo.get.datasets(published_only=True, include_community=False)

# Rasgo dataset repo directory name
    # At the root of your directory, you should have a subdirectory that contains the YAML representations for your Rasgo 
    # dataset.
rasgo_ds_dir_name = "rasgo"

# generate and write out yaml file representation for each dataset
for ds in datasets:
    ds.generate_yaml(file_path=f"{rasgo_ds_dir_name}/{ds.resource_key}.yaml")

## Commit Rasgo YAML dataset representation to your repository

Next, you should:
1. Create a branch for your new dataset representations
2. Open a PR from your new branch to the repository's main branch
3. \[OPTIONAL\] Update your PR-ed datasets to show that they are currently waiting to be approved by setting the `git_sync` attribute to `False`

In [None]:
# Optional Step 3:
# Update dataset attributes so users know these datasets are in a PR and are currently waiting on review
for ds in datasets:
    rasgo.update.dataset(dataset=ds, attributes={'git_sync':'False'})

4. When PR-ed datasets have been reviewed and accepted, merge the PR to master/main

## Scan repository's main branch for datasets and set the `git_sync` attribute for each dataset found

Update Dataset attributes to show that changes have been accepted and approved for consumption by users by setting `git_sync` to `True`. Any dataset on the main/master branch is assumed to be valid and available for consumption.

This can be done either using the Bitbucket API or using standard git user access. 

### Git CLI Method
Use git commands to get dataset files

For getting your local env (or a jupyter env) set up with shh/https git creds, check [the bitbucket docs](https://support.atlassian.com/bitbucket-cloud/docs/set-up-an-ssh-key/)

In [None]:
%cd ~/path/to/your/git/dir
# Check out the main/master branch and make sure it's up to date
!git checkout master 
!git pull

import os 
# file names should be resource keys
resource_keys = [x.replace('.yaml', '') for x in  os.listdir(rasgo_ds_dir_name) if '.yaml' in x]

# If found, set dataset attribute for each
for ds_rk in resource_keys:
    ds = rasgo.get.dataset(resource_key=ds_rk)
    rasgo.update.dataset(dataset=ds, attributes={"git_sync": "True"})

### Bitbucket API Method
If you have to use the Bitbucket API, get a list of datasets to update.

In [None]:
# instructions on how to get API token

# Set up OAuth consumer in the bitbucket workspace [here](https://support.atlassian.com/bitbucket-cloud/docs/use-oauth-on-bitbucket-cloud/)

oauth_key = "YOUR OAUTH KEY"
oauth_secret = "YOUR OAUTH SECRET"

# Get authorization token form OAuth
# Bitbucket auth token access
data = {"grant_type": "client_credentials"}
token_request = requests.post("https://bitbucket.org/site/oauth2/access_token", data=data, auth=(oauth_key, oauth_secret))
api_token = token_request.json().get("access_token")

token: str = (
    f"Bearer {api_token}"
)

headers = {
    "Content-Type": "application/json",
    "Authorization": token
}


In [None]:
api_version = "2.0"

# For bitbucket cloud, use this server URL
bitbucket_cloud_api_host = f"https://api.bitbucket.org/{api_version}"

# Enter your repo's workspace name here. This needs to be the same workspace in which you set up the OAuth consumer in the cell above
bitbucket_workspace_name = "..."

# Enter your repo's slug here - if you don't know it, you can find it on your repository's page
bitbucket_repo_slug = "..."

# Choose the branch from which you'd like to get YAML files to set verification - the default assumption here is "main"
bitbucket_main_branch_name = "main"

get_url = f"{bitbucket_cloud_api_host}/repositories/{bitbucket_workspace_name}/{bitbucket_repo_slug}/src/{bitbucket_main_branch_name}/{rasgo_ds_dir_name}/"
resource_keys = []
while True:
    page = requests.get(get_url, headers=headers)

    if page.status_code != 200:
        raise ValueError("Issue retrieving data from biqtbucket API")

    results = page.json()["values"]
    # file names should be resource keys
    resource_keys.extend([p["path"].replace(".yaml", "") for p in results if p["path"].endswith(".yaml")])
    
    # this page wasn't empty, let's get the next page
    if results and (next_page_url := results.get("next")):
        get_url = next_page_url
    else:
        break


# If found, set dataset attribute for each
for ds_rk in resource_keys:
    ds = rasgo.get.dataset(resource_key=ds_rk)
    rasgo.update.dataset(dataset=ds, attributes={"git_sync": "True"})