# ScienceBase items to AGOL categories
## Create AGOL group categories from ScienceBase items

This code builds an [AGOL category schema](https://developers.arcgis.com/rest/users-groups-and-items/assign-category-schema.htm) from nested ScienceBase items. We send the schema to an AGOL group in order to create categories to which items can be assigned.

In [None]:
import sciencebasepy #https://github.com/usgs/sciencebasepy
from arcgis.gis import GIS
import requests
import re
import json
from owslib.wms import WebMapService  #https://geopython.github.io/OWSLib/
import stringcase #https://pypi.org/project/stringcase/
import urllib3 # to suppress warnings about lack of certificate verification 
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

Set the variables below as necessary for your situation

In [None]:
# ScienceBase id (SB_ID) in this example points to the ASC Data Backup Community
#sb_id = '56b3ee22e4b0cc79997fb64b'
sb_id = '5bf322ede4b045bfcae0c371'

# set the id of the AGOL group to which you want to add the category schema
agol_id = '69f3ec79b0b944dcadfa4cf9003371e3'

# SB login parameters
sb_user = ''
sb_pw = ''

# AGOL login parameters
ag_user = ''
ag_pw = '' 

The assignCategorySchema operation is called by sending information to a special URL. In the case of USGS AGOL groups, it takes the form below:

In [None]:
cs_url = 'http://usgs.maps.arcgis.com/sharing/rest/community/groups/{}/assignCategorySchema'.format(agol_id)

Functions:

In [None]:
def sb_url(id):
    #build a url to a ScienceBase item when the id is known
    return 'https://www.sciencebase.gov/catalog/item/{}'.format(id)

In [None]:
def convert(name):
    # changes a name like aerialImagery to Aerial Imagery
    s1 = re.sub('(.)([A-Z][a-z]+)', r'\1 \2', name)
    s2 = re.sub('([a-z0-9])([A-Z])', r'\1 \2', s1)
    s3 = s2.replace('  ', ' ')
    return stringcase.capitalcase(s2)

In [None]:
def get_node(node_id):
    temp_obj = {}
    # recursively goes down all of the branches from the root id building out the category schema
    # https://developers.arcgis.com/rest/users-groups-and-items/group-category-schema.htm
    item_json = sb.get_json(sb_url(node_id))
    if item_json['hasChildren']:
        temp_obj['title'] = convert(item_json['title'])
        temp_obj['categories'] = [get_node(child_id) for child_id in sb.get_child_ids(node_id)]
        
    return clean_empty(temp_obj)

In [None]:
def clean_empty(d):
    # get_node can produce empty json keys and values and I don't know how to fix that there, so we'll build
    # a dirty dictionary and then clean it
    if not isinstance(d, (dict, list)):
        return d
    if isinstance(d, list):
        return [v for v in (clean_empty(v) for v in d) if v]
    return {k: v for k, v in ((k, clean_empty(v)) for k, v in d.items()) if v}
        

In [None]:
def prune(tree, max, current=0):
    # AGOL category schemas can only be 4 levels deep. Use this function to trim the depth of the 
    # nested lists of dictionaries before turning it into json
    for n in tree:
        if 'categories' in n:
            if current == max:
                del n['categories']
            else:
                prune(n['categories'], max, current + 1)
        else:
            pass

In [None]:
def sort_list(foo):
    foo.sort(key=lambda k: k['title'])
    for n in foo:
        if 'categories' in n:
            sort_list(n['categories'])

In [None]:
#returns ssl value and user token
def getToken(user, pw):
        data = {'username': user,
            'password': pw,
            'referer' : 'https://www.arcgis.com',
            'f': 'json'}
        url  = 'https://www.arcgis.com/sharing/rest/generateToken'
        jres = requests.post(url, data=data, verify=False).json()
        return jres['token'],jres['ssl']

Start by starting a ScienceBase session. It is not necessary to log in for public items, but the connection seems more robust if you do. Use `sb.loginc(sb_user)` to log in interactively if you don't want to save the password in the script.

In [None]:
sb = sciencebasepy.SbSession()
sb.login(sb_user, sb_pw)

The first step is to walk the childern of the parent item, building nested python dictionaries.

In [None]:
sb_items = get_node(sb_id)

Check the results.

In [None]:
print(json.dumps(sb_items, indent=2, separators=(',', ': ')))

Now, we have to clean up this list of dictionaries a bit. First, `get_node` returns all of the child dictionaries nested under a parent dictionary where the key 'title' is the name of the repo itself. In this example, I don't actually want that title to become a group category over at AGOL. I just want the titles of the children to become groups, so we re-write `sb_items` to be just `sb_items['categories']`

In [None]:
sb_items = sb_items['categories']

Second, make sure all lists of dictionaries are sorted by `'title'`

In [None]:
sort_list(sb_items)

Check the results

In [None]:
print(json.dumps(sb_items, indent=2, separators=(',', ': ')))

Third, Prune the tree depth, if necessary, to meet AGOL requirements. Category schemas can only be 4 levels deep.

In [None]:
prune(sb_items, 3)

Check the results

In [None]:
print(json.dumps(sb_items, indent=2, separators=(',', ': ')))

Lastly, wrap everything into a categorySchema dictionary container and append the list of dictionaries. This is how AGOL wants the JSON formatted

In [None]:
cs =  {'categorySchema': [{'title': 'Categories', 'categories':[]}]}
for obj in sb_items:
    cs['categorySchema'][0]['categories'].append(obj)
cs_json = json.dumps(cs)

Check the results

In [None]:
print(json.dumps(cs, indent=2, separators=(',', ': ')))

Log in to AGOL by getting a token. Sorry about the InsecureRequestWarning. I don't know how to deal with that except to suppress it.

In [None]:
token = getToken(ag_user, ag_pw)

Create the parameters for our POST request

In [None]:
params = {
    'f': 'json',
    'token': token[0],
    'categorySchema': cs_json
}

And try to upload the schema. This will clobber any schema that currently exists at the group

In [None]:
data = requests.post(cs_url, params=params)

Print out the results of the operation.

In [None]:
print(data.json())