# Managing publishers

## 1. Context
For a number of reasons, including special characters such as `&` and `'`, we store the publishers in CKAN in hex code. This notebook illustrates how to add/modify publishers.

In [None]:
import ckanapi
import codecs

## 2. Initialize CKAN

The `envs` dict is structured in the same manner as the `ckan_credentials_secret` variable in Airflow: what is needed for the RemoteCKAN object.

In [None]:
envs = {
    "dev": {
        "address": "https://ckanadmin0.intra.dev-toronto.ca/",
        "apikey": ""
    },
    "qa": {
        "address": "https://ckanadmin0.intra.qa-toronto.ca/",
        "apikey": ""
    },
    "prod": {
        "address": "https://ckanadmin0.intra.prod-toronto.ca/",
        "apikey": ""
    },
}

for env, args in envs.items():
    envs[env] = ckanapi.RemoteCKAN(**args)
    
ckan = envs["prod"]

## 3. Prep string conversion functions

In [None]:
def string_to_hex(s):
    return codecs.encode(s.encode('utf-8'), 'hex').decode("utf-8")

def hex_to_string(s):
    return codecs.decode(s, 'hex').decode('utf-8')

## 4. Logic for managing publishers

We store publishers in CKAN as vocab tags (vocabulary: `owner_division`) to ensure consistency.

### How to: add new publisher

1. Identify the vocabulary of publishers
2. Convert new publisher name to hex string
3. Add new tag to publisher vocab

### How to: Update existing publisher name

1. Identify the vocabulary of publishers
2. Convert new publisher name to hex string
3. Add new tag to publisher vocab
4. Patch packages with old publisher to use new one
5. Delete old tag

## 5. Example: changing publisher name

### 5.1. Identify vocab containing the publishers (name: `owner_division`)

In [None]:
publisher_vocab = None

for v in ckan.action.vocabulary_list():
    if v["name"] == "owner_division":
        publisher_vocab = v
        break

assert publisher_vocab is not None, "owner_division vocabulary not found"

### 5.2. Get all publishers from vocab

Get tags and translate from hex string using utils functions

In [None]:
encoded_publishers = [ x["name"] for x in publisher_vocab["tags"] ]

print(f"Example encoded publishers: {encoded_publishers[:2]}")


decoded_publishers = [ hex_to_string(x) for x in encoded_publishers ]

print(f"Example decoded publishers: {decoded_publishers[:2]}")

### 5.3. Check if new publisher is already in the list

For example:

In [None]:
new_publisher = "New publisher name" # for illustration only

assert new_publisher not in decoded_publishers, f"'{new_publisher}' already in list of publishers"

### 5.4. Convert new publisher to hex

In [None]:
encoded_new_publisher = string_to_hex(new_publisher)

print(f"New publisher: {new_publisher} | Converted publisher: {encoded_new_publisher}")

### 5.5. Create new publisher tag in vocab list

In [None]:
ckan.action.tag_create(
    vocabulary_id = publisher_vocab["id"]
    name = encoded_new_publisher
)

### 5.5. Loop through packages and update division where needed

In [None]:
previous_publisher = "Old publisher name"

assert previous_publisher in decoded_publishers, f"'{previous_publisher}' expected in owner_division vocab but not present"

for p in ckan.action.package_search(rows=100000)["results"]:
    if p["owner_division"] == previous_publisher:
        
        ckan.action.package_patch(
            id = p["id"], 
            owner_division = new_publisher
        )
        
        print(p["name"], " | ", previous_publisher, "==>", new_publisher)

### 5.6. Delete old division from vocab

In [None]:
tag_to_delete = [ t for t in publisher_vocab["tags"] if t["name"] == string_to_hex(previous_publisher) ][0]

ckan.action.delete_tag(id=tag_to_delete["id"])