# CPDCTL Samples for Notebooks and Environments in Spaces

CPDCTL is a command-line interface (CLI) you can use to manage the lifecycle of notebooks. By using the notebook CLI, you can automate the flow for creating notebooks and running notebook jobs, as well as promoting notebooks from a project to a space.   

This notebook begins by showing you how to install and configure CPDCTL and is then split up into three sections with examples of how to use the commands for:

- Creating notebooks and running notebook jobs
- Creating code packages and running code package jobs
- Promoting notebooks from a project to a space

## Table of Contents

[1. Installing and configuring CPDCTL](#part1)
- [1.1 Installing the latest version of CPDCTL](#part1.1)
- [1.2 Adding CPD cluster configuration settings](#part1.2)

[2. Demo 1: Creating a notebook asset and running a job](#part2)
- [2.1 Creating a notebook asset](#part2.1)
- [2.2 Running a job](#part2.2)

[3. Demo 2: Creating a code package asset and running a job](#part3)
- [3.1 Creating a code package asset](#part3.1)
- [3.2 Running a job](#part3.2)

[4. Demo 3: Promoting a notebook from a project to a space](#part4)

## Before you begin
Import the following libraries:

In [1]:
import base64
import json
import os
import requests
import platform
import tarfile
import zipfile
from IPython.core.display import display, HTML

##  1. Installing and configuring CPDCTL <a class="anchor" id="part1"></a>

### 1.1 Installing the latest version of CPDCTL <a class="anchor" id="part1.1"></a>

To use the notebook and environment CLI commands, you need to install CPDCTL. Download the binary from the [CPDCTL GitHub respository](https://github.com/IBM/cpdctl/releases).

Download the binary and then display the version number:

In [2]:
PLATFORM = platform.system().lower()
CPDCTL_ARCH = "{}_amd64".format(PLATFORM)
CPDCTL_RELEASES_URL="https://api.github.com/repos/IBM/cpdctl/releases"
CWD = os.getcwd()
PATH = os.environ['PATH']
CPD_CONFIG = os.path.join(CWD, '.cpdctl.config.yml')

response = requests.get(CPDCTL_RELEASES_URL)
assets = response.json()[0]['assets']
platform_asset = next(a for a in assets if CPDCTL_ARCH in a['name'])
cpdctl_url = platform_asset['url']
cpdctl_file_name = platform_asset['name']
        
response = requests.get(cpdctl_url, headers={'Accept': 'application/octet-stream'})
with open(cpdctl_file_name, 'wb') as f:
    f.write(response.content)
    
display(HTML('<code>cpdctl</code> binary downloaded from: <a href="{}">{}</a>'.format(platform_asset['browser_download_url'], platform_asset['name'])))

In [3]:
%%capture

%env PATH={CWD}:{PATH}
%env CPD_CONFIG={CPD_CONFIG}

In [4]:
if cpdctl_file_name.endswith('tar.gz'):
    with tarfile.open(cpdctl_file_name, "r:gz") as tar:
        tar.extractall()
elif cpdctl_file_name.endswith('zip'):
    with zipfile.ZipFile(cpdctl_file_name, 'r') as zf:
        zf.extractall()

if CPD_CONFIG and os.path.exists(CPD_CONFIG):
    os.remove(CPD_CONFIG)
    
version_r = ! cpdctl version
CPDCTL_VERSION = version_r.s

print("cpdctl version: {}".format(CPDCTL_VERSION))

cpdctl version: 1.1.132


### 1.2  Adding CPD cluster configuration settings <a class="anchor" id="part1.2"></a>

Before you can use CPDCTL, you need to add configuration settings. You only need to configure these settings once for the same IBM Cloud Pak for Data (CPD) user and cluster. Begin by entering your CPD credentials and the URL to the CPD cluster:

In [5]:
CPD_USER_NAME = 'dhshi'
CPD_USER_PASSWORD = 'passw0rd'
CPD_URL = 'https://cpd-cpd-instance.apps.cp4d404ugi.cp.fyre.ibm.com'

Add "cpd_user" user to the cpdctl configuration:

In [6]:
! cpdctl config user set cpd_user --username {CPD_USER_NAME} --password {CPD_USER_PASSWORD}

Add "cpd" cluster to the cpdctl configuration:

In [7]:
! cpdctl config profile set cpd --url {CPD_URL}

Add "cpd" context to the cpdctl configuration:

In [8]:
! cpdctl config context set cpd --profile cpd --user cpd_user

List available contexts:

In [9]:
! cpdctl config context list

[1mName[0m                          [1mProfile[0m                       [1mUser[0m                       [1mCurrent[0m   
[36;1minClusterEnvironmentContext[0m   inClusterEnvironmentProfile   inClusterEnvironmentUser   *   


Switch to the context you just created if it is not marked in the `Current` column:

In [10]:
! cpdctl config context use inClusterEnvironmentContext

Switched to context "inClusterEnvironmentContext".


List available spaces in context:

In [12]:
! cpdctl space list

...
[1mID[0m                                     [1mName[0m                                                 [1mCreated[0m                    [1mDescription[0m                           [1mState[0m    [1mTags[0m   
[36;1mf1c213be-597f-4c32-bcd5-9e4344ead75c[0m   AutoAI-TD-Sub-Deployment-Space                       2022-01-25T01:59:19.769Z                                         active   []   
[36;1m83de1cda-6129-4f97-a464-b71ec5224393[0m   r-shiny-test-space                                   2022-03-23T22:04:12.062Z                                         active   []   
[36;1mef5ae6e4-17be-4e46-af04-6a8f6ace3eae[0m   julian-test                                          2022-03-29T09:00:24.454Z                                         active   []   
[36;1mcd4e2877-1afc-4d02-a2cb-386020912e44[0m   openscale-express-path-00000000-0000-0000-0000-16…   2022-03-30T15:10:33.888Z                                         active   []   
[36;1m2e1d6881-c039-4894-8fa2-220a01c9d4a

Choose the space in which you want to work:

In [13]:
result = ! cpdctl space list --output json -j "(resources[].metadata.id)[0]" --raw-output
space_id = result.s
print("space id: {}".format(space_id))

# You can also specify your space id directly:
space_id = "e5a47063-9735-48fc-bd1a-45ed88ebf425"
print("space id: {}".format(space_id))

space id: f1c213be-597f-4c32-bcd5-9e4344ead75c
space id: e5a47063-9735-48fc-bd1a-45ed88ebf425


## 2. Demo 1: Creating a notebook asset and running a job <a class="anchor" id="part2"></a>

Before starting with this section, ensure that you have run the cells in [Section 1](#part1) and specified the ID of the space in which you will work.

Suppose you have a Jupyter Notebook (.ipynb) file on your local system and you would like to run the code in the file as a job on a CPD cluster. This section shows you how to create a notebook asset and run a job on a CPD cluster. 

### 2.1 Creating a notebook asset<a class="anchor" id="part2.1"></a>

First of all, you need to create a notebook asset in your space. To create a notebook asset you need to specify:
- The environment in which your notebook is to run
- A notebook file (.ipynb).

List all the environments in your space, filter them by their display name and get the ID of the environment in which your notebook will be run:

In [14]:
environment_name = "Default Python 3.8"
query_string = "(resources[?entity.environment.display_name == '{}'].metadata.asset_id)[0]".format(environment_name)

In [15]:
result = ! cpdctl environment list --space-id {space_id} --output json -j "{query_string}" --raw-output
env_id = result.s
print("environment id: {}".format(env_id))

# You can also specify your environment id directly:
# env_id = "Your environment ID"

environment id: jupconda38-e5a47063-9735-48fc-bd1a-45ed88ebf425


Upload the .ipynb file:

In [18]:
remote_file_path = "notebook/cpdctl-test-notebook.ipynb"
local_file_path = "my-new-notebook.ipynb"

In [19]:
! cpdctl asset file upload --path {remote_file_path} --file {local_file_path} --space-id {space_id}

...
[32;1mOK[0m


Create a notebook asset:

In [20]:
file_name = "cpdctl-test-notebook.ipynb"
runtime = {
    'environment': env_id
}
runtime_json = json.dumps(runtime)

In [21]:
result = ! cpdctl notebook create --file-reference {remote_file_path} --name {file_name} --space {space_id} --runtime '{runtime_json}' --output json -j "metadata.asset_id" --raw-output
notebook_id = result.s
print("notebook id: {}".format(notebook_id))

notebook id: 75dcacd5-30f3-4f44-8f5b-7a0973ba9e4b


### 2.2 Running a job <a class="anchor" id="part2.2"></a>

To create a notebook job, you need to give your job a name, add a description, and pass the notebook ID and environment ID you determined in [2.1](#part2.1). Additionally, you can add environment variables that will be used in your notebook:

In [22]:
job_name = "cpdctl-test-job"
job = {
    'asset_ref': notebook_id, 
    'configuration': {
        'env_id': env_id, 
        'env_variables': [
            'foo=1', 
            'bar=2'
        ]
    }, 
    'description': 'my job', 
    'name': job_name
}
job_json = json.dumps(job)

In [23]:
result = ! cpdctl job create --job '{job_json}' --space-id {space_id} --output json -j "metadata.asset_id" --raw-output
job_id = result.s
print("job id: {}".format(job_id))

job id: 20a33a05-e5bc-495d-bc44-19a7c71945f2


Run a notebook job:

In [24]:
job_run = {
    'configuration': {
        'env_variables': [
            'key1=value1', 
            'key2=value2'
        ]
    }
}
job_run_json = json.dumps(job_run)

In [25]:
result = ! cpdctl job run create --space-id {space_id} --job-id {job_id} --job-run '{job_run_json}' --output json -j "metadata.asset_id" --raw-output
run_id = result.s
print("run id: {}".format(run_id))

run id: 8d485b2b-f027-4559-9879-dbd5fb258fcb


You can see the output of each cell in your .ipynb file by listing job run logs:

In [26]:
! cpdctl job run logs --job-id {job_id} --run-id {run_id} --space-id {space_id}

...

Cell 6:

Cell 9:
cpdctl version: 1.1.132

Cell 19:
[1mName[0m                          [1mProfile[0m                       [1mUser[0m                       [1mCurrent[0m   
[36;1minClusterEnvironmentContext[0m   inClusterEnvironmentProfile   inClusterEnvironmentUser   *   

Cell 21:
Switched to context "inClusterEnvironmentContext".

Cell 23:
...

[1mID[0m                                     [1mName[0m                                                [1mCreated[0m                    [1mDescription[0m                                          [1mTags[0m   
[36;1m0619c2d3-2b75-42f7-97c1-9d898fdef44c[0m   Mortgage default project                            2022-02-11T12:02:58.633Z                                                        []   
[36;1m19a29ada-3e9c-4147-bf35-4cf431ed3a26[0m   AutoAI-TD-Sub                                       2022-01-25T01:20:57.078Z   The classification goal to train a model that can…   []   
[36;1m25fc9237-bda9-42f9-ab83-124bb92b3

## 3. Demo 2: Creating a code package asset and running a job <a class="anchor" id="part3"></a>

Before starting with this section, ensure that you have run the cells in [Section 1](#part1) and specified the ID of the space in which you will work.

A code package is a way of organizing a set of dependent files in a folder structure. For example, a code package can contain a notebook file that calls other notebook files or functions in script files.

Suppose you have a ZIP file of this folder structure on your local system and would like to run the code in the folder as a job on a CPD cluster. This section shows you how to create and register a code package asset in a deployment space and run the files in the code package asset as a job.

### 3.1 Creating a code package asset<a class="anchor" id="part3.1"></a>

Upload the .zip file:

In [31]:
remote_file_path = "code_package/cpdctl-test-code-package.zip"
local_file_path = "JupyterLabs-R-studio-Git-1-master.zip"

In [32]:
! cpdctl asset file upload --path {remote_file_path} --file {local_file_path} --space-id {space_id}

...
[32;1mOK[0m


Create a code package asset. The code package asset has the same name as the ZIP file.

In [33]:
os.environ["CPDCTL_ENABLE_CODE_PACKAGE"] = "true"

In [34]:
file_name = "cpdctl-test-code-package.zip"

In [35]:
result = ! cpdctl code-package create --file-reference {remote_file_path} --name {file_name} --space-id {space_id} --output json -j "metadata.asset_id" --raw-output
code_package_id = result.s
print("code package id: {}".format(code_package_id))

code package id: 7a4f781a-54da-454b-bc71-e067da915752


### 3.2 Running a job <a class="anchor" id="part3.2"></a>

List all the environments in your space, filter them by their display name and get the ID of the environment in which your code package will be run:

In [36]:
environment_name = "Default Python 3.8"
query_string = "(resources[?entity.environment.display_name == '{}'].metadata.asset_id)[0]".format(environment_name)

In [37]:
result = ! cpdctl environment list --space-id {space_id} --output json -j "{query_string}" --raw-output
env_id = result.s
print("environment id: {}".format(env_id))

# You can also specify your environment id directly:
# env_id = "Your environment ID"

environment id: jupconda38-e5a47063-9735-48fc-bd1a-45ed88ebf425


To create a code package job, you need to give your job a name, add a description, set an entrypoint and pass the code package ID and the environment ID. Additionally, you can add environment variables that will be used in your notebook:

In [38]:
job_name = "cpdctl-test-code-package-job"
job = {
    'asset_ref': code_package_id, 
    'configuration': {
        'env_id': env_id, 
        'env_variables': [
            'foo=1', 
            'bar=2'
        ],
        'entrypoint': "test.ipynb"
    }, 
    'description': 'my code package job', 
    'name': job_name
}
job_json = json.dumps(job)

In [39]:
result = ! cpdctl job create --job '{job_json}' --space-id {space_id} --output json -j "metadata.asset_id" --raw-output
job_id = result.s
print("job id: {}".format(job_id))

job id: 729ddec8-2d52-4fdf-87cb-6557497b3436


Run a code packge job:

In [40]:
job_run = {
    'configuration': {
        'env_variables': [
            'key1=value1', 
            'key2=value2'
        ]
    }
}
job_run_json = json.dumps(job_run)

In [41]:
result = ! cpdctl job run create --space-id {space_id} --job-id {job_id} --job-run '{job_run_json}' --output json -j "metadata.asset_id" --raw-output
run_id = result.s
print("run id: {}".format(run_id))

run id: FAILED                      ID:            ad402ef0-8e25-40cd-84d0-0e5564a89354    Name:          Notebook Job    Created:       2022-04-03T22:14:31Z    Description:       State:         Failed    Tags:          []   


You can see the output of each cell in your .ipynb file by listing job run logs:

In [50]:
! cpdctl job run logs --job-id {job_id} --run-id {run_id} --space-id {space_id}

...

Cell 1:
0
1
2
3
4




## 4. Demo 3: Promoting a notebook from a project to a space <a class="anchor" id="part4"></a>

Before starting with this section, ensure that you have run the cells in [Section 1](#part1) and specified the ID of the space in which you will work.

Suppose you have a notebook in a project and would like to promote a specific version of this notebook to a space. This section shows you how to promote a notebook from a project to a space on a CPD cluster.

Choose a project from which you will promote your notebook:

In [51]:
result = ! cpdctl project list --output json -j "(resources[].metadata.guid)[0]" --raw-output
project_id = result.s
print("project id: {}".format(project_id))

# You can also specify your project id directly:
# project_id = "Your project ID"

project id: 0f5a1f58-7fdc-4a34-ad75-28c5b122758a


Specify the notebook you would like to promote:

In [52]:
result = ! cpdctl asset search --type-name notebook --query "asset.asset_type:notebook" --project-id {project_id} --output json -j "(results[].metadata.asset_id)[0]" --raw-output
notebook_id_in_project = result.s
print("notebook id in project: {}".format(notebook_id_in_project))

# You can also specify your notebook id in project directly:
# notebook_id_in_project = "Your notebook ID in project"

notebook id in project: 8ead5d49-0a5d-4325-9017-996c3bf40245


Create a version for your notebook if it has not any version and get its corresponding revision id:

In [58]:
result = ! cpdctl notebook version create --notebook-id {notebook_id_in_project} --output json -j "entity.rev_id" --raw-output
revision_id = result.s
print("revision id: {}".format(revision_id))

revision id: 7


Or specify an existing revision of the notebook:

In [59]:
result = ! cpdctl notebook version list --notebook-id {notebook_id_in_project} --output json -j "(resources[].entity.rev_id)[0]" --raw-output
revision_id = result.s
print("revision id: {}".format(revision_id))

# You can also specify your revision id directly:
# revision_id = "Your revision ID"

revision id: 7


Promote the notebook to the space. The parameters `name` and `description` are optional. If they are not specified, the name and description of the original notebook in the project will be used.

In [60]:
notebook_name = "cpdctl_test_promote"
notebook_description = "cpdctl test promote"
request_body = {
    'space_id': space_id,
    'metadata': {
        'name': notebook_name,
        'description': notebook_description
    }
}
request_body = json.dumps(request_body)

In [61]:
result = ! cpdctl asset promote --asset-id {notebook_id_in_project} --revision-id {revision_id} --project-id {project_id} --request-body '{request_body}'
# verify that the notebook has been promoted into the space
result = ! cpdctl asset search --space-id {space_id} --type-name notebook --query asset.name:{notebook_name} --output json -j "(results[].metadata.asset_id)" --raw-output
notebook_id_in_space = result.s
print("notebook id in space: {}".format(notebook_id_in_space))

notebook id in space: [   "20377bfa-4cb8-4a98-8e9b-94e83817daae" ]


Copyright © 2021 IBM. This notebook and its source code are released under the terms of the MIT License.