# IBM Cloud Pak for Data Data Virtualization Lab Setup

### Where to find this notebook online
You can find a copy of this notebook at https://github.com/Db2-DTE-POC/CPDDVLAB.

### What is notebook does
This notebook is a operational guide to prepare for a digital bootcamp or hands on lab for a team of people. It uses an existing IBM Cloud Pak for Data Cluster that has Data Connections as we as virtual tables and views created for the admin user. 

All the code in this notebook runs using the main **admin** userid.

The notebook includes instructions on how to complete some steps directly in the IBM Cloud Pak for Data console. It also includes Python code that creates users, manages privledges, and tests that all the required tables and views work.

The notebooks includes three sections:
1. Utility python routines that can be used to check the status of the system
2. Setup of the main project, creation of users, granting privledges, and testing tables and views
3. Teardown, including deleting any user created objects, revoking privledges, deleting users and the main project

#### RESTful Services
IBM Cloud Pak for Data is built on a set of microservices that communicate with each other and the Console user interface using RESTful APIs. You can use these services to automate anything you can do throught the user interface.

This Jupyter Notebook contains examples of how to use the Open APIs to retrieve information from the virtualization service, how to run SQL statements directly against the service through REST and how to provide authoritization to objects. This provides a way write your own script to automate the setup and configuration of the virtualization service.

The next part of the lab relies on a set of base classes to help you interact with the RESTful Services API for IBM Cloud Pak for Data Virtualization. You can access this library on GITHUB. The commands below download the library and run them as part of this notebook.
<pre>
&#37;run CPDDVRestClass.ipynb
</pre>
The cell below loads the RESTful Service Classes and methods directly from GITHUB. Note that it will take a few seconds for the extension to load, so you should generally wait until the "Db2 Extensions Loaded" message is displayed in your notebook. 
1. Click the cell below
2. Click **Run**

In [None]:
!wget -O CPDDVRestClass.ipynb https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/CPDDVRestClass.ipynb
%run CPDDVRestClass.ipynb

## Establishing a Connection to the Console

### Connections
To connect to the Data Virtualization service you need to provide the URL, the service name (v1) and profile the console user name and password. The next cell connects to the console from inside the IBM CPD Cluster.

In [None]:
# Connect to the Db2 Data Management Console service

# From Outside the Cluster
Console  = 'https://services-uscentral.skytap.com:9152'
# From Inside the Cluster
# Console  = 'https://openshift-skytap-nfs-lb.ibm.com'
user     = 'admin'
password = 'xxxx'

# Set up the required connection
CPDAPI = Db2(Console)
api = '/v1'
CPDAPI.authenticate(api, user, password)
database = Console

## Utility Routines

### Data Sources
The next cell calls a restful service that displays all the currently configured data connections for data virtualization.

The following Python function (getDataSources) runs SQL against the **QPLEXSYS.LISTRDB** catalog table and combines it with a stored procedure call **QPLEXSYS.LISTRDBCDETAILS()** to add the **AVAILABLE** column to the results. The IBM Cloud Pak for Data Virtualization Service checks each data sources every 5 to 10 seconds to ensure that it is still up and available. In the table (DataFrame) in the next cell a **1** in the **AVAILABLE** column indicates that the data source is responding. A **0** indicdates that it is not longer responding. 

In [None]:
# Display the Available Data Sources already configured
dataSources = CPDAPI.getDataSources()
display(dataSources)

#### Run SQL through the SQL Editor Service
You can also use the SQL Editor service to run your own SQL. Statements are submitted to the editor. Your code then needs to poll the editor service until the script is complete. Fortunately you can use the DB2 class included in this lab so that it becomes a very simple Python call. The **runScript** routine runs the SQL and the **displayResults** routine formats the returned JSON. 

In [None]:
CPDAPI.displayResults(CPDAPI.runScript('SELECT * FROM TRADING.MOVING_AVERAGE'))

### Virtualized Tables and Views
The next two cells are useful to determine all the virtualized data availble to the admin user and the objects available by role.

In [None]:
# Display the Virtualized Assets Avalable to Engineers and Users
roles = ['DV_ENGINEER','DV_USER']
for role in roles:
    r = CPDAPI.getRole(role)
    if (CPDAPI.getStatusCode(r)==200):
        json = CPDAPI.getJSON(r)
        df = pd.DataFrame(json_normalize(json['objects']))
        print(', '.join(list(df)))
        display(df)
    else:
        print(CPDAPI.getStatusCode(r))  

In [None]:
### Display All Virtualized Tables and Views
display(CPDAPI.getVirtualizedTablesDF())
display(CPDAPI.getVirtualizedViewsDF())

### Cloud Pak for Data User Management
The next two cells can be used to list existing CPD users and add a new user to the system.

In [None]:
# Get the list of CPD Users
r = CPDAPI.getUsers()
if (CPDAPI.getStatusCode(r)==200):
    json = CPDAPI.getJSON(r)
    df = pd.DataFrame(json_normalize(json))
    print(', '.join(list(df)))
    display(df[['uid','username','displayName']])
else:
    print(CPDAPI.getStatusCode(r))

In [None]:
# Add a Single user to CPD
username = "LABUSER1"
displayName = "LABUSER1"
email = "kohlmann@ca.ibm.com"
user_roles = ["Data Scientist"]
password = 'password'
r = CPDAPI.addUser(username, displayName, email, user_roles, password)
if (CPDAPI.getStatusCode(r)==201):
    print('User Added')
else:
    print(CPDAPI.getStatusCode(r))

## Lab Setup

### Creating the Data Virtualization Project
The first step in setting up the lab is to create a project that all the lab users can share. You will then add a single hands on lab notebook to the project and finally make a copy for each participant.

###  Create the Data Virtualization Project
1. Right-click the following link and select **open link in new window** to open the IBM Cloud Pak for Data Console: https://services-uscentral.skytap.com:9152/
2. Organize your screen so that you can see both this notebook as well as the IBM Cloud Pak for Data Console at the same time. This will make it much easier for you to complete the lab without switch back and forth between screens.
3. Sign in using the **admin** userid and password
4. Click the three bar main navigation menu

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/2.42.03 Three Bar.png">
    
5. Select **Projects**    
    
    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.17.07 Projects.png">
6. Select **+ New project**
7. Select **Analytics project**
8. Click **OK**
9. Click **Create an empty project** (you may have to click twice)
10. Enter **Data Virtualization Hands on Lab** as the Project name
11. Click **Create**

 
###  Add the template notebook to the Project 
 
1. From the Projects list, click **Data Virtualization Hands on Lab**

3. From the My Projects screen click **Add to project** at the top right of the screen

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.17.48 Add to project.png">
    
4. Click **Notebook**

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.17.59 Notebook.png">

5. Click **From URL**
6. Enter **DV Lab** in the Notebook **Name** field
7. Copy and paste the following link into the **Notebook URL** field:
    https://github.com/Db2-DTE-POC/CPDDVLAB/blob/master/CPD-DV%20Hands%20on%20Lab%20Preloaded.ipynb
8. Add an optional description

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.25.23 New notebook.png">

9. Click **Create Notebook**

### Duplicate the template notebook
1. Click **Data Virtualization Hands on Lab** to navigate back to the list of assets
2. Scroll down until you see the new notebook **DB Lab** listed
3. Click the **elipsis icon** at the far right of the DV Lab
4. Click **duplicate**
5. **Repeat steps 3 and 4** nine more times. **Make sure to always select the original template notebook**

### Add Users to CPD and Data Virtualization
Set the value **ids** to the number of users you want to create for the lab. 

In [None]:
ids = 10

In [None]:
# Add x Data Scientists, LABUSERS to CPD

userList = {'UserRoot':['LABUSER','LABDATAENGINEER'],'Role':[['Data Scientist','Developer'],['Data Engineer']]}
userListDF = pd.DataFrame(userList) 
email = 'kohlmann@ca.ibm.com'
password = 'xxxx'

for x in range(0, ids):
    for row in range(0, len(userListDF)):
        username = userListDF['UserRoot'].iloc[row]+str(x)
        user_role = userListDF['Role'].iloc[row]
        displayName = username
        r = CPDAPI.addUser(username, displayName, email, user_role, password)
        if (CPDAPI.getStatusCode(r)==201):
            print('User: '+username+' Added as a '+str(user_role))
        else:
            print(CPDAPI.getStatusCode(r))

In [None]:
# Add x Users and Engineers to the DV Service

userList = {'UserRoot':['LABUSER','LABDATAENGINEER'],'Role':['User','Engineer']}
userListDF = pd.DataFrame(userList) 

df = CPDAPI.getUsersDF() # Get existing list of users to get the uid

for x in range(0, ids):
    for row in range(0, len(userListDF)):
        display_name = userListDF['UserRoot'].iloc[row]+str(x)
        role = userListDF['Role'].iloc[row]
        
        r = CPDAPI.addUserToDV(display_name, role, df)
        if (CPDAPI.getStatusCode(r)==200):
            print('User: '+display_name+' added to Data Virtualization Service')
        else:
            print(CPDAPI.getStatusCode(r))

### Grant Access to Data Engineers to the **Data Virtualization Hands on Lab** project

Now that you have created the LABDATAENGINEER users you need to give them access to the Hands on Lab project.

1. Click the three bar main navigation menu

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/2.42.03 Three Bar.png">
    
2. Select **Projects**    
    
    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.17.07 Projects.png">
    
3. Select the **Data Virtualization Hands on Lab** project
4. Click **Access Control** from the menu bar at the top of the page
5. Click **Add collaborators** at the top right of the page
6. Select all the **LABDATAENGINEER** users one at a time using the **Search for name** field
7. Select the **Editor** access control level
8. Add the users to the list of new collaborators
9. Click **Invite**

### Grant Access to Data Engineers to Existing Views and Tables

In [None]:
# Grant Access to Data Engineers to all the Views owned by the logged in user
ViewsDF = CPDAPI.getVirtualizedViewsDF()
roleToGrant = 'DV_ENGINEER'
for index, row in ViewsDF.iterrows():
    name = row['viewname']
    schema = row['viewschema']

    r = CPDAPI.grantPrivledgeToRole(name, schema, roleToGrant)
    if (CPDAPI.getStatusCode(r)==200):
        print('Access granted')
    else:
        print(CPDAPI.getStatusCode(r))

In [None]:
# Grant Access to Data Engineers to all the Virtualizated Tables owned by the logged in user
TablesDF = CPDAPI.getVirtualizedTablesDF()
roleToGrant = 'DV_ENGINEER'
for index, row in TablesDF.iterrows():
    name = row['table_name']
    schema = row['table_schema']

    r = CPDAPI.grantPrivledgeToRole(name, schema, roleToGrant)
    if (CPDAPI.getStatusCode(r)==200):
        print('Access granted')
    else:
        print(CPDAPI.getStatusCode(r))

### Test Existing Data Sources
You can use the following REST call to check the status of all the Data Sources used by the demonstration and hands on lab.

In [None]:
# Display the Available Data Sources already configured
dataSources = CPDAPI.getDataSources()
display(dataSources)

### Test Existing Virtualized Tables, Folded Tables and Views
The following view and virtualized tables are used in the lab and demo. The following code will check that they are all working.

In [None]:
# Test that the existing views all work
sqlText = \
'''
-- zOS VSAM
SELECT COUNT(*) FROM DVDEMO.STOCK_SYMBOLS;
SELECT * FROM DVDEMO.STOCK_SYMBOLS FETCH FIRST 2 ROWS ONLY;

-- Folded Virtual Tables
SELECT COUNT(*) FROM FOLDING.STOCK_HISTORY;
SELECT * FROM FOLDING.STOCK_HISTORY FETCH FIRST 2 ROWS ONLY;
SELECT COUNT(*) FROM FOLDING.ACCOUNTS_DV;
SELECT * FROM FOLDING.ACCOUNTS_DV FETCH FIRST 2 ROWS ONLY;
SELECT COUNT(*) FROM FOLDING.STOCK_TRANSACTIONS_DV;
SELECT * FROM FOLDING.STOCK_TRANSACTIONS_DV FETCH FIRST 2 ROWS ONLY;

-- Mongo DB
SELECT COUNT(CUSTOMER_ID) FROM MONGO_ONPREM.CUSTOMER_CONTACT;
SELECT * FROM MONGO_ONPREM.CUSTOMER_CONTACT FETCH FIRST 2 ROWS ONLY;
SELECT COUNT(CUSTOMER_ID) FROM MONGO_ONPREM.CUSTOMER_IDENTITY;
SELECT * FROM MONGO_ONPREM.CUSTOMER_IDENTITY FETCH FIRST 2 ROWS ONLY;
SELECT COUNT(CUSTOMER_ID) FROM MONGO_ONPREM.CUSTOMER_PAYMENT;
SELECT * FROM MONGO_ONPREM.CUSTOMER_PAYMENT FETCH FIRST 2 ROWS ONLY;

-- Views
SELECT COUNT(*) FROM TRADING.MOVING_AVERAGE;
SELECT * FROM TRADING.MOVING_AVERAGE FETCH FIRST 2 ROWS ONLY;
SELECT COUNT(*) FROM TRADING.VOLUME;
SELECT * FROM TRADING.VOLUME FETCH FIRST 2 ROWS ONLY;
SELECT COUNT(*) FROM TRADING.THREEPERCENT;
SELECT * FROM TRADING.THREEPERCENT FETCH FIRST 2 ROWS ONLY;
SELECT COUNT(*) FROM TRADING.TRANSBYCUSTOMER;
SELECT * FROM TRADING.TRANSBYCUSTOMER FETCH FIRST 2 ROWS ONLY;
SELECT COUNT(*) FROM TRADING.TOPBOUGHTSOLD;
SELECT * FROM TRADING.TOPBOUGHTSOLD FETCH FIRST 2 ROWS ONLY;
SELECT COUNT(*) FROM TRADING.TOPFIVE;
SELECT * FROM TRADING.TOPFIVE FETCH FIRST 2 ROWS ONLY;
SELECT COUNT(*) FROM TRADING.BOTTOMFIVE;
SELECT * FROM TRADING.BOTTOMFIVE FETCH FIRST 2 ROWS ONLY;
SELECT COUNT(*) FROM TRADING.OHIO;
SELECT * FROM TRADING.OHIO FETCH FIRST 2 ROWS ONLY;
'''

CPDAPI.displayResults(CPDAPI.runScript(sqlText))

## Lab Teardown
After users have completed the Hands on Lab you can use the following instructions to remove any objects created by the users and the notebooks they used. You can also remove the tables created in the Db2 Warehouse system used to demonstrate table virtualiation and folding.

### Remove Tables and Views Created by Lab Participants

In [None]:
# Delete Virtualized Tables Created by Lab Participants

virtualTables = CPDAPI.getVirtualizedTablesDF()
virtualUserTables = virtualTables.loc[virtualTables['owner'] != 'USER999']
display(virtualUserTables)
for index, row in virtualUserTables.iterrows():
    schema = row['table_schema']
    table = row['table_name']
    source = row['data_source_table_name']
    r = CPDAPI.deleteVirtualizedTable(schema, table, source)
    if (CPDAPI.getStatusCode(r)==200):
        print('Virtualized Table deleted')
    else:
        print(CPDAPI.getStatusCode(r))
display(CPDAPI.getVirtualizedTablesDF())

In [None]:
# Delete Virtualized Views Created by Lab Participants

views = CPDAPI.getVirtualizedViewsDF()
userViews = views.loc[views['owner'] != 'USER999']
display(userViews)
for index, row in userViews.iterrows():
    schema = row['viewschema']
    view = row['viewname']
    r = CPDAPI.deleteView(schema, view)
    if (CPDAPI.getStatusCode(r)==200):
        print('View deleted')
    else:
        print(CPDAPI.getStatusCode(r))
display(CPDAPI.getVirtualizedViewsDF())

### Remove Users from Data Virtualization Server and CPD
Set the value **ids** to the number of users you want to remove starting at 0. 

In [None]:
ids = 10

In [None]:
# Drop x users and engineers from the DV Service

userList = {'UserRoot':['LABUSER','LABDATAENGINEER']}
userListDF = pd.DataFrame(userList) 

df = CPDAPI.getUsersDF() # Get existing list of users to get the uid

for x in range(0, ids):
    for row in range(0, len(userListDF)):
        display_name = userListDF['UserRoot'].iloc[row]+str(x)
        
        r = CPDAPI.dropUserFromDV(display_name, df)
        if (CPDAPI.getStatusCode(r)==200):
            print('User: '+display_name+' dropped from Data Virtualization Service')
        else:
            print(CPDAPI.getStatusCode(r))

In [None]:
# Drop x users and engineers from CPD

userList = {'UserRoot':['labuser','labdataengineer']}
userListDF = pd.DataFrame(userList) 

for x in range(0, ids):
    for row in range(0, len(userListDF)):
        username = userListDF['UserRoot'].iloc[row]+str(x)

        r = CPDAPI.dropUser(username)
        if (CPDAPI.getStatusCode(r)==200):
            print('User: '+username+' Dropped')
        else:
            print(CPDAPI.getStatusCode(r))

### Delete the  **Data Virtualization Hands on Lab** project

Now that you have delete all the users and their virtualized objects you can delete the project that contains all the Jupyer notebooks.

1. Click the three bar main navigation menu

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/2.42.03 Three Bar.png">
    
2. Select **Projects**    
    
    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.17.07 Projects.png">
    
3. Click the **elipsis icon** to the right of the **Data Virtualization Hands on Lab** project
4. Select **Delete**
5. Click **Delete** to delete the project and all assets for all collaborators

### Remove User Created Tables from the Db2 Data Warehouse database in IBM Cloud Pak for Data
During the lab the users create new tables in the Db2 Data Warehouse database that is in IBM Cloud Pak for Data. The code below delete all the new tables as well as the related schema. 

In [None]:
# !wget https://raw.githubusercontent.com/IBM/db2-jupyter/master/db2.ipynb
!wget -O db2.ipynb https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/db2.ipynb

%run db2.ipynb
print('db2.ipynb loaded')

In [None]:
# Connect to the Db2 Warehouse on IBM Cloud Pak for Data Database from inside of IBM Cloud Pak for Data
database = 'bludb'
user = 'user999'
password = 'xxxx'
host = 'openshift-skytap-nfs-woker-5.ibm.com'
port = '31928'

%sql CONNECT TO {database} USER {user} USING {password} HOST {host} PORT {port}

In [None]:
# Connect to the Db2 Warehouse on IBM Cloud Pak for Data Database from outside of IBM Cloud Pak for Data

database = 'bludb'
user = 'user999'
# password can be found in the details option of the Db2 Warehouse service in the My Data->Databases menu option
password = 'xxxx'
host = 'services-uscentral.skytap.com'
port = '9094'

%sql CONNECT TO {database} USER {user} USING {password} HOST {host} PORT {port}

In [None]:
tables = %sql SELECT TABSCHEMA, TABNAME FROM SYSCAT.TABLES WHERE TABSCHEMA LIKE '%ENGINEER%'
display(tables)
schemas = %sql SELECT SCHEMANAME FROM SYSCAT.SCHEMATA WHERE SCHEMANAME LIKE '%ENGINEER%'
display(schemas)

In [None]:
drop_tables = ''
for index, row in tables.iterrows():
    schema_name = row['TABSCHEMA']
    table_name = row['TABNAME']
    drop_tables = drop_tables + 'DROP TABLE '+schema_name+'.'+table_name+'; '
print(drop_tables)
%sql {drop_tables}

In [None]:
drop_schemas = ''
for index, row in schemas.iterrows():
    schema_name = row['SCHEMANAME']
    drop_schemas = drop_schemas + 'DROP SCHEMA '+schema_name+' RESTRICT; '
print(drop_schemas)
%sql {drop_schemas}

You are now ready to recreate the lab ready for the next set of users. 

#### Credits: IBM 2019, Peter Kohlmann [kohlmann@ca.ibm.com]