# IBM Cloud Pak for Data - Multi-Cloud Virtualization Hands-on Lab

## Introduction
Welcome to the IBM Cloud Pak for Data Multi-Cloud Virtualization Hands on Lab. 

In this lab you analyze data from multiple data sources, from across multiple Clouds, without copying data into a warehouse.

This hands-on lab uses live databases, were data is “virtually” available through the IBM Cloud Pak for Data Virtualization Service. This makes it easy to analyze data from across your multi-cloud enterprise using tools like, Jupyter Notebooks, Watson Studio or your favorite reporting tool like Cognos.  

### Where to find this sample online
You can find a copy of this notebook on GITHUB at https://github.com/Db2-DTE-POC/CPDDVLAB.

### How to add this notebook to a IBM Cloud Pak for Data Project
1. Click the three bar main navigation menu

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/2.42.03 Three Bar.png">
    
2. Select **Projects**    
    
    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.17.07 Projects.png">
    
2. Either select the **existing project** that is the same as your LABDATAENGINEER id.
3. From the My Projects screen click **Add to project** at the top right of the screen

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.17.48 Add to project.png">
    
4. Click **Notebook**

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.17.59 Notebook.png">

5. Click **From URL**
6. Enter **Data Virtualization Hands on Lab** in the project **Name** field
7. Copy and paste the following link into the **Notebook URL** field:
    https://github.com/Db2-DTE-POC/CPDDVLAB/blob/master/CPD-DV%20Hands%20on%20Lab.ipynb
8. Add an optional description

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.25.23 New notebook.png">

9. Click **Create Notebook**
10. Click the **pencil icon** at the top of your new notebook to actively run your new notebook

### The business problem and the landscape
The Acme Company needs timely analysis of stock trading data from multiple source systems. 

Their data science and development teams needs access to:
* Customer data
* Account data
* Trading data
* Stock history and Symbol data

<img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/CPDDVLandscape.png">

The data sources are running on premises and on the cloud. In this example many of the databases are also running on OpenShift but they could be managed, virtual or bare-metal cloud installations. IBM Cloud Pak for Data doesn't care. Enterprise DB (Postgres) is also running in the Cloud.Mongo andn Informix are running on premises. Finally we also have a VSAM file on zOS leveraging the Data Virtualization Manager for zOS. 

To simply access for Data Scientists and Developers the Acme team want to make all their data look like it is coming from a single database. They also want to combine data to create simple to use tables.

In the past, Acme built a dedicated data warehouse, and then created ETL (Export, Transform and Load) job to move data from each data source into the warehouse were it could be combined. Now you can just virtualize your data without moving it.

### In this lab you learn how to:

* Sign into IBM Cloud Pak for Data using your own Data Engineer and Data Scientist (User) userids
* Connect to different data sources, on premesis and across a multi-vendor Cloud
* Make remote data from across your multi-vendor enterprise look and act like local tables in a single database
* Make combining complex data and queries simple even for basic users
* Capture complex SQL in easy to consume VIEWs that act just like simple tables
* Ensure that users can securely access even complex data across multiple sources 
* Use roles and privledges to ensure that only the right user may see the right data
* Make development easy by connecting to your virtualized data using Analytic tools and Appliction from outside of IBM Cloud Pak for Data. 

## Getting Started

### Using Jupyter notebooks
You are now officially using a Jupyter notebook! If this is your first time using a Jupyter notebook you might want to go through the [An Introduction to Jupyter Notebooks](http://localhost:8888/notebooks/An_Introduction_to_Jupyter_Notebooks.ipynb). The introduction shows you some of the basics of using a notebook, including how to create the cells, run code, and save files for future use. 

Jupyter notebooks are based on IPython which started in development in the 2006/7 timeframe. The existing Python interpreter was limited in functionality and work was started to create a richer development environment. By 2011 the development efforts resulted in IPython being released (http://blog.fperez.org/2012/01/ipython-notebook-historical.html).

Jupyter notebooks were a spinoff (2014) from the original IPython project. IPython continues to be the kernel that Jupyter runs on, but the notebooks are now a project on their own.

Jupyter notebooks run in a browser and communicate to the backend IPython server which renders this content. These notebooks are used extensively by data scientists and anyone wanting to document, plot, and execute their code in an interactive environment. The beauty of Jupyter notebooks is that you document what you do as you go along.

### Connecting to IBM Cloud Pak for Data
For this lab you will be assigned two IBM Cloud Pak for Data User IDs: A Data Engineer userid and and end-user userid. Check with the lab coordinator which userid and passwords you should use.
* **Engineer:**
    * ID: LABDATAENGINEERx
    * PASSWORD: xxx
* **User:**
    * ID: LABUSERx
    * PASSWORD: xxx

To get started, sign in using you Engineer id:
1. Right-click the following link and select **open link in new window** to open the IBM Cloud Pak for Data Console: https://services-uscentral.skytap.com:9152/
1. Organize your screen so that you can see both this notebook as well as the IBM Cloud Pak for Data Console at the same time. This will make it much easier for you to complete the lab without switch back and forth between screens.
2. Sign in using your Engineer userid and password
3. Click the icon at the very top right of the webpage. It will look something like this:

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.06.10 EngineerUserIcon.png">

4. Click **Profile and settings**
5. Click **Permissions** and review the user permissions for this user
6. Click the **three bar menu** at the very top left of the console webpage

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/2.42.03 Three Bar.png">

7. Click **Collect** if the Collect menu isn't already open
7. Click **Data Virtualization**. The Data Virtualization user interface is displayed

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.06.12 CollectDataVirtualization.png">

8. Click the carrot symbol beside **Menu** below the Data Virtualization title

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/3.07.47 Menu Carrot.png">

This displays the actions available to your user. Different user have access to more or fewer menu options depending on their role in Data Virtualization. 

As a Data Engineer you can:
* Add and modify Data sources. Each source is a connetion to a single database, either inside or outside of IBM Cloud Pak for Data.
* Virtualize data. This makes tables in other data sources look and act like tables that are local to the Data Virtualization database
* Work with the data you have virtualized.
* Write SQL to access and join data that you have virtualized
* See detailed in formation on how to connect external analytic tools and applications to your virtualized data

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.12.54 Menu Data sources.png">

As a User you can only:
* Work with data that has been virtualized for you
* Write SQL to work with that data
* See detailed connection information

As an Administrator (only available to the course instructor) you can also:
* Manage IBM Cloud Pak for Data User Access and Roles
* Create and Manage Data Caches to accelerate performance
* Change key service setttings

## Basic Data Virtualiation

### Exploring Data Source Connections
Let's start by looking at the the Data Source Connections that are already available.    

1. Click the Data Virtualization menu and select **Data Sources**.

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.12.54 Menu Data sources.png">

2. Click the **icon below the menu with a circle with three connected dots**.
    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.14.50 Connections Icons Spider.png">
3. A spider diagram of the connected data sources opens. 

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.15.31 Data Sources Spider.png">

    This displays the Data Source Graph with 8 active data sources:
    * 4 Db2 Family Databases hosted on premises, IBM Cloud, Azure and AWS
    * 1 EDB Postgres Database on Azure
    * 1 zOS VSAM file
    * 1 Informix Database running on premises 
**We are not going to add a new data source** but just go through the steps so you can see how to add additional data sources.
1. Click **+ Add** at the right of the console screen
2. Select **Add data source** from the menu
You can see a history of other data source connection information that was used before. This history is maintain to make reconnecting to data sources easier and faster.
3. Click **Add connection**
4. Click the field below **Connection type**
5. Scroll through all the **available data sources** to see the available connection types
6. Select **different data connection types** from the list to see the information required to connect to a new data source. 
At a minumum you typically need the host URL and port address, database name, userid and password. You can also connect using an SSL certificate that can be dragged and dropped directly into the console interface. 
7. Click **Cancel** to return to the previous list of connections to add
8. Click **Cancel** again to return to the list of currently connected data sources

### Exploring the available data
Now that you understand how to connect to data sources you can start virtualizing data. Much of the work has already been done for you. IBM Cloud Pak for Data searches through the available data sources and compiles a single large inventory of all the tables and data available to virtualize in IBM Cloud Pak for Data. 

1. Click the Data Virtualization menu and select **Virtualize**

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.13.07 Menu Virtualize.png">
    
2. Check the total number of available tables at the top of the list. There should be well over 500 available.

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.15.50 Available Tables.png">

3. Enter "STOCK" into the search field and hit **Enter**. Any tables with the string
**STOCK** in the tables name, the table schema or with a colunn title that includes **STOCK** will appear in the search results. 

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.39.43 Find STOCK.png">

4. Hover your mouse pointer to the far right side to the search results table. A **eye** icon will appear on each row as you move your mouse. 
5. Click the **eye** icon beside one table. This displays a preview of the data in the selected table.

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/3.26.54 Eye.png">

6. Click **X** at the top right of the dialog box to return to the search results.

### Creating New Tables
So that each user in this lab can have their own data to virtualize you will create your own table in a remote database.
In this part of the lab you will use this Jupyter notebook and Phyton code to connect to a source database, create a simple table and populate it with data. IBM Cloud Pak for Data will automatically detect the change in the source database and make the new table available for virtualization.

In this example we are using Db2 Warehousing running in IBM Cloud Pak for Data but the database can be anywhere. All you need is the connection information and authorized credentials. 

The first step is to connect to one of our remote data sources directly as if we were part of the team builing a new business application. Since each lab user will create their own table in their own schema the first thing you need to do is update and run the cell below with your engineer name. 
1. In this Juypyter notebook, click on the cell below 
2. Update the engineer name in quotes to match your engineer name
3. Click **Run** from the Jupyter notebook menu above

In [None]:
# Setting your userID
engineer = 'LABDATAENGINEERx'
print('variable engineer set to = ' + str(engineer))

The next part of the lab relies on a Jupyter notebook extension, commonly refer to as a "magic" command, to connect to a Db2 database. To use the commands you load load the extension by running another notebook call db2 that contains all the required code 
<pre>
&#37;run db2.ipynb
</pre>
The cell below loads the Db2 extension directly from GITHUB. Note that it will take a few seconds for the extension to load, so you should generally wait until the "Db2 Extensions Loaded" message is displayed in your notebook. 
1. Click the cell below
2. Click **Run**. When the cell is finished running, In[*] will change to In[2]

In [None]:
# !wget https://raw.githubusercontent.com/IBM/db2-jupyter/master/db2.ipynb
!wget -O db2.ipynb https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/db2.ipynb

%run db2.ipynb
print('db2.ipynb loaded')

#### Connecting to Db2

Before any SQL commands can be issued, a connection needs to be made to the Db2 database that you will be using. 

The Db2 magic command tracks whether or not a connection has occured in the past and saves this information between notebooks and sessions. When you start up a notebook and issue a command, the program will reconnect to the database using your credentials from the last session. In the event that you have not connected before, the system will prompt you for all the information it needs to connect. This information includes:

- Database name
- Hostname
- PORT 
- Userid
- Password

Run the next cell.

#### Connecting to Db2

In [None]:
# Connect to the Db2 Warehouse on IBM Cloud Pak for Data Database from inside of IBM Cloud Pak for Data
database = 'bludb'
user = 'user999'
password = 't1cz?K9-X1_Y-2Wi'
host = 'openshift-skytap-nfs-woker-5.ibm.com'
port = '31928'

%sql CONNECT TO {database} USER {user} USING {password} HOST {host} PORT {port}

To check that the connection is working. Run the following cell. It lists the tables in the database in the **DVDEMO** schema. Only the first 5 tables are listed.

In [None]:
%sql select TABNAME, OWNER from syscat.tables where TABSCHEMA = 'DVDEMO'

Now that you can successfully connect to the database, you are going to create two tables with the same name and column across two different schemas. In following steps of the lab you are going to virtualize these tables in IBM Cloud Paks for Data and fold them together into a single table. 

The next cell sets the default schema to your engineer name followed by 'A'. Notice how you can set a python variable and substitute it into the SQL Statement in the cell. The **-e** option echos the command. 

Run the next cell.

In [None]:
schema = engineer+'A'
%sql -e SET CURRENT SCHEMA {schema}

Run next cell to create a table with a single INTEGER column containing values from 1 to 10.

In [None]:
%%sql
CREATE TABLE DISCOVER (A INT);
INSERT INTO DISCOVER VALUES 1,2,3,4,5,6,7,8,9,10;
SELECT * FROM DISCOVER;

Run the next two cells to create the same table in a schema ending in **B**. It is populated with values from 11 to 20.

In [None]:
schema = engineer+'B'
print(schema)
%sql SET CURRENT SCHEMA {schema}

In [None]:
%%sql
CREATE TABLE DISCOVER (A INT);
INSERT INTO DISCOVER VALUES 11,12,13,14,15,16,17,18,19,20;
SELECT * FROM DISCOVER;

Run the next cell to see all the tables in the database called **DISCOVER**. You may see tables created by other people running the lab. 

In [None]:
%sql SELECT TABSCHEMA, TABNAME FROM SYSCAT.TABLES WHERE TABNAME = 'DISCOVER'

### Virtualizing your new Tables
Now that you have created two new tables you can virtualize that data and make it look like a single table in your database.
1. Return to the IBM Cloud Pak for Data Console
2. Click **Virtualize** in the Data Virtualization menu if you are not still in the Virtualize page

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.13.07 Menu Virtualize.png">
    
3. Click the **magnifying glass icon** beside the search bar to refresh the list of available tables

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.15.45 Find tables by name schema column.png">

3. Enter **DISCOVER** in the search bar and hit Enter. Now you can see that your new tables have automatically been discovered by IBM Cloud Pak for Data. You will see your tables listed under the LABDATAENGINEER schemas you used when you created your tables. You will also may see other lab participant tables.

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.31.01 Available Discover Tables.png">

4. Select the two tables you just created by clicking the **check box** beside each table. Make sure you only select those for your LABDATAENGINEER schema.
5. Click **Add to Cart**. Notice that the number of items in your cart is now **2**.

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.31.01 Available Discover Tables.png">

6. Click **View Cart**

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.33.31 View Cart(2).png">

7. Change the name of your two tables from DISCOVER to **DISCOVERA** and **DISCOVERB**. These are the new names that you will be able to use to find your tables in the Data Virtualization database. Don't change the Schema name. It is unique to your current userid. 

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.34.21 Assign to Project.png">
    
9. Click the **back arrow** beside **Review cart and virtualize tables**. We are going to add one more thing to your cart.

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.34.30 Back Arrow Icon.png">

10. Click the checkbox beside **Automatically group tables**. Notice how all the tables called **DISCOVER** have been grouped together into a single entry.

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.35.18 Automatically Group Available Tables.png">

11. Select the row were all the DISCOVER table have been grouped together
12. Click **Add to cart**. 
13. Click **View cart**

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.35.28 View cart(3).png">
    
    You should now see three items in your cart.

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.35.57 Cart with Fold.png">

14. Hover over the elipsis icon at the right side of the list for the **DISCOVER** table

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.34.44 Elipsis.png">

15. Select **Edit grouped tables**

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.36.11 Cart Elipsis Menu.png">

16. Deselect all the tables except for those in one of the schemas you created. You should now have two tables selected. 
17. Click **Apply**
17. Change the name of the new combined table **DISCOVERFOLD**
18. Select a project from the drop down list that corresponds to your current user id. 
19. From the elpsis menu select **Preview** for each of the three tables in your list. The new virtualizaed table **DISCOVERA** should contain values from 1-10. The new virtualized table **DISCOVERB** should contain values from 11-20. The **DISCOVERFOLD** virtualized table should contain values from 1-20.
20. Click **Virtualize**. You should that three new virtual tables have been created. 

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.36.49 Virtualize.png">
    
    The Virtual tables created dialog box opens.

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.37.24 Virtual tables created.png">
     
21. Click **View my virtualized data**. You return to the My virtualized data page.

### Working with your new tables
1. Enter **DISCOVER** in the Find field. 

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.37.55 Find DISCOVER.png">
    
You should see the three virtual tables you just created. Notice that you do not see tables that other users have created. By default, Data Engineers only see virtualized tables they have virtualized or virtual tables where they have been given access by other users. 
2. Click the elipsis (...) beside your **DISCOVERFOLD** table and select **Preview** to confirm that it contains 20 rows.

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/4.32.01 Elipsis Fold.png">

3. Click **SQL Editor** from the Data Virtualization menu

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.13.33 Menu SQL editor.png">

4. Click **Blank** to create a new blank SQL Script

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.38.24 + Blank.png">

4. Enter **SELECT * FROM DISCOVERFOLD;** into the SQL Editor

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.38.44 SELECT*.png">

5. Click **Run All** at the bottom left of the SQL Editor window. You should see 20 rows returned in the result. 

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.38.52 Run all.png">

Notice that you didn't have to specify the schema for your new virtual tables. The SQL Editor automatically uses the schema associated with your userid that was used when you created your new tables. 

Now you can:
* Create connection to a remote data source 
* Make a new or existing table in that remote data source look and act like a local table 
* Fold data from different tables in the same data source or access data sources by folding it together into a single virtual table

## Gaining Insight from Virtualized Data

Now that you understand the basics of Data Virtualization you can explore how easy it is to gain insight across multiple data sources without moving data. 

In the next set of steps you connect to virtualized data from from this notebook using your LABDATAENGINEER userid. You can use the same techniques to connect to virtualized data from applications and analytic tools from outside of IBM Cloud Pak for Data. 

Connecting to all your virtualized data is just like connecting to a single database. All the complexity of a dozens of tables across multiple databases on different on premesis and cloud providers is now as simple as connecting to a single database and querying a table. 

We are going to connect to the IBM Cloud Pak for Data Virtaulization database in exactly the same way we connected to a Db2 database earlier in this lab. However we need to change the detailed connection information. 

1. Click **Connection Details** in the Data Virtualization menu

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.13.44 Menu connection details.png">

2. Click **Without SSL**

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.14.29 Connection details.png">

3. Copy the **User ID** by highlighting it with your mouse, right click and select **Copy**
4. Paste the **User ID** in to the next cell in this notebook where **user=** (see below) between the quotation marks

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.54.27 Notebook Login.png">

5. Click **Service Settings** in the Data Virtualization menu

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.14.15 Access information.png">

6. Click **Show** to see the password. Highlight the password and copy using the right-click menu
7. Paste the **password** into the cell below between the quotation marks using the righ click paste.
8. Run the cell below to connect to the Data Virtualization database. 

#### Connecting to Data Virtualization SQL Engine

In [None]:
# Connect to the IBM Cloud Pak for Data Virtualization Database from inside CPD
database = 'bigsql'
user = 'userxxxx'
password = 'xxxxxxxxxxxxxx'
host = 'openshift-skytap-nfs-lb.ibm.com'
port = '32080'

%sql CONNECT TO {database} USER {user} USING {password} HOST {host} PORT {port}

### Stock Symbol Table
#### Get information about the stocks that are in the database
**System Z - VSAM**
This table comes from a VSAM file on zOS. IBM Cloud Pak for Data Virtualization works together with Data Virtualization Manager for zOS to make this looks like a local database table. For the following examples you can substitute any of the symbols below.

In [None]:
%sql -grid select * from DVDEMO.STOCK_SYMBOLS

### Stock History Table
#### Get Price of a Stock over the Year
Set the Stock Symbol in the line below and run the cell. This informaiton is folded together with data coming from two identical tables, one on Db2 database and on on and Informix database. Run the next two cells. Then pick a new stock symbol from the list above, enter it into the cell below and run both cells again.

**CP4D - Db2, Skytap -  Informix**

In [None]:
stock = 'AXP'
print('variable stock set to = ' + str(stock))

In [None]:
%%sql -pl
SELECT WEEK(TX_DATE) AS WEEK, OPEN FROM FOLDING.STOCK_HISTORY
WHERE SYMBOL = :stock AND TX_DATE != '2017-12-01'
ORDER BY WEEK(TX_DATE) ASC

#### Trend of Three Stocks
This chart shows three stock prices over the course of a year. It uses the same folded stock history information.

**CP4D - Db2, Skytap -  Informix**

In [None]:
stocks = ['INTC','MSFT','AAPL']

In [None]:
%%sql -pl
SELECT SYMBOL, WEEK(TX_DATE), OPEN FROM FOLDING.STOCK_HISTORY
WHERE SYMBOL IN (:stocks) AND TX_DATE != '2017-12-01'
ORDER BY WEEK(TX_DATE) ASC

#### 30 Day Moving Average of a Stock
Enter the Stock Symbol below to see the 30 day moving average of a single stock.

**CP4D - Db2, Skytap -  Informix**

In [None]:
stock = 'AAPL'

In [None]:
sqlin = \
"""
SELECT WEEK(TX_DATE) AS WEEK, OPEN, 
     AVG(OPEN) OVER (
       ORDER BY TX_DATE
     ROWS BETWEEN 15 PRECEDING AND 15 FOLLOWING) AS MOVING_AVG
  FROM FOLDING.STOCK_HISTORY
     WHERE SYMBOL = :stock
  ORDER BY WEEK(TX_DATE)
"""
df = %sql {sqlin}
txdate= df['WEEK']
sales = df['OPEN']
avg = df['MOVING_AVG']

plt.xlabel("Day", fontsize=12);
plt.ylabel("Opening Price", fontsize=12);
plt.suptitle("Opening Price and Moving Average of " + stock, fontsize=20);
plt.plot(txdate, sales, 'r');
plt.plot(txdate, avg, 'b');
plt.show();

#### Trading volume of INTC versus MSFT and AAPL in first week of November
**CP4D - Db2, Skytap -  Informix**

In [None]:
stocks = ['INTC','MSFT','AAPL']

In [None]:
%%sql -pb
SELECT SYMBOL, DAY(TX_DATE), VOLUME/1000000 FROM FOLDING.STOCK_HISTORY
WHERE SYMBOL IN (:stocks) AND WEEK(TX_DATE) =  45
ORDER BY DAY(TX_DATE) ASC

#### Show Stocks that Represent at least 3% of the Total Purchases during Week 45
**CP4D - Db2, Skytap -  Informix**

In [None]:
%%sql -pie
WITH WEEK45(SYMBOL, PURCHASES) AS (
  SELECT SYMBOL, SUM(VOLUME * CLOSE) FROM FOLDING.STOCK_HISTORY
    WHERE WEEK(TX_DATE) =  45 AND SYMBOL <> 'DJIA'
  GROUP BY SYMBOL
),
ALL45(TOTAL) AS (
  SELECT SUM(PURCHASES) * .03 FROM WEEK45
)
SELECT SYMBOL, PURCHASES FROM WEEK45, ALL45
WHERE PURCHASES > TOTAL
ORDER BY SYMBOL, PURCHASES

### Stock Transaction Table
#### Show Transactions by Customer
This next two examples uses data folded together from three different data sources representing three different trading organizations to create a combined of a single customer's stock trades. 

**AWS - Db2, Azure - EDB (Postgres), Azure - Db2**

In [None]:
%%sql
SELECT * FROM FOLDING.STOCK_TRANSACTIONS
 WHERE CUSTID = '107196'

#### Bought/Sold Amounts of Top 5 stocks 
**AWS - Db2, Azure - EDB (Postgres), Azure - Db2**

In [None]:
%%sql
WITH BOUGHT(SYMBOL, AMOUNT) AS
  (
  SELECT SYMBOL, SUM(QUANTITY) FROM FOLDING.STOCK_TRANSACTIONS
  WHERE QUANTITY > 0
  GROUP BY SYMBOL
  ),
SOLD(SYMBOL, AMOUNT) AS
  (
  SELECT SYMBOL, -SUM(QUANTITY) FROM FOLDING.STOCK_TRANSACTIONS
  WHERE QUANTITY < 0
  GROUP BY SYMBOL
  )
SELECT B.SYMBOL, B.AMOUNT AS BOUGHT, S.AMOUNT AS SOLD
FROM BOUGHT B, SOLD S
WHERE B.SYMBOL = S.SYMBOL
ORDER BY B.AMOUNT DESC
FETCH FIRST 5 ROWS ONLY

### Customer Accounts
#### Show Top 5 Customer Balance
These next two examples use data folded from systems running on AWS and Azure.
**AWS - Db2, Azure - EDB (Postgres), Azure - Db2**

In [None]:
%%sql
SELECT CUSTID, BALANCE FROM FOLDING.ACCOUNTS
ORDER BY BALANCE DESC
FETCH FIRST 5 ROWS ONLY

#### Show Bottom 5 Customer Balance
**AWS - Db2, Azure - EDB (Postgres), Azure - Db2**

In [None]:
%%sql
SELECT CUSTID, BALANCE FROM FOLDING.ACCOUNTS
ORDER BY BALANCE ASC
FETCH FIRST 5 ROWS ONLY

### Selecting Customer Information from MongoDB
The MongoDB database (running on premises) has customer information in a document format. In order to materialize the document data as relational tables, a total of four virtual tables are generated. The following query shows the tables that are generated for the Customer document collection.

In [None]:
%sql LIST TABLES FOR SCHEMA MONGO_ONPREM

The tables are all connected through the CUSTOMERID field, which is based on the generated _id of the main CUSTOMER colllection. In order to reassemble these tables into a document, we must join them using this unique identifier. An example of the contents of the CUSTOMER_CONTACT table is shown below.

In [None]:
%sql -grid SELECT * FROM MONGO_ONPREM.CUSTOMER_CONTACT FETCH FIRST 5 ROWS ONLY

A full document record is shown in the following SQL statement which joins all of the tables together.

In [None]:
%%sql -grid
SELECT C.CUSTOMERID AS CUSTID, 
       CI.FIRSTNAME, CI.LASTNAME, CI.BIRTHDATE,
       CC.CITY, CC.ZIPCODE, CC.EMAIL, CC.PHONE, CC.STREET, CC.STATE,
       CP.CARD_TYPE, CP.CARD_NO
FROM MONGO_ONPREM.CUSTOMER C, MONGO_ONPREM.CUSTOMER_CONTACT CC, 
     MONGO_ONPREM.CUSTOMER_IDENTITY CI, MONGO_ONPREM.CUSTOMER_PAYMENT CP
WHERE  CC.CUSTOMER_ID = C."_ID" AND
       CI.CUSTOMER_ID = C."_ID" AND
       CP.CUSTOMER_ID = C."_ID"
FETCH FIRST 3 ROWS ONLY

### Querying All Virtualized Data
In this final example we use data from each data source to answer a complex business question. "What are the names of the customers in Ohio, who bought the most during the highest trading day of the year (based on the Dow Jones Industrial Index)?" 

**AWS Db2, Azure EDB, Azure Db2, Skytap MongoDB, CP4D Db2Wh, Skytap Informix**

In [None]:
%%sql
WITH MAX_VOLUME(AMOUNT) AS (
  SELECT MAX(VOLUME) FROM FOLDING.STOCK_HISTORY
    WHERE SYMBOL = 'DJIA'
),
HIGHDATE(TX_DATE) AS (
  SELECT TX_DATE FROM FOLDING.STOCK_HISTORY, MAX_VOLUME M
    WHERE SYMBOL = 'DJIA' AND VOLUME = M.AMOUNT
),
CUSTOMERS_IN_OHIO(CUSTID) AS (
  SELECT C.CUSTID FROM TRADING.CUSTOMERS C 
    WHERE C.STATE = 'OH'
),
TOTAL_BUY(CUSTID,TOTAL) AS (
  SELECT C.CUSTID, SUM(SH.QUANTITY * SH.PRICE) 
    FROM CUSTOMERS_IN_OHIO C, FOLDING.STOCK_TRANSACTIONS SH, HIGHDATE HD
  WHERE SH.CUSTID = C.CUSTID AND
        SH.TX_DATE = HD.TX_DATE AND 
        QUANTITY > 0 
  GROUP BY C.CUSTID
)
  SELECT LASTNAME, T.TOTAL
  FROM MONGO_ONPREM.CUSTOMER_IDENTITY CI, MONGO_ONPREM.CUSTOMER C, TOTAL_BUY T
  WHERE CI.CUSTOMER_ID = C."_ID" AND C.CUSTOMERID = CUSTID
  ORDER BY TOTAL DESC

### Seeing where your Virtualized Data is coming from
You make eventually work with a complex Data Virtualization system. As an administrator or a Data Scientist you may need to understand where data is coming from. 

Fortunately, the Data Virtualization engine is based on Db2. It includes the same catalog of information as does Db2 with some additional features. If you want to work backwards and understand where each of your virtualized tables comes from, the information is included in the **SYSCAT.TABOPTIONS** catalog table. 

In [None]:
%%sql 
SELECT TABSCHEMA, TABNAME, SETTING
  FROM SYSCAT.TABOPTIONS
  WHERE OPTION = 'SOURCELIST' 
    AND TABSCHEMA <> 'QPLEXSYS';

In [None]:
%%sql 
SELECT * from SYSCAT.TABOPTIONS;

The table includes more information than you need to answer the question of where is my data coming from. The query below only shows the rows that contain the information of the source of the data ('SOURCELIST'). Notice that tables that have been folded together from several tables includes each of the data source information seperated by a semi-colon. 

In [None]:
%%sql 
SELECT TABSCHEMA, TABNAME, SETTING
  FROM SYSCAT.TABOPTIONS
  WHERE OPTION = 'SOURCELIST' 
    AND TABSCHEMA <> 'QPLEXSYS';

In [None]:
%%sql 
SELECT TABSCHEMA, TABNAME, SETTING
  FROM SYSCAT.TABOPTIONS
  WHERE TABSCHEMA = 'DVDEMO';

In this last example, you can search for any virtualized data coming from a Postgres database by searching for **SETTING LIKE '%POST%'**.

In [None]:
%%sql
SELECT TABSCHEMA, TABNAME, SETTING
  FROM SYSCAT.TABOPTIONS
  WHERE OPTION = 'SOURCELIST' 
    AND SETTING LIKE '%POST%'
    AND TABSCHEMA <> 'QPLEXSYS';

What is missing is additional detail for each connection. For example all we can see in the table above is a connection. You can find that detail in another table: **QPLEXSYS.LISTRDBC**. In the last cell, you can see that CID DB210113 is included in the STOCK_TRANSACTIONS virtual table. You can find the details on that copy of Db2 by running the next cell. 

In [None]:
%%sql
SELECT CID, USR, SRCTYPE, SRCHOSTNAME, SRCPORT, DBNAME, IS_DOCKER FROM QPLEXSYS.LISTRDBC;

## Advanced Data Virtualization 
Now that you have seen how powerful and easy it is to gain insight from your existing virtualized data, you can learn more about how to do advanced data virtualization. You will learn how to join different remote tables together to create a new virtual table and how to capture complex SQL into VIEWs.


### Joining Tables Together
The virtualized tables below come from different data sources on different systems. We can combine them into a single virtual table. 

* Select **My virtualized data** from the Data Virtualization menu

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.13.20 Menu My virtual data.png">
  
* Enter **Stock** in the find field and hit enter

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.39.43 Find STOCK.png">
  
* Select table **STOCK_TRANSACTIONS** in the **FOLDING** schema
* Select table **STOCK_SYMBOLS** in the **DVDEMO** schema

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.40.18 Two STOCK seleted.png">
  
* Click **Join View**
* In table STOCK_SYMBOLS: deselect **SYMBOL**
* In table STOCK_TRANSACTIONS: deselect **TX_NO** 
* Click **STOCK_TRANSACTION.SYMBOL** and drag to **STOCK_SYMBOLS.SYMBOL**

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.41.07 Joining Tables.png">
  
* Click **Preview** to check that your join is working. Each row shoud now contain the stock symbol and the long stock name.

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.41.55 New Join Preview.png">
  
* Click **X** to close the preview window
* Click **JOIN**

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.42.20 Join.png">
  
* Type view name **TRANSACTIONS_FULLNAME**
* Don't change the default schema. This corresponds to your LABENGINEER user id. 

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.43.10 View Name.png">
  
* Click **NEXT**

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.43.30 Next.png">
  
* Select the lab **project** that corresponds to your LABENGINEER id. 

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.43.58 Assign to Project.png">
  
* Click **CREATE VIEW**. 

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.44.06 Create view.png">
  
  You see the successful Join View window.
  
  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.44.23 Join view created.png">
  
  
* Click **View my virtualized data**
* Click the elipsis menu beside **TRANSACTIONS_FULLNAME**
* Click **Preview**

You can now join virtualize tables together to combine them into new virtualized tables. Now that you know how to perform simple table joins you can learn how to combine multiple data sources and virtual tables using the powerful SQL query engine that is part of the IBM Cloud Pak for Data - Virtualization.

### Using Queries to Answer Complex Business Questions
The IBM Cloud Pak for Data Virtualization Administrator has set up more complex data from multiple source for the next steps. The administrator has also given you access to this virtualized data. You may have noticed this in previous steps. 
1. Select **My virtualized data** from the Data Virtualiztion menu. All of these virtualized tables look and act like normal Db2 tables. 

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.13.20 Menu My virtual data.png">
 
2. Click **Preview** for any of the tables to see what they contain. 

The virtualized tables in the **FOLDING** schema have all been created by combinig the same tables from different data source. Folding isn't something that is restricted to the same data source in the simple example you just completed.

The virtaulized tables in the **TRADING** schema are view of complex queries that were use to combine data from multiple data source to answer specific business questions. 

3. Select **SQL Editor** from the Data Virtualization menu.

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.13.33 Menu SQL editor.png">

4. Select **Script Library**

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.45.02 Script Library.png">

5. Search for **OHIO**
6. Select and expand the **OHIO Customer** query

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.45.47 Ohio Script.png">

7. Click the **Open a script to edit** icon to open the script in the SQL Editor

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.45.54 Open Script.png">

8. Click **Run All**

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.46.21 Run Ohio Script.png">


This script is a complex SQL join query that uses data from all the virtualize data sources you explored in the first steps of this lab. While the SQL looks complex the author of the query did not have be aware that the data was coming from multiple sources. Everything used in this query looks like it comes from a single database, not eight different data sources across eight different systems on premesis or in the Cloud. 

### Making Complex SQL Simple to Consume
You can easily make this complex query easy for a user to consume. Instead of shaing this query with other users, you can wrap the query into a view that looks and acts like a simple table. 
1. Enter **CREATE VIEW MYOHIOQUERY AS** in the SQL Editor at the first line below the comment and before the **WITH** clause

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.46.54 Add CREATE VIEW.png">

2. Click **Run all**
3. Click **+** to **Add a new script**

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.48.28 Add to script.png">
  
4. Click **Blank**
4. Enter **SELECT * FROM MYOHIOQUERY;**
5. Click **Run all**

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.48.57 Run Ohio View.png">
  

Now you have a very simple virtualized table that is pulling data from eight different data sources, combining the data together to resolve a complex business problem. In the next step you will share your new virtualized data with a user.

### Sharing Virtualized Tables
1. Select **My virtualized data** from the Data Virtualization Menu.

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.13.20 Menu My virtual data.png">
 
2. Click the elipsis (...) menu to the right of the **MYOHIOQUERY** virtualized table

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.49.30 Select MYOHIOQUERY.png">
  
3. Select **Manage Access** from the elipsis menu

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.49.46 Virtualized Data Menu.png">
 
3. Click **Grant access**

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.50.07 Grant access.png">
 
4. Select the **LABUSERx** id associated with your lab. For example, if you are LABDATAENGINEER5, then select LABUSER5.

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.52.42 Grant access to specific user.png">
 
5. Click **Add**

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.50.28 Add.png">
 

You should now see that your **LABUSER** id has view only access to the new virtualized table. Nextyou switch to your LABUSERx id to check that you can see the data you have just granted access for.

6. Click the user icon at the very top right of the console
7. Click **Log out**
8. Sign in using the LABUSER id specified by your lab instructor
9. Click the three bar menu at the top left of the IBM Cloud Pak for Data console
10. Select **Data Virtualization**

You should see the **MYOHIOQUERY** with the schema from your engineer userid in the list of virtualized data.

11. Make a note of the schema of the MYOHIOQUERY in your list of virtualized tables. It starts with **USER**.
12. Select the **SQL Editor** from the Data virtualization menu
13. Click **Blank** to open a new SQL Editor window
14. Enter **SELECT * FROM USERxxxx.MYOHIOQUERY** where xxxx is the user number of your engineer user. The view created by your engineer user was created in their default schema. 
15. Click **Run all**
16. Add the following to your query: **WHERE TOTAL > 3000 ORDER BY TOTAL**
17. Click **</>** to format the query so it is easiler to read
18. Click **Run all**

You can see how you have just make a very complex data set extremely easy to consume by a data user. They don't have to know how to connect to multiple data sources or how to combine the data using complex SQL. You can hide that complexity while ensuring only the right user has access to the right data. 

In the next steps you will learn how to access virtualized data from outside of IBM Cloud Pak for Data.

### Allowing User to Access Virtualized Data with Analytic Tools
In the next set of steps you connect to virtualized data from from this notebook using your **LABUSER** userid. 

Just like you connected to IBM Cloud Pak for Data Virtualized Data using your LABDATAENGINEER you can connect using your LABUSER. 

We are going to connect to the IBM Cloud Pak for Data Virtualization database in exactly the same way we connected using you LABENGINEER. However you need to change the detailed connection information. Each user has their own unique userid and password to connect to the service. This ensures that no matter what tool you use to connect to virtualized data you are always in control over who can access specifical virtualized data. 

2. Click the user icon at the top right of the IBM Cloud Pak for data console to confirm that you are using your **LABUSER** id
1. Click **Connection Details** in the Data Virtualization menu
2. Click **Without SSL**
3. Copy the **User ID** by highlighting it with your mouse, right click and select **Copy**
4. Paste the **User ID** in to the cell below were **user =** between the quotation marks 
5. Click **Service Settings** in the Data Virtualization menu
6. Show the password. Highlight the password and copy using the right click menu
7. Paste the **password** into the cell below between the quotation marks using the righ click paste.
8. Run the cell below to connect to the Data Virtualization database. 

#### Connecting a USER to Data Virtualization SQL Engine

In [None]:
# Connect to the IBM Cloud Pak for Data Virtualization Database from inside CPD
database = 'bigsql'
user = 'userxxxx'
password = 'xxxxxxxxxxxxxxxxxx'
host = 'openshift-skytap-nfs-lb.ibm.com'
port = '32080'

%sql CONNECT TO {database} USER {user} USING {password} HOST {host} PORT {port}

Now you can try out the view that was created by the LABDATAENGINEER userid. Run the cell below.

In [None]:
%sql SELECT * FROM {user}.MYOHIOQUERY WHERE TOTAL > 3000 ORDER BY TOTAL;

Only LABENGINEER virtualized tables that have been authorized for the LABUSER to see are available. Try running the next cell. You should receive an error that the current user does not have the required authorization or privlege to perform the operation.

In [None]:
%sql SELECT * FROM {user}.DISCOVERFOLD;

### Next Steps:
Now you can use IBM Cloud Pak for Data to make even complex data and queries from different data sources, on premesis and across a multi-vendor Cloud look like simple tables in a single database. You are ready for some more advanced labs. 

1. Use Db2 SQL and Jupyter Notebooks to Analyze Virtualized Data
    * Build simple to complex queries to answer important business questions using the virtualized data available to you in IBM Cloud Pak for Data
    * See how you can transform the queries into simple tables available to all your users
2. Use Open RESTful Services to connect to the IBM Cloud Pak for Data Virtaulization 
    * Everything you can do in the IBM Cloud Pak for Data User Interface is accessible through Open RESTful APIs
    * Learn how to automate and script your managment of Data Virtualization using RESTful API
    * Learn how to accelerate appliation development by accessing virtaulied data through RESTful APIs

## Automating Data Virtualization Setup and Management through REST

The IBM Cloud Pak for Data Console is only one way you can interact with the Virtualization service. IBM Cloud Pak for Data is built on set of microservice that communicate with each other and the Console user interface using RESTful APIs. You can use these services to automate anything you can do throught the user interface.

This Jupyter Notebook contains examples of how to use the Open APIs to retrieve information from the virtualization service, how to run SQL statements directly against the service through REST and how to provide authoritization to objects. This provides a way 

The next part of the lab relies on a set of base classes to help you interact with the RESTful Services API for IBM Cloud Pak for Data Virtualization. You can access this library on GITHUT. The commands below download the library and run them as part of this notebook.
<pre>
&#37;run CPDDVRestClass.ipynb
</pre>
The cell below loads the RESTful Service Classes and methods directly from GITHUB. Note that it will take a few seconds for the extension to load, so you should generally wait until the "Db2 Extensions Loaded" message is displayed in your notebook. 
1. Click the cell below
2. Click **Run**

In [None]:
!wget -O CPDDVRestClass.ipynb https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/CPDDVRestClass.ipynb
%run CPDDVRestClass.ipynb

### The Db2 Class
The CPDDVRestClass.ipynb notebook includes a Python class called Db2 that encapsulates the Rest API calls used to connect to the IBM Cloud Pak for Data Virtualization service. 

To access the service you need to first authenticate with the service and create a reusable token that we can use for each call to the service. This ensures that we don't have to provide a userID and password each time we run a command. The token makes sure this is secure. 

Each request is constructed of several parts. First, the URL and the API identify how to connect to the service. Second the REST service request that identifies the request and the options. For example '/metrics/applications/connections/current/list'. And finally some complex requests also include a JSON payload. For example running SQL includes a JSON object that identifies the script, statement delimiters, the maximum number of rows in the results set as well as what do if a statement fails.

You can find this class and use it for your own notebooks in GITHUB. Have a look at how the class encapsulated the API calls by clicking on the following link: https://github.com/Db2-DTE-POC/CPDDVLAB/blob/master/CPDDVRestClass.ipynb

### Example Connections
To connect to the Data Virtualization service you need to provide the URL, the service name (v1) and profile the console user name and password. For this lab we are assuming that the following values are used for the connection:
* Userid: LABDATAENGINEERx
* Password: password

Substitute your assigned LABDATAENGINEER userid below along with your password and run the cell. It will generate a breaer token that is used in the following steps to authenticate your use of the API. 

#### Connecting to Data Virtualization API Service

In [None]:
# Set the service URL to connect from inside the ICPD Cluster
Console  = 'https://openshift-skytap-nfs-lb.ibm.com'

# Connect to the Db2 Data Management Console service
user     = 'labdataengineerx'
password = 'password'

# Set up the required connection
databaseAPI = Db2(Console)
api = '/v1'
databaseAPI.authenticate(api, user, password)
database = Console

#### Data Sources
The following call (getDataSources) uses an SQL call in the DB2 class to run the same SQL statement you saw earlier in the lab. 

In [None]:
# Display the Available Data Sources already configured
json = databaseAPI.getDataSources()
databaseAPI.displayResults(json)

#### Virtualized Data
This call retrieves all of the virtualized data available to the role of Data Engineer. It uses a direct RESTful service call and does not use SQL. The service returns a JSON result set that is converted into a Python Pandas dataframe. Dataframes are very useful in being able to manipulate tables of data in Python. If there is a problem with the call, the error code is displayed.

In [None]:
# Display the Virtualized Assets Avalable to Engineers
roles = ['DV_ENGINEER']
for role in roles:
    r = databaseAPI.getRole(role)
    if (databaseAPI.getStatusCode(r)==200):
        json = databaseAPI.getJSON(r)
        df = pd.DataFrame(json_normalize(json['objects']))
        display(df)
    else:
        print(databaseAPI.getStatusCode(r))  

#### Virtualized Tables and Views
This call retrieves all the virtualized tables and view available to the userid that you use to connect to the service. In this example the whole call is included in the DB2 class library and returned as a complete Dataframe ready for display or to be used for analysis or administration.

In [None]:
### Display Virtualized Tables and Views 
display(databaseAPI.getVirtualizedTablesDF())
display(databaseAPI.getVirtualizedViewsDF())

#### Get a list of the IBM Cloud Pak for Data Users
This example returns a list of all the users of the IBM Cloud Pak for Data system. It only displays three colunns in the Dataframe, but the list of all the available columns is als printed out. Try changing the code to display other columns.

In [None]:
# Get the list of CPD Users
r = databaseAPI.getUsers()
if (databaseAPI.getStatusCode(r)==200):
    json = databaseAPI.getJSON(r)
    df = pd.DataFrame(json_normalize(json))
    print(', '.join(list(df))) # List available column names
    display(df[['uid','username','displayName']])
else:
    print(databaseAPI.getStatusCode(r))

#### Get the list of available schemas in the DV Database
Do not forget that the Data Virtualization engine supports the same function as a regular Db2 database. So you can also look at standard Db2 objects like schemas.

In [None]:
# Get the list of available schemas in the DV Database
r = databaseAPI.getSchemas()
if (databaseAPI.getStatusCode(r)==200):
    json = databaseAPI.getJSON(r)
    df = pd.DataFrame(json_normalize(json['resources']))
    print(', '.join(list(df)))
    display(df[['name']].head(10))
else:
    print(databaseAPI.getStatusCode(r))  

#### Object Search
Fuzzy object search is also available. The call is a bit more complex. If you look at the routine in the DB2 class it posts a RESTful service call that includes a JSON payload. The payload includes the details of the search request. 

In [None]:
# Search for tables across all schemas that match simple search critera 
# Display the first 100
# Switch between searching tables or views
object = 'view'
# object = 'table'
r = databaseAPI.postSearchObjects(object,"TRADING",10,'false','false')
if (databaseAPI.getStatusCode(r)==200):
    json = databaseAPI.getJSON(r)
    df = pd.DataFrame(json_normalize(json))
    print('Columns:')
    print(', '.join(list(df)))
    display(df[[object+'_name']].head(100))
else:
    print("RC: "+str(databaseAPI.getStatusCode(r)))

#### Run SQL through the SQL Editor Service
You can also use the SQL Editor service to run your own SQL. Statements are submitted to the editor. Your code then needs to poll the editor service until the script is complete. Fortunately you can use the DB2 class included in this lab so that it becomes a very simple Python call. The **runScript** routine runs the SQL and the **displayResults** routine formats the returned JSON.

In [None]:
sqlText = \
'''
WITH MAX_VOLUME(AMOUNT) AS (
  SELECT MAX(VOLUME) FROM FOLDING.STOCK_HISTORY
    WHERE SYMBOL = 'DJIA'
),
HIGHDATE(TX_DATE) AS (
  SELECT TX_DATE FROM FOLDING.STOCK_HISTORY, MAX_VOLUME M
    WHERE SYMBOL = 'DJIA' AND VOLUME = M.AMOUNT
),
CUSTOMERS_IN_OHIO(CUSTID) AS (
  SELECT C.CUSTID FROM TRADING.CUSTOMERS C 
    WHERE C.STATE = 'OH'
),
TOTAL_BUY(CUSTID,TOTAL) AS (
  SELECT C.CUSTID, SUM(SH.QUANTITY * SH.PRICE) 
    FROM CUSTOMERS_IN_OHIO C, FOLDING.STOCK_TRANSACTIONS SH, HIGHDATE HD
  WHERE SH.CUSTID = C.CUSTID AND
        SH.TX_DATE = HD.TX_DATE AND 
        QUANTITY > 0 
  GROUP BY C.CUSTID
)
  SELECT LASTNAME, T.TOTAL
  FROM MONGO_ONPREM.CUSTOMER_IDENTITY CI, MONGO_ONPREM.CUSTOMER C, TOTAL_BUY T
  WHERE CI.CUSTOMER_ID = C."_ID" AND C.CUSTOMERID = CUSTID
  ORDER BY TOTAL DESC
FETCH FIRST 5 ROWS ONLY;
'''

databaseAPI.displayResults(databaseAPI.runScript(sqlText))

#### Run scripts of SQL Statements repeatedly through the SQL Editor Service
The runScript routine can contain more than one statement. The next example runs a scipt with eight SQL statements multple times. 

In [None]:
repeat = 3
sqlText = \
'''
SELECT * FROM TRADING.MOVING_AVERAGE;
SELECT * FROM TRADING.VOLUME;
SELECT * FROM TRADING.THREEPERCENT;
SELECT * FROM TRADING.TRANSBYCUSTOMER;
SELECT * FROM TRADING.TOPBOUGHTSOLD;
SELECT * FROM TRADING.TOPFIVE;
SELECT * FROM TRADING.BOTTOMFIVE;
SELECT * FROM TRADING.OHIO;
'''

for x in range(0, repeat):
    print('Repetition number: '+str(x))
    databaseAPI.displayResults(databaseAPI.runScript(sqlText))
print('done')

### What's next
if you are inteested in finding out more about using RESTful services to work with Db2, check out this DZone article: https://dzone.com/articles/db2-dte-pocdb2dmc. The article also includes a link to a complete hands-on lab for Db2 and the Db2 Data Management Console. In it you can find out more about using REST and Db2 together. 

### Clean up at the end of your lab
Like any good student we as you to clean up your workspace at the end of the lab. Please connect to the Db2 Database were you created new tables and drop them using the following cells.

In [None]:
# Connect to the Db2 Warehouse on IBM Cloud Pak for Data Database from inside of IBM Cloud Pak for Data
database = 'bludb'
user = 'user999'
password = 't1cz?K9-X1_Y-2Wi'
host = 'openshift-skytap-nfs-woker-5.ibm.com'
port = '31928'

%sql CONNECT TO {database} USER {user} USING {password} HOST {host} PORT {port}

In [None]:
# Connect to the Db2 Warehouse on IBM Cloud Pak for Data Database from outside of IBM Cloud Pak for Data

database = 'bludb'
user = 'user999'
password = 't1cz?K9-X1_Y-2Wi'
host = 'services-uscentral.skytap.com'
port = '9094'

%sql CONNECT TO {database} USER {user} USING {password} HOST {host} PORT {port}

In [None]:
schema = engineer+'A'
%sql DROP TABLE {schema}.DISCOVER 

In [None]:
schema = engineer+'B'
%sql DROP TABLE {schema}.DISCOVER

In [None]:
%sql DROP TABLE DATAENGINEER1A.DISCOVER1

In [None]:
%sql SELECT * FROM SYSCAT.TABLES WHERE TABNAME LIKE '%DISCOVER%'

#### Credits: IBM 2019, Peter Kohlmann [kohlmann@ca.ibm.com]