<img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/Digital Technical Engagement.png">

# IBM Cloud Pak for Data - Multi-Cloud Virtualization Hands-on Lab

## Introduction
Welcome to the IBM Cloud Pak for Data Multi-Cloud Virtualization Hands on Lab. 

In this lab you analyze data from multiple data sources, from across multiple Clouds, without copying data into a warehouse.

This hands-on lab uses live databases, were data is “virtually” available through the IBM Cloud Pak for Data Virtualization Service. This makes it easy to analyze data from across your multi-cloud enterprise using tools like, Jupyter Notebooks, Watson Studio or your favorite reporting tool like Cognos.  

Take a minute to watch this introductory video to get an overview of what you will see in this live Hands on Lab system.
http://ibm.biz/DTEVirtualMultiCloud

### Where to find this sample online
You can find a copy of this notebook on GITHUB at https://github.com/Db2-DTE-POC/CPDDVLAB.

### The business problem and the landscape
The Acme Company needs timely analysis of stock trading data from multiple source systems. 

Their data science and development teams needs access to:
* Customer data
* Account data
* Trading data
* Stock history and Symbol data

<img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/CPDDVLandscape.png">

The data sources are running on premises and on the cloud. In this example many of the databases are also running on OpenShift but they could be managed, virtual or bare-metal cloud installations. IBM Cloud Pak for Data doesn't care. Enterprise DB (Postgres) is also running in the Cloud. Mongo and Informix are running on premises. Finally, we also have a VSAM file on zOS leveraging the Data Virtualization Manager for zOS. 

To simplify access for Data Scientists and Developers the Acme team wants to make all their data look like it is coming from a single database. They also want to combine data to create simple to use tables.

In the past, Acme built a dedicated data warehouse, and then created ETL (Export, Transform and Load) job to move data from each data source into the warehouse were it could be combined. Now they can just virtualize your data without moving it.

### In this lab you learn how to:

* Sign into IBM Cloud Pak for Data using your own Data Engineer and Data Scientist (User) userids
* Connect to different data sources, on premises and across a multi-vendor Cloud
* Make remote data from across your multi-vendor enterprise look and act like local tables in a single database
* Make combining complex data and queries simple even for basic users
* Capture complex SQL in easy to consume VIEWs that act just like simple tables
* Ensure that users can securely access even complex data across multiple sources 
* Use roles and priviledges to ensure that only the right user may see the right data
* Make development easy by connecting to your virtualized data using Analytic tools and Application from outside of IBM Cloud Pak for Data. 

## Getting Started

### Using Jupyter notebooks
You are now officially using a Jupyter notebook! If this is your first time using a Jupyter notebook you might want to go through the Db2 Data Management Console Hands on Lab at www.ibm.biz/DMCDemosPOT. It includes an introduction to using Jupyter notebooks with the Db2 family. The introduction shows you some of the basics of using a notebook, including how to create the cells, run code, and save files for future use. 

Jupyter notebooks are based on IPython which started in development in the 2006/7 timeframe. The existing Python interpreter was limited in functionality and work was started to create a richer development environment. By 2011 the development efforts resulted in IPython being released (http://blog.fperez.org/2012/01/ipython-notebook-historical.html).

Jupyter notebooks were a spinoff (2014) from the original IPython project. IPython continues to be the kernel that Jupyter runs on, but the notebooks are now a project on their own.

Jupyter notebooks run in a browser and communicate to the backend IPython server which renders this content. These notebooks are used extensively by data scientists and anyone wanting to document, plot, and execute their code in an interactive environment. The beauty of Jupyter notebooks is that you document what you do as you go along.

### Connecting to IBM Cloud Pak for Data
For this lab you will be assigned two IBM Cloud Pak for Data User IDs: A Data Engineer userid and and end-user userid. Check with the lab coordinator which userid and passwords you should use.
* **Engineer:**
    * ID: LABDATAENGINEERx
    * PASSWORD: xxx
* **User:**
    * ID: LABUSERx

    * PASSWORD: xxx

If you have this notebook open, you should have already signed in as your assigned LABDATAENGINEER userid. 
1. To check your userid, click the icon at the very top right of the webpage. It will look something like this:

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/11.06.10 EngineerUserIcon.png">

2. Click **Profile and settings**
3. Click **Permissions** and review the user permissions for this user

As a Data Engineer you can:
* Add and modify Data sources. Each source is a connection to a single database, either inside or outside of IBM Cloud Pak for Data.
* Virtualize data. This makes tables in other data sources look and act like tables that are local to the Data Virtualization database
* Work with the data you have virtualized.
* Write SQL to access and join data that you have virtualized
* See detailed information on how to connect external analytic tools and applications to your virtualized data

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.22.45 PM.png">

As a User you can only:
* Work with data that has been virtualized for you
* Write SQL to work with that data
* See detailed connection information

As an Administrator (only available to the course instructor) you can also:
* Manage IBM Cloud Pak for Data User Access and Roles
* Create and Manage Data Caches to accelerate performance
* Change key service setttings

## Basic Data Virtualization

### Exploring Data Source Connections
Let's start by looking at the the Data Source Connections that are already available. 

You should now have this Hands-on Lab notebook on the left side of your screen and the Cloud Pak for Data Console on the right side of your screen. In the Cloud Pak for Data Console:

1. Click the three bar (hamburger) menu at the top left of the console
2. Select Collect and Data Virtualization
3. Click the Data Virtualization menu and select **Data Sources**.

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.22.45 PM.png">

4. Click **Constellation View**.
    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.23.03 PM.png">
5. A spider diagram of the connected data sources opens. 

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.23.19 PM.png">

    This displays the Data Source Graph with 8 active data sources:
    * 4 Db2 Family Databases hosted on premises, IBM Cloud, Azure and AWS
    * 1 EDB Postgres Database on Azure
    * 1 zOS VSAM file
    * 1 Informix Database running on premises 

**We are not going to add a new data source** but just go through the steps so you can see how to add additional data sources.
1. Click the down arrow menu beside **Add new data source** at the upper-right of the console screen
2. Select **From existing connections** from the menu
You can see a history of other data source connection information that was used before. This history is maintain to make reconnecting to data sources easier and faster.
3. Click **Connect to a new data source** in the area below the title and above the list of existing connections.
4. Click the field below **Connection type**
5. Scroll through all the **available data sources** to see the available connection types
6. Select **different data connection types** from the list to see the information required to connect to a new data source. 
At a minumum you typically need the host URL and port address, database name, userid and password. You can also connect using an SSL certificate that can be dragged and dropped directly into the console interface. 
7. Click **Cancel** to return to the previous list of connections to add
8. Click **Cancel** again to return to the list of currently connected data sources

### Exploring the available data
Now that you understand how to connect to data sources you can start virtualizing data. Much of the work has already been done for you. IBM Cloud Pak for Data searches through the available data sources and compiles a single large inventory of all the tables and data available to virtualize in IBM Cloud Pak for Data. 

1. Click the Data Virtualization menu and select **Virtualize**

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.23.31 PM.png">
    
2. Check the total number of available tables at the top of the list. There should be hundreds available.

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.24.37 PM.png">

3. Enter "STOCK" into the search field and hit **Enter**. Any tables with the string
**STOCK** in the table name, the table schema or with a colunn name that includes **STOCK** appears in the search results. 

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.24.51 PM.png">

4. Hover your mouse pointer to the far right side to the search results table. An **preview** icon will appear on each row as you move your mouse. 
5. Click the **preview** icon beside one table. This displays a preview of the data in the selected table.

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.33.46 PM.png">

6. Click **X** at the top right of the dialog box to return to the search results.

### Creating New Tables
So that each user in this lab can have their own data to virtualize you will create your own table in a remote database.

In this part of the lab you will use this Jupyter notebook and Python code to connect to a source database, create a simple table and populate it with data. 

IBM Cloud Pak for Data will automatically detect the change in the source database and make the new table available for virtualization.

In this example, you connect to the Db2 database running in IBM Cloud Pak for Data but the database can be anywhere. All you need is the connection information and authorized credentials. 

   <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/Db2CPDDatabase.png">

The first step is to connect to one of our remote data sources directly as if we were part of the team builing a new business application. Since each lab user will create their own table in their own schema the first thing you need to do is update and run the cell below with your engineer name. 
1. In this Juypyter notebook, click on the cell below 
2. **Update the lab number** in the cell below to your assigned user and lab number
3. Click **Run** from the Jupyter notebook menu above

In [None]:
# Setting your userID
labnumber = x
engineer = 'DATAENGINEER' + str(labnumber)
print('variable engineer set to = ' + str(engineer))

The next part of the lab relies on a Jupyter notebook extension, commonly refer to as a "magic" command, to connect to a Db2 database. To use the commands you load load the extension by running another notebook call db2 that contains all the required code 
<pre>
&#37;run db2.ipynb
</pre>
The cell below loads the Db2 extension directly from GITHUB. Note that it will take a few seconds for the extension to load, so you should generally wait until the "Db2 Extensions Loaded" message is displayed in your notebook. 
1. Click the cell below
2. Click **Run**. When the cell is finished running, In[*] will change to In[2]

In [None]:
# !wget https://raw.githubusercontent.com/IBM/db2-jupyter/master/db2.ipynb
!wget -O db2.ipynb https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/db2.ipynb

%run db2.ipynb
print('db2.ipynb loaded')

#### Connecting to Db2

Before any SQL commands can be issued, a connection needs to be made to the Db2 database that you will be using. 

The Db2 magic command tracks whether or not a connection has occured in the past and saves this information between notebooks and sessions. When you start up a notebook and issue a command, the program will reconnect to the database using your credentials from the last session. In the event that you have not connected before, the system will prompt you for all the information it needs to connect. This information includes:

- Database name
- Hostname
- PORT 
- Userid
- Password

Run the next cell.

#### Connecting to Db2

In [None]:
# Connect to the Db2 Warehouse on IBM Cloud Pak for Data Database from inside of IBM Cloud Pak for Data
database = 'bludb'
user = 'user999'
password = 'i_@Iy_%4fyAVR392'
host = 'openshift-skytap-nfs-woker-3.ibm.com'
port = '32030'

%sql CONNECT TO {database} USER {user} USING {password} HOST {host} PORT {port}

To check that the connection is working. Run the following cell. It lists the tables in the database in the **DVDEMO** schema. Only the first 5 tables are listed.

In [None]:
%sql select TABNAME, OWNER from syscat.tables where TABSCHEMA = 'DVDEMO'

Now that you can successfully connect to the database, you are going to create two tables with the same name and column across two different schemas. In following steps of the lab you are going to virtualize these tables in IBM Cloud Paks for Data and fold them together into a single table. 

The next cell sets the default schema to your engineer name followed by 'A'. Notice how you can set a python variable and substitute it into the SQL Statement in the cell. The **-e** option echos the command. 

Run the next cell.

In [None]:
schema_name = engineer+'A'
table_name = 'DISCOVER'

print("")
print("Lab #: "+str(labnumber))
print("Schema name: " + str(schema_name))
print("Table name: " + str(table_name))

%sql -e SET CURRENT SCHEMA {schema_name}

Run next cell to create a table with a single INTEGER column containing values from 1 to 10. The **-q** flag in the %sql command supresses any warning message if the table already exists.

In [None]:
sqlin = f'''
DROP TABLE {table_name}; 
CREATE TABLE {table_name} (A INT); 
INSERT INTO {table_name} VALUES 1,2,3,4,5,6,7,8,9,10; 
SELECT * FROM {table_name}; 
'''

%sql -q {sqlin}

Run the next two cells to create the same table in a schema ending in **B**. It is populated with values from 11 to 20.

In [None]:
schema_name = engineer+'B'
table_name = 'DISCOVER'

print("")
print("Lab #: "+str(labnumber))
print("Schema name: " + str(schema_name))
print("Table name: " + str(table_name))

%sql -e SET CURRENT SCHEMA {schema_name}

In [None]:
sqlin = f'''
DROP TABLE {table_name}; 
CREATE TABLE {table_name} (A INT); 
INSERT INTO {table_name} VALUES 11,12,13,14,15,16,17,18,19,20; 
SELECT * FROM {table_name}; 
'''
%sql -q {sqlin}

Run the next cell to see all the tables in the database you just created. 

In [None]:
%sql SELECT TABSCHEMA, TABNAME FROM SYSCAT.TABLES WHERE TABSCHEMA LIKE '{engineer}%'

Run the next cell to see all the tables in the database that are like **DISCOVER**. You may see tables created by other people running the lab. 

In [None]:
%sql SELECT TABSCHEMA, TABNAME FROM SYSCAT.TABLES WHERE TABNAME LIKE 'DISCOVER%'

### Virtualizing your new Tables
Now that you have created two new tables you can virtualize that data and make it look like a single table in your database.
1. Return to the IBM Cloud Pak for Data Console
2. Click **Virtualize** in the Data Virtualization menu if you are not still in the Virtualize page

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.23.31 PM.png">

3. Enter your current userid, for example ENGINEER1, in the search bar. (The search automatically looks for partial as well as full matches.) Since the search results are cached your new tables won't appear. So we need to refresh the information in the cache. 
4. Click **Files** and then click **Tables**. This refreshes the information in the cache so that you can find your new table. 
5. Re-enter current userid, for example ENGINEER1, in the search bar. Now you can see that your new tables have automatically been discovered by IBM Cloud Pak for Data.

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.28.59 PM.png">

6. Select the two tables you just created by clicking the **check box** beside each table. Make sure you only select those for your LABDATAENGINEER schema. (Your table names will include the number of your lab participant number.)

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.30.47 PM.png">

7. Click **Add to Cart**. Notice that the number of items in your cart is now **2**.

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.29.37 PM.png">

8. Click **View Cart**

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.29.43 PM.png">

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.30.59 PM.png">
    
9. Change the name of your two tables from DISCOVER to **DISCOVERA** and **DISCOVERB**. These are the new names that you will be able to use to find your tables in the Data Virtualization database. Don't change the Schema name. It is unique to your current userid. 

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.30.60 PM.png">

10. Click **Virtualize** in the navigation history at the top of the page. We are going to add one more thing to your cart.

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.31.25 PM.png">

11. Click the gear icon at the upper-right of the page. 

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.31.44 PM.png">
    
12. Check the box beside **Group tables with identical names**. Notice how all the tables called **DISCOVER** have been grouped together into a single entry.
    
    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.31.55 PM.png">    

13. Select the row were all your DISCOVER tables have been grouped together
14. Click **Add to cart**. 
15. Click **View cart**

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.33.09 PM.png">
    
    You should now see three items in your cart.
16. Change the name of the new combined table to **DISCOVERFOLD**

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.33.35 PM.png">

17. Hover over the elipsis icon at the right side of the list for the **DISCOVER** table

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.33.46 PM.png">

18. Select **Edit grouped tables**

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.34.02 PM.png">

19. Deselect all the tables except for those in one of the schemas you created. You should now have two tables selected. 
20. Click **Apply**
21. Select **My Virtualized Data**. 
22. Click **Virtualize**. You see that three new virtual tables have been created. 

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.34.39 PM.png">
    
    The Virtual tables created dialog box opens.

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.35.08 PM.png">
     
23. Click **View my virtualized data**. You return to the My virtualized data page.

### Working with your new tables
1. Enter DISCOVER

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.35.31 PM.png">
    
You should see the three virtual tables you just created. Notice that you do not see tables that other users have created. By default, Data Engineers only see virtualized tables they have virtualized or virtual tables where they have been given access by other users. 
2. Click the elipsis (...) beside your **DISCOVERFOLD** table and select **Preview** to confirm that it contains 20 rows.
3. Click **SQL Editor** from the Data Virtualization menu

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.23.48 PM.png">

4. Click **Create New** to create a new blank SQL Script

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.54.39 PM.png">

4. Enter **SELECT * FROM DISCOVERFOLD ORDER BY A;** into the SQL Editor

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.38.06 PM.png">

5. Click **Run All** at the bottom left of the SQL Editor window. 

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.36.45 PM.png">
    
6. Review the 20 rows returned in the result. Click **Show more** to see all the rows. The rows from both tables are combined into this new table.

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.38.02 PM.png">

Notice that you didn't have to specify the schema for your new virtual tables. The SQL Editor automatically uses the schema associated with your userid that was used when you created your new tables. 

Now you can:
* Create connection to a remote data source 
* Make a new or existing table in that remote data source look and act like a local table 
* Fold data from different tables in the same data source or access data sources by folding it together into a single virtual table

## Gaining Insight from Virtualized Data

Now that you understand the basics of Data Virtualization you can explore how easy it is to gain insight across multiple data sources without moving data. 

In the next set of steps you connect to virtualized data from this notebook using your LABDATAENGINEER userid. You can use the same techniques to connect to virtualized data from applications and analytic tools from outside of IBM Cloud Pak for Data. 

   <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/media/ConnectingTotheAnalyticsDatabase.png">

Connecting to all your virtualized data is just like connecting to a single database. All the complexity of a dozens of tables across multiple databases on different on premises and cloud providers is now as simple as connecting to a single database and querying a table. 

We are going to connect to the IBM Cloud Pak for Data Virtualization database in exactly the same way we connected to a Db2 database earlier in this lab. However we need to change the detailed connection information in the next notebook cell.

1. Click **Service Settings** in the Data Virtualization menu
2. Look for the Access Information section of the page. The image below is an example. Your userid will likley be different.
    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.38.43 PM.png">

3. Copy the **User ID** by highlighting it with your mouse, right click and select **Copy**
4. Paste the **User ID** in to the **next cell** in this notebook where **user=** between the quotation marks
5. Return to the **Service Settings** page. Click **Show** to see the password. Highlight the password and copy using the right-click menu
6. Paste the **password** into the cell below between the quotation marks using the right click paste.
7. Run the cell below to connect to the Data Virtualization database. 

#### Connecting to Data Virtualization SQL Engine

In [None]:
# Connect to the IBM Cloud Pak for Data Virtualization Database from inside CPD
database = 'bigsql'
user = 'userxxxx'
password = 'xxxxxxxxxxxxxxxx'
host = 'dv-server.icp4d-test.svc.cluster.local'
port = '32051'

%sql CONNECT TO {database} USER {user} USING {password} HOST {host} PORT {port}

### Stock Symbol Table
#### Get information about the stocks that are in the database
**System Z - VSAM**
This table comes from a VSAM file on zOS. IBM Cloud Pak for Data Virtualization works together with Data Virtualization Manager for zOS to make this looks like a local database table. For the following examples you can substitute any of the symbols below.

In [None]:
%sql -a select * from STOCK.STOCK_SYMBOLS

### Stock History Table
#### Get Price of a Stock over the Year
Set the Stock Symbol in the line below and run the cell. This information is folded together with data coming from two identical tables, one on Db2 database and on on and Informix database. Run the next two cells. Then pick a new stock symbol from the list above, enter it into the cell below and run both cells again.

**CP4D - Db2, Skytap -  Informix**

In [None]:
stock = 'AXP'
print('variable stock set to = ' + str(stock))

In [None]:
%%sql -pl
SELECT WEEK(TX_DATE) AS WEEK, OPEN FROM STOCK.STOCK_HISTORY
WHERE SYMBOL = :stock AND TX_DATE != '2017-12-01'
ORDER BY WEEK(TX_DATE) ASC

#### Trend of Three Stocks
This chart shows three stock prices over the course of a year. It uses the same folded stock history information.

**CP4D - Db2, Skytap -  Informix**

In [None]:
stocks = ['INTC','MSFT','AAPL']

In [None]:
%%sql -pl
SELECT SYMBOL, WEEK(TX_DATE), OPEN FROM STOCK.STOCK_HISTORY
WHERE SYMBOL IN (:stocks) AND TX_DATE != '2017-12-01'
ORDER BY WEEK(TX_DATE) ASC

#### 30 Day Moving Average of a Stock
Enter the Stock Symbol below to see the 30 day moving average of a single stock.

**CP4D - Db2, Skytap -  Informix**

In [None]:
stock = 'AAPL'

In [None]:
sqlin = \
"""
SELECT WEEK(TX_DATE) AS WEEK, OPEN, 
     AVG(OPEN) OVER (
       ORDER BY TX_DATE
     ROWS BETWEEN 15 PRECEDING AND 15 FOLLOWING) AS MOVING_AVG
  FROM STOCK.STOCK_HISTORY
     WHERE SYMBOL = :stock
  ORDER BY WEEK(TX_DATE)
"""
df = %sql {sqlin}
txdate= df['WEEK']
sales = df['OPEN']
avg = df['MOVING_AVG']

plt.xlabel("Day", fontsize=12);
plt.ylabel("Opening Price", fontsize=12);
plt.suptitle("Opening Price and Moving Average of " + stock, fontsize=20);
plt.plot(txdate, sales, 'r');
plt.plot(txdate, avg, 'b');
plt.show();

#### Trading volume of INTC versus MSFT and AAPL in first week of November
**CP4D - Db2, Skytap -  Informix**

In [None]:
stocks = ['INTC','MSFT','AAPL']

In [None]:
%%sql -pb
SELECT SYMBOL, DAY(TX_DATE), VOLUME/1000000 FROM STOCK.STOCK_HISTORY
WHERE SYMBOL IN (:stocks) AND WEEK(TX_DATE) =  45
ORDER BY DAY(TX_DATE) ASC

#### Show Stocks that Represent at least 3% of the Total Purchases during Week 45
**CP4D - Db2, Skytap -  Informix**

In [None]:
%%sql -pie
WITH WEEK45(SYMBOL, PURCHASES) AS (
  SELECT SYMBOL, SUM(VOLUME * CLOSE) FROM STOCK.STOCK_HISTORY
    WHERE WEEK(TX_DATE) =  45 AND SYMBOL <> 'DJIA'
  GROUP BY SYMBOL
),
ALL45(TOTAL) AS (
  SELECT SUM(PURCHASES) * .03 FROM WEEK45
)
SELECT SYMBOL, PURCHASES FROM WEEK45, ALL45
WHERE PURCHASES > TOTAL
ORDER BY SYMBOL, PURCHASES

### Stock Transaction Table
#### Show Transactions by Customer
This next two examples uses data folded together from three different data sources representing three different trading organizations to create a combined of a single customer's stock trades. 

**AWS - Db2, Azure - Postgres, Azure - Db2**

In [None]:
%%sql -a
SELECT * FROM STOCK.STOCK_TRANSACTIONS
 WHERE CUSTID = '107196'
 FETCH FIRST 10 ROWS ONLY

#### Bought/Sold Amounts of Top 5 stocks 
**AWS - Db2, Azure - Postgres, Azure - Db2**

In [None]:
%%sql -a
WITH BOUGHT(SYMBOL, AMOUNT) AS
  (
  SELECT SYMBOL, SUM(QUANTITY) FROM TRADING.STOCK_TRANSACTIONS
  WHERE QUANTITY > 0
  GROUP BY SYMBOL
  ),
SOLD(SYMBOL, AMOUNT) AS
  (
  SELECT SYMBOL, -SUM(QUANTITY) FROM TRADING.STOCK_TRANSACTIONS
  WHERE QUANTITY < 0
  GROUP BY SYMBOL
  )
SELECT B.SYMBOL, B.AMOUNT AS BOUGHT, S.AMOUNT AS SOLD
FROM BOUGHT B, SOLD S
WHERE B.SYMBOL = S.SYMBOL
ORDER BY B.AMOUNT DESC
FETCH FIRST 5 ROWS ONLY

### Customer Accounts
#### Show Top 5 Customer Balance
These next two examples use data folded from systems running on AWS and Azure.
**AWS - Db2, Azure - Postgres, Azure - Db2**

In [None]:
%%sql -a
SELECT CUSTID, BALANCE FROM STOCK.ACCOUNTS
ORDER BY BALANCE DESC
FETCH FIRST 5 ROWS ONLY

#### Show Bottom 5 Customer Balance
**AWS - Db2, Azure - Postgres, Azure - Db2**

In [None]:
%%sql -a
SELECT CUSTID, BALANCE FROM STOCK.ACCOUNTS
ORDER BY BALANCE ASC
FETCH FIRST 5 ROWS ONLY

### Selecting Customer Information from MongoDB
The MongoDB database (running on premises) has customer information in a document format. In order to materialize the document data as relational tables, a total of four virtual tables are generated. The following query shows the tables that are generated for the Customer document collection.

In [None]:
%sql -a SELECT TABSCHEMA, TABNAME, COLCOUNT FROM SYSCAT.TABLES WHERE TABSCHEMA = 'STOCK' AND TABNAME LIKE 'CUSTOMER%'

The tables are all connected through the CUSTOMERID field, which is based on the generated _id of the main CUSTOMER colllection. In order to reassemble these tables into a document, we must join them using this unique identifier. An example of the contents of the CUSTOMER_CONTACT table is shown below.

In [None]:
%sql -a SELECT * FROM STOCK.CUSTOMER_CONTACT FETCH FIRST 5 ROWS ONLY

A full document record is shown in the following SQL statement which joins all of the tables together.

In [None]:
%%sql -a
SELECT C.CUSTOMERID AS CUSTID, 
       CI.FIRSTNAME, CI.LASTNAME, CI.BIRTHDATE,
       CC.CITY, CC.ZIPCODE, CC.EMAIL, CC.PHONE, CC.STREET, CC.STATE,
       CP.CARD_TYPE, CP.CARD_NO
FROM STOCK.CUSTOMER C, STOCK.CUSTOMER_CONTACT CC, 
     STOCK.CUSTOMER_IDENTITY CI, STOCK.CUSTOMER_PAYMENT CP
WHERE  CC.CUSTOMER_ID = C."_ID" AND
       CI.CUSTOMER_ID = C."_ID" AND
       CP.CUSTOMER_ID = C."_ID"
FETCH FIRST 3 ROWS ONLY

### Querying All Virtualized Data
In this final example we use data from each data source to answer a complex business question. "What are the names of the customers in Ohio, who bought the most during the highest trading day of the year (based on the Dow Jones Industrial Index)?" 

**AWS Db2, Azure Postgres, Azure Db2, Skytap MongoDB, CP4D Db2Wh, Skytap Informix**

In [None]:
%%sql -a
WITH MAX_VOLUME(AMOUNT) AS (
  SELECT MAX(VOLUME) FROM STOCK.STOCK_HISTORY
    WHERE SYMBOL = 'DJIA'
),
HIGHDATE(TX_DATE) AS (
  SELECT TX_DATE FROM STOCK.STOCK_HISTORY, MAX_VOLUME M
    WHERE SYMBOL = 'DJIA' AND VOLUME = M.AMOUNT
),
CUSTOMERS_IN_OHIO(CUSTID, LASTNAME) AS (
  SELECT C.CUSTOMERID, CI.LASTNAME
    FROM  STOCK.CUSTOMER C, 
          STOCK.CUSTOMER_CONTACT CC,
          STOCK.CUSTOMER_IDENTITY CI
    WHERE CC.CUSTOMER_ID = C."_ID" AND
          CI.CUSTOMER_ID = C."_ID" AND
          CC.STATE = 'OH'
),
TOTAL_BUY(CUSTID,TOTAL) AS (
  SELECT C.CUSTID, SUM(SH.QUANTITY * SH.PRICE) 
    FROM CUSTOMERS_IN_OHIO C, TRADING.STOCK_TRANSACTIONS SH, HIGHDATE HD
  WHERE SH.CUSTID = C.CUSTID AND
        SH.TX_DATE = HD.TX_DATE AND 
        QUANTITY > 0 
  GROUP BY C.CUSTID
)
SELECT C.LASTNAME, T.TOTAL 
  FROM CUSTOMERS_IN_OHIO C, TOTAL_BUY T
WHERE C.CUSTID = T.CUSTID
ORDER BY TOTAL DESC
FETCH FIRST 10 ROWS ONLY

### Seeing where your Virtualized Data is coming from
You may eventually work with a complex Data Virtualization schema with dozens or hundres of data sources. As an administrator or a Data Scientist you may need to understand where data is coming from. 

Fortunately, the Data Virtualization engine is based on Db2. It includes the same catalog of information as does a Db2 database with some additional features. If you want to work backwards and understand where each of your virtualized tables comes from, the information is included in the **SYSCAT.TABOPTIONS** catalog table. 

Rows in the **SYSCAT.TABOPTIONS** table where the **OPTION** column is equal to **SOURCELIST** contain a list of the data sources for each virtualized table in the **SETTING** column. Tables that include more than on data source have values seperated by a comma in the **SETTING** column.

Run the next cell to see an example of how this works.

In [None]:
%%sql -a
SELECT *
  FROM SYSCAT.TABOPTIONS
    WHERE OPTION = 'SOURCELIST'

Notice that there are some tables above with familiar names, like STOCK.CUSTOMER_CONTACT or STOCK.SYMBOLS. These tables reference a single remote data source. The tables in the QPLEXSYS schema are the tables created automatically by Data Virtualization to support VIEWs or FOLDED virtual tables. These tables include a long signature ID following an underscore character (for example ACCOUNTS_3648E6D008FCA827286C0F30ED83B266). 

To see all the views that are in the STOCK schema, run the SQL statement in the nextx cell. 

In [None]:
%%sql -a 
SELECT VIEWSCHEMA, VIEWNAME, TEXT from SYSCAT.VIEWS WHERE VIEWSCHEMA = 'STOCK' 

With a little simply Python we can extract the VIEW definitions where you will see the autogenerated virtual tables.

In [None]:
views = %sql SELECT VIEWSCHEMA, VIEWNAME, TEXT from SYSCAT.VIEWS WHERE VIEWSCHEMA = 'STOCK' 
for index, row in views.iterrows():
    view_name = row['VIEWNAME']
    view_ddl = str(row['TEXT'])
    print(view_name)
    print(view_ddl)

With a bit more Python you can match the VIEW name with the autogenerated virtual table. 

In [None]:
views = %sql SELECT VIEWSCHEMA, VIEWNAME, TEXT from SYSCAT.VIEWS WHERE VIEWSCHEMA = 'STOCK' 
for index, row in views.iterrows():
    view_name = row['VIEWNAME']
    view_ddl = str(row['TEXT'])
    table_name = view_ddl[view_ddl.find('"QPLEXSYS"."')+12: -2]
    print(table_name + "is used by: STOCK." + view_name)

In [None]:
%%sql -a
SELECT * from SYSCAT.TABOPTIONS WHERE OPTION = 'SOURCELIST';

The table includes more information than you need to answer the question of where is my data coming from. The query below only shows the rows that contain the information of the source of the data ('SOURCELIST'). Notice that tables that have been folded together from several tables includes each of the data source information seperated by a semi-colon. 

In [None]:
%%sql -a
SELECT *
  FROM SYSCAT.TABOPTIONS
  WHERE TABSCHEMA = 'STOCK' ;

In this last example, you can search for any virtualized data coming from a Postgres database by searching for **SETTING LIKE '%POST%'**.

What is missing is additional detail for each connection. For example all we can see in the table above is a connection. You can find that detail in another table: **QPLEXSYS.LISTRDBC**. In the last cell, you can see that CID DB210113 is included in the STOCK_TRANSACTIONS virtual table. You can find the details on that copy of Db2 by running the next cell. 

In [None]:
%%sql
SELECT CID, USR, SRCTYPE, SRCHOSTNAME, SRCPORT, DBNAME FROM QPLEXSYS.LISTRDBC;

## Advanced Data Virtualization 
Now that you have seen how powerful and easy it is to gain insight from your existing virtualized data, you can learn more about how to do advanced data virtualization. You will learn how to join different remote tables together to create a new virtual table and how to capture complex SQL into VIEWs.


### Joining Tables Together
The virtualized tables below come from different data sources on different systems. We can combine them into a single virtual table. 

1. Select **My virtualized data** from the Data Virtualization menu

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.23.40 PM.png">
  
2. Enter **Stock** in the find field

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.41.49 PM.png">
  
3. Select table **STOCK_TRANSACTIONS** in the **TRADING** schema
4. Select table **STOCK_SYMBOLS** in the **STOCK** schema

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.45.24 PM.png">
  
5. Click **Join**
6. In table STOCK_SYMBOLS: deselect **SYMBOL**
7. In table STOCK_TRANSACTIONS: deselect **TX_NO** 
8. Click **STOCK_TRANSACTION.SYMBOL** and drag to **STOCK_SYMBOLS.SYMBOL**

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.46.50 PM.png">
 
9. Click **Next**

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.47.46 PM.png">
  
10. Check that you can now see both the stock symbol and the full company name. You can also change column names in this page.
  
  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.48.46 PM.png">

11. Click **Next**

12. Enter **TRANSACTIONS_FULLNAME** into the **Enter view name** field.
13. Don't change the default schema. This corresponds to your LABENGINEER user id. 
14. Select **My virtualized data** 

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.52.03 PM.png">
  
15. Click **CREATE VIEW**. You see the successful Join View window.

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.52.41 PM.png"> 
  
16. Click **View my virtualized data**
17. Click the elipsis menu beside **TRANSACTIONS_FULLNAME**
18. Click **Preview**. You can confirm that your new join is working.

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.53.11 PM.png"> 
    
19. Click **Meta data**
    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.53.20 PM.png"> 

20. Click **Creation SQL**. You can review the statement used to create the view that joins the two tables together. 

    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.53.43 PM.png"> 
    
21. Click the **X** at the upper-right of the View screen.

You can now join virtualize tables together to combine them into new virtualized tables. Now that you know how to perform simple table joins you can learn how to combine multiple data sources and virtual tables using the powerful SQL query engine that is part of the IBM Cloud Pak for Data - Virtualization.

### Using Queries to Answer Complex Business Questions
The IBM Cloud Pak for Data Virtualization Administrator has set up more complex data from multiple source for the next steps. The administrator has also given you access to this virtualized data. You may have noticed this in previous steps. 
1. Select **My virtualized data** from the Data Virtualiztion menu. All of these virtualized tables look and act like normal Db2 tables. 

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.23.40 PM.png">
 
2. Click **Preview** for any of the tables to see what they contain. 

The virtualized tables in the **STOCK** schema have all been created by virtualizing single tables or combining or folding the same tables from different data sources. Folding isn't something that is restricted to the same data source in the simple example you just completed.

The virtualized tables in the **TRADING** schema are database VIEWs. The use queries to combine data from multiple data sources to answer specific business questions. 

3. Select **SQL Editor** from the Data Virtualization menu.

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.23.48 PM.png">

4. Click **Add new script**

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 3.54.26 PM.png">
  
5. Click **Open a script to edit**
8. Search for **OHIO**
9. Select and expand the **OHIO Customer** query

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 4.59.25 PM.png">

10. Click the **Open a script to edit** icon to open the script in the SQL Editor. 

   **Note** that if you cannot open the script then you may have to refresh your browser or contract and expand the script details section before the icon is active.
   
11. Click **Run All**

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 4.59.50 PM.png">

This script is a complex SQL join query that uses data from all the virtualize data sources you explored in the first steps of this lab. While the SQL looks complex the author of the query did not have be aware that the data was coming from multiple sources. Everything used in this query looks like it comes from a single database, not eight different data sources across eight different systems on premises or in the Cloud. 

### Making Complex SQL Simple to Consume
You can easily make this complex query easy for a user to consume. Instead of sharing this query with other users, you can wrap the query into a view that looks and acts like a simple table. 
1. Enter **CREATE VIEW MYOHIOQUERY AS** in the SQL Editor at the first line below the comment and before the **WITH** clause

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-16 at 5.00.24 PM.png">

2. Click **Run all**
3. Click **Add a new script**
5. Click **Create new**
7. Enter **SELECT * FROM MYOHIOQUERY;**
8. Click **Run all**

Now you have a very simple virtualized table that is pulling data from eight different data sources, combining the data together to resolve a complex business problem. In the next step you will share your new virtualized data with a user.

### Sharing Virtualized Tables
1. Select **My virtualized data** from the Data Virtualization Menu.
2. Click the elipsis (...) menu to the right of the **MYOHIOQUERY** virtualized table

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-17 at 1.44.12 PM.png">
  
3. Select **Manage Access** from the elipsis menu

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-17 at 1.45.18 PM.png">
 
3. Click **Grant access**

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-17 at 1.45.44 PM.png">
 
3. Click **Add user**
4. Search for **LABUSER**. 

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-17 at 1.46.57 PM.png">

4. Check the box beside **LABUSER** and click **Add users**
    <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-17 at 1.47.11 PM.png">
  
5. Click **Add**

You should now see that the **LABUSER** id has view-only access to the new virtualized table. 

  <img src="https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/mediav3/Screen Shot 2020-07-17 at 1.47.17 PM.png">

Next switch to the LABUSER id to check that you can see the data you have just granted access for.

9. Click the user icon at the very top right of the console
10. Click **Log out**
11. Sign in using the LABUSER id 
12. Click the three bar menu at the top left of the IBM Cloud Pak for Data console
13. Select **Data Virtualization**

You should see the **MYOHIOQUERY** with the schema from your engineer userid in the list of virtualized data.

14. Make a note of the schema of the MYOHIOQUERY in your list of virtualized tables. It starts with **USER**.
15. Select the **SQL Editor** from the Data virtualization menu
16. Click **Create new** to open a new SQL Editor window
17. Enter **SELECT * FROM USERxxxx.MYOHIOQUERY** where xxxx is the user number of your engineer user. The view created by your engineer user was created in their default schema. 
18. Click **Run all**
19. Add the following to your query: ** WHERE TOTAL > 3000 ORDER BY TOTAL**
20. Click **</>** to format the query so it is easiler to read
21. Click **Run all**

You can see how you have just make a very complex data set extremely easy to consume by a data user. They don't have to know how to connect to multiple data sources or how to combine the data using complex SQL. You can hide that complexity while ensuring only the right user has access to the right data. 

In the next steps you will learn how to access virtualized data from outside of IBM Cloud Pak for Data.

### Allowing User to Access Virtualized Data with Analytic Tools
In the next set of steps you connect to virtualized data from this notebook using your **LABUSER** userid. 

Just like you connected to IBM Cloud Pak for Data Virtualized Data using your LABDATAENGINEER you can connect using your LABUSER. 

We are going to connect to the IBM Cloud Pak for Data Virtualization database in exactly the same way we connected using you LABENGINEER. However you need to change the detailed connection information. Each user has their own unique userid and password to connect to the service. This ensures that no matter what tool you use to connect to virtualized data you are always in control over who can access specifical virtualized data. 

1. Click the user icon at the top right of the IBM Cloud Pak for data console to confirm that you are using your **LABUSER** id
2. Click **Service Settings** in the Data Virtualization menu
3. Copy the **User ID** by highlighting it with your mouse, right click and select **Copy**
4. Paste the **User ID** in to the cell below were **user =** between the quotation marks 
5. Show the password. Highlight the password and copy using the right click menu
6. Paste the **password** into the cell below between the quotation marks using the righ click paste.
7. Run the cell below to connect to the Data Virtualization database. 

#### Connecting a USER to Data Virtualization SQL Engine

In [None]:
# Connect to the IBM Cloud Pak for Data Virtualization Database from inside CPD
database = 'bigsql'
user = 'userxxxx'
password = 'xxxxxxxxxxxxxxxxx'
host = 'dv-server.icp4d-test.svc.cluster.local'
port = '32051'

%sql CONNECT TO {database} USER {user} USING {password} HOST {host} PORT {port}

Now you can try out the view that was created by the LABDATAENGINEER userid. 

Substitute the **xxxx** for the schema used by your ***LABDATAENGINEERx*** user in the next two cells before you run them.

In [None]:
%sql SELECT * FROM USERxxxx.MYOHIOQUERY WHERE TOTAL > 3000 ORDER BY TOTAL;

Only LABENGINEER virtualized tables that have been authorized for the LABUSER to see are available. Try running the next cell. You should receive an error that the current user does not have the required authorization or privlege to perform the operation.

In [None]:
%sql SELECT * FROM USERxxxx.DISCOVERFOLD;

### Next Steps:
Before you start the next section you should log our as the share lab user and log back in using your LABDATAENGINEER id:
1. Click the user icon at the very top right of the console
2. Click **Log out**
3. Sign in using your LABDATAENGINEERx user id
4. Click the three bar menu at the top left of the IBM Cloud Pak for Data console
5. Select **Data Virtualization**

Now you can use IBM Cloud Pak for Data to make even complex data and queries from different data sources, on premises and across a multi-vendor Cloud look like simple tables in a single database. You are ready for some more advanced labs. 

1. Use Db2 SQL and Jupyter Notebooks to Analyze Virtualized Data
    * Build simple to complex queries to answer important business questions using the virtualized data available to you in IBM Cloud Pak for Data
    * See how you can transform the queries into simple tables available to all your users
2. Use Open RESTful Services to connect to the IBM Cloud Pak for Data Virtualization 
    * Everything you can do in the IBM Cloud Pak for Data User Interface is accessible through Open RESTful APIs
    * Learn how to automate and script your managment of Data Virtualization using RESTful API
    * Learn how to accelerate appliation development by accessing virtaulied data through RESTful APIs

## Automating Data Virtualization Setup and Management through REST

The IBM Cloud Pak for Data Console is only one way you can interact with the Virtualization service. IBM Cloud Pak for Data is built on a set of microservices that communicate with each other and the Console user interface using RESTful APIs. You can use these services to automate anything you can do throught the user interface.

This Jupyter Notebook contains examples of how to use the Open APIs to retrieve information from the virtualization service, how to run SQL statements directly against the service through REST and how to provide authoritization to objects. This provides a way write your own script to automate the setup and configuration of the virtualization service. 

The next part of the lab relies on a set of base classes to help you interact with the RESTful Services API for IBM Cloud Pak for Data Virtualization. You can access this library on GITHUB. The commands below download the library and run them as part of this notebook.
<pre>
&#37;run CPDDVRestClass.ipynb
</pre>
The cell below loads the RESTful Service Classes and methods directly from GITHUB. Note that it will take a few seconds for the extension to load, so you should generally wait until the "Db2 Extensions Loaded" message is displayed in your notebook. 
1. Click the cell below
2. Click **Run**

In [None]:
!wget -O CPDDVRestClassV3.ipynb https://raw.githubusercontent.com/Db2-DTE-POC/CPDDVLAB/master/CPDDVRestClassV3.ipynb
%run CPDDVRestClassV3.ipynb

### The Db2 Class
The CPDDVRestClass.ipynb notebook includes a Python class called Db2 that encapsulates the Rest API calls used to connect to the IBM Cloud Pak for Data Virtualization service. 

To access the service you need to first authenticate with the service and create a reusable token that we can use for each call to the service. This ensures that we don't have to provide a userID and password each time we run a command. The token makes sure this is secure. 

Each request is constructed of several parts. First, the URL and the API identify how to connect to the service. Second the REST service request that identifies the request and the options. For example '/metrics/applications/connections/current/list'. And finally some complex requests also include a JSON payload. For example running SQL includes a JSON object that identifies the script, statement delimiters, the maximum number of rows in the results set as well as what do if a statement fails.

You can find this class and use it for your own notebooks in GITHUB. Have a look at how the class encapsulated the API calls by clicking on the following link: https://github.com/Db2-DTE-POC/CPDDVLAB/blob/master/CPDDVRestClass.ipynb

### Example Connections
To connect to the Data Virtualization service you need to provide the URL, the service name (v1) and profile the console user name and password. 

1. Substitute your assigned LABDATAENGINEER userid below along with your password you used to log into IBM Cloud Pak for Data at the beginning of the lab. 
2. Run the next cell. 

The cell generates a breaer token that is used in the following steps to authenticate your use of the API. 

#### Connecting to Data Virtualization API Service

In [None]:
# Set the service URL to connect from inside the ICPD Cluster
Console  = 'https://10.1.1.1:32618'

# Connect to the Db2 Data Management Console service
user     = 'LABDATAENGINEERx'
password = 'tsdvlab'

# Set up the required connection
databaseAPI = Db2(Console)
api = '/v1'
databaseAPI.authenticate(api, user, password)
database = Console

#### Data Sources and Availability
The following Python function (getDataSources) runs SQL against the **QPLEXSYS.LISTRDB** catalog table and combines it with a stored procedure call **QPLEXSYS.LISTRDBCDETAILS()** to add the **AVAILABLE** column to the results. The IBM Cloud Pak for Data Virtualization Service checks each data sources every 5 to 10 seconds to ensure that it is still up and available. In the table (DataFrame) in the next cell a **1** in the **AVAILABLE** column indicates that the data source is responding. A **0** indicdates that it is not longer responding. 

Run the following cell.

In [None]:
# Display the Available Data Sources already configured

dataSources = databaseAPI.getDataSources()
display(dataSources)

#### Virtualized Data
This call retrieves all of the virtualized data available to the role of Data Engineer. It uses a direct RESTful service call and does not use SQL. The service returns a JSON result set that is converted into a Python Pandas dataframe. Dataframes are very useful in being able to manipulate tables of data in Python. If there is a problem with the call, the error code is displayed.

In [None]:
# Display the Virtualized Assets Avalable to Engineers and Users
roles = ['DV_ENGINEER','DV_USER']
for role in roles:
    r = databaseAPI.getRole(role)
    if (databaseAPI.getStatusCode(r)==200):
        json = databaseAPI.getJSON(r)
        df = pd.DataFrame(json_normalize(json['objects']))
        display(df)
    else:
        print(databaseAPI.getStatusCode(r))  

#### Virtualized Tables and Views
This call retrieves all the virtualized tables and view available to the userid that you use to connect to the service. In this example the whole call is included in the DB2 class library and returned as a complete Dataframe ready for display or to be used for analysis or administration.

In [None]:
### Display Virtualized Tables and Views 
display(databaseAPI.getVirtualizedTablesDF())
display(databaseAPI.getVirtualizedViewsDF())

#### Get a list of the IBM Cloud Pak for Data Users
This example returns a list of all the users of the IBM Cloud Pak for Data system. It only displays three colunns in the Dataframe, but the list of all the available columns is als printed out. Try changing the code to display other columns.

In [None]:
# Get the list of CPD Users
r = databaseAPI.getUsers()
if (databaseAPI.getStatusCode(r)==200):
    json = databaseAPI.getJSON(r)
    df = pd.DataFrame(json_normalize(json))
    print(', '.join(list(df))) # List available column names
    display(df[['uid','username','displayName']])
else:
    print(databaseAPI.getStatusCode(r))

#### Get the list of available schemas in the DV Database
Do not forget that the Data Virtualization engine supports the same function as a regular Db2 database. So you can also look at standard Db2 objects like schemas.

In [None]:
# Get the list of available schemas in the DV Database
r = databaseAPI.getSchemas()
if (databaseAPI.getStatusCode(r)==200):
    json = databaseAPI.getJSON(r)
    df = pd.DataFrame(json_normalize(json['resources']))
    print(', '.join(list(df)))
    display(df[['name']].head(10))
else:
    print(databaseAPI.getStatusCode(r))  

#### Object Search
Fuzzy object search is also available. The call is a bit more complex. If you look at the routine in the DB2 class it posts a RESTful service call that includes a JSON payload. The payload includes the details of the search request. 

In [None]:
# Search for tables across all schemas that match simple search critera 
# Display the first 100
# Switch between searching tables or views
object = 'view'
# object = 'table'
r = databaseAPI.postSearchObjects(object,"TRADING",10,'false','false')
if (databaseAPI.getStatusCode(r)==200):
    json = databaseAPI.getJSON(r)
    df = pd.DataFrame(json_normalize(json))
    print('Columns:')
    print(', '.join(list(df)))
    display(df[[object+'_name']].head(100))
else:
    print("RC: "+str(databaseAPI.getStatusCode(r)))

#### Run SQL through the SQL Editor Service
You can also use the SQL Editor service to run your own SQL. Statements are submitted to the editor. Your code then needs to poll the editor service until the script is complete. Fortunately you can use the DB2 class included in this lab so that it becomes a very simple Python call. The **runScript** routine runs the SQL and the **displayResults** routine formats the returned JSON. 

Run the next cell.

In [None]:
databaseAPI.displayResults(databaseAPI.runScript('SELECT * FROM TRADING.MOVING_AVERAGE'))

You can also run longer more complex statements by using three quotes to create a multi-line string in Python.

In [None]:
# Find the most active customers in OHIO on the most active trading day of the year
sqlText = \
'''
WITH MAX_VOLUME(AMOUNT) AS (
  SELECT MAX(VOLUME) FROM STOCK.STOCK_HISTORY
    WHERE SYMBOL = 'DJIA'
),
HIGHDATE(TX_DATE) AS (
  SELECT TX_DATE FROM STOCK.STOCK_HISTORY, MAX_VOLUME M
    WHERE SYMBOL = 'DJIA' AND VOLUME = M.AMOUNT
),
CUSTOMERS_IN_OHIO(CUSTID, LASTNAME) AS (
  SELECT C.CUSTOMERID, CI.LASTNAME
    FROM  STOCK.CUSTOMER C, 
          STOCK.CUSTOMER_CONTACT CC,
          STOCK.CUSTOMER_IDENTITY CI
    WHERE CC.CUSTOMER_ID = C."_ID" AND
          CI.CUSTOMER_ID = C."_ID" AND
          CC.STATE = 'OH'
),
TOTAL_BUY(CUSTID,TOTAL) AS (
  SELECT C.CUSTID, SUM(SH.QUANTITY * SH.PRICE) 
    FROM CUSTOMERS_IN_OHIO C, STOCK.STOCK_TRANSACTIONS SH, HIGHDATE HD
  WHERE SH.CUSTID = C.CUSTID AND
        SH.TX_DATE = HD.TX_DATE AND 
        QUANTITY > 0 
  GROUP BY C.CUSTID
)
SELECT C.LASTNAME, T.TOTAL 
  FROM CUSTOMERS_IN_OHIO C, TOTAL_BUY T
WHERE C.CUSTID = T.CUSTID
ORDER BY TOTAL DESC;
'''

databaseAPI.displayResults(databaseAPI.runScript(sqlText))

#### Run scripts of SQL Statements repeatedly through the SQL Editor Service
The runScript routine can contain more than one statement. The next example runs a scipt with eight SQL statements multple times. 

In [None]:
repeat = 3
sqlText = \
'''
SELECT * FROM TRADING.MOVING_AVERAGE;
SELECT * FROM TRADING.VOLUME;
SELECT * FROM TRADING.THREEPERCENT;
SELECT * FROM TRADING.TRANSBYCUSTOMER;
SELECT * FROM TRADING.TOPBOUGHTSOLD;
SELECT * FROM TRADING.TOPFIVE;
SELECT * FROM TRADING.BOTTOMFIVE;
SELECT * FROM TRADING.OHIO;
'''

for x in range(0, repeat):
    print('Repetition number: '+str(x))
    databaseAPI.displayResults(databaseAPI.runScript(sqlText))
print('done')

### What's next
You can download a copy of your completed Jupyter notebook as a reference:
1. Click **File** from the Jupyter notebook main menu
2. Select **Download as**
3. Select **Notebook** if you want to use this notebook in your own Jupyter environment
4. Select **HTML** if you want a read only version of the notebook for reference

If you are interested in finding out more about using RESTful services to work with Db2, check out this DZone article: https://dzone.com/articles/db2-dte-pocdb2dmc. The article also includes a link to a complete hands-on lab for Db2 and the Db2 Data Management Console. In it you can find out more about using REST and Db2 together. 

#### Credits: IBM 2020, Peter Kohlmann [kohlmann@ca.ibm.com]