In [None]:
import os
os.environ['JDBC_HOST'] = 'jrtest01-splice-hregion'

In [1]:
%%HTML
<link rel="stylesheet" href="https://doc.splicemachine.com/jupyter/css/custom.css">

# Monitoring Queries with the Database Console

This notebook introduces the Spark Database Console, which you can use to monitor Splice Machine queries that are running in Spark, in these sections:

* *Accessing the Console*
* *Basic UI Features*
* *Running a Basic Query*
* *Drilling Down into a Query*
* *Terminating an Active Query*
* *Parallelism and Spark*

## About Spark Jobs in Splice Machine

You may recall that Splice Machine has a dual-engine architecture can run statements and queries directly in HBase (the `control` side) or in Apache Spark. You can see which engine is used (`control` or `Spark`) from examining the top line of the `explain` for a query. Fast queries that run in milliseconds are sent directly to the control engine, while larger queries processing more data go to the Spark engine.

You can use the DB Console to monitor the progress of queries that are sent to the Spark engine, including GC usage. You can also terminate queries when necessary.

## Accessing the Console

You access the Spark DB Console by opening a new browser tab, then navigating to the console's URL:

* For our training classes, open a new browser tab and point at `localhost:4040`. 
* For actual clusters, the access path depends on whether you are using Splice Machine's cloud service or your own infrastructure.  See our documentation for specifics.

<p class="noteNote">The Spark DB Console is not accessible until you've run at least one Spark query on your cluster.</p>

Once you've started a Spark query, you'll see the DB Console ( *Spark* ) UI:
<img class="fitwidth" src="https://doc.splicemachine.com/zeppelin/images/zepSparkJobs-a.png" alt="Console UI Top-Level Display">


<p class="noteNote">The <em>Executors</em> tab of the console does not work when running code in this class.</p>


## Basic UI Features

Before we use the console to examine a specific query, let's go over a few interesting notes about the DB Console:

* Queries are reported as *Jobs* in the Spark UI
* Each Job will have *Stages*
* Each Stage will have *Tasks*

### Drilling Down

In general, you can click anything that displays as a <span class="ConsoleLink">blue link</span> to drill down into a more detailed view. For example,if you were  looking at the following information displayed in the Console, you could click <span class="ConsoleLink">Produce Result Set</span> in the following description from the completed jobs table, which will drill down into the job details for *Job 6*:
<img class="fitwidth" src="https://doc.splicemachine.com/zeppelin/images/zepConsoleDrillDown-a.png" alt="Drilling Down in the Console UI">

You can continue to drill down from there to reveal increasing levels of detail. In the next section of this notebook, we will view job details and then drill down for an example query.

### Switching Views

You can quickly switch to a different view by clicking a tab in the tab bar at the top of the console screen. Note that the *Jobs* tab is selected in this screen shot:
<img class="fitwidth" src="https://doc.splicemachine.com/zeppelin/images/zepConsoleTabs-a.png" alt="Console UI Main Tabs display">

### Hovering

You can hover the cursor over interface element links, like the <span class="ConsoleLink">Event Timeline</span> drop-down in the following image, to display a screen tip for the item:
<img class="fitwidth" src="https://doc.splicemachine.com/zeppelin/images/zepConsoleHover-a.png" alt="Console UI Event Timeline drop-down">

Similarly, you can hover over the <span class="ConsoleLink">?</span> to display the definition for a term; this example is displaying the definition of a job:
<img class="fitwidth" src="https://doc.splicemachine.com/zeppelin/images/zepConsoleHover2-a.png" alt="Hovering to display term definition in the Console UI">

And you can hover over an event in timeline display to see summary information; for example:
<img class="fitwidth" src="https://doc.splicemachine.com/zeppelin/images/zepConsoleTimelineHover-a.png" alt="Hovering over the Console UI Timeline Display">



## Running a Basic Query

Let's generate an `EXPLAIN` plan for a simple query that we previously ran: `explain select count(*) from index_example`. Generate the plan by running the next cell:



In [None]:
%%sql 

explain select count(*) from dev1.index_example

<br />

Notice the `engine=Spark` on the top line, which indicates that this query will be processed by the Spark engine, which means that we can monitor the query in the Spark DB Console.

Now let's actually run the query:

In [None]:
%%sql 

select count(*) from dev1.index_example

<br />

Now let's use the DB Console to view the query - remember to go to `localhost:4040` in a new browser window. You should see something like this:
<img class="fitwidth" src="https://doc.splicemachine.com/zeppelin/images/zepSparkJob2-a.png" alt="Viewing a query in the console UI">

If you got to the DB Console quickly enough after running the query, it may show as an *Active Job* instead of being a *Completed Job.*


## Drilling Down into Our Results

Let's examine the *Stages* of the *Job* we just ran by starting on the Jobs page and clicking <span class="ConsoleLink">Produce Result Set</span> for the above query. You'll see the *Job Detail* display for the query:

  <img class="fitwidth" src="https://doc.splicemachine.com/zeppelin/images/zepJobDetail1-a.png" alt="Job Detail in the Console UI">

Note that the detail includes this information:

* This job has two Stages.
* Each Stage has a duration.
* Each Stage in this Job ran one Task.

  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; *Note:* The number of tasks required for the job may be different in your environment.

### Viewing Job Details Graphically

You can see a graphical representation of the actual activity performed within the Job's Stages by clicking the <span class="ConsoleLink">DAG Visualization</span> link above the *Completed Stages* section of the Job Details display. Here's what that looks like for our example query:

  <img class="fitwidth" src="https://doc.splicemachine.com/zeppelin/images/zepJobDetailDag-a.png" alt="Directed Graph Visualization in the Console UI">

Note that this is essentially another view of the EXPLAIN plan for this query, with the execution flow depicted by the arrows.


### Viewing Stage Details

To drill down into the detail of the first Stage of our query, click anywhere in the box representing that Stage starting with TableScan in the DAG visualization. The Console displays the details of that Stage:

  <img class="fitwidth" src="https://doc.splicemachine.com/zeppelin/images/zepStageDetailDag-a.png" alt="Viewing details of a stage in the Console UI">

The DAG Visualization for the Stage is shown at the top of this view; you can hide the DAG by clicking the <span class="ConsoleLink">DAG Visualization</span> link, or you can scroll down below the graph to see the *Summary Metrics* for the Stage:

  <img class="fitwidth" src="https://doc.splicemachine.com/zeppelin/images/zepStageDetails-a.png" alt="Stage Details ">

At the very bottom of this view, we see *Tasks.*  These are the most basic work units in the Spark Engine. For each task you will see:

* a duration
* garbage collection time
* other information relevant to the task activity

In the above example, we see that the set of Tasks:

* performed a TableScan
* read around 1.3M rows total
* wrote out some bytes of records for processing by the next Stage



### Viewing the Event Timeline

You can get another view of the current Stage by clicking the <span class="ConsoleLink">Event Timeline</span> link; the Console the displays all tasks in this stage on a timeline:
  <img class="fitwidth" src="https://doc.splicemachine.com/zeppelin/images/zepStageTimeline-a.png" alt="Viewing the event timeline for a stage in the Console UI">

This view is especially useful when a Stage has many tasks, and you want to see how many executors and how much parallelism is being achieved for this stage of the query. More on this in a moment.

## Terminating an Active Query
You can terminate an active job if you determine that something isn't working as expected. Simply access the *Jobs* tab while a job is actively running, and click the the *(kill)* text displayed next to the job description, as shown here:

<img class="fitwidth" src="https://doc.splicemachine.com/zeppelin/images/zepKillJob-a.png">

You'll be asked to confirm that you want to terminate that job.


## About Parallelism and Spark

The power of Splice Machine in performing large analytic queries quickly lies in its ability to run those queries with parallel resources.  Spark has the capability of running a number of Job/Stages/Tasks in parallel.  How much parallelism you see, and where, depends on the following:

<table class="splicezepOddEven">
    <col width="25%" />
    <col />
    <thead>
        <tr>
            <th>Parallelism Factor</th>
            <th>Description</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Active executors</td>
            <td><p>How many executors are available to run your query?</p>
                <p>The available executor count on your cluster depends on your configuration; in our training example we have just one, but clusters can have many executors.  Each executor is typically configured to run 4 tasks in parallel.  Therefore your maximum parallelism is typically the number of executors * 4 simultaneous tasks.  For example if your cluster had 12 executors, you can run 48 tasks in parallel across all running jobs.</p></td>
        </tr>
        <tr>
            <td>Tasks per stage in one query</td>
            <td><p>Splice Machine and Spark will dynamically split up the workload across many tasks for large data sets.</p>
            <p>Our example data set contains only 1 million rows; as a result, our example query won't have many Tasks per Stage. With more data in your tables, you will see more tasks in parallel in a given Stage.</p>
                </td>
        </tr>
        <tr>
            <td># of Queries being run simultaneously</td>
            <td>Spark can run queries simultaneously with available resources.</td>
        </tr>
    </tbody>
</table>



## Where to Go Next
The next notebook in this presentation introduces you to <a href="./3.4%20Transactions%20with%20Spark%20%26%20JDBC.ipynb">the transactional nature of Splice Machine and using JDBC to program transactions.</a>

## For Additional Help
As you've seen in this notebook, the Database Console UI is extremely useful in getting a view into how well your queries are getting processed.  

Once you have your data loaded at or near target scale, if you are not seeing good throughput (task activity, etc), please visit our <a href="https://splicemachine.slack.com/messages/splice-community/" target="_blank"><em>splice-community</em> Slack channel</a> and ask for help; if you've not already done so, you can register for this channel <a href="https://www.splicemachine.com/community/slack-channel-signup/" target="_blank">here</a>.

