In [1]:
%%HTML
<link rel="stylesheet" href="https://doc.splicemachine.com/jupyter/css/custom.css">

In [None]:
import os
os.environ['JDBC_HOST'] = 'jrtest01-splice-hregion'

In [None]:
# setup-- 
import os
import pyspark
from splicemachine.spark.context import PySpliceContext
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession

# make sure pyspark tells workers to use python3 not 2 if both are installed
os.environ['PYSPARK_PYTHON'] = '/usr/bin/python3'
jdbc_host = os.environ['JDBC_HOST']

conf = pyspark.SparkConf()
sc = pyspark.SparkContext(conf=conf)

spark = SparkSession.builder.config(conf=conf).getOrCreate()

splicejdbc=f"jdbc:splice://{jdbc_host}:1527/splicedb;user=splice;password=admin"

splice = PySpliceContext(spark, splicejdbc)


<link rel="stylesheet" href="https://doc.splicemachine.com/zeppelin/css/zepstyles.css" />

# Welcome to Splice Machine!
This README will help you to get started, in these sections:

* *Release Notes* lists the known assumptions and limitations in the current version of our Database Service.

* *About Jupyter* provides a very quick introduction to Jupyter.

* *Tutorial Notebooks* introduces the set of tutorials that we've created to help you get started with Splice Machine.


## Release Notes

The following are known assumptions and limitations to the Splice Machine Database-as-Service at this time:


<table class="splicezepOddEven" style="border:none"
    <col />
    <col />
    <tbody>
        <tr>
            <td>&#8226;</td>
            <td>Clusters are only created in the us-east-1 region currently. We will add support for more regions later.</td>
        </tr>
        <tr>
            <td>&#8226;</td>
            <td>When using a JDBC connection, individual queries or actions will time out after one hour. Run long-running queries within a Jupyter notebook.</td>
        </tr>
        <tr>
            <td>&#8226;</td>
            <td>TLS for JDBC has not yet been enabled.</td>
        </tr>
        <tr>
            <td>&#8226;</td>
            <td>Usage graphs for clusters (CPU, Memory, and Disk) are currently intermittent.</td>
        </tr>
        <tr>
            <td>&#8226;</td>
            <td>VPC Settings are not yet enabled, but will be in a near future release.</td>
        </tr>
        <tr>
            <td>&#8226;</td>
            <td>Note that the timestamps you see in Jupyter will be different than the timestamps you see in the Splice Spark UI, depending upon your time zone.</td>
        </tr>
        <tr>
            <td>&#8226;</td>
            <td>Cancelling queries through Jupyter or JDBC tools does not currently work. Spark queries can be killed through the Spark UI.</td>
        </tr>
        <tr>
            <td>&#8226;</td>
            <td>Reminder: though Splice Machine backs up your database regularly, it does not back up your Notebook changes. Please save your Notebooks regularly if you make changes.</td>
        </tr>
    </tbody>
</table>


## About Jupyter

JupyterLab is a web-based interactive development environment for Jupyter notebooks, code, and data. JupyterLab is flexible: configure and arrange the user interface to support a wide range of workflows in data science, scientific computing, and machine learning. JupyterLab is extensible and modular: write plugins that add new components and integrate with existing ones.

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

We strongly encourage you to visit the [Jupyter documentation site](https://jupyter.org/documentation) to learn about creating, modifying, and running your own Jupyter notebooks.


## Tutorial Notebooks

We've created a set of Notebooks to help you get up and running with your Splice Machine database and Zeppelin:

<table class="splicezepOddEven">
    <col width="20%" />
    <col width="30%" />
    <col />
    <thead>
        <tr>
            <th>Section</th>
            <th>Notebook</th>
            <th>Description</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td rowspan="9"><em>2. Getting Started Tutorial</em></td>
            <td><a href="./2.1%20Jupyter%20Notebook%20Basics.ipynb">Jupyter Notebook Basics</a></td>
            <td>Introduces using Splice Machine and Jupyter together</td>
        </tr>
        <tr>
            <td><a href="./2.2%Copying%20Data%20to%20S3.ipynb">Copying Data to S3 for easy import</a></td>
            <td>How to copy data to S3 for easy access from Splice Machine</td>
        </tr>
        <tr>
            <td><a href="./2.3%20Importing%20Data%20Tutorial.ipynb">Importing Data into Your Database</a></td>
            <td>How to import data into your Database</td>
        </tr>
        <tr>
            <td><a href="./2.4%20Running%20Queries%20Tutorial.ipynb">Running Queries</a></td>
            <td>Running Splice Machine database queries in Zeppelin and applying visualizations to the results</td>
        </tr>
        <tr>
            <td><a href="./2.5%20Tuning%20for%20Performance%20Tutorial.ipynb">Tuning Queries for Performance</a></td>
            <td>Easy Splice Machine query optimization techniques</td>
        </tr>
        <tr>
            <td><a href="./2.6%20Using%20the%20Database%20Console%20Tutorial.ipynb">Using the DB Console UI</a></td>
            <td>Introduces the Spark DB Console, which you can use to monitor queries</td>
        </tr>
        <tr>
            <td><a href="./2.7%20Explaining%20and%20Hinting%20Tutorial.ipynb">Using Explain and Hints</a></td>
            <td>Shows you how to use Splice Machine's Explain Plan and Hints to tune up your queries</td>
        </tr>
        <tr>
            <td><a href="./2.8%20TPCH-1%20Tutorial.ipynb">Running the TPCH-1 Benchmark Queries</a></td>
            <td>Walks you through importing the TPCH-1 data and running the benchmark queries</td>
        </tr>
        <tr>
            <td><a href="./2.9%20Common%20Utilities.ipynb">Common Utilities</a></td>
            <td>A collection of Splice Machine tools and techniques to help simplify development</td>
        </tr> 
        <tr>
            <td rowspan="8"><em>3. Splice Machine Deep Dive</em></td>
            <td><a href="./3.1%20Introduction.ipynb">Introduction</a></td>
            <td>This notebook introduces Splice Machine, with a brief overview of its architecture, technology stack, and SQL coverage.</td>
        </tr>
        <tr>
            <td><a href="./3.2%20The%20Life%20of%20a%20Query.ipynb">The Life of a Query</a></td>
            <td>Walks you through importing and running the TPC-H benchmark data by creating a database in Splice Machine, importing the TPC-H dataset, running queries, and improving performance by indexing the data.</td>
        </tr>
        <tr>
            <td><a href="./3.3%20Monitoring%20Queries.ipynb">Monitoring Queries with the Database Console</a></td>
            <td>Introduces you to the Splice Machine Database Console, which you can use to monitor and control your currently running queries.</td>
        </tr>
        <tr>
            <td><a href="./3.4%20Transactions%20with%20Spark%20&amp%20JDBC.ipynb">Splice Transactions with Spark and JDBC</a></td>
            <td>Introduces you to the transactional nature of Splice Machine and using JDBC to program transactions.</td>
        </tr>
        <tr>
            <td><a href="./3.5%20Creating%20Applications.ipynb">Creating Applications with Splice Machine</a></td>
            <td>Show you how you can easily create applications with Splice Machine.</td>
        </tr>
        <tr>
            <td><a href="./3.6%20Using%20our%20Native%20Spark%20DataSource.ipynb">Using the Splice Machine Native Spark DataSource</a></td>
            <td>Introduces you to Using the Splice Machine Spark Adapter for working with your database.</td>
        </tr>
        <tr>
            <td><a href="./3.7%20Python%20MLlib%20example.ipynb">Machine Learning with Spark MLlib using Python</a></td>
            <td>Presents an example of Using the Spark Machine Learning Library (MLlib) with Splice Machine and Python.</td>
        </tr>
        <tr>
            <td><a href="./3.8%20Scala%20MLlib%20example.ipynb">Machine Learning with Spark MLlib using Scala</a></td>
            <td>Presents an example of Using the Spark Machine Learning Library (MLlib) with Splice Machine and Scala.</td>
        </tr>
        <tr>
            <td><em>4. Advanced Topics</em></td>
            <td><a href="./4.1%20Creating%20Custom%20Stored%20Procedures.ipynb">Creating Custom Stored Procedures</a></td>
            <td>Shows you how to create Splice Machine stored procedures and functions.</td>
        </tr>
        <tr>
            <td><em>5. Benchmarks</em></td>
            <td><a href="./5.1%20TPCH-100.ipynb">TPCH-100 Benchmark</a></td>
            <td>Walks you through importing the TPCH-100 data and running the benchmark queries.</td>
        </tr>
        <tr>
            <td rowspan="5"><em>6. Internet of Things (IoT) Reference App</em></td>
            <td><a href="./6.1%20IoT%20App%20Overview.ipynb">App Overview</a></td>
            <td>Overview of the IoT Reference Application.</td>
        </tr>
        <tr>
            <td><a href="./6.2%20Setting%20Up%20the%20Database.ipynb">Setting Up the Database</a></td>
            <td>Sets up the database table and loads static data for our IoT app.</td>
        </tr>
        <tr>
            <td><a href="./6.3%20Setting%20Up%20Kafka.ipynb">Setting Up Kafka</a></td>
            <td>Creates a Kafka topic and producer.</td>
        </tr>
        <tr>
            <td><a href="./6.4%20Using%20Spark%20Streaming.ipynb">Using Spark Streaming</a></td>
            <td>Uses Spark Streaming to get data from Kafka into the database.</td>
        </tr>
        <tr>
            <td><a href="./6.5%20Querying%20Our%20IoT%20Database.ipynb">Querying Our IoT Database</a></td>
            <td>Runs real-time queries of the IoT data as it is streaming in.</td>
        </tr>
        <tr>
            <td rowspan="2"><em>7. Machine Learning</em></td>
            <td><a href="./7.1%20KMeans.ipynb">Kmeans</a></td>
            <td>An example of implementing a KMeans algorithm.</td>
        </tr>
        <tr>
            <td><a href=".7.2%20Decision%20Tree.ipynb">Decision Tree</a></td>
            <td>An example of implementing a decision tree algorithm.</td>
        </tr>
    </tbody>
</table>
 
We strongly recommend that you take the time to go through all of these Tutorial Notebooks, which will address many of your initial questions and guide you to your next steps.

<p class="noteIcon">We recommend going through the Notebooks in our <em>Getting Started Tutorial</em> in sequence, starting with <a href="./2.1%20Notebook%20Basics.ipynb">Notebook Basics</a>; these Notebooks build on results generated by previous steps to guide you through importing data, making queries, and tuning those queries for better performance.</p> 

After you've completed the Tutorials, you can explore our other Notebooks, which illustrate the database capabilities, and walk you through reference applications using Splice Machine along with other tools, including streaming, supply chain management, and machine learning.
