# Introduction to Splice Machine for Data Scientists

This class is recommended for data scientists and data engineers who are working with Splice Machine. To understand this content, you need to have some experience with:

* using SQL
* programming with Python or Scala
* implementing Machine Learning, data transformation, data analysis, streaming, model design and selection, and results interpretation

You can expect to spend 2-3 hours completing this *Splice Machine Data Scientist* class, which contains the following sequence of notebooks:

<table class="splicezepOddEven">
    <col />
    <col />
    <tbody>
        <tr>
            <td>&nbsp;1.</td>
            <td>If you've not already done so, please visit our <a href="../For%20All%20-%20Basic%20Training/Splice%20Machine%20Basics.ipynb"><em>Splice Machine Basics</em></a> notebook for an overview of Splice Machine before diving into this set of notebooks.</td>
        </tr>
        <tr>
            <td>&nbsp;2.</td>
            <td>And if you're new to using Jupyter, our <a href="../For%20All%20-%20Basic%20Training/Jupyter%20Notebook%20Basics.ipynb"><em>Jupyter Notebook Basics</em></a> notebook is a brief introduction to running and creating/modifying Jupyter notebooks, which you'll need to know before proceeding with the other notebooks in this section.</td>
        </tr>
        <tr>
            <td>&nbsp;3.</td>
            <td>This class starts with our <a href="./b.%20Importing%20Data%20and%20Running%20Queries.ipynb"><em>Importing Data and Running Queries</em></a> notebook, which introduces you to querying your Splice Machine database.</td>
        </tr>
        <tr>
            <td>&nbsp;4.</td>
            <td>The <a href="./c.%20Using%20Spark%20in%20Jupyter%20Notebooks.ipynb"><em>Using Spark in Jupyter Notebooks</em></a> notebook demonstrates writing and running a Spark program in Jupyter.</td>
        </tr>
        <tr>
            <td>&nbsp;5.</td>
            <td>The <a href="./d.%20Using%20the%20Database%20Console.ipynb"><em>Using the Database Console</em></a> notebook introduces the dashboard you can use to monitor queries sent to the Spark engine; this allows you to follow the progress of queries running in Spark, monitor GC usage, and terminate queries when necessary.</td>
        </tr>
        <tr>
            <td>&nbsp;6.</td>
            <td>The <a href="./e.%20Using%20our%20Native%20Spark%20DataSource.ipynb"><em>Using our Native Spark DataSource</em></a> notebook demonstrates how our Native Spark DataSource allows you to adopt the full power of Spark and manipulate dataframes while also having the power of full ANSI, ACID-compliant SQL.</td>
        </tr>
        <tr>
            <td>&nbsp;7.</td>
            <td>The <a href="./f.%20Machine%20Learning%20with%20Spark%20MLlib.ipynb"><em>Machine Learning with Spark MLib</em></a> notebook contains Python code that uses the Machine Learning Library embedded in Spark, <em>MLlib</em>, with the Splice Machine Spark Adapter to realize in-process machine learning.</td>
        </tr>
        <tr>
            <td>&nbsp;8.</td>
            <td>The <a href="./g.%20KMeans%20Example.ipynb"><em>KMeans Example</em></a> notebook presents an unsupervised-learning, clustering algorithm used to determine similarities and trends within a given dataset.</td>
        </tr>
        <tr>
            <td>&nbsp;9.</td>
            <td>The <a href="./h.%20Decision%20Trees%20Example.ipynb"><em>Decision Trees Example</em></a> notebook demonstrates how to load a LIBSVM data file, parse it as an RDD of LabeledPoint and then perform classification using a decision tree with Gini impurity as an impurity measure.</td>
        </tr>
        <tr>
            <td>10.</td>
            <td>The <a href="./i.%20ETL%20Pipeline.ipynb"><em>ETL Pipeline Example</em></a> notebook presents a simple example of implementing an ETL pipeline with Splice Machine.<td>
        </tr>
        <tr>
            <td>11.</td>
            <td>The <a href="./j.%20Exercises.ipynb"><em>Exercises for This Class</em></a> notebook contains a number of exercises that you should complete to test your understanding of the content in this class.</td>
        </tr>       
    </tbody>
</table>

Note that each notebook contains a link to the next notebook in the sequence, and most notebooks also contain links to related pages in our Splice Machine Documentation web.



## Where to Go Next
Please start this class with the [*Importing Data and Running Queries*](./b.%20Importing%20Data%20and%20Running%20Queries.ipynb) notebook for an introduction to running Splice Machine database queries.