<link rel="stylesheet" href="https://doc.splicemachine.com/zeppelin/css/zepstyles.css" />

# Importing Data into Your Splice Machine Database

This tutorial walks you through importing data into your Splice Machine database from flat files that are stored in S3.

Before starting this tutorial, you should already have your data stored in delimited format (such as CSV) in an S3 bucket; if that's not yet true, please review the previous tutorial, <a href="./2.3%20Copying%20Data%20to%20S3">Copying Data to S3,</a> which walks you through copying your data to S3.

We'll walk you through a simple example, after which you'll be able to import your own data into your database.

## Import Data Checklist

When you use the `import` command in Splice Machine to load your data into your database, you need to specify a number of details about your data files to get them correctly imported. Before starting this process, please make sure your data formats will work, as defined here:


<table class="splicezepOddEven">
    <col />
    <col />
    <thead>
        <tr>
            <th>Data File Detail</th>
            <th>Specific Requirements</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Field delimited?</td>
            <td>The fields in each row <strong>must</strong> have delimiters between them.</td>
        </tr>
        <tr>
            <td>Rows terminated?</td>
            <td>Each row <strong>must</strong> be terminated with a newline character.</td>
        </tr>
        <tr>
            <td>Header row included?</td>
            <td>Header rows are not allowed; if your data contains one, you <strong>must</strong> remove it.</td>
        </tr>
        <tr>
            <td><code>Date</code>, <code>time</code>, <code>timestamp</code> data types</td>
            <td> If you are using <code>date</code>, <code>time</code>, and/or <code>timestamp</code> data types in the target table, you need to know how that data is represented in the flat file; your file <strong>must</strong> use a consistent representation, and you must specify that format when using the import command.</td>
        </tr>
        <tr>
            <td><code>Char</code> and <code>Varchar</code> data</td>
            <td><p>If any of your <code>char</code> or <code>varchar</code> data contains your delimiter character, you <strong>need to use</strong> a special character delimiter.</p>
                <p>If any of your <code>char</code> or <code>varchar</code> data contains newline characters, you <strong>need to set</strong> the <code>oneLineRecords</code> parameter to <code>false</code>.</p>
            </td>
        </tr>
    </tbody>
</table>


The examples in this tutorial will clarify how to specify these parameters when importing data. For more information, please see the Splice Machine <a href="https://doc.splicemachine.com/sqlref_sysprocs_importdata.html" target="_blank">import data procedure documentation</a> page.

<p class="noteIcon">It is a good idea to test your import, delimiting, date formatting, etc., on a small amount of data first before loading all of your data. That's what we'll do in this Tutorial.</p>


## The IMPORT_DATA Command

Syntax for the `IMPORT_DATA` command looks like this:
```
call SYSCS_UTIL.IMPORT_DATA (
	schemaName,
	tableName,
	insertColumnList | null,
	fileOrDirectoryName,
	columnDelimiter | null,
	characterDelimiter | null,
	timestampFormat | null,
	dateFormat | null,
	timeFormat | null,
	badRecordsAllowed,
	badRecordDirectory | null,
	oneLineRecords | null,
	charset | null 
);
```
Notice that many of the parameters allow you to apply the default value by specifying `null`.

<p class="noteNote">You can find full details about these parameters, including the default value for each, in <a href="https://doc.splicemachine.com/sqlref_sysprocs_importdata.html" target="_blank">our Importing Data documentation.</a></p>


### Example 1

This example allows you to walk through importing data one step at a time and see the results of that step.

#### 1. Create our Database Table

Run the next cell in this Notebook, which uses the Jupyter *%%sql* magic, to create a table in your Splice Machine database.

In [None]:
%%sql 

create table import_example (i int, v varchar(20), t timestamp);

#### 2. Import a Small Data Sample

Now we'll import a small sample of our data to make sure that we've got our import set up correctly. We've created a sample data file named *example1.csv* that contains these two records:

<pre>100,hello there,2017-01-01 00:00:00
200,how are you,2017-02-01 00:00:00</pre>

Import the data in this file by running the next cell, which calls our `IMPORT_DATA` function

In [None]:
%%sql 

call SYSCS_UTIL.IMPORT_DATA('splice','import_example',null,'s3a://splice-examples/import/example1.csv',null,null,null,null,null,0,null,null,null);

<br />

You'll notice that after you run the cell, you see a short report that indicates how many rows were successfully loaded, and how many failed to load, In this example, all 2 rows were successfully loaded.

You have probably also noticed that we used default values by specifying `null` for all of the parameters that have defaults; here's what those defaults mean:

<table class="splicezepOddEven">
    <col />
    <col />
    <thead>
        <tr>
            <th>Parameter</th>
            <th>NULL Value Details</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td class="CodeFont">insertColumnList</td>
            <td>Our column list exactly matches the columns and ordering of columns in the table, so there's not need to specify a list.</td>
        </tr>
        <tr>
            <td class="CodeFont">columnDelimiter</td>
            <td>Our data uses the default comma character (<code>,</code>) to delimit columns.</td>
        </tr>
        <tr>
            <td class="CodeFont">stringDelimiter</td>
            <td>None of our data fields contain the comma character, so we don't need a string delimiter character.</td>
        </tr>
        <tr>
            <td class="CodeFont">timestampFormat</td>
            <td>Our data matches the default timestamp format, which is <code>yyyy-MM-dd HH:mm:ss</code>.</td>
        </tr>
        <tr>
            <td class="CodeFont">dateFormat</td>
            <td>Our data doesn't contain any date columns, so there's no need to specify a format.</td>
        </tr>
        <tr>
            <td class="CodeFont">timeFormat</td>
            <td>Our data doesn't contain any time columns, so there's no need to specify a format.</td>
        </tr>
        <tr>
            <td class="CodeFont">badRecordDirectory</td>
            <td>We left this <code>null</code>, which is allowable, but not considered a good practice. Splice Machine advises specifying a bad record directory so that you can diagnose any record import problems.</td>
        </tr>
        <tr>
            <td class="CodeFont">oneLineRecords</td>
            <td>We were able to leave this as <code>null</code> because our records each fit on one line. If your data contains any newline characters, you must specify <code>false</code> for this parameter, and you must include delimiters around the data.</td>
        </tr>
        <tr>
            <td class="CodeFont">charset</td>
            <td>This parameter is currently ignored; Splice Machine assumes that your data uses utf-8 encoding.</td>
        </tr>
    </tbody>
</table>


#### 3. Make Minor Data Changes and Corresponding Parameter Changes

Now we'll make a few minor changes in our input data to see how that influences our import command. Let's change the data a bit, and see how that influences the import_data command. 

The updated data, stored in *example2.csv*, looks like this:

<pre>
'hello
there'|2017-01-01 00:00:00.123456
'how, are you'|2017-02-01 00:00:00.123456
</pre>

Now import the data in the example2.csv file by running the next cell, which again uses the *%sql* interpreter:


In [None]:
%%sql 
call SYSCS_UTIL.IMPORT_DATA('splice','import_example','v,t','s3a://splice-examples/import/example2.csv','|','''','yyyy-MM-dd HH:mm:ss.SSSSSS',null,null,0,null,false,null)

<br />

Let's examine the changes in our call to `IMPORT_DATA` due to changes in our data file:

<table class="splicezepOddEven">
    <col />
    <col />
    <thead>
        <tr>
            <th>Parameter</th>
            <th>Details</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td class="CodeFont">insertColumnList</td>
            <td><p>In this case, we only want to import two of the three columns in our input data, so we specify the names of the columns we want imported.</p>
                <p class="noteNote">The default value (or `null` if no default value is defined in the database) is inserted for records that don't contain a value in a column being imported.</p>
            </td>
        </tr>
        <tr>
            <td class="CodeFont">columnDelimiter</td>
            <td>At least one of our records includes a string that contains the default delimiter (comma), so need to use a different delimiter character. Our sample file uses the `|` character.</td>
        </tr>
        <tr>
            <td class="CodeFont">stringDelimiter</td>
            <td><p>We want to be able to include commas and newlines in our input data fields, so we enclose string data in our input file in single quote (<code>'</code>) characters.</p>
                <p class="noteNote">You need to escape the single quote character in your parameter values, which is why you see four single quotes (<code>''''</code>).</p>
            </td>
        </tr>
        <tr>
            <td class="CodeFont">timestampFormat</td>
            <td>Our data now includes microseconds, so we need to change our format specification to <code>yyyy-MM-dd HH:mm:ss.SSSSSS</code>.</td>
        </tr>
        <tr>
            <td class="CodeFont">oneLineRecords</td>
            <td>One of our input records contains a newline, so we must explicitly set this value to <code>false</code>.</td>
        </tr>
    </tbody>
</table>


## Where to Go Next

Once you have successfully imported your own data, you're ready to run queries. See our next tutorial, [*Running Queries*](../2.%20Tutorials/Running%20Queries%20Tutorial.ipynb).

<p class="noteIcon">Our <a href="https://doc.splicemachine.com/sqlref_sysprocs_importdata.html" target="_blank">documentation for importing data</a> is extremely useful in handling your specific import cases, especially with respect to supported timestamp, date, and time data formats.</p>
