# Data Definition Language - DDL

You should have three files in the `/dataset/text/` hdfs folder:

- `/dataset/text/gutenberg_all.txt`
- `/dataset/text/holmes.txt`
- `/dataset/text/small.txt`

In [None]:
! hdfs dfs -ls /dataset/text

## Creating the Hive Database

In [None]:
%load_ext sql
%sql hive://hadoop@localhost:10000/

In [None]:
%%sql
create database if not exists text

Verify that we see the created database `text`.
This can be seen on two places:
1. With the command below
2. In [HDFS explorer](http://bdlc-XX.el.eee.intern:9870/explorer.html#/user/hive/warehouse/text.db) (the folder was created)

In [None]:
%sql show databases

## Using a Database

We now trigger the command `use text` to select the database since we already have an open session. For other notebooks one can trigger:

```python
%load_ext sql
%sql hive://hadoop@localhost:10000/text
```

In [None]:
%sql use text

## Defining the Tables

### Table: `raw_small`

In [None]:
%%sql

CREATE TABLE IF NOT EXISTS raw_small (
    line STRING
)
ROW FORMAT DELIMITED
LINES TERMINATED BY '\n'
STORED AS TEXTFILE



### Table: `raw_holmes`

In [None]:
%%sql

CREATE TABLE IF NOT EXISTS raw_holmes (
    line STRING
)
ROW FORMAT DELIMITED
LINES TERMINATED BY '\n'
STORED AS TEXTFILE

### Table: `raw_gutenberg`

In [None]:
%%sql

CREATE TABLE IF NOT EXISTS raw_gutenberg (
    line STRING
)
ROW FORMAT DELIMITED
LINES TERMINATED BY '\n'
STORED AS TEXTFILE

## Verifying that we have the three tables

In [None]:
%sql show tables

## Inserting Data
Hive would delete the source data if we feed it into the tables. Let us copy the files first so we still have the original data in `/dataset/text`

In [None]:
!hdfs dfs -cp /dataset/text/*.txt /user/hadoop

In [None]:
!hdfs dfs -ls /user/hadoop/*.txt

### Loading Data into `raw_small`

In [None]:
%%sql
LOAD DATA INPATH '/user/hadoop/small.txt'
OVERWRITE INTO TABLE raw_small

### Loading Data into `raw_holmes`

In [None]:
%%sql
LOAD DATA INPATH '/user/hadoop/holmes.txt'
OVERWRITE INTO TABLE raw_holmes

### Loading Data into `raw_gutenberg`

In [None]:
%%sql
LOAD DATA INPATH '/user/hadoop/gutenberg_all.txt'
OVERWRITE INTO TABLE raw_gutenberg

## Verify that we have data in the tables

First the source data should be gone:

In [None]:
!hdfs dfs -ls /user/hadoop/*.txt

Checkout the folders and files in [HDFS explorer](http://bdlc-XX.el.eee.intern:9870/explorer.html#/user/hive/warehouse/text.db)

## Final Check

In [None]:
%sql select * from raw_small limit 1

In [None]:
%sql select * from raw_holmes limit 1

In [None]:
%sql select * from raw_gutenberg limit 1