### Show Databases

<pre>SHOW DATABASES;</pre>
----

### Create Database

<pre>CREATE DATABASE DATABASE_NAME; </pre>

----

### Describe Database

<pre>DESCRIBE DATABASE DATABASE_NAME;</pre>

----

### Use

<pre>USE DATABASE_NAME;</pre>

----

### Drop Database (remember the Database must be empty to drop )

<pre>DROP DATABASE DATABASE_NAME</pre>

----

### Current Database (display the current db in the cli)

<pre>set hive.cli.print.current.db=true;</pre>

### Display the Column Names 
<pre>set hive.cli.print.header=true;</pre>

----

### View Tables (shows all the tables in the Database)

<pre>SHOW TABLES;</pre>




## Tables in HIVE
<ul>
    <li>Managed/Internal Table</li>
    <li>External Table</li>
</ul>


### Managed/Internal table
<ul>
    <li>Managed table is also called as Internal table. This is the default table in Hive. <br> When
we create a table in Hive without specifying it as external, by default we will get a
Managed table.</li>
    <li>If we create a table as a managed table, the table will be created in a specific location
in HDFS.</li>
    <li>By default, the table data will be created in /usr/hive/warehouse directory of HDFS.</li>
    <li><b>If we delete a Managed table, both the table data and metadata for that table will be
deleted from the HDFS.</b></li>
</ul>
<pre>
create table if not exists emp_int    
(  
id int,  
name string,
sal int,  
city string   
)  
row format delimited fields terminated by ','  
stored as textfile  
tblproperties("skip.header.line.count"="1");
</pre>

----

### External Table
<ul>
    <li>External table is created for external use as when the data is used outside Hive.</li>
    <li>Whenever we want to delete the table’s metadata and we want to keep the table’s
data as it is, we use External table.</li>
    <li><b>External table only deletes the schema of the table.</b></li>
</ul>
<pre>
create external table if not exists emp_int     
(   
id int,   
name string, 
sal int,   
city string    
)   
row format delimited fields terminated by ','   
stored as textfile   
tblproperties("skip.header.line.count"="1");
</pre>

### Truncate Table
<pre>TRUNCATE TABLE TABLE_NAME;</pre>
----

### Load Data:

<pre>LOAD DATA LOCAL INPATH '/home/saif/LFS/datasets/emp_all.txt' INTO TABLE emp_all_temp;</pre>


<p><b>Note:</b> <br>To access the data from the Edge Node we are supposed to use <b>LOCAL</b> <br>
                To access the data from hdfs dont use <b>LOCAL</b> keyword in the command
</p>

----

### Alter Table
Melwin Reffer the Notes

### Functions:
Melwin reffer the notes

## Hive Partitions

### What is Partitions?
<p>
Hive Partitions is a way to organizes tables into partitions by dividing tables into different
parts based on partition keys. <br>
Table partitioning means dividing table data into some parts based on the values of
particular columns like date or country,<br> which segregates the input records into different
files/directories based on date or country. 
</p>
<p>
    Partitioning can be done based on more than one column which will <b>impose multidimensional structure on directory storage</b>. For e.g. in addition to partitioning records by
date column, we can also sub-divide the single day records into country wise separate files
by including country column into partitioning.
</p>

<b>Advantages:</b>
<ol>
    <li>Partitioning is used for distributing execution load horizontally. </li>
    <li>As the data is stored in slices/parts, query response time is faster to process the small
parts of data instead of searching in the entire data set.</li>
    <li>For e.g. in a larger table where the table is partitioned by country, then selecting users
of country ‘IN’ will just scan one directory ‘country=IN’ instead of all directories.</li>
</ol>

<b>Disadvantages:</b>
<ul>
    <li>Having too many partitions in table create large number of files/directories in HDFS, <br>
which is an overhead to NameNode since it has to keep all Metadata for the file system in
memory only. </li>
    <li> Partitions may optimized some queries based on Where clause but may be less
responsive for other queries based on grouping etc.</li>
</ul>

<pre>
create table if not exists emp_all(
id int,
name string,
sal int
)
partitioned by (country string)
row format delimited fields terminated by ','
stored as textfile;
</pre>

<p><b>NOTE:</b> remember we are <b>not supposed to add the partition column</b> and<br> the tblproperties("skip.header.line.count"="1");
 in the partition table creation</p>

### Inserting Data into Partitioned Tables
<ul>
    <li>Static Partition</li>
    <li>Dynamic Partition</li>
</ul>

### Static Partition
<p>Note: <br>we are using the previous tables we have a table <b>emp_all_temp</b> where all the data are there. <br>
      we are loading the data from the <b>emp_all_temp</b> to <b>emp_all</b> using the Static Partition</p>
<pre>
insert overwrite table emp_all partition (country='IN') select id,name,sal from emp_all_temp where country='IN';

insert overwrite table emp_all partition (country='US') select id,name,sal from emp_all_temp where country='US';

insert overwrite table emp_all partition (country='UK') select id,name,sal from emp_all_temp where country='UK';
</pre>

### Dynamic Partition

<p>For the Dynamic Partition we are supposed to set some of the settings</p>

<pre>
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=100;
set hive.exec.max.dynamic.partitions.pernode=100;

create table if not exists emp_all_dynamic(
id int,
name string,
sal int
)
partitioned by (country string)
row format delimited fields terminated by ','
stored as textfile;


insert overwrite table emp_all_dynamic partition (country) select id,name,sal,country from emp_all_temp;
</pre>

<p><b>Note:</b>for the dynamic partition in the insert column, we are supposed to add the <b>partition column at the last column of the select query</b></p>

### TO SHOW PARTITION ON A TABLE
<pre>
SHOW PARTITIONS Partition_Table_Name;
</pre>

### TO Drop Partitions

<pre>ALTER TABLE Partition_Table_Name DROP IS EXISTS PARTITION (Column_Name);</pre>

## Hive Buckets

<p>Hive partition divides table into number of partitions and these partitions can be further
subdivided into more manageable parts known as Buckets or Clusters. The Bucketing
concept is based on <b>Hash function</b>, which depends on the type of the bucketing column.
Records which are bucketed by the same column will always be saved in the same bucket. </p>

<pre>
    set hive.exec.dynamic.partition=true;
    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.exec.max.dynamic.partitions=100;
    set hive.exec.max.dynamic.partitions.pernode=100;
    set hive.enforce.bucketing = true
</pre>
<pre>
    CREATE TABLE REAL_ESTATE_BUCKET
    (
        street string,
        zip int,
        state string,
        beds int,
        baths int,
        sq_ft int,
        type string,
        price int
    )
    partitioned by (city string)
    clustered by (street) into 4 buckets
    row format delimited fields terminated by ','
    lines terminated by '\n'
    stored as textfile
    tblproperties("skip.header.line.count"="1");
</pre>

### Advantages of Bucketing
<ul>
    <li>It provides faster query response like partioning.</li>
    <li> In bucketing due to equal volumes of data in each partition, joins at Map side will be
quicker.</li>
</ul>