#### Amazon EMR Hadoop Hive configuration
* Download and Install the ODBC driver AmazonHiveODB.
* Drivers can be downloaded from the below page:
    * https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-bi-tools.html
* Download the Hive ODBC – 2.6.9.1009.
* After installation go to ODBC Data Source Administrator program and check that the System DSN contains Sample Amazon Hive DSN.
<img src="ODBC.png" width="700">

#### Provision an AWS EMR Cluster using the following CLI command
* Use the below CLI code to provision a new CLI cluster
<code>
aws emr create-cluster --name My-Hive-Cluster --use-default-roles --release-label emr-6.1.0 --instance-count 3 --instance-type m5a.xlarge --applications Name=Hadoop Name=JupyterHub Name=Hive Name=Spark --ec2-attributes KeyName=My-Cluster-KP --log-uri s3://covid-19-tracker-2020/logs/
</code>
<img src="cli.png" width="700">
* Configure access to JupyterHub and Hive from your IP address
    * For your master node security group create an Inbound Rule to ALLOW access to port 10000 (Hive) and 9443 (JupyterHub) from your IP address
    <img src="Inbound-Rule.png" width="700">

#### Hive Configuration on JupyterHub notebook
* Go to JupyterHub and create a new Notebook. My notebook name is My-Hive-Notebook.
* Write script for our Spark Application in the notebook.
* We will modify our script a little to create Hive tables from our dataframes.
* Using our final combined dataset df_Final for Canadas COVID-19 statistics, Register a Temp Table.
<code>
df_Final.registerTempTable('hive_temp')
</code>
Temp Table – Temp Table is a temporary Hive table for performing SQL like queries on our dataframe using SparkSQL. This table gets deleted when we terminate our cluster.
<img src="Notebook.png" width="700">
* Create physical table from the temporary table.
<code>
spark.sql('DROP TABLE IF EXISTS hive_table')
spark.sql('CREATE TABLE hive_table (Date string, Country string, Province string, Latitude double, Longitude double, Confirmed int, Recovered int, Deaths int) USING CSV OPTIONS (path "s3://covid-19-tracker-2020/hive/tables/hive-table.csv", header "true")')
</code>
<img src="Physical_Table.png" width="700">
* Insert the temporary table records into the Hive physical table.
<code>
spark.sql('INSERT INTO hive_table (SELECT* FROM hive_temp)')
</code>
<img src="Insert_Records.png" width="700">

#### Tableau Setup for EMR Hadoop Hive as Data Source
* Launch Tableau Desktop Professional and while choosing data source choose Amazon EMR Hadoop Hive.
* In server configuration, enter your EMR master node DNS as the Server address and Port as 10000 (Hive). Set username to hive. Click Sign In.
<img src="Hive_Server.png" width="700">
* Choose the default schema and drag and drop hive_table to your data source.
<img src="Hive_Table.png" width="700">

#### Tableau Dashboard for COVID-19
* Map of number of Confirmed Cases.
    * Drag the Latitude and Longitude Dimensions into rows and columns and it generates a map of Canada.
    * Drag the measure Confirmed on the Mark Size. This displays the extent of cases for each Province.
    <img src="Confirmed_Map.png" width="700">
* Graph of number of Confirmed Cases.
    * Drag the Date Dimension into Columns and Click the Dimension and Select Day.
    * Drag the Confirmed Measure into Rows.
    * Under Marks, Choose the Line type to be Bar.
    <img src="Confirmed_Bar.png" width="700">
* Graph of number of Confirmed Deaths.
    * Drag the Date Dimension into Columns and Click the Dimension and Select Day.
    * Drag the Deaths Measure into Rows.
    * Under Marks, Choose the Line type to be Bar.
    <img src="Deaths_Bar.png" width="700">
* Confirmed Cases by Province.
    * Drag the Confirmed Measure into Columns.
    * Drag the Province Dimension into Rows.
    * Drag the Confirmed Measure into both Color and Label Mark. Sort Province by Descending.
    <img src="Confirmed_Measure.png" width="700">
* Confirmed Deaths by Province.
    * Drag the Deaths Measure into Columns.
    * Drag the Province Dimension into Rows.
    * Drag the Deaths Measure into both Color and Label Mark. Sort Province by Descending.
    <img src="Deaths_Measure.png" width="700">
* Add a Label for Confirmed Cases till date.
    * Drag Confirmed Measure into Text Mark.
    * Click on Text and Edit the Label to change the Font and Color.
    * Right Click on Title and Click on Hide Title.
    <img src="Confirmed_Label.png" width="700">
* Add a Label for Deaths till date.
    * Drag Deaths Measure into Text Mark.
    * Click on Text and Edit the Label to change the Font and Color.
    * Right Click on Title and Click on Hide Title.
    <img src="Deaths_Label.png" width="700">
* Create Dashboard
    * Under Dashboard, Click on New Dashboard.
    * Drag each Sheet into the Dashboard Container and Resize.
    * Click on Container for each Sheet and Click on Hide Label.
    * To insert a Text Label for Each container, Drag the Object Text into the Dashboard for each Sheet.
    * Export this Dashboard as an Image.
    <img src="Dashboard.png" width="700">