#### Provisioning an Amazon Redshift Cluster
* Under Analytics, Choose Amazon Redshift. Click on Create Cluster.
* Under Cluster Identifier give a Name of your cluster. I have given the name My-Cluster.
* Choose free trial. By default, free trial cluster selects dc2.large | 1 node cluster with 2vCPU.
<img src="Cluster_Configuration.png" width="700">
* Under database configurations, choose database name as dev and database port as 5439.
* Also choose a username and password for your database.
<img src="Database_Configuration.png" width="700">

#### Query Editor
* Under Redshift service, go to Query Editor.
* Connect to database using database credentials. Click on Connect to Database.
<img src="Connect_Database.png" width="700">
* Run this query to Create a new table in Amazon Redshift
<code>
CREATE TABLE public.phi_table(
  phi_Date date NOT NULL,
  phi_Province varchar NOT NULL,
  phi_Confirmed integer,
  phi_Deaths integer
)
</code>
<img src="Query_Editor.png" width="700">

#### Using AWS Glue Service to Import data into Redshift
There are different methods to Import our data from S3 bucket into a Redshift table, some of these are
1. COPY method - The COPY query loads data in parallel from Amazon S3, Amazon EMR, Amazon DynamoDB, or multiple data sources on remote hosts. COPY loads large amounts of data much more efficiently than using INSERT statements, and stores the data more effectively as well.

        copy <table_name> from 's3://<bucket_name>/<manifest_file>'
        authorization
        manifest;
        
2. Create AWS Glue Job

* Go to AWS Glue service and Create a Classifier.
    * Choose Classifier type as CSV and Column delimiter for CSV. Give a Name to your Classifier and Click on Create.
<img src="Add_Classifier.png" width="700">
* Create a Crawler to connect to a data source which in our case is S3 bucket.
    * Go to Crawler section and Click on Add Crawler. Give a name to your crawler and select our Classifier.
    <img src="Add_Crawler.png" width="700">
    * Choose Crawler source type as data store and choose data store as S3 and type in path to your S3 bucket.
    <img src="Add_Data_Store.png" width="700">
    * Choose IAM Role to allow crawler to access S3 data stores.
    * Click on Add database to create a new database catalog to contain all the tables created by this crawler.
    <img src="Crawler_Output.png" width="700">
    * Click on Finish to create a CSV Crawler. Click on Run Crawler to run your crawler to import metadata from your source files into tables.
* Setting Up Redshift Connection
    * Under AWS Glue, go to Connections and choose Add Connection.
    * Give a Connection Name and Choose Connection type as Amazon Redshift.
    <img src="Add_Connection.png" width="700">
    * Choose your Redshift Cluster and enter your Redshift Database username and password and click on Finish.
    * To successfully test this connection to your Redshift Cluster from AWS Glue, we need to configure VPC Endpoint for AWS Glue service to connect to S3.
* Set Up VPC Gateway Endpoint Connection from AWS Glue to S3.
    * Go to Amazon VPC service and Click on Endpoints.
    * Click on Create Endpoint. Under Service category choose AWS services and under Service Name choose com.amazonaws.us-east-1.s3.
    <img src="Create_Endpoint.png" width="700">
    * Choose your default VPC and Router Table ID and Click Create Endpoint.
* Create a new Crawler to Connect to our Redshift database and identify the schema of our redshift tables.
    * Under AWS Glue, Go to Add Crawler and give it a name.
    * Choose Crawler source type as data stores.
    * Under data store choose JDBC and Under Connection choose the connection RedshiftCluster that we already defined.
    * Choose the input path in the format database/schema/table and Click Next.
    <img src="Edit_Crawler.png" width="700">
    * Choose our existing IAM Role and Click Next.
    * Choose Output database catalog Name and Click Next. Review and Click Finish.
    <img src="Output_Database.png" width="700">
* Create a Job to transfer S3 data to Redshift tables
    * Under Glue Go to Jobs. Click on Add Job.
    * Give a Name to your Job and Choose the existing Glue Role. Click on Next.
    <img src="Add_Job.png" width="700">
    * Choose your data source as the S3 catalog and Click Next.
    <img src="Data_Source.png" width="700">
    * Choose a data target as the Redshift catalog and Click Next.
    <img src="Data_Target.png" width="700">
    * Configure mapping from source schema to output schema
    <img src="Output_Schema.png" width="700">
    * Click on Save Job and Edit Script.
    * Click on Run Job

#### Performing Analysis Using Redshift Query Editor
* Run the below command to confirm that the tables populated well
<code>
SELECT
  *
FROM 
  public.jh_table 
WHERE
  jh_province = 'Ontario'
</code>
<img src="Query_1.png" width="700">
Output
<code>
jh_date,jh_country,jh_province,jh_lat,jh_long,jh_confirmed,jh_recovered,jh_deaths
2020-01-22,Canada,Ontario,51.2538,-85.3232,0,,0
2020-01-24,Canada,Ontario,51.2538,-85.3232,0,,0
2020-01-26,Canada,Ontario,51.2538,-85.3232,1,,0
2020-01-28,Canada,Ontario,51.2538,-85.3232,1,,0
2020-01-30,Canada,Ontario,51.2538,-85.3232,1,,0
2020-02-01,Canada,Ontario,51.2538,-85.3232,3,,0
2020-02-03,Canada,Ontario,51.2538,-85.3232,3,,0
2020-02-05,Canada,Ontario,51.2538,-85.3232,3,,0
2020-02-07,Canada,Ontario,51.2538,-85.3232,3,,0
2020-02-09,Canada,Ontario,51.2538,-85.3232,3,,0
</code>
* Export our final table into our S3 bucket in a different file format than the source data
UNLOAD method 
Unloads the result of a query to one or more text or Apache Parquet files on Amazon S3, using Amazon S3 server-side encryption (SSE-S3).

        UNLOAD ('select-statement')
        TO 's3://object-path/name-prefix'
        authorization
        [ option [ ... ] ]
<code>
UNLOAD ('SELECT * FROM final_table')
TO 's3://covid-19-tracker-2020/redshift/output/'
IAM_ROLE 'arn:aws:iam::234610110457:role/My-Redshift-IAM-Role'
FORMAT PARQUET# Parquet format
</code>
<img src="Unload.png" width="700">