PREPARATION FOR eu-west-1
==

In [None]:
!aws s3 sync s3://aws-potus/split/ s3://$USER-$WORKSHOP-aws-bigdata-workshop/split/

In [None]:
!aws s3api put-object --bucket $USER-$WORKSHOP-aws-bigdata-workshop --key parquet/clinton/

In [None]:
!aws s3api put-object --bucket $USER-$WORKSHOP-aws-bigdata-workshop --key parquet/trump/

In [None]:
!echo "BUCKETNAME on eu-west-1 : $USER-$WORKSHOP-aws-bigdata-workshop"

GLUE to convert our CSV to Parquet (eu-west-1)
===

Create Athena tables

- Go to https://console.aws.amazon.com/athena/home?region=eu-west-1#
- In the "query editor" put the following code

```
CREATE EXTERNAL TABLE IF NOT EXISTS default.clinton ( 
      `id` bigint, 
      `name` string, 
      `message` string, 
      `ts` bigint, 
      `isodate` timestamp, 
      `date` string 
    ) 
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
    WITH SERDEPROPERTIES ( 
      'serialization.format' = ',', 
      'field.delim' = '|' 
    ) LOCATION 's3://PLEASE-REPLACE-WITH-YOUR-BUCKET-NAME-ON-eu-west-1/split/Hillary/' 
    TBLPROPERTIES ('has_encrypted_data'='false'); 
```
- put another request

```
CREATE EXTERNAL TABLE IF NOT EXISTS default.trump ( 
      `id` bigint, 
      `name` string, 
      `message` string, 
      `ts` bigint, 
      `isodate` timestamp, 
      `date` string 
    ) 
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
    WITH SERDEPROPERTIES ( 
      'serialization.format' = ',', 
      'field.delim' = '|' 
    ) LOCATION 's3://PLEASE-REPLACE-WITH-YOUR-BUCKET-NAME-eu-west-1/split/Donald/' 
    TBLPROPERTIES ('has_encrypted_data'='false');
    ```


Go to Glue console on **eu-west-1** https://console.aws.amazon.com/glue/home?region=eu-west-1#

- In the "Data catalog" go to "Databases" / "Tables"
- Verify that your table 'clinton' is here
  - if not there is only a small update to do in Athena side to allow the share of Athena tables to Glue.
  
- Now in Glue in the left bar choose "Jobs" in "ETL" section
- Choose "Add job"
- For "name" choose "clinton-parquet-conversion"
- For role choose an existing role or create a new one with the link "Create IAM role" (Target service is Glue, for a test I will use Administrator managed policy)
- You will have to choose the option "A proposed script generated by AWS Glue"
- You can leave unchanged "script name" and "S3 path where the script is stored"
- You can use same path (as "S3 path where the script is stored") with a postfix "/tmp" (example : "s3://aws-glue-scripts-506951059283-eu-west-1/tmp/")
- And then you can click "Next"
- Then as "Datasource" choose your previously created table 'clinton'
- Then as "Data target" choose "Create tables in your data target"
 - For Data store choose Amazon S3
 - For Format choose Parquet
 - For "Target path" choose the path of the bucket you have created with a /clintonparquet/ postfix (example "s3://ec2-user-oberger17102017-aws-bigdata-workshop/parquet/clinton)
 - Choose "Next"
- You can validate the detected mapping by pressing "Next"
- You can click "Finish"

Now that your job is ready you can click in the menu bar "Run job"

Do the same thing for Trump tweets.


ATHENA on PARQUET (eu-west-1)
===

- Go to https://console.aws.amazon.com/athena/home?region=eu-west-1#
- In the "query editor" put the following code

```
CREATE EXTERNAL TABLE IF NOT EXISTS default.clintonparquet ( 
      `id` bigint, 
      `name` string, 
      `message` string, 
      `ts` bigint, 
      `isodate` timestamp, 
      `date` string 
    ) 
    STORED AS PARQUET
    LOCATION 's3://PLEASE-REPLACE-WITH-YOUR-BUCKET-NAME/parquet/clinton/' 
    tblproperties ("parquet.compress"="SNAPPY"); 
```
- put another request

```
CREATE EXTERNAL TABLE IF NOT EXISTS default.trumpparquet ( 
      `id` bigint, 
      `name` string, 
      `message` string, 
      `ts` bigint, 
      `isodate` timestamp, 
      `date` string 
    ) 
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
    STORED AS PARQUET
    LOCATION 's3://PLEASE-REPLACE-WITH-YOUR-BUCKET-NAME/parquet/trump/' 
    tblproperties ("parquet.compress"="SNAPPY");
```