# AWS Glue Studio Notebook
##### You are now running a AWS Glue Studio notebook; To start using your notebook you need to start an AWS Glue Interactive Session.


#### Optional: Run this cell to see available notebook commands ("magics").


In [None]:
%help

####  Run this cell to set up and start your interactive session.


In [4]:
%idle_timeout 2880
%glue_version 3.0
%worker_type G.1X
%number_of_workers 2
%extra_py_files s3://pedestrian-analysis-working-bucket/glue-job-scripts/util.py

import sys, io, zipfile, pandas as pd, util
from datetime import datetime

from pyspark.sql.functions import sum, col, rank, desc, lit
from pyspark.sql.window import Window
from pyspark.sql.functions import row_number

from pyspark.context import SparkContext
from pyspark.sql.types import (
    StructType, StructField, StringType, IntegerType, DoubleType, TimestampType
)

from awsglue.utils import getResolvedOptions
from awsglue.context import GlueContext
from awsglue.job import Job


sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)

You are already connected to a glueetl session a328b2c5-3eaa-41b1-9cf1-dab910110d18.

No change will be made to the current session that is set as glueetl. The session configuration change will apply to newly created sessions.


Current idle_timeout is 2880 minutes.
idle_timeout has been set to 2880 minutes.


You are already connected to a glueetl session a328b2c5-3eaa-41b1-9cf1-dab910110d18.

No change will be made to the current session that is set as glueetl. The session configuration change will apply to newly created sessions.


Setting Glue version to: 3.0


You are already connected to a glueetl session a328b2c5-3eaa-41b1-9cf1-dab910110d18.

No change will be made to the current session that is set as glueetl. The session configuration change will apply to newly created sessions.


Previous worker type: G.1X
Setting new worker type to: G.1X


You are already connected to a glueetl session a328b2c5-3eaa-41b1-9cf1-dab910110d18.

No change will be made to the current session that is set as glueetl. The session configuration change will apply to newly created sessions.


Previous number of workers: 2
Setting new number of workers to: 2


You are already connected to a glueetl session a328b2c5-3eaa-41b1-9cf1-dab910110d18.

No change will be made to the current session that is set as glueetl. The session configuration change will apply to newly created sessions.


Extra py files to be included:
s3://pedestrian-analysis-working-bucket/glue-job-scripts/util.py



In [2]:
BUCKET_NAME = 'pedestrian-analysis-working-bucket'
DATABASE_NAME = 'pedestrian_analysis_report'
TABLE_NAME = 'report_top_10_locations_per_day'

# Load the data from the source tables
df = glueContext.create_dynamic_frame.from_catalog(
    database="pedestrian_analysis_raw",
    table_name="sensor_counts"
).toDF()
sensor_reference = glueContext.create_dynamic_frame.from_catalog(
    database="pedestrian_analysis_raw",
    table_name="sensor_reference_data"
).toDF()




In [17]:
df.show()

+-------+-------------------+---------+--------------------+------------+
|     id|          date_time|sensor_id|         sensor_name|hourly_count|
+-------+-------------------+---------+--------------------+------------+
|2902119|2019-11-12T22:00:00|       11|     Waterfront City|          27|
| 763567|2014-01-24T10:00:00|       25|Melbourne Convent...|         436|
|1028480|2015-01-06T16:00:00|       23|Spencer St-Collin...|         760|
| 331173|2011-07-08T12:00:00|       10|      Victoria Point|         569|
| 430567|2012-03-03T19:00:00|        1|Bourke Street Mal...|         847|
| 631323|2013-06-14T14:00:00|        4|    Town Hall (West)|        3219|
| 573323|2013-01-30T13:00:00|       16|Australia on Collins|        2931|
| 408282|2012-01-12T04:00:00|       18|Collins Place (No...|           1|
| 754463|2014-01-12T13:00:00|        6|Flinders Street S...|        1154|
|1139175|2015-05-13T19:00:00|       11|     Waterfront City|          39|
| 256092|2011-01-04T10:00:00|       11

#### Example: Create a DynamicFrame from a table in the AWS Glue Data Catalog and display its schema


In [None]:
dyf = glueContext.create_dynamic_frame.from_catalog(database='database_name', table_name='table_name')
dyf.printSchema()

#### Example: Convert the DynamicFrame to a Spark DataFrame and display a sample of the data


In [None]:
df = dyf.toDF()
df.show()

#### Example: Write the data in the DynamicFrame to a location in Amazon S3 and a table for it in the AWS Glue Data Catalog


In [None]:
s3output = glueContext.getSink(
  path="s3://bucket_name/folder_name",
  connection_type="s3",
  updateBehavior="UPDATE_IN_DATABASE",
  partitionKeys=[],
  compression="snappy",
  enableUpdateCatalog=True,
  transformation_ctx="s3output",
)
s3output.setCatalogInfo(
  catalogDatabase="demo", catalogTableName="populations"
)
s3output.setFormat("glueparquet")
s3output.writeFrame(DyF)