# Applying functions to dates and times
<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->

This tutorial demonstrates how to work with [date and time scalar functions](https://druid.apache.org/docs/latest/querying/sql-scalar#date-and-time-functions). You will run SQL that show them being applied to transform, filter, and aggregate data.

## Prerequisites

This tutorial works with Druid 27.0.0 or later.

Launch this tutorial and all prerequisites using the `druid-jupyter` profile of the Docker Compose file for Jupyter-based Druid tutorials. For more information, see the Learn Druid repository [readme](https://github.com/implydata/learn-druid).

## Initialization

Run the next cell to set up the Druid Python client's connection to Apache Druid.

If successful, the Druid version number will be shown in the output.

In [None]:
import druidapi
import os

if 'DRUID_HOST' not in os.environ.keys():
    druid_host=f"http://localhost:8888"
else:
    druid_host=f"http://{os.environ['DRUID_HOST']}:8888"
    
print(f"Opening a connection to {druid_host}.")
druid = druidapi.jupyter_client(druid_host)

display = druid.display
sql_client = druid.sql
status_client = druid.status

status_client.version

### Load example data

Once your Druid environment is up and running, ingest the sample data for this tutorial.

Run the following cell to create a table called `example-koalas-fndatetime`. The ingestion only includes fields from the source data that is required for this notebook.

When completed, you'll see a description of the final table.

In [None]:
sql='''
REPLACE INTO "example-koalas-fndatetime" OVERWRITE ALL
WITH "ext" AS (SELECT *
FROM TABLE(
  EXTERN(
    '{"type":"http","uris":["https://static.imply.io/example-data/kttm-v2/kttm-v2-2019-08-25.json.gz"]}',
    '{"type":"json"}'
  )
) EXTEND ("timestamp" VARCHAR, "agent_category" VARCHAR, "agent_type" VARCHAR, "browser" VARCHAR, "browser_version" VARCHAR, "city" VARCHAR, "continent" VARCHAR, "country" VARCHAR, "version" VARCHAR, "event_type" VARCHAR, "event_subtype" VARCHAR, "loaded_image" VARCHAR, "adblock_list" VARCHAR, "forwarded_for" VARCHAR, "language" VARCHAR, "number" VARCHAR, "os" VARCHAR, "path" VARCHAR, "platform" VARCHAR, "referrer" VARCHAR, "referrer_host" VARCHAR, "region" VARCHAR, "remote_address" VARCHAR, "screen" VARCHAR, "session" VARCHAR, "session_length" BIGINT, "timezone" VARCHAR, "timezone_offset" VARCHAR, "window" VARCHAR))
SELECT
  TIME_PARSE("timestamp") AS "__time",
  "browser",
  "browser_version",
  "city",
  "continent",
  "country",
  "version",
  "event_type",
  "event_subtype",
  "loaded_image",
  "session",
  "session_length"
FROM "ext"
PARTITIONED BY DAY
'''

display.run_task(sql)
sql_client.wait_until_ready('example-koalas-fndatetime')
display.table('example-koalas-fndatetime')

Finally, run the following cell to import additional Python modules that you will use to create visuals as part of the notebook.

In [None]:
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd

## Filter data using time

In this part of the notebook you'll see examples of:

* Translating a string timestamp to a datetime using `TIMESTAMP`.
* Filtering a query using `TIME_IN_INTERVAL`.
* Using `CURRENT_TIMESTAMP` to produce the current datetime.
* Calculating a new timestamp using `TIME_SHIFT`.

The `TIMESTAMP` function translates a string into a timestamp that can then be used in operators.

In [None]:
sql='''
SELECT
  MIN(__time) AS "earliestSession",
  MAX(__time) AS "latestSession",
  COUNT(*) AS "sessions"
FROM "example-koalas-fndatetime"
WHERE __time > TIMESTAMP '2019-08-25 23:00:00'
'''

display.sql(sql)

When you cannot guarantee that timestamps are in ISO format, or when timezones need to be accounted for, use `TIME_PARSE`. This is the function most commonly used in SQL ingestion to parse incoming timestamps.

Run the cell below to see how various input timestamp formats are processed.

In [None]:
sql='''
SELECT
 TIME_PARSE('13-07-2019 08:32','dd-MM-yyyy HH:mm') AS "parse_1",
 TIME_PARSE('1998 245 16','YYYY DDD HH') AS "parse_2",
 TIME_PARSE('2023/11/03 01:43','yyyy/MM/dd HH:mm') AS "parse_3",
 TIME_PARSE('1984-05-13T12:56:14.451Z','yyyy-MM-dd''T''HH:mm:ss.SSSZ') AS "parse_4",
 TIMESTAMP '2019-04-03' AS "parse_ISO"
'''

display.sql(sql)

Instead of `BETWEEN` to capture an interval from the data, use the `TIME_IN_INTERVAL` function.

In [None]:
sql='''
SELECT
  MIN(__time) AS "earliestSession",
  MAX(__time) AS "latestSession",
  COUNT(*) AS "sessions"
FROM "example-koalas-fndatetime"
WHERE TIME_IN_INTERVAL(__time,'2019-08-25T11/2019-08-25T13')
'''

display.sql(sql)

The second parameter is an [ISO8601](https://en.wikipedia.org/wiki/ISO_8601#Time_intervals) interval representing the period between 11am and 1pm on the 25th August 2019.

In the next cell, an ISO8601 period is given as the start of the interval to give the same results.

In [None]:
sql='''
SELECT
  MIN(__time) AS "earliestSession",
  MAX(__time) AS "latestSession",
  COUNT(*) AS "sessions"
FROM "example-koalas-fndatetime"
WHERE TIME_IN_INTERVAL(__time,'PT2H/2019-08-25T13')
'''

display.sql(sql)

The `TIME_SHIFT` function applies multiples of ISO8601 periods to a timestamp.

Run the next cell for a simple example:

In [None]:
sql='''
SELECT
  __time AS "sessionStart",
  TIME_SHIFT(__time,'PT1H',3) AS "sessionStart-plus3Hours",
  TIME_SHIFT(__time,'P1Y',-5) AS "sessionStart-minus5Years"
FROM "example-koalas-fndatetime"
WHERE TIME_IN_INTERVAL(__time,'2019-08-25T13/PT1S')
'''

display.sql(sql)

A period - along with a multiple - has been provided to the `TIME_SHIFT` function. This is then added to the timestamp. One calculation adds 3 hours, the next subtracts 5 years.

The results are all sessions that started in the minute before 1pm and ended (by calculation) in the minute after 1pm.

Run the following cell to see `TIME_SHIFT` and `TIME_IN_INTERVAL` being used together.

In [None]:
sql='''
SELECT
  __time AS "start",
  "session_length" * 0.01 as "session_seconds",
  TIME_SHIFT(__time, 'PT0.01S', "session_length") AS "end"
FROM "example-koalas-fndatetime"
WHERE TIME_IN_INTERVAL(TIME_SHIFT(__time, 'PT0.01S', "session_length"),'2019-08-25T13/PT1M')
AND TIME_IN_INTERVAL(__time, 'PT1M/2019-08-25T13')
LIMIT 10
'''

display.sql(sql)

The `CURRENT_TIMESTAMP` function returns the current time and date. You might use these in combination to return results covering a period of time before the current moment, for example:

```sql
    SELECT
      MIN(__time) AS "earliestSession",
      MAX(__time) AS "latestSession",
      COUNT(*) AS "sessions"
    FROM "example-koalas-fndatetime"
    WHERE __time < TIME_SHIFT(CURRENT_TIMESTAMP,'PT1H',-6)
```

## Use time functions at ingestion time

In this part of the notebook are examples of:

* Reading secondary dimensions from a table using `MILLIS_TO_TIMESTAMP`.
* Filtering data using secondary dimensions.

Run the cell below to create a new dimension - a secondary timestamp - that represents the end timestamp of each session.

In [None]:
sql='''
REPLACE INTO "example-koalas-fndatetime" OVERWRITE ALL
WITH "ext" AS (SELECT *
FROM TABLE(
  EXTERN(
    '{"type":"http","uris":["https://static.imply.io/example-data/kttm-v2/kttm-v2-2019-08-25.json.gz"]}',
    '{"type":"json"}'
  )
) EXTEND ("timestamp" VARCHAR, "agent_category" VARCHAR, "agent_type" VARCHAR, "browser" VARCHAR, "browser_version" VARCHAR, "city" VARCHAR, "continent" VARCHAR, "country" VARCHAR, "version" VARCHAR, "event_type" VARCHAR, "event_subtype" VARCHAR, "loaded_image" VARCHAR, "adblock_list" VARCHAR, "forwarded_for" VARCHAR, "language" VARCHAR, "number" VARCHAR, "os" VARCHAR, "path" VARCHAR, "platform" VARCHAR, "referrer" VARCHAR, "referrer_host" VARCHAR, "region" VARCHAR, "remote_address" VARCHAR, "screen" VARCHAR, "session" VARCHAR, "session_length" BIGINT, "timezone" VARCHAR, "timezone_offset" VARCHAR, "window" VARCHAR))
SELECT
  TIME_PARSE("timestamp") AS "__time",
  "browser",
  "browser_version",
  "city",
  "continent",
  "country",
  "version",
  "event_type",
  "event_subtype",
  "loaded_image",
  "session",
  "session_length",
  TIME_SHIFT(TIME_PARSE("timestamp"), 'PT0.01S', "session_length") AS "ended_at"
FROM "ext"
PARTITIONED BY DAY
'''

display.run_task(sql)
sql_client.wait_until_ready('example-koalas-fndatetime')
display.table('example-koalas-fndatetime')

Address this new timestamp in the table, [stored as a `LONG`](https://druid.apache.org/docs/latest/querying/sql-data-types#standard-types), using `MILLIS_TO_TIMESTAMP`.

Run the next cell, showing `MILLIS_TO_TIMESTAMP` being used alongside `TIME_IN_INTERVAL`. Notice these results are the same as are produced by the SQL statement above.

In [None]:
sql='''
SELECT
  __time AS "start",
  MILLIS_TO_TIMESTAMP("ended_at") AS "end"
FROM "example-koalas-fndatetime"
WHERE TIME_IN_INTERVAL(MILLIS_TO_TIMESTAMP("ended_at"),'2019-08-25T13/PT1M')
AND TIME_IN_INTERVAL(__time, 'PT1M/2019-08-25T13')
LIMIT 10
'''

display.sql(sql)

## Use functions to format the date and time

Below are examples of:

* Using `TIME_FORMAT` to apply a string pattern to a timestamp.
* Extracting elements of a timestamp using `TIME_EXTRACT` and `EXTRACT`.

The `TIME_FORMAT` function applies a [JODA](https://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html) pattern to a datetime values.

Run the cell below to see a number of examples.

In [None]:
sql='''
SELECT
  TIME_FORMAT(__time, 'hh:mm:ss') AS "sessionStart-HMS",
  TIME_FORMAT(CURRENT_TIMESTAMP, 'YYYY-MM') AS "now-YM",
  TIME_FORMAT(TIME_SHIFT(MILLIS_TO_TIMESTAMP("ended_at"),'PT1H',3), 'YYYY-MM-dd (DD)') AS "ended_at-plus3Hours-YMD-DOY",
  TIME_FORMAT(TIME_SHIFT(__time,'P1Y',-5), 'EE dd MMM YY:hh a z') AS "sessionStart-minus5Years-reallyPretty"
FROM "example-koalas-fndatetime"
WHERE TIME_IN_INTERVAL(__time,'2019-08-25T13/PT1S')
'''

display.sql(sql)

Using `EXTRACT` and `TIME_EXTRACT`, particular portions of datetimes are returned.

In [None]:
sql='''
SELECT
  EXTRACT(HOUR FROM __time) AS "start-hour",
  EXTRACT(HOUR FROM MILLIS_TO_TIMESTAMP("ended_at")) AS "end-hour",
  COUNT(*) AS "sessions"
FROM "example-koalas-fndatetime"
GROUP BY 1, 2
'''

df = pd.DataFrame(sql_client.sql(sql))
df_group=df.groupby(['start-hour','end-hour']).sum().unstack()
df_group.plot.bar(stacked="true", legend=None)

Run the next cell to see how `TIME_EXTRACT` has been used instead of `EXTRACT` to allow a timezone to be specified in the calculation.

In [None]:
sql='''
SELECT
  TIME_EXTRACT(__time, 'HOUR') AS "start-hour",
  TIME_EXTRACT(__time, 'HOUR', 'America/Los_Angeles') AS "start-hour-LA",
  TIME_EXTRACT(__time, 'HOUR', 'India/Kolkota') AS "start-hour-Ko",
  COUNT(*) AS "sessions"
FROM "example-koalas-fndatetime"
WHERE TIME_IN_INTERVAL(__time, '2019-08-25T03/PT2H')
GROUP BY 1,2,3
'''

display.sql(sql)

## Rounding timestamps

This notebook includes examples of:

* Rounding down: `TIME_FLOOR`, `FLOOR`, and `DATE_TRUNC` ([translated](https://druid.apache.org/docs/latest/querying/math-expr#time-functions) to the `timestamp_floor` native JSON function)
* Rounding up: `TIME_CEIL` and `CEIL` (translated to the `timestamp_ceil` native JSON function).

Run the cell below to see how these functions effect timestamps.

In [None]:
sql='''
SELECT DISTINCT  
  TIME_FLOOR(__time, 'PT1H') AS "session_down_hour_TF",
  FLOOR(__time TO HOUR) AS "session_down_hour_F",
  DATE_TRUNC('HOUR', __time) AS "session_down_hour_DT",
  TIME_CEIL(__time, 'PT1H') AS "session_up_hour_TC",
  CEIL(__time TO HOUR) AS "session_up_hour_C"
FROM "example-koalas-fndatetime"
WHERE TIME_IN_INTERVAL(__time, '2019-08-25T03/PT2H')
'''

display.sql(sql)

Using the more flexible `TIME_FLOOR` and `TIME_CEIL` functions, a parameter has been supplied in the SQL statement below to shift the calculation of the floor / ceiling operation.

In [None]:
sql='''
SELECT DISTINCT
  TIME_FLOOR(__time, 'PT1H') AS "session_down_hour_TF",
  TIME_FLOOR(__time, 'PT1H', TIMESTAMP '2018-01-01 00:45:00') AS "session_down_hour_TFZ",
  TIME_CEIL(__time, 'PT1H') AS "session_up_hour_TC",
  TIME_CEIL(__time, 'PT1H', TIMESTAMP '1970-01-01 00:35:10') AS "session_up_hour_TCZ"
FROM "example-koalas-fndatetime"
WHERE TIME_IN_INTERVAL(__time, '2019-08-25T03/PT2H')
'''

display.sql(sql)

Notice how the result was calculated using the timestamp provided, rather than the standard base of the [Linux epoch](https://en.wikipedia.org/wiki/Unix_time).

## Generate time-based statistics

Time-based analytics relies on using scalar functions to round or truncate timestamps to reduce the cardinality of the `__time` (and other timestamp) columns for input to `GROUP BY` operations.

Run the following cell to see a `GROUP BY` function where the time has been floored and metrics produced.

In [None]:
sql='''
SELECT DISTINCT  
  TIME_FLOOR(__time, 'PT15M') AS "timeperiod",
  "continent",
  COUNT(DISTINCT "session") AS "sessions",
  COUNT(DISTINCT "country") AS "countries"
FROM "example-koalas-fndatetime"
WHERE TIME_IN_INTERVAL(__time, '2019-08-25T03/PT1H')
GROUP BY 1, 2
'''

display.sql(sql)

Using the `WHERE` condition for filters, the next cell uses the `TIME_IN_INTERVAL` to generate a 15-minutely breakdown, per continent, of the number of sessions.

In [None]:
sql='''
SELECT DISTINCT  
  "continent",
  COUNT(DISTINCT "session") FILTER (WHERE TIME_IN_INTERVAL(__time, '2019-08-25T03:00:00/PT15M')) AS "sessions_0",
  COUNT(DISTINCT "session") FILTER (WHERE TIME_IN_INTERVAL(__time, '2019-08-25T03:15:00/PT15M')) AS "sessions_1",
  COUNT(DISTINCT "session") FILTER (WHERE TIME_IN_INTERVAL(__time, '2019-08-25T03:30:00/PT15M')) AS "sessions_2",
  COUNT(DISTINCT "session") FILTER (WHERE TIME_IN_INTERVAL(__time, '2019-08-25T03:45:00/PT15M')) AS "sessions_3"
FROM "example-koalas-fndatetime"
WHERE TIME_IN_INTERVAL(__time, '2019-08-25T03/PT1H')
GROUP BY 1
'''

display.sql(sql)

Run the following cell to see an example of a single query that compares data from two time periods using a `JOIN`.

In [None]:
sql='''
WITH recent_sessions AS
(
    SELECT "continent", COUNT(DISTINCT "session") AS "sessions"
    FROM "example-koalas-fndatetime"
    WHERE TIME_IN_INTERVAL(__time, '2019-08-25T06/PT1H')
    GROUP BY 1
),
older_sessions_15minutely AS
(
    SELECT TIME_FLOOR(__time,'PT15M'), "continent", COUNT(DISTINCT "session") AS "sessions"
    FROM "example-koalas-fndatetime"
    WHERE TIME_IN_INTERVAL(__time, '2019-08-25T00/PT12H')
    GROUP BY 1, 2
),
older_sessions AS
(
    SELECT "continent", AVG("sessions") AS "sessions"
    FROM older_sessions_15minutely
    GROUP BY 1
)

SELECT
    recent_sessions."continent",
    recent_sessions.sessions AS "recent",
    older_sessions.sessions AS "historical_average"
FROM recent_sessions
LEFT JOIN older_sessions ON recent_sessions.continent = older_sessions.continent
'''

display.sql(sql)

The `GROUP BY` function can be used at ingestion time (also known as "[rollup](https://druid.apache.org/docs/latest/ingestion/rollup)") to pre-calculate commonly used aggregations.

Run the next cell to create a pre-aggregated table.

In [None]:
sql='''
REPLACE INTO "example-koalas-fndatetime-rollup" OVERWRITE ALL
WITH "ext" AS (SELECT *
FROM TABLE(
  EXTERN(
    '{"type":"http","uris":["https://static.imply.io/example-data/kttm-v2/kttm-v2-2019-08-25.json.gz"]}',
    '{"type":"json"}'
  )
) EXTEND ("timestamp" VARCHAR, "agent_category" VARCHAR, "agent_type" VARCHAR, "browser" VARCHAR, "browser_version" VARCHAR, "city" VARCHAR, "continent" VARCHAR, "country" VARCHAR, "version" VARCHAR, "event_type" VARCHAR, "event_subtype" VARCHAR, "loaded_image" VARCHAR, "adblock_list" VARCHAR, "forwarded_for" VARCHAR, "language" VARCHAR, "number" VARCHAR, "os" VARCHAR, "path" VARCHAR, "platform" VARCHAR, "referrer" VARCHAR, "referrer_host" VARCHAR, "region" VARCHAR, "remote_address" VARCHAR, "screen" VARCHAR, "session" VARCHAR, "session_length" BIGINT, "timezone" VARCHAR, "timezone_offset" VARCHAR, "window" VARCHAR))
SELECT
  TIME_FLOOR(TIME_PARSE("timestamp"),'PT1H') AS "__time",
  TIME_FLOOR(CURRENT_TIMESTAMP,'PT1H') AS "ingested_at",
  TIME_CEIL(TIME_SHIFT(TIME_PARSE("timestamp"), 'PT0.01S', "session_length"),'PT1H') AS "ended_at",
  "city",
  "continent",
  "country",
  COUNT(DISTINCT "browser") AS "browsers",
  COUNT(DISTINCT "session") AS "sessions",
  MAX("session_length") AS "session_length_max",
  COUNT(*) AS "event_count"
FROM "ext"
GROUP BY 1, 2, 3, 4, 5, 6
PARTITIONED BY DAY
'''

display.run_task(sql)
sql_client.wait_until_ready('example-koalas-fndatetime-rollup')
display.table('example-koalas-fndatetime-rollup')

The SQL statement includes:

* Functions to create a primary timestamp.
   * `TIME_PARSE` to convert the incoming raw time into the primary time column
   * `TIME_FLOOR` to round the timestamp down, reducing the granularity of the primary time column
* Functions to create additional datetime columns.
   * `CURRENT_TIMESTAMP` to generate a secondary timestamp, rounded down using `TIME_FLOOR`
   * `TIME_SHIFT` and `TIME_PARSE` used in combination to calculate a session end timestamp that is then rounded up with `TIME_CEIL`
 
A similar operation can be achieved in classic batch or streaming ingestion by using [native functions](https://druid.apache.org/docs/latest/querying/aggregations) to [emit metrics](https://druid.apache.org/docs/latest/ingestion/ingestion-spec#metricsspec), using `queryGranularity` to round down the primary timestamp, native [functions](https://druid.apache.org/docs/latest/querying/math-expr) for additional date time columns, and by enabling [rollup](https://druid.apache.org/docs/latest/ingestion/rollup) to carry out the `GROUP BY`.

Run the cell below to look at a sample of the data from the new table.

In [None]:
sql='''
SELECT
  "__time",
  "city",
  "continent",
  "country",
  "browsers",
  "sessions",
  "session_length_max",
  "event_count"
FROM "example-koalas-fndatetime-rollup"
WHERE TIME_IN_INTERVAL(__time, '2019-08-25T13/PT1H')
LIMIT 5
'''

display.sql(sql)

For each time period, for each city, continent, and then country, you see a pre-calculated number of browsers and sessions, the maximum session length, and the count of the number of events.

Run the next cell to generate a plot, hour-by-hour, of the average session length.

The SQL statement includes some fields that are not used in the plot itself so that you can unpick how the average has been calculated.

In [None]:
sql='''
SELECT
  "__time",
  TIME_FORMAT("__time", 'dd MMM hh a') AS "time",
  SUM("event_count") AS "events",
  SUM("session_length_max") AS "session_length",
  SUM("session_length_max") / SUM("event_count") AS "avg_session_length"
FROM "example-koalas-fndatetime-rollup"
WHERE TIME_IN_INTERVAL(__time, '2019-08-25T0/PT12H')
GROUP BY 1
'''

df = pd.DataFrame(sql_client.sql(sql))
df.plot.bar(x='time', y='avg_session_length')
plt.xticks(rotation=45, ha='right')
plt.show()

The SQL statement in the next cell uses `MILLIS_TO_TIMESTAMP` against the additional datetime columns, together with the `TIMESTAMPDIFF` function. The resulting plot shows the average lag between the session being recorded in the source data and it having been ingested into the table.

In [None]:
sql='''
SELECT
    __time,
    ingested_at,
    TIME_FORMAT("__time", 'dd MMM hh a') AS "time",
    TIMESTAMPDIFF(MINUTE, "__time", MILLIS_TO_TIMESTAMP("ingested_at")) / SUM("event_count") AS "avgdelay-minutes"
FROM "example-koalas-fndatetime-rollup"
WHERE TIME_IN_INTERVAL(__time, '2019-08-25/P1D')
GROUP BY 1, 2, 3
'''

df = pd.DataFrame(sql_client.sql(sql))
df.plot.bar(x='time', y='avgdelay-minutes')
plt.xticks(rotation=45, ha='right')
plt.show()

## Clean up

Run the following cell to remove the tables created for this notebook from the database.

In [None]:
druid.datasources.drop("example-koalas-fndatetime-rollup")
druid.datasources.drop("example-koalas-fndatetime")

## Summary

* All rows in Druid tables have an primary event timestamp
* Effective filtering by time is an essential element of all queries on Druid tables
* Additional datetime columns can be added to tables and used in queries
* Various timestamp formats can be parsed to Druid timestamps
* Calculations can be performed to add and remove periods of time
* There are basic and advanced versions of SQL functions
* Timestamps can be rounded up and down, particularly useful in `GROUP BY` statements
* When combined with a timestamp function, a `GROUP BY` at ingestion time can create very lean tables

## Learn more

* Review the full list of [SQL](https://druid.apache.org/docs/latest/querying/sql-scalar#date-and-time-functions) and [native](https://druid.apache.org/docs/latest/querying/aggregations) functions
* Look at the [list of timezones](https://www.joda.org/joda-time/timezones.html)
* Bookmark the [JODA time format reference page](https://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html)
* Read the list of [granularities](https://druid.apache.org/docs/latest/querying/granularities#simple-granularities)
* Try using SQL `GROUP BY` / Native Rollup on your own sample data set to create a pre-aggregated table
* See the [sketch ingestion](../02-ingestion/03-sketchIngestion.ipynb) notebook for another example of a pre-aggregated table