<img src="img/dsci513_header2.png" width="600">

# Lecture 2: Data types, filtering, functions

**Arman Seyed-Ahmadi, November 2021**

## Lecture outline

- Various data types in SQL
- `WHERE` conditionals, pattern matching
- Derived columns, aliases with `AS`
- Conditionals with `CASE`
- Functions and operators in SQL

In [5]:
%load_ext sql
%config SqlMagic.displaylimit = 20

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [6]:
import json
import urllib.parse

with open('data/credentials.json') as f:
    login = json.load(f)

user = login['user']
password = urllib.parse.quote(login['password'])
host = login['host']
port = login['port']

In [7]:
%sql postgresql://{user}:{password}@{host}:{port}/

'Connected: postgres@'

## Data types

You might remember from previous lecture that in relational databases, each column is characterized with its name and its **domain**. A domain is the set of permissible or valid values that a column is allowed to store. This highlights one of the advantages of using a DBMS, which enforces particular data types for the columns of a table.

Postgres supports

- boolean
- character
- number
- datetime
- binary

and some extension types specific to Postgres.

### Type conversion

To demonstrate how different data types work in SQL, I first need to show you how we convert values from one type to another. In standard SQL, type conversion can be done using the `CAST` function:

```sql
CAST(<column> AS <data_type>)
```
In Postgres, we can also use the double-colon syntax as a shorthand for the above `CAST` function:

```sql
<column>::<data_type>
```

### Boolean

We can specify this data type using the keyword `BOOLEAN` or `BOOL`. Valid values are `NULL`, `TRUE`, `1` (or any other positive integer), `YES`, `Y`, `T`, `FALSE`, `0`, `NO`, `N`, `F`. Note that all of these values will be interpreted as `TRUE`, `FALSE`, or `NULL`  by Postgres:

In [19]:
%%sql

SELECT
    'TRUE'::BOOLEAN,
    'T'::BOOLEAN,
    '0'::BOOLEAN,
    'NO'::BOOLEAN
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


bool,bool_1,bool_2,bool_3
True,True,False,False


### Characters

The character data type is used to represent fixed-length and variable length character strings. This type can be defined using the following keywords:

- `CHAR(n)`: a string of exactly `n` characters padded with spaces
- `VARCHAR(n)`: a variable set of `n` characters
- `TEXT` which is a Postgres specific type for which there is practically no limit on the number of characters.

In [20]:
%%sql

SELECT
    'Arman'::CHAR(50),
    'Arman'::VARCHAR(2),
    'Arman'::TEXT
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


bpchar,varchar,text
Arman,Ar,Arman


> Note that you can't see the space-padding for `CHAR(50) 'Arman'` in the Jupyter notebook, but if you run the same statement in `psql`, you will see `'Arman'` + 45 spaces in the output.

### Numbers

Numerical values in Postgres belong to the following general categories:
- Integers
- Floating-point numbers
- Arbitrary precision numbers

**Integers:**

| Name     | Storage Size | Description                | Range                                        |
|----------|--------------|----------------------------|----------------------------------------------|
| `smallint` | 2 bytes      | small-range integer        | -32768 to +32767                             |
| `integer`  | 4 bytes      | typical choice for integer | -2147483648 to +2147483647                   |
| `bigint`   | 8 bytes      | large-range integer        | -9223372036854775808 to +9223372036854775807 |
| `serial`      | 4 bytes | auto-incrementing integer       | 1 to 2147483647          |
| `bigserial`   | 8 bytes | large auto-incrementing integer | 1 to 9223372036854775807 |

We'll learn later that the `serial` type (which is not an actual data type) is a shortcut to tell Postgres create unique "auto-incrementing" often used for the primary key column of table.

**Floating-point numbers:**

| Name     | Storage Size | Description                | Range                                        |
|----------|--------------|----------------------------|----------------------------------------------|
| `real`             | 4 bytes  | variable-precision, inexact     | at least 6 decimal digits (implementation dependent) |
| `double precision` | 8 bytes  | variable-precision, inexact     | at least 15 decimal digits (implementation dependent) |

**Arbitrary precision numbers**

| Name     | Storage Size | Description                | Range                                        |
|----------|--------------|----------------------------|----------------------------------------------|
| `numeric`          | variable | user-specified precision, exact | 131072 digits before and 16383 digits after the decimal point |
| `decimal`          | variable | user-specified precision, exact | 131072 digits before and 16383 digits after the decimal point |

> `DECIMAL` and `NUMERIC` data types are exactly the same thing in Postgres.

In [21]:
%%sql

SELECT CAST(44.268 AS SMALLINT);

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


int2
44


Note that this was also acceptable (and maybe preferred, but specific to Postgres):

In [22]:
%%sql

SELECT 44.268::SMALLINT;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


int2
44


In [23]:
%%sql

SELECT CAST(4.54021223948E-8 AS REAL);

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


float4
4.540212e-08


With the `numeric` data type, we can specify the total number of significant digits to store (known as precision) as well as the number of digits in the fractional part (known as scale) by specifying `NUMERIC(precision, scale)`:

In [24]:
%%sql

SELECT CAST('1.123456789' AS NUMERIC(5, 2));

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


numeric
1.12


The `numeric` type is exact (as opposed to other types of floats) and immune to the round-off error, but it is **slow to work with for the DBMS**. It is often used for monetary and financial data, where either numbers with a many digits may be stored or exactness is important.

For example, the following number cannot be represented as `BIGINT` and would throw an error, but it works with `NUMERIC`:

In [25]:
%%sql

SELECT CAST(9223372036854775808 AS NUMERIC);

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


numeric
9223372036854775808


### Date/time

Postgres provides **datetime** and **interval** data types similar to those we've seen in DSCI 511 in Python and Pandas.

#### Datetimes

- `DATE` for dates
- `TIME` for the time of day

Postgres also provides two ways to store the **timestamp** datatype;
- `TIMESTAMP` for date + time
- `TIMESTAMPTZ` for date + time + timezone (Postgres specific)

When a timestamp value is queried:
- For `TIMESTAMP`, Postgres returns the timestamp as originally stored in the database server
- For `TIMESTAMPTZ`, Postgres converts the timestamp into the local timezone of the database server

Note that Postgres does not store timezone information. It always internally stores `TIMESTAMPTZ` in [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) value, and does the back-conversion using the local time zone of the database server.

#### Intervals

There is also another datatype for storing intervals of time. Intervals are useful for doing date and time arithmetic, such as adding a duration of time to a timestamp.

For more detailed information, refer to the Postgres documentation [here](https://www.postgresql.org/docs/8.4/datatype-datetime.html).

**Entering datetime data**

Postgres does a pretty good job of getting the datetimes right even if we don't enter them in the standard ISO way. Let's take a look at a few examples:

In [26]:
%%sql

SELECT
    'January 23, 2021'::DATE,
    '23 January 2021'::DATE,
    '2021 1 23'::DATE,
    '1/23/2021'::DATE,
    'today'::DATE,
    'tomorrow'::DATE
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


date,date_1,date_2,date_3,date_4,date_5
2021-01-23,2021-01-23,2021-01-23,2021-01-23,2021-11-18,2021-11-19


In [27]:
%%sql

SELECT
    '14:24:00'::TIME,
    '2:24pm'::TIME,
    '2:24 PM PST'::TIME WITH TIME ZONE,
    'now'::TIME,
    'now'::TIME WITH TIME ZONE
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


time,time_1,timetz,time_2,timetz_1
14:24:00,14:24:00,14:24:00-08:00,07:39:08.686027,07:39:08.686027-08:00


In [28]:
%%sql

SELECT
    '1 day 23 hours 8 minutes'::INTERVAL,
    '2m 18s'::INTERVAL,
    '3 years 2 months'::INTERVAL
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


interval,interval_1,interval_2
"1 day, 23:08:00",0:02:18,"1155 days, 0:00:00"


When datetime is stored without timezone, it is oblivious to the local server timezone:

In [29]:
%sql SELECT '2021-11-18 8:30:00'::TIMESTAMP;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


timestamp
2021-11-18 08:30:00


In [30]:
%sql SELECT '2021-11-18 8:30:00'::TIMESTAMPTZ;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


timestamptz
2021-11-18 08:30:00-08:00


In [31]:
%sql SHOW TIMEZONE;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


TimeZone
America/Vancouver


In [32]:
%sql SET timezone = 'America/New_York';

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
Done.


[]

In [33]:
%sql SELECT '2021-11-18 8:30:00 -8'::TIMESTAMPTZ;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


timestamptz
2021-11-18 11:30:00-05:00


In [34]:
%sql SET timezone = 'America/Vancouver';

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
Done.


[]

In [35]:
%sql SELECT '2021-11-18 8:30:00'::TIMESTAMPTZ;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


timestamptz
2021-11-18 08:30:00-08:00


### Binary data

It is also possible to have binary data in a table (e.g. documents, images, videos). We don't use binary data in this course.

### Nulls

A null is marker to indicate that the value for a column is unknown, or not entered yet. A null is not equal to 0, or an empty string. In fact, a null is not even equal to another null!

How different environments show nulls:
- `ipython-sql` -> `None`
- psql -> blank space
- pgAdmin -> `[null]`

## Filtering rows with `WHERE`

We've seen the `WHERE` keyword in passing in the last lecture. `WHERE` is an intuitive keyword that is used to filter rows based on a particular condition. The syntax is as follows:

```sql
SELECT
    column1, column2
FROM
    table1
WHERE
    condition
;
```

| Condition        | Operator                        |
|------------------|---------------------------------|
| Comparison       | `=`, `<>`, `<`, `<=`, `>`, `>=` |
| Pattern matching | `LIKE`                          |
| Range            | `BETWEEN`                       |
| List             | `IN`                            |
| Null testing     | `IS NULL`                       |

In [37]:
%sql postgresql://{user}:{password}@{host}:{port}/imdb_dsci513

'Connected: postgres@imdb_dsci513'

In [38]:
%%sql

SELECT
    *
FROM
    movies;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
26058 rows affected.


id,title,orig_title,start_year,end_year,runtime,rating,nvotes
10035423,Kate & Leopold,,2001,,118,6.4,74982
10042742,Mister 880,,1950,,90,7.1,1171
10041181,Black Hand,,1950,,92,6.4,666
10041387,Francis,,1950,,91,6.4,979
10041719,Orpheus,Orphée,1950,,95,8.0,9346
10041931,Stromboli,"Stromboli, terra di Dio",1950,,107,7.3,5239
10042052,Woman in Hiding,,1950,,92,6.9,553
10042179,Abbott and Costello in the Foreign Legion,,1950,,80,6.6,2573
10042200,Annie Get Your Gun,,1950,,107,6.9,4050
10042206,Armored Car Robbery,,1950,,67,7.0,2077


---

**Example:** Retrieve rows for movies produced in or after 2010.

---

In [39]:
%%sql

SELECT
    *
FROM
    movies
WHERE
    start_year >= 2010
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
8804 rows affected.


id,title,orig_title,start_year,end_year,runtime,rating,nvotes
10069049,The Other Side of the Wind,,2018,,122,6.9,4904
10176694,The Tragedy of Man,Az ember tragédiája,2011,,160,7.8,610
10293069,Dark Blood,,2012,,86,6.5,1073
10315642,Wazir,,2016,,103,7.1,15796
10337692,On the Road,,2012,,124,6.1,38216
10359950,The Secret Life of Walter Mitty,,2013,,114,7.3,278645
10365907,A Walk Among the Tombstones,,2014,,114,6.5,106413
10369610,Jurassic World,,2015,,124,7.0,547391
10376136,The Rum Diary,,2011,,119,6.2,95417
10376479,American Pastoral,,2016,,108,6.1,13376


---

**Example:** Retrieve the row for the movie called "Lost Highway".

---

In [40]:
%%sql

SELECT
    *
FROM
    movies
WHERE
    title = 'Lost Highway'
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


id,title,orig_title,start_year,end_year,runtime,rating,nvotes
10116922,Lost Highway,,1997,,134,7.6,120549


> Note that in SQL, strings are enclosed in single quotes, i.e. `'string'`.

> While SQL syntax is case-insensitive, SQL is **case-sensitive** when it comes to **comparing strings**. In the above example, `'Lost highway'` will not return any rows.

### Logical operators `AND`, `OR`, and `NOT`

Just like in Python, we can combine multiple conditions logical/boolean operators `AND`, `OR`, and `NOT`.

When there are multiple logical operators, `NOT` is evaluated first, then `AND` and finally `OR`.

We can enclose each condition in parentheses if we want. This can be done either for readability, or to override the default precedence rules.

---

**Example:** Retrieve the rows for movies that are produced in 2015 and are rated higher than 8.

---

In [41]:
%%sql

SELECT
    *
FROM
    movies
WHERE
    start_year = 2015
    AND
    rating > 9
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
0 rows affected.


id,title,orig_title,start_year,end_year,runtime,rating,nvotes


---

**Example:** Retrieve the rows for movies that are produced either in 2015 or 2018, and are rated higher than 8.

---

In [42]:
%%sql

SELECT
    *
FROM
    movies
WHERE
    start_year = 2015
    OR 
    start_year = 2018
    AND
    rating > 8
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1048 rows affected.


id,title,orig_title,start_year,end_year,runtime,rating,nvotes
10369610,Jurassic World,,2015,,124,7.0,547391
10420293,The Stanford Prison Experiment,,2015,,122,6.9,33319
10478970,Ant-Man,,2015,,117,7.3,517941
10790770,Miles Ahead,,2015,,100,6.4,8650
10884732,The Wedding Ringer,,2015,,101,6.6,67575
11533089,Tab Hunter Confidential,,2015,,90,7.8,2852
11596363,The Big Short,,2015,,130,7.8,318033
11598642,Z for Zachariah,,2015,,98,6.0,25985
11618448,Racing Extinction,,2015,,90,8.3,7042
11638355,The Man from U.N.C.L.E.,,2015,,116,7.3,245184


What? This isn't the right result! We have multiple returned movies that are rated below 8.

The reason is that the `AND` operator takes precedence over `OR`. Therefore, `start_year = 2018 AND rating > 8` gets evaluated first, and then the result is passed to the `OR` part of the condition. In order to override this behaviour, we can rewrite our query in the following way:

In [43]:
%%sql

SELECT
    *
FROM
    movies
WHERE
    (start_year = 2015
    OR
    start_year = 2018)
    AND
    rating > 8
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
119 rows affected.


id,title,orig_title,start_year,end_year,runtime,rating,nvotes
11618448,Racing Extinction,,2015,,90,8.3,7042
12096673,Inside Out,,2015,,95,8.2,550606
12473476,Be Here Now,,2015,,100,8.7,2863
12631186,Baahubali: The Beginning,Bahubali: The Beginning,2015,,159,8.1,94989
12865822,All the World in a Design School,,2015,,59,8.4,1270
13170832,Room,,2015,,118,8.2,326042
13270538,Requiem for the American Dream,,2015,,73,8.1,8061
13717510,The Drop Box,,2015,,79,8.1,604
13865286,My Lonely Me,,2015,,95,8.2,671
14112208,Kuttram Kadithal,,2015,,120,8.1,638


---

**Example:** Count the number of movies that have no less than 1 million votes.

---

We need to use the `COUNT()` function to count the number of returned rows (more on `COUNT()` in a later lecture):

In [44]:
%%sql

SELECT
    COUNT(*)
FROM
    movies
WHERE
    NOT nvotes < 1000000
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


count
33


> It is mostly a matter of style whether to use `NOT` or `<>`.

---

**Example:** Find the genres listed for the movie "The Godfather".

---

We will learn in later lectures how to answer this question in a single SQL query in various ways, but for now, we have to take a two step process. Here, we are trying to find information related to each are that are stored in two tables. This is the first time we actually encounter the notion of a **relational** database in practice!

It turns out that are the `id` column in the `movie` table and `movie_id` in `movies_genre` table reference the same movies. These columns actually relate to two tables together. For our query, we have to find out the `id` of the movie `'The Godfather'` first, and then use it to retrieve the genres associated with that movie:

In [45]:
%%sql

SELECT
    id, title
FROM
    movies
WHERE
    title = 'The Godfather'
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


id,title
10068646,The Godfather


Alright, the id for `'The Godfather'` is `10068646`. Now let's retrieve the genres for this id from the `movie_genres` table:

In [46]:
%%sql

SELECT
    *
FROM
    movie_genres
WHERE
    movie_id = 10068646
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
2 rows affected.


movie_id,genre
10068646,crime
10068646,drama


Well done!

---

**Question:**
Do you think using the following query, you can find the number of movies in the `movie_genres` table that are NOT listed as `'drama'`?
    
```sql
SELECT
    COUNT(DISTINCT movie_id)
FROM
    movie_genres
WHERE
    genre <> 'drama'
;
```

---

### Pattern matching

It is a quite common situation that we want to find rows for which the values of one or more columns match a particular pattern. In SQL, this can be done either using `LIKE` or by using regular expressions. The syntax is as follows:

```sql
SELECT
    column1, column2
FROM
    table1
WHERE
    column1 [NOT] LIKE '<pattern>'
;
```

Postgres provides us with two wild-cards that we can use with `LIKE`:
- `%` matches any string of characters
- `_` matches a single character.

Pattern matching with `LIKE` is case sensitive; however, Postgres also provides the `ILIKE` keyword that has the same functionality as `LIKE` but is case-insensitive.

> **Note:** With `LIKE` or `ILIKE`, the entire string should match the pattern.

In [47]:
%%sql

SELECT
    'Arman' LIKE '%a_',
    'UBC' LIKE '_B_',
    'MDS is awesome!' LIKE '%!_'
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


?column?,?column?_1,?column?_2
True,True,False


---

**Example:** Retrieve those movies from the `movie` table whose title contains the word `'violin'` (note that `LIKE` is picky about letter cases in strings!)

---

In [48]:
%%sql

SELECT
    *
FROM
    movies
WHERE
    title LIKE '%Violin%'
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
5 rows affected.


id,title,orig_title,start_year,end_year,runtime,rating,nvotes
10120802,The Red Violin,Le violon rouge,1998,,130,7.6,30285
10451966,The Violin,El violín,2005,,98,7.7,2212
12401715,The Devil's Violinist,,2013,,122,6.1,3033
14972904,The Violin Teacher,Tudo Que Aprendemos Juntos,2015,,102,6.8,645
10053987,The Steamroller and the Violin,Katok i skripka,1961,,46,7.5,4867


---

**Example:** Retrieve those movies from the `movie` table whose title starts with the word `'Zero'`.

---

In [49]:
%%sql

SELECT
    *
FROM
    movies
WHERE
    title LIKE 'Zero%'
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
18 rows affected.


id,title,orig_title,start_year,end_year,runtime,rating,nvotes
10095244,Zerograd,Gorod Zero,1988,,103,7.5,1463
10113557,Zero Kelvin,Kjærlighetens kjøtere,1995,,118,7.3,1711
10120906,Zero Effect,,1998,,116,6.9,13383
10198837,Zero Tolerance,Noll tolerans,1999,,108,6.4,3288
10283693,Zero Woman: Red Handcuffs,Zeroka no onna: Akai wappa,1974,,88,6.6,783
10365960,Zero Day,,2002,,92,7.2,3840
10421090,Zerophilia,,2005,,90,6.2,2177
11592292,Zero 2,,2010,,90,7.6,5360
11790885,Zero Dark Thirty,,2012,,157,7.4,254644
12294965,Zero Charisma,,2013,,86,6.2,2384


---

**Example:** Retrieve those movies from the `movie` table whose title is 4 letters long and ends with the letter `'e'`.

---

In [50]:
%%sql

SELECT
    *
FROM
    movies
WHERE
    title LIKE '___e'
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
71 rows affected.


id,title,orig_title,start_year,end_year,runtime,rating,nvotes
10043539,Five,,1951,,93,6.3,1068
10064694,More,,1969,,112,6.5,2155
10066500,Hope,Umut,1970,,100,8.2,2770
10067814,Love,Szerelem,1971,,88,7.9,1582
10068306,Bone,,1972,,95,6.8,905
10069158,Rage,,1972,,100,6.3,765
10071803,Mame,,1974,,132,6.1,2490
10080716,Fame,,1980,,134,6.6,18864
10087182,Dune,,1984,,137,6.5,113255
10088930,Clue,,1985,,94,7.3,71433


---

**Example:** Retrieve those movies from the `movie` table whose title contains the character `'%'`.

---

We can specify an escape character using the keyword `ESCAPE` that tells SQL to not interpret a `%` or `_` that immediately follows it:

In [51]:
%%sql

SELECT
    *
FROM
    movies
WHERE
    title LIKE '%$%%' ESCAPE '$'
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
3 rows affected.


id,title,orig_title,start_year,end_year,runtime,rating,nvotes
10487092,Who the #$&% Is Jackson Pollock?,,2006,,74,7.0,1134
12662228,10%: What Makes a Hero?,,2013,,88,6.8,543
11869226,100% Love,,2011,,141,7.0,2369


Pattern matching could also be done using regular expressions in two different ways:
- `SIMILAR TO`: This is the SQL standard's definition of a regular expression, which is a mix between the `LIKE` and common regular expressions
- `~`: This is the POSIX regular expression operator

> **Note:** With `SIMILAR TO`, the entire string should match the pattern. This is unlike regex behaviour!

You can find more information on this in the Postgres documentation [here](https://www.postgresql.org/docs/current/functions-matching.html)

In [52]:
%%sql

SELECT
    'abc' SIMILAR TO '%(b|d)%',
    'abc' SIMILAR TO '(b|c)_';

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


?column?,?column?_1
True,False


---

**Example:** Select movies from the `movie` table whose title starts and ends with a digit.

---

In [53]:
%%sql

SELECT
    *
FROM
    movies
WHERE
    title ~ '^\d.*\d$'
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
58 rows affected.


id,title,orig_title,start_year,end_year,runtime,rating,nvotes
10048918,1984,,1956,,90,7.0,2837
10068156,1776,,1972,,141,7.6,7250
10074084,1900,Novecento,1976,,317,7.7,20696
10078721,10,,1979,,122,6.1,14224
10080319,9 to 5,Nine to Five,1980,,109,6.8,25199
10087803,1984,Nineteen Eighty-Four,1984,,113,7.1,61070
10109001,1-900,06,1994,,87,6.2,576
10112257,301/302,"301, 302",1995,,100,6.4,938
10126765,23,,1998,,99,7.3,6219
10212712,2046,,2004,,129,7.4,47553


### `IN`

Sometimes we want to check whether a column value matches any one of the items in a list. We can express this with an `OR` operator:

```sql
SELECT
    column1, column2
FROM
    table1
WHERE
    column1 = value1
    OR
    column1 = value2
    OR
    column1 = value3
;
```

This can be rewritten more succinctly using the `IN` operator:

```sql
SELECT
    column1, column2
FROM
    table1
WHERE
    column1 [NOT] IN (value1, value2, value3)
;
```

---

**Example:** Retrieve rows from the `movie` table that correspond to the movies `'Donnie Brasco'`, `'The Usual Suspects'`, `'Schindler''s List'`, `'Shutter Island'`, `'A Beautiful Mind'`.

---

In [54]:
%%sql

SELECT
    *
FROM
    movies
WHERE
    title IN ('Donnie Brasco',
              'The Usual Suspects',
              'Schindler''s List',
              'Shutter Island',
              'A Beautiful Mind'
               )
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
5 rows affected.


id,title,orig_title,start_year,end_year,runtime,rating,nvotes
10108052,Schindler's List,,1993,,195,8.9,1110590
10114814,The Usual Suspects,,1995,,106,8.5,922333
10119008,Donnie Brasco,,1997,,127,7.7,258120
10268978,A Beautiful Mind,,2001,,135,8.2,784095
11130884,Shutter Island,,2010,,138,8.1,1027318


### `BETWEEN`

The `BETWEEN` keyword is helpful for when we want to select a range of values, and it can be used for number, character and datetime ranges:

```sql
SELECT
    column1, column2
FROM
    table1
WHERE
    column1 [NOT] BETWEEN value1 AND value2
;
```

> **Note:** `BETWEEN` is **inclusive** of both ends of the interval.

We can try it out using a `SELECT` statement without any tables:

In [55]:
%%sql

SELECT 
    5 BETWEEN 1 AND 10,
    DATE '2021-11-01' BETWEEN DATE '2021-01-01' AND '2021-11-10',
    'w' BETWEEN 'e' AND 'm';

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


?column?,?column?_1,?column?_2
True,True,False


---

**Example:** Retrieve the name, production year and rating of the top 5 movies from the `movie` table that are produced between 2018 and 2020, and have a rating of at least 8.5 with at least 100000 votes. Sort the results in descending order based on ratings.

---

In [56]:
%%sql

SELECT
    title, start_year, rating
FROM
    movies
WHERE
    start_year BETWEEN 2018 AND 2020
    AND
    rating >= 8
    AND
    nvotes >= 100000
ORDER BY
    rating
LIMIT
    5
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
5 rows affected.


title,start_year,rating
Once Upon a Time... in Hollywood,2019,8.0
Toy Story 4,2019,8.0
Bohemian Rhapsody,2018,8.0
Green Book,2018,8.2
Spider-Man: Into the Spider-Verse,2018,8.4


### `IS NULL`

Trying to find `NULL` values using `WHERE column = NULL` fails. This is because a `NULL` value is by definition not known and _could be anything_, so it's not necessarily equal to another `NULL`. To find `NULL` values in a column, we can use `IS NULL`:

```sql
SELECT
    column1, column2
FROM
    table1
WHERE
    column1 IS [NOT] NULL
;
```

---

**Example:** Find movies the `movie` whose `orig_title` is different from that listed in the `title` column.

---

In [57]:
%%sql

SELECT
    *
FROM
    movies
WHERE
    orig_title IS NOT NULL
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
8270 rows affected.


id,title,orig_title,start_year,end_year,runtime,rating,nvotes
10041719,Orpheus,Orphée,1950,,95,8.0,9346
10041931,Stromboli,"Stromboli, terra di Dio",1950,,107,7.3,5239
10042355,Story of a Love Affair,Cronaca di un amore,1950,,98,7.1,2209
10042619,Diary of a Country Priest,Journal d'un curé de campagne,1951,,115,8.0,8621
10042692,Variety Lights,Luci del varietà,1950,,97,7.1,2416
10042804,The Young and the Damned,Los olvidados,1950,,85,8.3,16453
10042810,Operation Disaster,Morning Departure,1950,,102,7.0,668
10042876,Rashomon,Rashômon,1950,,88,8.2,138304
10042906,La Ronde,La ronde,1950,,93,7.6,4456
10043048,To Joy,Till glädje,1950,,98,7.2,2109


## Column Aliases with `AS`

In SQL, we are not required to use the same column and table names in the schema. We can create **aliases** for a column or a table with the following syntax:

```sql
SELECT
    column1 [AS] c1,
    column2 [AS] c2
FROM
    table1 [AS] t1
;
```

Note that the keyword `AS` is optional. I usually choose to use it because it makes the query more readable.

We will use table aliases a lot when we work on SQL joins in the upcoming lectures!

In [58]:
%%sql

SELECT
    title AS movieTitle,
    orig_title AS "oringinal Title",
    runtime AS Duration
FROM
    movies;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
26058 rows affected.


movietitle,oringinal Title,duration
Kate & Leopold,,118
Mister 880,,90
Black Hand,,92
Francis,,91
Orpheus,Orphée,95
Stromboli,"Stromboli, terra di Dio",107
Woman in Hiding,,92
Abbott and Costello in the Foreign Legion,,80
Annie Get Your Gun,,107
Armored Car Robbery,,67


Note that we've used a column alias with a space in its name. This is generally not a good practice, but if you absolutely need to do it, in Postgres you should enclose the alias in double quotes, e.g. `"alias name"`. A situation where double quotes are necessary is when you want to name a column with a word that is reserved keyword in Postgres, e.g. `"COUNT"`.

> **Note:** we **cannot** use column aliases in the `WHERE` clause, since it is evaluated by SQL before setting aliases. The following query will throw an error:

In [59]:
%%sql

SELECT
    title AS movieTitle,
    orig_title AS "oringinal Title",
    runtime AS Duration
FROM
    movies
WHERE
    Duration > 100
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
(psycopg2.errors.UndefinedColumn) column "duration" does not exist
LINE 8:     Duration > 100
            ^

[SQL: SELECT
    title AS movieTitle,
    orig_title AS "oringinal Title",
    runtime AS Duration
FROM
    movies
WHERE
    Duration > 100
;]
(Background on this error at: https://sqlalche.me/e/14/f405)


## Derived columns

Derived columns in SQL are columns that are the result of doing operations on existing columns of a table.

For example, suppose that we want to convert the `runtime` column of our table `movies` from minutes to hours. We can do that by manipulating the `runtime` column right in the `SELECT` statement:

In [60]:
%%sql

SELECT
    title,
    runtime / 60.
FROM
    movies;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
26058 rows affected.


title,?column?
Kate & Leopold,1.9666666666666668
Mister 880,1.5
Black Hand,1.5333333333333332
Francis,1.5166666666666668
Orpheus,1.5833333333333333
Stromboli,1.7833333333333332
Woman in Hiding,1.5333333333333332
Abbott and Costello in the Foreign Legion,1.3333333333333333
Annie Get Your Gun,1.7833333333333332
Armored Car Robbery,1.1166666666666667


> Note that I've written `60.` with the decimal point on purpose. If you divide by `60` instead, SQL assumes that the result of this operation should also be an integer (given that the column `runtime` is also of type integer), and will return truncated integer values instead of floats.

SQL doesn't know what to call the derived column, and by default you will see `?column?` as the column name. We can use an alias to name the new derived column:

In [61]:
%%sql

SELECT
    title,
    runtime / 60. AS runtime_hours
FROM
    movies;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
26058 rows affected.


title,runtime_hours
Kate & Leopold,1.9666666666666668
Mister 880,1.5
Black Hand,1.5333333333333332
Francis,1.5166666666666668
Orpheus,1.5833333333333333
Stromboli,1.7833333333333332
Woman in Hiding,1.5333333333333332
Abbott and Costello in the Foreign Legion,1.3333333333333333
Annie Get Your Gun,1.7833333333333332
Armored Car Robbery,1.1166666666666667


Remember I mentioned that the `SELECT` statement is powerful, but not dangerous? Derived columns returned by Postgres are not saved anywhere, nor do they change existing columns.

---

**Example:** Using table `names` from the `imdb` database, find the age of all actors/actresses who are still alive. Who is the youngest person alive listed in the table?

---

In [62]:
%%sql

SELECT
    name,
    2021 - birth_year AS age
FROM
    names
WHERE
    birth_year IS NOT NULL
    AND
    death_year IS NULL
ORDER BY
    age DESC
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
35413 rows affected.


name,age
Julia Calhoun,151
Benoît Duval,140
John Seabourne Sr.,131
Carl Stephenson,128
Pierre Charbonnier,124
Manuel R. Ojeda,123
Léonide Azar,121
Helen Leary,121
Georges Chaperot,119
Earl Rath,119


## Conditionals with `CASE`

The `CASE` structure is very useful in SQL: it enables us to treat a column differently based on the values in each row. Here is the syntax:

```sql
SELECT
    column1,
    CASE
        WHEN condition THEN expression
        WHEN condition THEN expression
        .
        .
        .
        ELSE expression
    END,
    column3,
    column4
FROM
    table1
;
```

For example, let's say we want to retrieve the name of movies and also want to have a column in the results that in each row has the value "long" if a movie is over 90 minutes long, "normal" if it's between 30 to 90 minutes, and "short" if it's under 30 minutes:

In [63]:
%%sql

SELECT
    title,
    runtime,
    CASE
        WHEN runtime > 90 THEN 'long'
        WHEN runtime BETWEEN 30 AND 90 THEN 'normal'
        ELSE 'short'
    END AS duration
FROM
    movies
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
26058 rows affected.


title,runtime,duration
Kate & Leopold,118,long
Mister 880,90,normal
Black Hand,92,long
Francis,91,long
Orpheus,95,long
Stromboli,107,long
Woman in Hiding,92,long
Abbott and Costello in the Foreign Legion,80,normal
Annie Get Your Gun,107,long
Armored Car Robbery,67,normal


## Functions & operators

### Math

We've just seen how arithmetic operators (i.e. `+`, `-`, `*`, `/`) can be used to make derived columns. Like other programming languages, PostgreSQL comes built-in with the most common mathematical operators (for a full list of operators see the documentation of Postgres [here](https://www.postgresql.org/docs/9.0/functions-math.html)):

**Operators**:

| Operator   | Description        | Example   | Result |
|------------|--------------------|-----------|--------|
| `+`        | addition           | 2 + 3     | 5      |
| `-`        | subtraction        | 2 - 3     | -1     |
| `*`        | multiplication     | 2 * 3     | 6      |
| `/`        | division           | 4 / 2     | 2      |
| `%`        | modulo (remainder) | 5 % 4     | 1      |
| `^`        | exponentiation     | 2.0 ^ 3.0 | 8      |
| `@`        | absolute value     | @ -5.0    | 5      |

**Functions**

| Function                  | Description                               | Example             | Result            |
|---------------------------|-------------------------------------------|---------------------|-------------------|
| abs(x)                    | absolute value                            | `abs(-17.4)`        | 17.4              |
| ceil(dp or numeric)       | smallest integer not less than argument   | `ceil(-42.8)`       | -42               |
| exp(dp or numeric)        | exponential                               | `exp(1.0)`          | 2.71828182845905  |
| floor(dp or numeric)      | largest integer not greater than argument | `floor(-42.8)`      | -43               |
| ln(dp or numeric)         | natural logarithm                         | `ln(2.0)`           | 0.693147180559945 |
| log(b numeric, x numeric) | logarithm to base b                       | `log(2.0, 64.0)`    | 6.0000000000      |
| pi()                      | "π" constant                              | `pi()`              | 3.14159265358979  |
| power(a dp, b dp)         | a raised to the power of b                | `power(9.0, 3.0)`   | 729               |
| round(v numeric, s int)   | round to s decimal places                 | `round(42.4382, 2)` | 42.44             |
| sqrt(dp or numeric)       | square root                               | `sqrt(2.0)`         | 1.4142135623731   |

> Note the order of evaluation for different operators:
> 1. arithmetic operators (e.g. `+`, `*`)
> 2. comparison operators (e.g. `>`, `<=`)
> 3. logical operators (e.g. `AND`, `OR`)

In [64]:
%%sql

SELECT
    25 * 2,
    ABS(-2^10),
    ROUND(23.24545, 2),
    SQRT(25),
    PI()
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


?column?,abs,round,sqrt,pi
50,1024.0,23.25,5.0,3.141592653589793


> **Note:** The `ROUND()` function only works with the `NUMERIC` (or equivalently `DECIMAL`) data type.

### Strings

[documentation](https://www.postgresql.org/docs/9.0/functions-string.html)

| Function                                            | Description                                         | Example                           | Result     |
|-----------------------------------------------------|-----------------------------------------------------|-----------------------------------|------------|
| `string \|\| string`                                | String concatenation                                | 'Post' \|\| 'greSQL'              | PostgreSQL |
| `string \|\| non-string or non-string \|\| string`  | String concatenation with one non-string input      | 'Value: ' \|\| 42                 | Value: 42  |
| `char_length(string) or character_length(string)`   | Number of characters in string                      | char_length('jose')               | 4          |
| `lower(string)`                                     | Convert string to lower case                        | lower('SQL')                      | sql        |
| `position(substring in string)`                     | Location of specified substring                     | position('om' in 'Thomas')        | 3          |
| `substring(string)`                                 | Extract substring                                   | substring('Postgres' from 2 for 3)| ost        |
| `upper(string)`                                     | Convert string to upper case                        | upper('mds')                      | MDS        |
| `length(string)`             | Number of characters in string                      | length('jose')                    |4           |

One of the useful string operators is `||`, or the concatenation operators. It can be used with multiple strings, or non-string values.

---

**Example:** Using table `movies` from the `imdb` database, print: `"<title>" is <hours> hours long, and rated <rating> / 10.`. Round the hour to 1 decimal point.

---

In [65]:
%%sql

SELECT
    '"' || title || '"'
    || ' is ' || ROUND(runtime / 60., 1)
    || ' hours long, and rated '
    || rating || ' / 10.'
FROM
    movies
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
26058 rows affected.


?column?
"""Kate & Leopold"" is 2.0 hours long, and rated 6.4 / 10."
"""Mister 880"" is 1.5 hours long, and rated 7.1 / 10."
"""Black Hand"" is 1.5 hours long, and rated 6.4 / 10."
"""Francis"" is 1.5 hours long, and rated 6.4 / 10."
"""Orpheus"" is 1.6 hours long, and rated 8 / 10."
"""Stromboli"" is 1.8 hours long, and rated 7.3 / 10."
"""Woman in Hiding"" is 1.5 hours long, and rated 6.9 / 10."
"""Abbott and Costello in the Foreign Legion"" is 1.3 hours long, and rated 6.6 / 10."
"""Annie Get Your Gun"" is 1.8 hours long, and rated 6.9 / 10."
"""Armored Car Robbery"" is 1.1 hours long, and rated 7 / 10."


We can use the `SUBSTRING(string FROM pos FOR num)` function to extract parts of a string value, starting with a particular position and continuing for a specified number of characters.

The function `SUBSTR(string, pos, num)` also exists in Postgres which is pretty similar to `SUBSTRING`, but with a different syntax.

---

**Example:** Using table `movies` from the `imdb` database, print the `title` column in upper-case letters. Also, create two more columns with the first and last three characters of the title. Name these columns "First 3 characters" and "Last 3 characters".

---

In [66]:
%%sql

SELECT
    UPPER(title),
    SUBSTRING(title FROM 1 FOR 3) AS "First 3 characters",
    SUBSTR(title, LENGTH(title) - 3, 3) AS "Last 3 characters"
FROM
    movies
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
26058 rows affected.


upper,First 3 characters,Last 3 characters
KATE & LEOPOLD,Kat,pol
MISTER 880,Mis,88
BLACK HAND,Bla,Han
FRANCIS,Fra,nci
ORPHEUS,Orp,heu
STROMBOLI,Str,bol
WOMAN IN HIDING,Wom,din
ABBOTT AND COSTELLO IN THE FOREIGN LEGION,Abb,gio
ANNIE GET YOUR GUN,Ann,Gu
ARMORED CAR ROBBERY,Arm,ber


### Datetimes

Using the interval datatype, we can easily do time arithmetic:

In [67]:
%%sql

SELECT
    '2h 50m'::INTERVAL,
    ' 5.5h'::INTERVAL + 3 * '14:00'::TIME,
    '2021-11-1'::DATE + '3 months 12 days'::INTERVAL,
    '2:00'::TIME + '18 hours 9 seconds'::INTERVAL,
    14 * '1 day'::INTERVAL
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


interval,?column?,?column?_1,?column?_2,?column?_3
2:50:00,"1 day, 23:30:00",2022-02-13 00:00:00,20:00:09,"14 days, 0:00:00"


In addition to `+`, `-`, `*`, and `/` operators, Postgres also provides useful functions for working with datetimes and intervals. Let's look at a few of those here:

In [68]:
%%sql

SELECT
    CURRENT_DATE,
    NOW(),
    CURRENT_TIMESTAMP(0),
    CURRENT_TIME(0),
    LOCALTIMESTAMP(0),
    LOCALTIME(2)
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


current_date,now,current_timestamp,current_time,localtimestamp,localtime
2021-11-18,2021-11-18 07:39:44.472676-08:00,2021-11-18 07:39:44-08:00,07:39:44-08:00,2021-11-18 07:39:44,07:39:44.470000


The argument to `CURRENT_TIMESTAMP()` and other functions above specifies the desired precision for the seconds field.

In [69]:
%%sql

SELECT
    CURRENT_TIME,
    LOCALTIME,
    LOCALTIMESTAMP
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


current_time,localtime,localtimestamp
07:39:44.756441-08:00,07:39:44.756441,2021-11-18 07:39:44.756441


`NOW()` and `CURRENT_TIMESTAMP` are equivalent, with the latter being SQL-standard.

> Note that both of these functions/variables are timezone-aware.

In [70]:
%%sql

SELECT
    EXTRACT(hour FROM NOW()),
    EXTRACT(year FROM '2021-11-15 8:00:00'::TIMESTAMP)
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


extract,extract_1
7,2021


In [71]:
%%sql

SELECT
    age(NOW(), '1979-01-05'::TIMESTAMP)
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


age
"15643 days, 7:39:45.214928"


The `age()` function formats the output nicely in `psql` and pgAdmin, but it unfortunately shows up as the number of days in `ipython-sql`.

There are a couple of functions for conversions between datetimes and intervals to and from strings:

In [72]:
%%sql

SELECT
    to_date('2021', 'YYYY'),
    to_date('05 Dec 2000', 'DD Mon YYYY'),
    to_timestamp('05 Dec 2021', 'DD Mon YYYY')
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


to_date,to_date_1,to_timestamp
2021-01-01,2000-12-05,2021-12-05 00:00:00-08:00


In [73]:
%%sql

SELECT
    to_char(current_timestamp, 'Day, DD  HH:MI'),
    to_char(interval '15h 2m 12s', 'HH12:MI:SS')
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
1 rows affected.


to_char,to_char_1
"Thursday , 18 07:39",03:02:12


For a full description of various datetime formatting functions and detailed string formatting patterns, see the documentation [here](https://www.postgresql.org/docs/current/functions-formatting.html).

---

**Example:** Using table `names` from the `imdb` database, retrieve the name and age of each alive person (according to the table!) in years using the `AGE()` function. You need to convert data types for this.

---

In [74]:
%%sql

SELECT
    name,
    EXTRACT(
        year FROM AGE(NOW(), to_date(birth_year::varchar, 'YYYY')))
        AS "Age"
FROM
    names
WHERE
    death_year IS NULL
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
95083 rows affected.


name,Age
Brigitte Bardot,87
Olivia de Havilland,105
Kirk Douglas,105
Sophia Loren,87
Raquel Welch,81
Li Gong,56
Armin Mueller-Stahl,91
Gérard Pirès,79
John Cleese,82
Brad Pitt,58


### Nulls



`NULLIF(value1, value2)` returns null if `value1` and `value2` are equal. This is helpful for replacing known values with nulls, or prevent, for example, division by zero.

In [75]:
%%sql

SELECT
    *,
    NULLIF(genre, 'drama')
FROM
    movie_genres
;

   postgresql://postgres:***@localhost:5432/
 * postgresql://postgres:***@localhost:5432/imdb_dsci513
57633 rows affected.


movie_id,genre,nullif
10035423,comedy,comedy
10035423,fantasy,fantasy
10035423,romance,romance
10042742,comedy,comedy
10042742,crime,crime
10042742,romance,romance
10041181,crime,crime
10041181,film-noir,film-noir
10041181,thriller,thriller
10041387,comedy,comedy


### Postgre-specific functions

There are also a number of informative functions that are specific to Postgres. You can find a list of all of them here: [link](https://www.postgresql.org/docs/current/functions-info.html).

The one that I particularly find useful is `pg_typeof()` for when I want to make sure about the type of values after doing computations:

In [116]:
%%sql

SELECT
    pg_typeof(54 / 3.),
    pg_typeof(100 > 1)
;

 * postgresql://postgres:***@localhost:5432/imdb_dsci513
   postgresql://postgres:***@localhost:5432/world_dsci513
1 rows affected.


pg_typeof,pg_typeof_1
numeric,boolean
