You can connect to a database and issue SQL commands inside Jupyter Notebook. For this module `ipython-sql` needs to be installed via `pip`. Once installed, load it by using `%load_ext sql` magic command as shown below. Then one can issue SQL commands after using `%sql` for single line SQL command or `%%sql` for multi-line SQL commands. See below for examples.

For issuing PostgreSQL meta commands, we need to install 'pgspecial' via pip. (Not clear how to use it.)

In [2]:
%load_ext sql

import numpy as np
import pandas as pd

In [4]:
%%sql

postgres://postgres:xyzaaa@localhost/postgres
        

'Connected: postgres@postgres'

In [7]:
%%sql
SELECT * FROM dft;

 * postgres://postgres:***@localhost/postgres
10 rows affected.


index,A,B,C,D,E
0,1.0,1.32921217264919,,-0.31628035962143,-0.990810386640961
1,2.0,-1.07081625562024,-1.4387132798348,0.564416851519634,0.295721887622338
2,3.0,-1.62640423331069,0.219565198748076,0.678804799025063,1.88927273141528
3,4.0,0.961538398678316,0.104011195683739,-0.481165317281003,0.850228531228244
4,5.0,1.45342466640608,1.05773743558111,0.165561607158273,0.515018378039662
5,6.0,-1.33693568578291,0.562861136707544,1.39285482506846,-0.0633279834506061
6,7.0,0.121668361534558,1.2076025381991,-0.0020402149102765,1.62779574448215
7,8.0,0.35449278572224,1.03752763272633,-0.385683512767947,0.51981800078432
8,9.0,1.6865828874516,-1.32596314576274,1.42898370021096,-2.08935427746937
9,10.0,-0.129819937432298,0.631522949364332,-0.586538064294987,0.290720080977682


In [31]:
%sql data << SELECT * from cities; --o/p of query can be assigned to a local variable

 * postgres://postgres:***@localhost/postgres
1 rows affected.
Returning data to local variable data


In [32]:
data

name,location
SF,"(-194,53)"


In [19]:
import pgspecial


In [34]:
df1 = data.DataFrame() ##save the query result to a dataframe
df1

Unnamed: 0,name,location
0,SF,"(-194,53)"


In [21]:
%sql \d #not working

 * postgres://postgres:***@localhost/postgres


ImportError: pgspecial not installed

In [36]:
%sql DROP TABLE df1;
%sql PERSIST df1 #PERSIST is ipython-sql specific PSEUDO-SQL command to save dataframe into a same-name database table
%sql SELECT * FROM df1; 

 * postgres://postgres:***@localhost/postgres
Done.
 * postgres://postgres:***@localhost/postgres
 * postgres://postgres:***@localhost/postgres
1 rows affected.


index,name,location
0,SF,"(-194,53)"


### SQL

In [49]:
%%sql

CREATE TABLE mydata (
id INTEGER, 
    language VARCHAR(20),
    author VARCHAR(25),
    year INTEGER);

 * postgres://postgres:***@localhost/postgres
Done.


[]

In [52]:
%%sql
INSERT INTO mydata VALUES (1, 'Fortran', 'Backus', 1955), (2, 'Lisp' , 'McCarthy', 1958);
INSERT INTO mydata (id, author, language, year) VALUES (3, 'Hopper', 'Cobol', 1959);
SELECT * FROM mydata;

 * postgres://postgres:***@localhost/postgres
2 rows affected.
1 rows affected.
3 rows affected.


id,language,author,year
1,Fortran,Backus,1955
2,Lisp,McCarthy,1958
3,Cobol,Hopper,1959


Notice how multiple records were inserted without mentioning column types (because order was implicitly assumed). Also notice how third record was inserted. Here order of fields is different from default order. 

#### Constraints

In [54]:
%%sql

CREATE TABLE mydatacopy (
id INTEGER NOT NULL,
language VARCHAR(20) NOT NULL,
author VARCHAR(25) NOT NULL,
year INTEGER NOT NULL,
standard VARCHAR(20) NULL);

INSERT INTO mydatacopy (id, language, author, year, standard) VALUES (1, 'prolog', 'Colmerauer', '1972', 'ISO');
INSERT INTO mydatacopy (id, language, author, year) VALUES (2, 'Perl', 'Wall', '1987');
INSERT INTO mydatacopy (id, year, standard,language, author) VALUES (3, '1964', 'ANSI', 'APL', 'Iverson');
 
SELECT * FROM mydatacopy; --this is how write comment



 * postgres://postgres:***@localhost/postgres
Done.
1 rows affected.
1 rows affected.
1 rows affected.
3 rows affected.


id,language,author,year,standard
1,prolog,Colmerauer,1972,ISO
2,Perl,Wall,1987,
3,APL,Iverson,1964,ANSI


In [55]:
%sql SELECT * FROM mydatacopy WHERE standard is Null;

 * postgres://postgres:***@localhost/postgres
1 rows affected.


id,language,author,year,standard
2,Perl,Wall,1987,


In [57]:
%sql SELECT * FROM mydatacopy WHERE standard = Null; --compare this with above query

 * postgres://postgres:***@localhost/postgres
0 rows affected.


id,language,author,year,standard


#### Primary Key Constraint

A primary key constraint indicates that a column, or group of columns, can be used as a unique identifier for rows in the table. This requires that the values be both unique and not null. So, the following two table definitions accept the same data:
```
CREATE TABLE products (
product_no integer UNIQUE NOT NULL,
name text,
price numeric
);
```
```
CREATE TABLE products (
product_no integer PRIMARY KEY,
name text,
price numeric
);
```
**Primary keys can span more than one column**; the syntax is similar to unique constraints:
```
CREATE TABLE example (
a integer,
b integer,
c integer,
PRIMARY KEY (a, c)
);
```
Adding a primary key will automatically create a unique B-tree index on the column or group of columns listed in the primary key, and will force the column(s) to be marked `NOT NULL`. 

**A table can have at most one primary key.** (There can be any number of unique and not-null constraints, which are functionally almost the same thing, but only one can be identified as the primary key.)

In [62]:
%%sql

CREATE TABLE mydatacopy2 (
id INTEGER NOT NULL PRIMARY KEY,
language VARCHAR(20) NOT NULL,
author VARCHAR(25) NOT NULL,
year INTEGER NOT NULL,
standard VARCHAR(20) NULL);

 * postgres://postgres:***@localhost/postgres
Done.


[]

#### Unique Key Constraint

In [63]:
%%sql 

CREATE TABLE mydatacopy3 (
id INTEGER NOT NULL PRIMARY KEY,
language VARCHAR(20) NOT NULL UNIQUE,
author VARCHAR(25) NOT NULL,
year INTEGER NOT NULL,
standard VARCHAR(20) NULL);


INSERT INTO mydatacopy3 (id, language, author, year, standard) VALUES (1, 'prolog', 'Colmerauer', '1972', 'ISO');
INSERT INTO mydatacopy3 (id, language, author, year) VALUES (2, 'Perl', 'Wall', '1987');
INSERT INTO mydatacopy3 (id, year, standard,language, author) VALUES (3, '1964', 'ANSI', 'APL', 'Iverson');
INSERT INTO mydatacopy3 (id, year, standard,language, author) VALUES (3, '1964', 'ANSI', 'APL', 'Iverson');
-- Last record will not be inserted a
SELECT * FROM mydatacopy3; 


 * postgres://postgres:***@localhost/postgres
Done.
1 rows affected.
1 rows affected.
1 rows affected.


IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "mydatacopy3_pkey"
DETAIL:  Key (id)=(3) already exists.
 [SQL: "INSERT INTO mydatacopy3 (id, year, standard,language, author) VALUES (3, '1964', 'ANSI', 'APL', 'Iverson');"] (Background on this error at: http://sqlalche.me/e/gkpj)

In [64]:
%sql SELECT * FROM mydatacopy3;

 * postgres://postgres:***@localhost/postgres
3 rows affected.


id,language,author,year,standard
1,prolog,Colmerauer,1972,ISO
2,Perl,Wall,1987,
3,APL,Iverson,1964,ANSI



#### Unique key constraint 

A unique key like a primary key is also used to make each record inside a table unique. Once you have defined the primary key of a table, any other fields yomu wish to make unique is done through this constraint. For example, in our database it now makes sense to have a unique key constraint on the language field. This would 
ensure none of the records would duplicate information about the same programming language.



#### Creating New Table from Existing Table

In [66]:
%%sql

CREATE TABLE mydata4 AS SELECT * FROM mydatacopy3;
SELECT * FROM mydata4;


 * postgres://postgres:***@localhost/postgres
3 rows affected.
3 rows affected.


id,language,author,year,standard
1,prolog,Colmerauer,1972,ISO
2,Perl,Wall,1987,
3,APL,Iverson,1964,ANSI


#### Writing Some Basic Queries

In [67]:
%sql SELECT language, author FROM mydata4;

 * postgres://postgres:***@localhost/postgres
3 rows affected.


language,author
prolog,Colmerauer
Perl,Wall
APL,Iverson


In [69]:
%sql SELECT language, year FROM mydata4 ORDER BY year;

 * postgres://postgres:***@localhost/postgres
3 rows affected.


language,year
APL,1964
prolog,1972
Perl,1987


In [71]:
%sql SELECT language, author FROM mydata4 ORDER BY 2 DESC; -- 1 means language, 2 means author

 * postgres://postgres:***@localhost/postgres
3 rows affected.


language,author
Perl,Wall
APL,Iverson
prolog,Colmerauer


In [74]:
%sql SELECT language, standard FROM mydata4 WHERE standard = 'ANSI';

 * postgres://postgres:***@localhost/postgres
1 rows affected.


language,standard
APL,ANSI


In [75]:
%sql SELECT language, author FROM mydata4 WHERE YEAR > 1970 ORDER BY language;

 * postgres://postgres:***@localhost/postgres
2 rows affected.


language,author
Perl,Wall
prolog,Colmerauer


In [77]:
%sql SELECT language, year, standard FROM mydata4 WHERE YEAR > 1970 AND standard is Null;

 * postgres://postgres:***@localhost/postgres
1 rows affected.


language,year,standard
Perl,1987,


In [79]:
%sql SELECT language, author FROM mydata4 WHERE year BETWEEN 1980 AND 1990;

 * postgres://postgres:***@localhost/postgres
1 rows affected.


language,author
Perl,Wall


In [80]:
%sql SELECT language, author FROM mydata4 WHERE year NOT BETWEEN 1980 AND 1990;

 * postgres://postgres:***@localhost/postgres
2 rows affected.


language,author
prolog,Colmerauer
APL,Iverson


##### Inserting Null

In [81]:
%%sql

INSERT INTO mydata4 VALUES (4, 'Tcl', 'Ousterhout', '1988', NULL);
SELECT * from mydata4;


 * postgres://postgres:***@localhost/postgres
1 rows affected.
4 rows affected.


id,language,author,year,standard
1,prolog,Colmerauer,1972,ISO
2,Perl,Wall,1987,
3,APL,Iverson,1964,ANSI
4,Tcl,Ousterhout,1988,


##### Inserting Data into a Table from Another Table

In [82]:
%sql DROP TABLE mydata, mydatacopy, mydatacopy2;

 * postgres://postgres:***@localhost/postgres
Done.


[]

##### Updating/Deleting Records

In [83]:
%%sql

CREATE TABLE mydata (language VARCHAR(20), standard VARCHAR(10));
INSERT INTO mydata SELECT language, standard FROM mydata4 WHERE standard IS NOT NULL;
SELECT * FROM mydata;



 * postgres://postgres:***@localhost/postgres
Done.
2 rows affected.
2 rows affected.


language,standard
prolog,ISO
APL,ANSI


In [85]:
%%sql 

INSERT INTO mydata4 VALUES (4, 'Forth', 'Moore');
SELECT * FROM mydata4;

 * postgres://postgres:***@localhost/postgres
1 rows affected.
5 rows affected.


id,language,author,year,standard
1,prolog,Colmerauer,1972.0,ISO
2,Perl,Wall,1987.0,
3,APL,Iverson,1964.0,ANSI
4,Tcl,Ousterhout,1988.0,
4,Forth,Moore,,


In [87]:
%%sql

UPDATE mydata4 SET year = 1972, standard = 'ANSI' WHERE language  = 'Forth';
SELECT * FROM mydata4;

 * postgres://postgres:***@localhost/postgres
1 rows affected.
5 rows affected.


id,language,author,year,standard
1,prolog,Colmerauer,1972,ISO
2,Perl,Wall,1987,
3,APL,Iverson,1964,ANSI
4,Tcl,Ousterhout,1988,
4,Forth,Moore,1972,ANSI


In [91]:
%%sql

DELETE FROM mydata4 WHERE language = 'Forth';
SELECT * FROM mydata4;

 * postgres://postgres:***@localhost/postgres
1 rows affected.
4 rows affected.


id,language,author,year,standard
1,prolog,Colmerauer,1972,ISO
2,Perl,Wall,1987,
3,APL,Iverson,1964,ANSI
4,Tcl,Ousterhout,1988,


One should be wary of statements of the form

`DELETE FROM tablename;`

Without a qualification, `DELETE` will remove all rows from the given table, leaving it empty. The
system will not request confirmation before doing this!

#### Counting Records

In [93]:
%sql SELECT COUNT(*) FROM mydata4;

 * postgres://postgres:***@localhost/postgres
1 rows affected.


count
4


In [94]:
%sql SELECT COUNT(standard) FROM mydata4;

 * postgres://postgres:***@localhost/postgres
1 rows affected.


count
2


#### Column Aliases

In [95]:
%sql SELECT id, language, author creator FROM mydata4;

 * postgres://postgres:***@localhost/postgres
4 rows affected.


id,language,creator
1,prolog,Colmerauer
2,Perl,Wall
3,APL,Iverson
4,Tcl,Ousterhout


#### `LIKE` operator

For matching we are provided with two wildcard characters to use with `LIKE`.
 - 1 % (Percent)	Used to match multiple characters including a single character and no character
 - 2 _ (Underscore)	Used to match exactly one character


In [96]:
%sql SELECT author, language FROM mydata4 WHERE language LIKE 'p%';

 * postgres://postgres:***@localhost/postgres
1 rows affected.


author,language
Colmerauer,prolog


In [97]:
%sql SELECT author, language FROM mydata4 WHERE language LIKE 'P%';

 * postgres://postgres:***@localhost/postgres
1 rows affected.


author,language
Wall,Perl


In [98]:
%sql SELECT author, language FROM mydata4 WHERE language LIKE '_P_';

 * postgres://postgres:***@localhost/postgres
1 rows affected.


author,language
Iverson,APL


In [100]:
%sql SELECT author, language FROM mydata4 WHERE language LIKE '__L';

 * postgres://postgres:***@localhost/postgres
1 rows affected.


author,language
Iverson,APL


#### Mathematical Calculations

In [101]:
%sql SELECT language, year - (year%10) decade FROM mydata4;

 * postgres://postgres:***@localhost/postgres
4 rows affected.


language,decade
prolog,1970
Perl,1980
APL,1960
Tcl,1980


#### String Operation

In [102]:
%sql SELECT language, 'The '||(year/10)*10||'s' decade FROM mydata4;

 * postgres://postgres:***@localhost/postgres
4 rows affected.


language,decade
prolog,The 1970s
Perl,The 1980s
APL,The 1960s
Tcl,The 1980s


#### Aggregation and Grouping

In [104]:
%%sql
INSERT INTO mydata4 (id, language, author, year, standard) VALUES(5, 'Fortran', 'Backus', 1957, 'ANSI');
INSERT INTO mydata4 (id, language, author, year, standard) VALUES(6, 'PL/I', 'IBM', 1964, 'ECMA');
SELECT * FROM mydata4;


 * postgres://postgres:***@localhost/postgres
1 rows affected.
1 rows affected.
8 rows affected.


id,language,author,year,standard
1,prolog,Colmerauer,1972,ISO
2,Perl,Wall,1987,
3,APL,Iverson,1964,ANSI
4,Tcl,Ousterhout,1988,
5,Fortran,Backus,1957,ANSI
6,PL/I,IBM,1964,ECMA
5,Fortran,Backus,1957,ANSI
6,PL/I,IBM,1964,ECMA


In [105]:
%sql SELECT COUNT (DISTINCT YEAR) FROM mydata4;

 * postgres://postgres:***@localhost/postgres
1 rows affected.


count
5


In [106]:
%sql SELECT COUNT (DISTINCT standard) FROM mydata4;

 * postgres://postgres:***@localhost/postgres
1 rows affected.


count
3


In [107]:
%sql SELECT MIN(YEAR) FROM mydata4;

 * postgres://postgres:***@localhost/postgres
1 rows affected.


min
1957


In [108]:
%sql SELECT language, MAX(year) FROM mydata4;  --wrong query

 * postgres://postgres:***@localhost/postgres
(psycopg2.ProgrammingError) column "mydata4.language" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT language, MAX(year) FROM mydata4;
               ^
 [SQL: 'SELECT language, MAX(year) FROM mydata4;'] (Background on this error at: http://sqlalche.me/e/f405)


In [109]:
%sql SELECT language, year FROM mydata4 WHERE year = (SELECT MAX(year) FROM mydata4);

 * postgres://postgres:***@localhost/postgres
1 rows affected.


language,year
Tcl,1988


### `GROUP BY` Clause

In [112]:
%%sql

SELECT language, standard FROM mydata4 WHERE standard IS NOT NULL GROUP BY standard, language;

 * postgres://postgres:***@localhost/postgres
4 rows affected.


language,standard
Fortran,ANSI
prolog,ISO
APL,ANSI
PL/I,ECMA


Above example is unclear. How `GROUP BY` is working above?

Note –You cannot group by a column which is not present in the SELECT list. You must specify all the columns in the grouping clause which are present in the SELECT list. 

The `GROUP BY` clause must appear right after `FROM` or (optionally) `WHERE` clause. Followed by the `GROUP BY` clause is one column or a list of comma separated columns. This clause may be followed by `ORDER BY` clause.  Also, the column(s) mentioned in `GROUP BY` clause must be present in `SELECT` clause.  The general syntax is like below – 

```
SELECT column_1, aggregate_function(column_2) 
FROM table_name 
WHERE some_condition (this is optional)
GROUP BY column_1
ORDER BY column_1 or aggregate_function(column2);
```

ORDER OF EXECUTION – `FROM-> WHERE -> GROUPBY-> SELECT`


In [122]:
%%sql 

CREATE TABLE employee (
id INTEGER, name VARCHAR(20), salary INTEGER, age INTEGER);


 * postgres://postgres:***@localhost/postgres
Done.
(psycopg2.ProgrammingError) syntax error at or near "1"
LINE 1: INSERT INTO employee (1, 'Harsh',2000, 19),(2, 'Dhanraj', 30...
                              ^
 [SQL: "INSERT INTO employee (1, 'Harsh',2000, 19),(2, 'Dhanraj', 3000, 20), (3, 'Ashish', 1500, 19), (4,'Harsh', 3500, 19),\n(5, 'Ashish', 1500,19);"] (Background on this error at: http://sqlalche.me/e/f405)


In [123]:
%%sql
INSERT INTO employee VALUES (1, 'Harsh',2000, 19),(2, 'Dhanraj', 3000, 20), (3, 'Ashish', 1500, 19), (4,'Harsh', 3500, 19),
(5, 'Ashish', 1500,19);

SELECT * FROM employee;

 * postgres://postgres:***@localhost/postgres
5 rows affected.
5 rows affected.


id,name,salary,age
1,Harsh,2000,19
2,Dhanraj,3000,20
3,Ashish,1500,19
4,Harsh,3500,19
5,Ashish,1500,19


In [125]:
%sql SELECT name, SUM(salary) FROM employee GROUP BY name; 

 * postgres://postgres:***@localhost/postgres
3 rows affected.


name,sum
Ashish,3000
Harsh,5500
Dhanraj,3000


In [135]:
%sql SELECT name, SUM(salary) FROM employee WHERE name != 'Dhanraj' GROUP BY name HAVING SUM(salary) < 5000; 

 * postgres://postgres:***@localhost/postgres
1 rows affected.


name,sum
Ashish,3000


#### Understanding Joining


In [139]:
%%sql

CREATE TABLE lang (id INTEGER NOT NULL PRIMARY KEY,language VARCHAR(20) NOT NULL, 
                   year INTEGER NOT NULL, standard VARCHAR(10) NULL);

CREATE TABLE auth (author_id INTEGER NOT NULL, author VARCHAR(25) NOT NULL,
                   language_id INTEGER);


 * postgres://postgres:***@localhost/postgres
Done.
Done.


[]

In [141]:
%%sql
INSERT INTO lang VALUES (1,'Prolog', 1972, 'ISO'),(2,'Perl', 1987, NULL), (3,'APL', 1964,'ISO'), (4,'TCL', 1987,NULL), (5,'BASIC', 1964, 'ANSI');
SELECT * FROM lang;

 * postgres://postgres:***@localhost/postgres
5 rows affected.
5 rows affected.


id,language,year,standard
1,Prolog,1972,ISO
2,Perl,1987,
3,APL,1964,ISO
4,TCL,1987,
5,BASIC,1964,ANSI


In [142]:
%%sql
INSERT INTO auth VALUES (5, 'Kemeny', 5), (6, 'Kurtz', 5),(1,'Colmerauer',1),(2,'Wall',2),(3,'Ousterhaut',4), (4, 'Iverson', 3);
SELECT * FROM auth;

 * postgres://postgres:***@localhost/postgres
6 rows affected.
6 rows affected.


author_id,author,language_id
5,Kemeny,5
6,Kurtz,5
1,Colmerauer,1
2,Wall,2
3,Ousterhaut,4
4,Iverson,3


In [143]:
%%sql

SELECT author, language FROM auth, lang WHERE language_id = id;

 * postgres://postgres:***@localhost/postgres
6 rows affected.


author,language
Kemeny,BASIC
Kurtz,BASIC
Colmerauer,Prolog
Wall,Perl
Ousterhaut,TCL
Iverson,APL


In [144]:
%%sql

SELECT author, language FROM auth JOIN lang ON language_id = id;

 * postgres://postgres:***@localhost/postgres
6 rows affected.


author,language
Kemeny,BASIC
Kurtz,BASIC
Colmerauer,Prolog
Wall,Perl
Ousterhaut,TCL
Iverson,APL


#### Resolving ambiguity in join columns 

In our example the join condition fields had distinct names - `id` and `language_id`. But what if in our languages table (`lang`) we kept the key field’s name as `language_id`. This would create an ambiguity in the join condition, which would become the confusing `language_id = language_id`. To resolve this, we need to qualify the column by prepending it by the table name it belongs to and a `.`(period).

In [145]:
%%sql

SELECT author, language FROM auth JOIN lang ON auth.language_id = lang.id;

 * postgres://postgres:***@localhost/postgres
6 rows affected.


author,language
Kemeny,BASIC
Kurtz,BASIC
Colmerauer,Prolog
Wall,Perl
Ousterhaut,TCL
Iverson,APL


#### Self Join

Relationship between 2 columns from same table.

In [147]:
%%sql

CREATE TABLE inflang (id INTEGER PRIMARY KEY,language VARCHAR(20) NOT NULL, influenced_by INTEGER);

INSERT INTO inflang VALUES (1, 'Fortran', NULL), (2, 'Pascal',3),(3, 'Algol',1);

SELECT * FROM inflang;


 * postgres://postgres:***@localhost/postgres
Done.
3 rows affected.
3 rows affected.


id,language,influenced_by
1,Fortran,
2,Pascal,3.0
3,Algol,1.0


In [148]:
%%sql

SELECT l1.language, l2.language AS influenced FROM inflang l1, inflang l2 WHERE l1.id = l2.influenced_by;

 * postgres://postgres:***@localhost/postgres
2 rows affected.


language,influenced
Algol,Pascal
Fortran,Algol


Notice the use of table aliases to qualify the join condition columns as separate and the use of the `AS` keyword which renames the column in the output.

##### Subqueries

A subquery, simply put, is a query written as a part of a bigger statement. Think of it as a `SELECT` statement inside another one. The result of the inner `SELECT` can then be used in the outer query. 

In [150]:
%%sql

SELECT author FROM auth WHERE language_id IN (SELECT id FROM lang WHERE language = 'TCL');

 * postgres://postgres:***@localhost/postgres
1 rows affected.


author
Ousterhaut


There are basically 2 types of subqueries. First one scalar subquery which returns only a single column of a single row.  The query we ran just above is an example of scalar subquery. Other type of subquery is Table subquery which returns a table in itself. Example is below – 

In [154]:
%%sql
SELECT author, language FROM auth a,(SELECT id, language FROM lang WHERE year > 1980) n WHERE a.language_id = n.id;


 * postgres://postgres:***@localhost/postgres
2 rows affected.


author,language
Wall,Perl
Ousterhaut,TCL


Using subqueries in `INSERT` statements – Let’s first insert a record in `lang` table – 

In [155]:
%%sql

INSERT INTO lang (id, language, year, standard) VALUES(6, 'Pascal', 1970, 'ISO');
SELECT * FROM lang;


 * postgres://postgres:***@localhost/postgres
1 rows affected.
6 rows affected.


id,language,year,standard
1,Prolog,1972,ISO
2,Perl,1987,
3,APL,1964,ISO
4,TCL,1987,
5,BASIC,1964,ANSI
6,Pascal,1970,ISO


In [156]:
%%sql

INSERT INTO auth (author_id, author, language_id) VALUES(7, 'Wirth', (SELECT id FROM lang WHERE language = 'Pascal'));
SELECT * FROM auth;


 * postgres://postgres:***@localhost/postgres
1 rows affected.
7 rows affected.


author_id,author,language_id
5,Kemeny,5
6,Kurtz,5
1,Colmerauer,1
2,Wall,2
3,Ousterhaut,4
4,Iverson,3
7,Wirth,6


In [157]:
%%sql

SELECT language, standard FROM lang WHERE standard  = 'ISO' OR standard IS Null;


 * postgres://postgres:***@localhost/postgres
5 rows affected.


language,standard
Prolog,ISO
Perl,
APL,ISO
TCL,
Pascal,ISO


#### `WHERE EXISTS`

In [158]:
%%sql
SELECT year FROM lang WHERE EXISTS(SELECT author FROM auth WHERE language_id = lang.id AND language_id > 4);



 * postgres://postgres:***@localhost/postgres
2 rows affected.


year
1964
1970


#### VIEWS

In [160]:
%%sql

CREATE VIEW test AS SELECT language, author  FROM mydata4;

 * postgres://postgres:***@localhost/postgres
Done.


[]

In [161]:
%sql SELECT * FROM test;

 * postgres://postgres:***@localhost/postgres
8 rows affected.


language,author
prolog,Colmerauer
Perl,Wall
APL,Iverson
Tcl,Ousterhout
Fortran,Backus
PL/I,IBM
Fortran,Backus
PL/I,IBM


To delete view, use `DROP VIEW`.

Also, view `test` depends on table `mydata`. If you try to remove `mydata`, a warning will be raised. 


In [162]:
%sql DROP TABLE mydata4; -- error raised because view 'test' depends on this table

 * postgres://postgres:***@localhost/postgres


InternalError: (psycopg2.InternalError) cannot drop table mydata4 because other objects depend on it
DETAIL:  view test depends on table mydata4
HINT:  Use DROP ... CASCADE to drop the dependent objects too.
 [SQL: 'DROP TABLE mydata4;'] (Background on this error at: http://sqlalche.me/e/2j85)

To drop the table and dependent objects, we issue following command -

`DROP TABLE mydata4 CASCADE;`

#### Some bits

In [163]:
%sql SELECT current_date;

 * postgres://postgres:***@localhost/postgres
1 rows affected.


current_date
2018-09-30


In [164]:
%sql SELECT (5+4)/2;

 * postgres://postgres:***@localhost/postgres
1 rows affected.


?column?
4


In [165]:
%sql SELECT (5+4)/2.;

 * postgres://postgres:***@localhost/postgres
1 rows affected.


?column?
4.5


In [166]:
%sql SELECT CHAR_LENGTH('ABCDE');  -- CHARACTER_LENGTH can also be used


 * postgres://postgres:***@localhost/postgres
1 rows affected.


char_length
5


#### PostgreSQL Official Tutorial

In [169]:
%%sql

CREATE TABLE weather(
city VARCHAR(80),
temp_lo int,
temp_hi int,  --high temperature
prcp real,    --precipitation
date date);

CREATE TABLE cities(
name VARCHAR(80),
location point); -- point is postgresql specific data type

 * postgres://postgres:***@localhost/postgres
Done.
Done.


[]

In [170]:
%%sql
INSERT INTO weather VALUES ('San Francisco', 46, 50, 0.25, '1994-11-27'), ('San Francisco', 43, 57, 0, '1994-11-29'),
('Hayward', 37, 54, NULL, '1994-11-29');

SELECT * FROM weather;

 * postgres://postgres:***@localhost/postgres
3 rows affected.
3 rows affected.


city,temp_lo,temp_hi,prcp,date
San Francisco,46,50,0.25,1994-11-27
San Francisco,43,57,0.0,1994-11-29
Hayward,37,54,,1994-11-29


In [172]:
%%sql

INSERT INTO cities VALUES ('San Francisco', '(-194,53)');
SELECT * FROM cities;


 * postgres://postgres:***@localhost/postgres
1 rows affected.
1 rows affected.


name,location
San Francisco,"(-194,53)"


In [175]:
%%sql

SELECT city, temp_lo, temp_hi, prcp, date, location FROM weather, cities WHERE city = name;

 * postgres://postgres:***@localhost/postgres
2 rows affected.


city,temp_lo,temp_hi,prcp,date,location
San Francisco,46,50,0.25,1994-11-27,"(-194,53)"
San Francisco,43,57,0.0,1994-11-29,"(-194,53)"


In [177]:
%%sql 
SELECT * FROM weather INNER JOIN cities ON (weather.city = cities.name); --alternate form, same as above


 * postgres://postgres:***@localhost/postgres
2 rows affected.


city,temp_lo,temp_hi,prcp,date,name,location
San Francisco,46,50,0.25,1994-11-27,San Francisco,"(-194,53)"
San Francisco,43,57,0.0,1994-11-29,San Francisco,"(-194,53)"


In [179]:
%%sql
SELECT *
FROM weather LEFT OUTER JOIN cities ON (weather.city = cities.name); --columns from left table appear at least once


 * postgres://postgres:***@localhost/postgres
3 rows affected.


city,temp_lo,temp_hi,prcp,date,name,location
San Francisco,46,50,0.25,1994-11-27,San Francisco,"(-194,53)"
San Francisco,43,57,0.0,1994-11-29,San Francisco,"(-194,53)"
Hayward,37,54,,1994-11-29,,


In [180]:
%%sql
SELECT *
FROM weather RIGHT OUTER JOIN cities ON (weather.city = cities.name); 


 * postgres://postgres:***@localhost/postgres
2 rows affected.


city,temp_lo,temp_hi,prcp,date,name,location
San Francisco,43,57,0.0,1994-11-29,San Francisco,"(-194,53)"
San Francisco,46,50,0.25,1994-11-27,San Francisco,"(-194,53)"


In [181]:
%%sql
SELECT *
FROM weather FULL OUTER JOIN cities ON (weather.city = cities.name); --columns from left table appear at least once


 * postgres://postgres:***@localhost/postgres
3 rows affected.


city,temp_lo,temp_hi,prcp,date,name,location
San Francisco,46,50,0.25,1994-11-27,San Francisco,"(-194,53)"
San Francisco,43,57,0.0,1994-11-29,San Francisco,"(-194,53)"
Hayward,37,54,,1994-11-29,,


In [182]:
%%sql

-- self join

SELECT W1.city, W1.temp_lo AS low, W1.temp_hi AS high,
W2.city, W2.temp_lo AS low, W2.temp_hi AS high
FROM weather W1, weather W2
WHERE W1.temp_lo < W2.temp_lo
AND W1.temp_hi > W2.temp_hi;

 * postgres://postgres:***@localhost/postgres
2 rows affected.


city,low,high,city_1,low_1,high_1
San Francisco,43,57,San Francisco,46,50
Hayward,37,54,San Francisco,46,50


In tutorial, the output columns name are different from what we are seeing here. In fact, when this query is run in command prompt, the column names are different from what we are seeing here but same as seen in tutorial. Why?

##### Aggregate Functions

In [183]:
%sql SELECT max(temp_lo) FROM weather;

 * postgres://postgres:***@localhost/postgres
1 rows affected.


max
46


In [184]:
%sql SELECT city FROM weather WHERE temp_lo = max(temp_lo); -- wrong

 * postgres://postgres:***@localhost/postgres
(psycopg2.ProgrammingError) aggregate functions are not allowed in WHERE
LINE 1: SELECT city FROM weather WHERE temp_lo = max(temp_lo); -- wr...
                                                 ^
 [SQL: 'SELECT city FROM weather WHERE temp_lo = max(temp_lo); -- wrong'] (Background on this error at: http://sqlalche.me/e/f405)


This query does not work since the aggregate `max` cannot be used in the `WHERE` clause. (This restriction
exists because the `WHERE` clause determines which rows will be included in the aggregate calculation;
so obviously it has to be evaluated before aggregate functions are computed.) However, as is often the
case the query can be restated to accomplish the desired result, here by using a *subquery*:

In [187]:
%sql SELECT city FROM weather WHERE temp_lo = (SELECT max(temp_lo) FROM weather);

 * postgres://postgres:***@localhost/postgres
1 rows affected.


city
San Francisco


In [188]:
%sql SELECT city, max(temp_lo) FROM weather GROUP BY city;

 * postgres://postgres:***@localhost/postgres
2 rows affected.


city,max
San Francisco,46
Hayward,37


In [189]:
%sql SELECT city, max(temp_lo) FROM weather GROUP BY city HAVING max(temp_lo) <40;

 * postgres://postgres:***@localhost/postgres
1 rows affected.


city,max
Hayward,37


In [190]:
%sql SELECT city, max(temp_lo) FROM weather WHERE city LIKE 'S%' GROUP BY city HAVING max(temp_lo) <40;

 * postgres://postgres:***@localhost/postgres
0 rows affected.


city,max


It is important to understand the interaction between aggregates and SQL's `WHERE` and `HAVING`
clauses. The fundamental difference between `WHERE` and `HAVING` is this: `WHERE` selects input
rows before groups and aggregates are computed (thus, it controls which rows go into the aggregate
computation), whereas `HAVING` selects group rows after groups and aggregates are computed. Thus,
the `WHERE` clause must not contain aggregate functions; it makes no sense to try to use an aggregate
to determine which rows will be inputs to the aggregates. On the other hand, the `HAVING` clause
always contains aggregate functions. (Strictly speaking, you are allowed to write a `HAVING` clause
that doesn't use aggregates, but it's seldom useful. The same condition could be used more efficiently
at the `WHERE` stage.)
In the previous example, we can apply the city name restriction in `WHERE`, since it needs no aggregate.
This is more efficient than adding the restriction to `HAVING`, because we avoid doing the grouping
and aggregate calculations for all rows that fail the `WHERE` check.

### Foreign Keys

Consider the following problem: You want
to make sure that no one can insert rows in the `weather` table that do not have a matching entry
in the `cities` table. This is called maintaining the referential integrity of your data. In simplistic
database systems this would be implemented (if at all) by first looking at the `cities` table to check
if a matching record exists, and then inserting or rejecting the new `weather` records. This approach
has a number of problems and is very inconvenient, so PostgreSQL can do this for you.

In [194]:
%%sql

CREATE TABLE cities1(
city VARCHAR(80) PRIMARY KEY,
location point); 


CREATE TABLE weather1(
city VARCHAR(80) REFERENCES cities1(city),
temp_lo int,
temp_hi int,  --high temperature
prcp real,    --precipitation
date date);


 * postgres://postgres:***@localhost/postgres
Done.
Done.


[]

Now try inserting following record - 

In [199]:
%%sql 
INSERT INTO weather1 VALUES ('Berkeley', 45, 53, 0.0, '1994-11-28'); --error

 * postgres://postgres:***@localhost/postgres


IntegrityError: (psycopg2.IntegrityError) insert or update on table "weather1" violates foreign key constraint "weather1_city_fkey"
DETAIL:  Key (city)=(Berkeley) is not present in table "cities1".
 [SQL: "INSERT INTO weather1 VALUES ('Berkeley', 45, 53, 0.0, '1994-11-28');"] (Background on this error at: http://sqlalche.me/e/gkpj)

Above insertion fails because there was no matching record in `cities1` table. 

[Source](www.postgresqltutorial.com/postgresql-foreign-key/) for following section -

Let us create 2 tables as shown below - 

```
CREATE TABLE so_headers (
id SERIAL PRIMARY KEY,
customer_id INTEGER,
ship_to VARCHAR (255)
);

CREATE TABLE so_items (
item_id INTEGER NOT NULL, 
so_id INTEGER REFERENCES so_headers(id), --foreign key constraint
product_id INTEGER,
qty INTEGER,
net_price numeric,
PRIMARY KEY (item_id,so_id)
);
```
foreign key in `so_items` can also be defined as shown below - 
```
CREATE TABLE so_items (
 item_id INTEGER NOT NULL,
 so_id INTEGER,
 product_id INTEGER,
 qty INTEGER,
 net_price NUMERIC,
 PRIMARY KEY(item_id, so_id),
 FOREIGN KEY(so_id) REFERENCES so_headers(id)
);
```

Because we didn’t specify a name for the foreign key constraint explicitly, PostgreSQL assigned a name with the pattern: `table_column_fkey`. In our example, PostgreSQL creates a foreign key constraint as `so_items_so_id_fkey`.

Each line item of a sales order must belong to a specific sales order. Each sales order can have one or many line items. This is call one-to-many relationship. We cannot insert a row into the `so_items` without referencing to a valid `id` in the `so_headers` table. (This last line is different in original source and seemingly wrong.)

What will happen to the rows in the `so_items` table when a row in the `so_headers` is deleted? PostgreSQL gives us the following main options: `DELETE RESTRICT`, `DELETE CASCADE` and `NO ACTION`.

PostgreSQL does not delete a row in the so_headers table until all referenced rows in the so_items deleted. To achieve this, we use `ON DELETE RESTRICT` expression when we define the foreign key constraint.

    so_id int4 REFERENCES so_headers(id) ON DELETE RESTRICT
    
PostgreSQL will delete all rows in the `so_items` table that are referenced to the rows that are being deleted in the `so_headers` table. To instruct PostgreSQL to do this, we use `ON DELETE CASCADE`.   

In case a foreign key is a group of columns, we define the foreign key constraint using the following syntax:
```	
CREATE TABLE child_table(
  c1 INTEGER PRIMARY KEY,
  c2 INTEGER,
  c3 INTEGER,
  FOREIGN KEY (c2, c3) REFERENCES parent_table (p1, p2)
);
```

### Transactions 

The essential point of a transaction is that it bundles multiple steps into a single, all-or-nothing operation.

In PostgreSQL, a transaction is set up by surrounding the SQL commands of the transaction with
`BEGIN` and `COMMIT` commands. So our banking transaction would actually look like:
```
BEGIN;
UPDATE accounts SET balance = balance - 100.00
WHERE name = 'Alice';
-- etc etc
COMMIT;
```

### Window Functions

In [201]:
%%sql

CREATE TABLE empsalary (depname VARCHAR(20),
                       empno INTEGER,
                       salary INTEGER);

 * postgres://postgres:***@localhost/postgres
Done.


[]

In [202]:
%%sql

INSERT INTO empsalary VALUES ('develop',11,5200),('develop',7,4200),('develop',9,4500),('develop',8,6000),
('personnel',5,3500),('personnel',2,3900),('sales',3,4800),('sales',1,5000),('sales',4,4800);

SELECT * FROM empsalary;

 * postgres://postgres:***@localhost/postgres
9 rows affected.
9 rows affected.


depname,empno,salary
develop,11,5200
develop,7,4200
develop,9,4500
develop,8,6000
personnel,5,3500
personnel,2,3900
sales,3,4800
sales,1,5000
sales,4,4800


In [203]:
%sql SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;

 * postgres://postgres:***@localhost/postgres
9 rows affected.


depname,empno,salary,avg
develop,11,5200,4975.0
develop,7,4200,4975.0
develop,9,4500,4975.0
develop,8,6000,4975.0
personnel,5,3500,3700.0
personnel,2,3900,3700.0
sales,3,4800,4866.666666666666
sales,1,5000,4866.666666666666
sales,4,4800,4866.666666666666


A window function call always contains an `OVER` clause directly following the window function's
name and argument(s). This is what syntactically distinguishes it from a normal function or nonwindow
aggregate. The `OVER` clause determines exactly how the rows of the query are split up for
processing by the window function. The `PARTITION BY` clause within `OVER` divides the rows into
groups, or partitions, that share the same values of the `PARTITION BY` expression(s). For each row,
the window function is computed across the rows that fall into the same partition as the current row.

In [206]:
%sql SELECT depname, empno, salary, rank() OVER (PARTITION BY depname ORDER BY salary DESC) FROM empsalary;

 * postgres://postgres:***@localhost/postgres
9 rows affected.


depname,empno,salary,rank
develop,8,6000,1
develop,11,5200,2
develop,9,4500,3
develop,7,4200,4
personnel,2,3900,1
personnel,5,3500,2
sales,1,5000,1
sales,3,4800,2
sales,4,4800,2


**Note**- Notice the use of `rank()` above.

As shown here, the `rank` function produces a numerical rank for each distinct `ORDER BY` value in
the current row's partition, using the order defined by the `ORDER BY` clause. `rank` needs no explicit
parameter, because its behavior is entirely determined by the `OVER` clause.

In [207]:
%sql SELECT salary, sum(salary) OVER () FROM empsalary;

 * postgres://postgres:***@localhost/postgres
9 rows affected.


salary,sum
5200,41900
4200,41900
4500,41900
6000,41900
3500,41900
3900,41900
4800,41900
5000,41900
4800,41900


Above, since there is no `ORDER BY` in the `OVER` clause, the window frame is the same as the partition,
which for lack of `PARTITION BY` is the whole table; in other words each sum is taken over the
whole table and so we get the same result for each output row. But if we add an `ORDER BY` clause,
we get very different results:

In [208]:
%sql SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;

 * postgres://postgres:***@localhost/postgres
9 rows affected.


salary,sum
3500,3500
3900,7400
4200,11600
4500,16100
4800,25700
4800,25700
5000,30700
5200,35900
6000,41900


Here the sum is taken from the first (lowest) salary up through the current one, including any duplicates
of the current one (notice the results for the duplicated salaries).
Window functions are permitted only in the `SELECT` list and the `ORDER BY` clause of the query.
They are forbidden elsewhere, such as in `GROUP BY`, `HAVING` and `WHERE` clauses. This is because
they logically execute after the processing of those clauses. Also, window functions execute after
non-window aggregate functions. This means it is valid to include an aggregate function call in the
arguments of a window function, but not vice versa.

### Inheritance

In [214]:
%%sql 

CREATE TABLE city (
name text,
population real,
altitude int -- (in ft)
);

CREATE TABLE capitals (
state char(2)
) INHERITS (city); -- Notice INHERITS


 * postgres://postgres:***@localhost/postgres
Done.
Done.


[]

In [215]:
%%sql

INSERT INTO city VALUES ('Lucknow', 1000000, 100);
INSERT INTO capitals VALUES ('Lucknow', 1000000, 100,'UP');

SELECT * FROM city;

 * postgres://postgres:***@localhost/postgres
1 rows affected.
1 rows affected.
2 rows affected.


name,population,altitude
Lucknow,1000000.0,100
Lucknow,1000000.0,100


In [216]:
%sql SELECT * FROM capitals;

 * postgres://postgres:***@localhost/postgres
1 rows affected.


name,population,altitude,state
Lucknow,1000000.0,100,UP


### Modifying Table

In [218]:
%sql SELECT * FROM mydata;


 * postgres://postgres:***@localhost/postgres
2 rows affected.


language,standard
prolog,ISO
APL,ANSI


##### Adding Column

In [219]:
%sql ALTER TABLE mydata ADD COLUMN year INTEGER;

 * postgres://postgres:***@localhost/postgres
Done.


[]

In [220]:
%sql SELECT * FROM mydata;

 * postgres://postgres:***@localhost/postgres
2 rows affected.


language,standard,year
prolog,ISO,
APL,ANSI,


##### Renaming Table

In [221]:
%sql ALTER TABLE mydata RENAME TO language;

 * postgres://postgres:***@localhost/postgres
Done.


[]

In [222]:
%sql SELECT * FROM language;

 * postgres://postgres:***@localhost/postgres
2 rows affected.


language,standard,year
prolog,ISO,
APL,ANSI,


##### Renaming Column

In [223]:
%%sql  

ALTER TABLE language RENAME COLUMN year TO decade;
SELECT * FROM language;

 * postgres://postgres:***@localhost/postgres
Done.
2 rows affected.


language,standard,decade
prolog,ISO,
APL,ANSI,


##### Adding Default

In [224]:
%%sql

ALTER TABLE language ALTER COLUMN decade SET DEFAULT 1900;
SELECT * FROM language;

 * postgres://postgres:***@localhost/postgres
Done.
2 rows affected.


language,standard,decade
prolog,ISO,
APL,ANSI,


Only future records are affected. Past records remain unaffected.

In [225]:
%%sql

INSERT INTO language VALUES ('C', 'ANSI');
SELECT * FROM language;

 * postgres://postgres:***@localhost/postgres
1 rows affected.
3 rows affected.


language,standard,decade
prolog,ISO,
APL,ANSI,
C,ANSI,1900.0


##### Adding Constraint

In [226]:
%%sql 

ALTER TABLE language ALTER COLUMN standard SET NOT NULL;
INSERT INTO language VALUES('Fortran', Null);               --error
SELECT * FROM language;

 * postgres://postgres:***@localhost/postgres
Done.


IntegrityError: (psycopg2.IntegrityError) null value in column "standard" violates not-null constraint
DETAIL:  Failing row contains (Fortran, null, 1900).
 [SQL: "INSERT INTO language VALUES('Fortran', Null);"] (Background on this error at: http://sqlalche.me/e/gkpj)

The `INSERT` command won't work because standard value can't be `Null`.

**Other Commands**

 - Droping columns

`ALTER TABLE products DROP COLUMN description;`

Whatever data was in the column disappears. Table constraints involving the column are dropped, too. However, if the column is referenced by a foreign key constraint of another table, PostgreSQL will not silently drop that constraint. You can authorize dropping everything that depends on the column by adding `CASCADE`:

`ALTER TABLE products DROP COLUMN description CASCADE;`

 - To remove any default value, use:

`ALTER TABLE products ALTER COLUMN price DROP DEFAULT;`

This is effectively the same as setting the default to null. As a consequence, it is not an error to drop a default where one hadn't been defined, because the default is implicitly the null value. 

 - Removing a constraint

`ALTER TABLE products DROP CONSTRAINT some_name;`

 - To convert a column to a different data type, use a command like:

`ALTER TABLE products ALTER COLUMN price TYPE numeric(10,2);`

This will succeed only if each existing entry in the column can be converted to the new type by an
implicit cast.



#### SUBQUERY EXPRESSION – 

```
EXISTS (subquery)
expression IN (subquery)
expression NOT IN (subquery)
expression operator ANY (subquery)
expression operator SOME (subquery)
expression operator All (subquery)
```


---
### Exercises 

from [pgexercises](www.pgexercises.com)

There are 3 tables -

 - `cd.members`
 - `cd.facilities`
 - `cd.bookings`


![](images/pgschema.png)

In [227]:
%%sql

postgres://postgres:xyzaaa@localhost/exercises
        

'Connected: postgres@exercises'