# 关系数据库标准语言

In [11]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


###  连接你所创建的数据库
通过pgAdmin 4在PostgreSQL数据库中创建Ex2数据库，并连接该数据库

In [14]:
%%sql postgresql://postgres:329905023@localhost:5432/ex2

SET statement_timeout = 0;
SET lock_timeout = 0;
SET client_encoding = 'utf-8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = error;

Done.
Done.
Done.
Done.
Done.
Done.


[]

In [15]:
%config SqlMagic.short_errors = False

### 3.2 数据定义

关系创建语句格式：

Create Table <表名> (<列名> <数据类型> [<列级完整性约束条件>], ..., <表级完整性约束条件>);


PostgreSQL的[DDL](https://www.postgresql.org/docs/current/static/ddl.html)，数据类型包括char, varchar, int, float, date, serial等，具体参考[Data Type](https://www.postgresql.org/docs/current/static/datatype.html)

完整性约束包括实体完整性(Primary Key)、参照完整性(Foreign Key)和用户自定义完整性(NOT NULL、UNIQUE、DEFAULT、CHECK等)。

创建学生关系，主码是学号（实体完整性约束），姓名不能为空（用户自定义完整性），年龄为大于0的整数（用户自定义完整性）。

In [17]:
%%sql
drop table if exists Students;
create table Students (
    sid char(10) primary key,
    name varchar(20) NOT NULL,
    age int check(age > 0));

 * postgresql://postgres:***@localhost:5432/ex2
Done.
Done.


[]

创建选课关系，主码是学号和课程号（实体完整性约束），学号参照学生关系的主码（参照完整性），当主码或外码包含两个或两个以上属性时，只能使用表级完整性约束条件实现。

In [18]:
%%sql
drop table if exists Enrolled;
create table enrolled (
    student_id char(10) primary key references Students(sid),
    cid char(10) primary key, -- 错误：对表enrolled指定多个主键
    grade int);

 * postgresql://postgres:***@localhost:5432/ex2
Done.


ProgrammingError: (psycopg2.errors.InvalidTableDefinition) multiple primary keys for table "enrolled" are not allowed
LINE 3:     cid char(10) primary key, -- 错误：对表enrolled指定多个...
                         ^

[SQL: create table enrolled (
    student_id char(10) primary key references Students(sid),
    cid char(10) primary key, -- 错误：对表enrolled指定多个主键
    grade int);]
(Background on this error at: https://sqlalche.me/e/14/f405)

In [19]:
%%sql
drop table if exists Enrolled;
create table enrolled (
    student_id char(10) references Students(sid),
    cid char(10), 
    grade int,
    primary key(student_id, cid)); -- 使用表级完整性约束条件实现

 * postgresql://postgres:***@localhost:5432/ex2
Done.
Done.


[]

In [20]:
%%sql
drop table if exists Enrolled;
create table enrolled (
    student_id char(10),
    cid char(10), 
    grade int,
    constraint pk_en primary key(student_id, cid), 
    constraint fk_en foreign key(student_id) references Students(sid)); -- 使用表级完整性约束条件实现

 * postgresql://postgres:***@localhost:5432/ex2
Done.
Done.


[]

使用Check实现NOT NULL

In [21]:
%%sql 
drop table if exists Student;
create table Student(sID int, sName text CHECK(sName is NOT NULL), GPA real, sizeHS INT);

 * postgresql://postgres:***@localhost:5432/ex2
Done.
Done.


[]

In [22]:
sid = 1
name = "张三"
%sql insert into Student values(:sid, :name, NULL, 100);

 * postgresql://postgres:***@localhost:5432/ex2
1 rows affected.


[]

In [23]:
%sql insert into Student values(2, 'a', 4, 100);

 * postgresql://postgres:***@localhost:5432/ex2
1 rows affected.


[]

In [24]:
%%sql 
insert into Student values(3, NULL, 3.5, 200); 
-- 错误：关系 "student" 的新列违反了检查约束 "student_sname_check"

 * postgresql://postgres:***@localhost:5432/ex2


IntegrityError: (psycopg2.errors.CheckViolation) new row for relation "student" violates check constraint "student_sname_check"
DETAIL:  Failing row contains (3, null, 3.5, 200).

[SQL: insert into Student values(3, NULL, 3.5, 200);]
(Background on this error at: https://sqlalche.me/e/14/gkpj)

In [25]:
%sql drop table Student;

 * postgresql://postgres:***@localhost:5432/ex2
Done.


[]

使用Check实现Keys，确认PostgreSQL是否支持在Check中使用子查询

In [26]:
%%sql 
drop table if exists T;
create table T(A int check(A not in (select A from T)));

 * postgresql://postgres:***@localhost:5432/ex2
Done.


NotSupportedError: (psycopg2.errors.FeatureNotSupported) cannot use subquery in check constraint
LINE 1: create table T(A int check(A not in (select A from T)));
                                     ^

[SQL: create table T(A int check(A not in (select A from T)));]
(Background on this error at: https://sqlalche.me/e/14/tw8g)

In [27]:
%%sql 
drop table if exists T;
create table T(A int check((select count(distinct A) from T) = (select count(*) from T)));

 * postgresql://postgres:***@localhost:5432/ex2
Done.


NotSupportedError: (psycopg2.errors.FeatureNotSupported) cannot use subquery in check constraint
LINE 1: create table T(A int check((select count(distinct A) from T)...
                                   ^

[SQL: create table T(A int check((select count(distinct A) from T) = (select count(*) from T)));]
(Background on this error at: https://sqlalche.me/e/14/tw8g)

修改和删除表格

In [28]:
%sql ALTER TABLE Students ADD Scome DATE;

 * postgresql://postgres:***@localhost:5432/ex2
Done.


[]

In [29]:
%sql ALTER TABLE Students ALTER COLUMN Scome type timestamp; 

 * postgresql://postgres:***@localhost:5432/ex2
Done.


[]

In [30]:
%sql ALTER TABLE Students DROP Scome;

 * postgresql://postgres:***@localhost:5432/ex2
Done.


[]

In [31]:
%sql ALTER TABLE Enrolled ADD CONSTRAINT grade_check CHECK(grade >= 0 and grade <= 100);

 * postgresql://postgres:***@localhost:5432/ex2
Done.


[]

In [32]:
%sql ALTER TABLE Enrolled DROP CONSTRAINT pk_En;

 * postgresql://postgres:***@localhost:5432/ex2
Done.


[]

注意关系的删除顺序，错误顺序会产生违背参照完整性约束条件

In [33]:
%%sql 
Drop Table Students;
Drop Table Enrolled;

 * postgresql://postgres:***@localhost:5432/ex2


InternalError: (psycopg2.errors.DependentObjectsStillExist) cannot drop table students because other objects depend on it
DETAIL:  constraint fk_en on table enrolled depends on table students
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

[SQL: Drop Table Students;]
(Background on this error at: https://sqlalche.me/e/14/2j85)

In [None]:
%%sql 
Drop Table Enrolled;
Drop Table Students;

### 时间属性

大部分空间数据都具有时间属性，先来熟悉一下PostgreSQL中[timestamp](https://www.postgresql.org/docs/current/static/datatype-datetime.html)时间属性及相关[时间函数](https://www.postgresql.org/docs/current/static/functions-datetime.html)

CURRENT_DATE和CURRENT_TIMESTAMP用来获得当前日期和当前日期与时间

In [34]:
date = %sql select CURRENT_DATE
time = %sql select CURRENT_TIMESTAMP
print(date)
print(time)

 * postgresql://postgres:***@localhost:5432/ex2
1 rows affected.
 * postgresql://postgres:***@localhost:5432/ex2
1 rows affected.
+--------------+
| current_date |
+--------------+
|  2022-09-20  |
+--------------+
+----------------------------------+
|        current_timestamp         |
+----------------------------------+
| 2022-09-20 11:45:51.904512+08:00 |
+----------------------------------+


In [35]:
date = %sql select date(CURRENT_TIMESTAMP)
hour = %sql select extract(hour from timestamp '2022-09-15 11:38:40')
minute = %sql select date_part('minute', timestamp '2022-09-15 11:38:40')
print("date is " + str(date[0][0]))
print("hour is " + str(hour[0][0]))
print("minute is " + str(minute[0][0]))

 * postgresql://postgres:***@localhost:5432/ex2
1 rows affected.
 * postgresql://postgres:***@localhost:5432/ex2
1 rows affected.
 * postgresql://postgres:***@localhost:5432/ex2
1 rows affected.
date is 2022-09-20
hour is 11.0
minute is 38.0


时空数据举例，创建关系ST(name, time, position)，并创建用户Tom和Rob，随机插入一些数据

In [None]:
%%sql
drop table if exists ST;
create table ST (
    name varchar(10),
    time timestamp,
    position int
);

In [None]:
import random

# Tom
for i in range(8):
    position = random.randint(1, 1000)
    hour     = str(random.randint(1, 72)) + 'hours'
    %sql insert into ST values ('Tom', current_timestamp - interval :hour, :position)

# Rob
for i in range(12):
    position = random.randint(1, 1000)
    hour     = str(random.randint(1, 72)) + 'hours'
    %sql insert into ST values ('Rob', current_timestamp - interval :hour, :position)

In [None]:
%sql select * from ST order by time desc

查询Tom当前所在的位置，相当于时间最大值问题

In [None]:
%%sql
select * from ST where name = 'Tom' order by time desc limit 1

In [None]:
%%sql
select *
from ST
where name = 'Tom' and time >= all(select time from ST where name = 'Tom')

查询Rob在最近一天内的所有位置记录

In [None]:
%%sql 
select *
from ST
where name = 'Rob' and current_timestamp - time <= interval '24 hours'
order by time

### 3.3 数据更新

创建学生关系

In [None]:
%%sql
drop table if exists Students;
create table Students (
    sid char(10) primary key,
    name varchar(20) NOT NULL,
    age int check(age > 0));

注意中英文标点符号

In [None]:
%sql Insert into Students Values('200011', '张三', 19);

In [None]:
%sql Insert into Students(sid, age, name) Values('200012', 20, '李四');

In [None]:
%sql Insert into Students(sid, name) Values('200013', '王五');

**Rule in Insert SQL:** Exclude only tuples that yield FALSE / 0.0

In [None]:
%sql Insert into Students Values('200010', '赵键', NULL);

当数据违反完整性约束时，数据库拒绝数据插入

In [None]:
%sql Insert into Students Values('200012', '刘晓', 19);

In [None]:
%sql Insert into Students Values('200014', NULL, 19);

In [None]:
%sql Insert into Students Values('200014', 'NULL', 19);

In [None]:
%sql Insert into Students Values('200014', '刘晓', 0);

In [None]:
%sql select * from Students

数据修改

In [None]:
%sql select * from Students;

In [None]:
%sql Update Students Set age = 21 where sid = '200012'

In [None]:
%sql Update Students Set age = 18 where name = '王五'

In [None]:
%sql select * from Students

In [None]:
%sql Update Students Set age = age + 1;

In [None]:
%sql select * from students;

In [None]:
%sql Update Students Set sid = '200013' where sid = '200012';

数据删除

In [None]:
%sql Delete From Students where sid = '200011';

In [None]:
%sql Delete From Students where sid = '200000';

In [None]:
%sql Delete From Students;

### 参照完整性

关系R的属性A参照关系S的属性B，可能违反参考完整性的修改：Insert into R, Delete from S, Update R.A, Update S.B
* 当Insert into R或Update R.A时，属性A不在关系S的属性B中，拒绝插入
* 当Delete from S或Update S.B时，可以在创建外码时执行Restrict(缺省，拒绝操作)，Cascade(级联操作)，Set NULL(设为NULL)

下列语句的执行结果是什么？

In [None]:
%sql drop table if exists T cascade;
%sql create table T (A int, B int, C int, primary key (A,B),foreign key (B,C) references T(A,B) on delete cascade);
%sql insert into T values (1,1,1);
%sql insert into T values (2,1,1);
for i in range(0, 6):
    %sql insert into T values (3 + :i, 2 + :i, 1 + :i)
%sql select * from T; 

In [None]:
%sql delete from T where A = 1;

In [None]:
%sql select * from T; 

### 3.4 数据查询

### 3.4.1 The basic SELECT statement
选择语句的基本格式
    <p>SELECT    A1, A2, …, An      #3: what to return
    <p>FROM     R1, R2, …, Rn     #1: relations to query
    <p>WHERE    condition	       #2: combine, filter relations

语义上的执行顺序是：先做笛卡尔积，然后做选择，最后做投影。

In [None]:
from display_tools import side_by_side
%sql drop table if exists R;
%sql drop table if exists S;
%sql create table R(A int);
%sql create table S(B int, C int);
%sql insert into R values (1), (3);
%sql insert into S values (2, 3), (3, 4), (3, 5);
r = %sql select * from R;
s = %sql select * from S;
side_by_side(r, s)

查询语句
    <br>SELECT R.A
    <br>FROM   R, S
    <br>WHERE  R.A = S.B
的结果为：

In [None]:
%%sql 
select R.A 
from R, S
where R.A = S.B

如果用python实现上述查询，等价的代码如下：

In [None]:
R = [1, 3]
S = [(2, 3), (3, 4), (3, 5)]

result = []
for A in R:
    for (B, C) in S:
        print(A, B, C)
        if A == B:
            result.append(A)
            
print(result)

改变投影的属性，看输出变化

In [None]:
%%sql 
select R.A, 2022 as year
from R, S
where R.A = S.B

下面我们采用美国高中生申请大学数据库为例：

College(<u>cName</u>, state, enrollment)

Student(<u>sID</u>, sName, GPA, sizeHS)

Apply(<u>sID</u>, <u>cName</u>, <u>major</u>, decision)

In [None]:
%%sql
drop table if exists College;
drop table if exists Student;
drop table if exists Apply;

create table College(cName text primary key, state text, enrollment int);
create table Student(sID int primary key, sName text, GPA real, sizeHS int);
create table Apply(sID int, cName text, major text, decision text);

alter table Apply add constraint pk primary key(sID, cName, major);

In [None]:
%%sql
insert into College values
('Stanford', 'CA', 15000),
('Berkeley', 'CA', 36000),
('MIT', 'MA', 10000),
('Carnegie Mellon', 'PA', 11500);

In [None]:
%%sql
insert into Student values
(123, 'Amy', 3.6, 200),
(234, 'Bob', 3.5, 200),
(456, 'Doris', 3.3, 200),
(567, 'Edward', 3.4, 200),
(678, 'Fay', 3.9, 200),
(789, 'Gary', 3.8, 200),
(987, 'Helen', 3.7, 200),
(765, 'Jay', 3.2, 200),
(654, 'Amy', 3.6, 200),
(543, 'Craig', 3.7, 200),
(432, 'Kevin', 3.9, 200),
(321, 'Lori', 3.5, 200);

In [None]:
%%sql
insert into Apply values
(123, 'Stanford', 'CS', 'Y'),
(123, 'Berkeley', 'CS', 'Y'),
(234, 'Berkeley', 'biology', 'Y'),
(678, 'Sanford', 'history', 'Y'),
(987, 'Stanford', 'CS', 'Y'),
(987, 'Berkeley', 'CS', 'Y'),
(765, 'Stanford', 'history', 'Y'),
(765, 'Cornell', 'history', 'Y'),
(765, 'Cornell', 'psychology', 'Y'),
(543, 'MIT', 'CS', 'Y'),
(321, 'MIT', 'history', 'Y'),
(321, 'MIT', 'psychology', 'Y'),
(456, 'Carnegie Mellon', 'CS', 'Y'),
(654, 'Carnegie Mellon', 'CS', 'Y'),
(432, 'Carnegie Mellon', 'CS', 'Y'),
(567, 'Carnegie Mellon', 'economics', 'Y'),
(789, 'Carnegie Mellon', 'economics', 'Y'),
(123, 'Stanford', 'CSE', 'Y'),
(123, 'Cornell', 'CSE', 'Y'),
(123, 'Carnegie Mellon', 'CSE', 'Y');

In [None]:
%%sql  
copy Student(sID, sName, GPA, sizeHS) from  'e://student.txt' delimiter '|';
copy College(cName, state, enrollment) from  'e://college.txt' delimiter '|';
copy Apply(sID, cName, major, decision) from  'e://apply.txt' delimiter '|';

### 3.4.2 Table and Attribute Variables

如何解决属性名相同问题？可以使用 **关系名.属性名** 或关系与属性**重命名**解决。

In [None]:
%sql drop table if exists A; drop table if exists B;
%sql create table A (x int, y int); create table B (x int, y int);
for i in range(1,6):
    %sql insert into A values (:i, :i+1)
for i in range(1,11,3):
    %sql insert into B values (:i, :i+2)

In [None]:
%%sql 
SELECT A.x FROM A, B WHERE A.x = B.x;  -- 关系A和B在x属性相同时的表连接，查询关系A的x属性

查询关系A和B在x列上的重叠记录

In [None]:
r = %sql SELECT * FROM A;
s = %sql SELECT * FROM B;
side_by_side(r,s)

In [None]:
%%sql
SELECT x, y FROM (
    SELECT A.x, A.y FROM A, B WHERE A.x = B.x
    UNION
    SELECT B.x, B.y FROM A, B WHERE A.x = B.x
) as T(x, y);

关系$R,S,T$都只有属性$A$：
* R = {1,2,3,4,5}
* S = {1,3,5,7,9}
* T = {1,4,7,10}

In [None]:
%sql DROP TABLE IF EXISTS R; DROP TABLE IF EXISTS S; DROP TABLE IF EXISTS T;
%sql CREATE TABLE R (A int); CREATE TABLE S (A int); CREATE TABLE T (A int);
for i in range(1,6):
    %sql INSERT INTO R VALUES (:i)
for i in range(1,10,2):
    %sql INSERT INTO S VALUES (:i)
for i in range(1,11,3):
    %sql INSERT INTO T VALUES (:i)

查询 $R \cap (S \cup T)$ - in other words elements that are in $R$ and either $S$ or $T$?

In [None]:
%%sql
SELECT DISTINCT R.A
FROM R, S, T
WHERE R.A = S.A OR R.A = T.A;

当 $S = \emptyset$时，查看查询结果，从select执行顺序上解释原因

In [None]:
%%sql
delete from S;

In [None]:
%%sql
SELECT DISTINCT R.A
FROM R, S, T
WHERE R.A = S.A OR R.A = T.A;

### 3.4.3 Set Operators in SQL
查询申请CS，但没有申请EE的学生学号

In [None]:
%sql SELECT sid FROM Apply WHERE major = 'CS' and major <> 'EE'

In [None]:
%%sql 
SELECT sid FROM Apply WHERE major = 'CS' 
except 
SELECT sID FROM Apply WHERE major = 'EE'

### 3.4.4 Subqueries in the WHERE clause
MySQL不支持except关键字，如何修改查询语句实现申请CS但没有申请EE的学生学号查询？

In [None]:
query = """
SELECT sID FROM Student
    WHERE sID in (SELECT sID FROM Apply WHERE major = 'CS') and
          sID not in (SELECT sID FROM Apply WHERE major = 'EE');
"""
l = %sql $query

query = """
SELECT distinct sID FROM Apply A1 
WHERE major = 'CS' and 
      not exists (SELECT * FROM Apply A2 WHERE A1.sID = A2.sID and major = 'EE');"""

r = %sql $query

side_by_side(l, r)

嵌套查询实现集合的交和差功能

In [None]:
%sql drop table if exists R; drop table if exists S;
%sql create table R (A int, B int); create table S (A int, B int);
for i in range(1,6):
    %sql insert into R values (:i, :i+1)
%sql insert into R values (1, 2)
for i in range(1,11,3):
    %sql insert into S values (:i, :i+1)
r = %sql SELECT * FROM R;
s = %sql SELECT * FROM S;
side_by_side(r, s)

Intersect等价实现，数据有重复时，如何解决？

In [None]:
query = """
SELECT R.A, R.B FROM R
 INTERSECT
SELECT S.A, S.B FROM S
"""
l = %sql $query

query = """
SELECT R.A, R.B
FROM   R
WHERE EXISTS (SELECT * FROM S WHERE R.A=S.A AND R.B=S.B)
"""
r = %sql $query

side_by_side(l, r)

Except等价实现

In [None]:
query = """
SELECT R.A, R.B FROM R
 EXCEPT
SELECT S.A, S.B FROM S
"""
l = %sql $query

query = """
SELECT R.A, R.B
FROM   R
WHERE NOT EXISTS (SELECT * FROM S WHERE R.A=S.A AND R.B=S.B)
"""
r = %sql $query

side_by_side(l, r)

### 3.4.5 Subqueries in the FROM and SELECT clauses

**最大/最小值问题**：查询GPA最高学生的学号

下面是一种错误写法，和4种正确写法，在插入更多数据后，观察查询时间

In [None]:
%sql SELECT sID, max(GPA) FROM Student;

In [None]:
%sql delete from Student;

import random
for i in range(1,10000):
    GPA = random.random() * 4
    %sql insert into Student values (:i, :i, :GPA, 200)

In [None]:
%%time
%%sql 
SELECT sID FROM Student ORDER BY GPA desc LIMIT 1;

In [None]:
%%time 
%%sql 
SELECT sID FROM Student 
WHERE GPA >= all(SELECT GPA FROM Student);

In [None]:
%%time
%%sql 
SELECT sID FROM Student 
WHERE GPA = (SELECT max(GPA) FROM Student);

In [None]:
%%time
%%sql
SELECT sID FROM Student, 
    (SELECT max(GPA) as maxGPA FROM Student) as T 
WHERE GPA = maxGPA;

In [None]:
%%sql
delete from Student;
insert into Student values
(123, 'Amy', 3.6, 200),
(234, 'Bob', 3.5, 200),
(456, 'Doris', 3.3, 200),
(567, 'Edward', 3.4, 200),
(678, 'Fay', 3.9, 200),
(789, 'Gary', 3.8, 200),
(987, 'Helen', 3.7, 200),
(765, 'Jay', 3.2, 200),
(654, 'Amy', 3.6, 200),
(543, 'Craig', 3.7, 200),
(432, 'Kevin', 3.9, 200),
(321, 'Lori', 3.5, 200);

### 3.4.6 The Join Operators

In [None]:
%sql drop table if exists R; drop table if exists S;
%sql create table R (A int, B varchar(50)); create table S (A int, B varchar(50));
%sql insert into R values (1, 'Cat'), (2, 'Dog'), (3, 'Dog');
%sql insert into S values (1, 'Apple'), (2, 'Banana'), (2, 'Pear'), (4, 'Lemon');
r = %sql SELECT * FROM R;
s = %sql SELECT * FROM S;
side_by_side(r, s)

Inner Join

In [None]:
query = """
select R.A, S.B from R, S where R.A = S.A
"""
l = %sql $query

query = """
select R.A, S.B from R join S on R.A = S.A
"""
r = %sql $query

side_by_side(l, r)

Left Outer Join

In [None]:
%sql select R.A, S.B from R left outer join S on R.A = S.A

Right Outer Join

In [None]:
%sql select R.A, S.B from R right outer join S on R.A = S.A

Full Outer Join

In [None]:
%sql select R.A, S.B from R full outer join S on R.A = S.A

练习：Is the Full Outer Join operator associative? 
Specifically is<br/>
  SELECT *
  FROM (T1 full outer join T2) full outer join T3;<br/>
    equivalent to<br/>
  SELECT *
  FROM T1 full outer join (T2 full outer join T3);<br/>
创建关系T1，T2，T3，插入相应数据，验证上述两个SQL语句是否等价

In [None]:
%sql

### 3.4.7 Aggregation
特别注意：every column in the SELECT clause must either be<br/>
* Also present in the GROUP BY clause AND/OR
* Used in an aggregation function

In [None]:
%sql select A from S group by A;

In [None]:
%sql select B from S group by A;

查询每个学校的申请人中，GPA的最高和最低值，返回校名，GPA的最高和最低值

In [None]:
%%sql
select cName, max(GPA), min(GPA)
from Apply A, Student S
where A.sID = S.sID
group by cName

练习：查询每个学校的申请人中，GPA的最高和最低值，返回校名，GPA的最高和最低值，不能使用group by和聚集函数

In [None]:
%%sql

练习：查询申请人数最多的学校，返回校名和申请人数，不能使用limit

In [None]:
%%sql 

查询每个学生的申请学校数量，学生尚未申请时，学校数量为0

In [None]:
%sql insert into Student values (345, 'Harry', 3.9, 200);
%sql select * from Student;

In [None]:
%%sql
SELECT Student.sID, count(distinct cName)
FROM Student, Apply
WHERE Student.sID = Apply.sID
GROUP BY Student.sID
union
SELECT sID, 0
FROM Student
WHERE sID not in (select sID from Apply);

练习：使用outer join实现学生申请学校数量查询

In [None]:
%%sql 

**分组最大/最小值问题：** 查询申请最多的学校

In [None]:
%%sql 
SELECT CName 
FROM Apply 
GROUP BY CName 
HAVING count(*) >= ALL 
        (SELECT count(*) FROM Apply gROUP BY CName);

练习：查询申请人最多的学校

In [None]:
%%sql

### 3.4.8 NULL values

NULL的任何数值或布尔运算操作结果都为NULL，判断属性是否为NULL方法
* x is NULL
* x is not NULL

In [None]:
%sql SELECT 1 + NULL AS add_null, 1 - NULL AS sub_null, 1 * NULL AS mul_null, 1 / NULL AS div_null;

In [None]:
%sql SELECT true = NULL AS eq_bool, true != NULL AS neq_bool, true AND NULL AS and_bool, NULL = NULL AS eq_null, NULL IS NULL AS is_null;

下面两个查询语句的查询结果是否相同？

**Rule in Selection SQL:** Include only tuples that yield TRUE / 1.0
* Where子句只有条件为True的行才保留
* Having子句只有条件为True的组才保留
* 表连接时，NULL != NULL

In [None]:
%sql update student set gpa = NULL where sid = 123;
l = %sql SELECT * FROM Student WHERE GPA >= 3.5 or GPA < 3.5;
r = %sql SELECT * FROM Student WHERE GPA >= 3.5 or GPA < 3.5 or GPA is NULL;
side_by_side(l, r)

In [None]:
l = %sql SELECT * FROM Student;
r = %sql SELECT * FROM Student WHERE NULL = NULL;
side_by_side(l, r)

对于aggregate函数
* 如果输入空集，count返回0，其他任何函数返回NULL
* 如果count(*)，NULL的记录参与计算，count属性，NULL的记录忽略
* 其他aggregate函数，忽略NULL

In [None]:
l = %sql SELECT count(*) FROM Student;
r = %sql SELECT count(GPA) FROM Student;
side_by_side(l, r)

In [None]:
%sql select max(GPA), min(GPA) from Student

GROUP BY NULL算一个GROUP，NULL在ORDER BY时默认排序最前面，有语法可以改变顺序

In [None]:
%sql select GPA from Student group by GPA order by GPA