SQL Miscellanea
-----

In [1]:
%load_ext sql
%sql sqlite:///world-db

We can create views in SQL. Views are not stored as tables (materialized), but they can be used inside other SQL queries as normal tables.

In [2]:
%%sql
DROP VIEW OfficialCountryLanguage;

CREATE VIEW OfficialCountryLanguage AS
SELECT C.Name AS CountryName, L.Language AS Language
FROM CountryLanguage L, Country C
WHERE L.CountryCode = C.Code
AND L.IsOfficial = 'T' ;

 * sqlite:///world-db
Done.
Done.


[]

In [3]:
%sql SELECT * FROM OfficialCountryLanguage LIMIT 10;

 * sqlite:///world-db
Done.


CountryName,Language
Aruba,Dutch
Afghanistan,Dari
Afghanistan,Pashto
Anguilla,English
Albania,Albaniana
Andorra,Catalan
Netherlands Antilles,Dutch
Netherlands Antilles,Papiamento
United Arab Emirates,Arabic
Argentina,Spanish


**COMPUTING THE LONGEST SEQUENCE OF CONSECUTIVE NUMBERS**


In the following, we are given a table with a single column. Our goal is to compute the length of the longest sequence of consecutive numbers.

We define the following table for our case:

In [3]:
%%sql
DROP TABLE IF EXISTS R;
CREATE TABLE R(A integer);
INSERT INTO R Values (1),(2),(5),(9),(3),(10),(25);

 * sqlite:///world-db
Done.
Done.
7 rows affected.


[]

To express this query, we make use of the function `row_number`. This function returns the sequential number of a row within a table. It can be used with an `ORDER BY` clause, so that we can specify a desired order in which we want to number the rows.

Let's see what happens if we use this!

In [5]:
%%sql
SELECT A, (A-row_number() OVER (ORDER BY A ASC)) AS B
FROM R ;

 * sqlite:///world-db
Done.


A,B
1,0
2,0
3,0
5,1
9,4
10,4
25,18


In [7]:
%%sql
WITH num AS
(SELECT A,(A - row_number() OVER (ORDER BY A)) AS B FROM R)
SELECT MIN(A) AS start, MAX(A) AS end, COUNT(*) AS length
FROM num
GROUP BY B
ORDER BY length DESC;

 * sqlite:///world-db
Done.


start,end,length
1,3,3
9,10,2
25,25,1
5,5,1


**MEDIAN**

In the following, we will show how one can compute the *median* of a table column using SQL. Recall that to compute the median of a (multi)set of *n* values, we first order the values; then, we return the middle number if *n* is odd, otherwise the average of the two middle numbers. We use the table R from above.

We can easily compute the median using `row_number`- the query below only works for an odd numer of elements.

In [16]:
%%sql
WITH num AS
(SELECT A, (row_number() OVER (ORDER BY A)) AS B FROM R)
SELECT A
FROM num
WHERE B*2 = (SELECT COUNT(*) FROM num)+1;

 * sqlite:///world-db
Done.


A
5


Next, we will see how to compute the median without using `row_number`! We will first solve a simplified version of the median problem. Let's for now assume that:
* the number of elements is odd
* there are no duplicate values

In [None]:
%%sql
SELECT X.A
FROM R AS X
WHERE (SELECT COUNT(*) FROM R AS X1 WHERE X.A > X1.A) 
= (SELECT COUNT(*) FROM R AS X2 WHERE X.A < X2.A);

The above solution will not work if our two assumptions do not hold (why?). Let us rewrite the query so that we can solve the general median problem.

In [None]:
%%sql
SELECT AVG(DISTINCT X.A)
FROM R AS X
WHERE (SELECT COUNT(*) FROM R AS X1 WHERE X.A >= X1.A) >= (SELECT COUNT(*) FROM R AS X2 WHERE X.A < X2.A)
AND (SELECT COUNT(*) FROM R AS X1 WHERE X.A > X1.A) <= (SELECT COUNT(*) FROM R AS X2 WHERE X.A <= X2.A);

**Paths in Graphs**

We will next show how we can compute some queries on a graph. Here, we represent the graph a single relation with schema `Edge(source, target, distance)`.

In [17]:
%%sql
DROP TABLE IF EXISTS Edge;
CREATE TABLE Edge (source integer, target integer, distance integer);
INSERT INTO Edge VALUES (1,2,10),(2,3,10),(3,4,20),(4,1,30),(1,3,5); 

 * sqlite:///world-db
Done.
Done.
5 rows affected.


[]

The query below computes the number of outgoing edges for each vertex.

In [28]:
%%sql
SELECT source, COUNT(target)
FROM Edge
GROUP BY source;

 * sqlite:///world-db
Done.


source,COUNT(target)
1,2
2,1
3,1
4,1


Next, we want to find all the directed paths of length 2 in the graph.

In [31]:
%%sql
SELECT e1.source, e1.target, e2.target, e3.target
FROM Edge e1, Edge e2, Edge e3
WHERE e1.target = e2.source AND e2.target = e3.source; 

 * sqlite:///world-db
Done.


source,target,target_1,target_2
1,2,3,4
2,3,4,1
3,4,1,2
3,4,1,3
4,1,2,3
4,1,3,4
1,3,4,1


Can we also compute the distance of each path?

In [30]:
%%sql
SELECT e1.source, e1.target, e2.target, (e1.distance+e2.distance)
FROM Edge e1, Edge e2
WHERE e1.target = e2.source; 

 * sqlite:///world-db
Done.


source,target,target_1,(e1.distance+e2.distance)
1,2,3,20
2,3,4,30
3,4,1,50
4,1,2,40
4,1,3,35
1,3,4,25


**A few things on recursion**

SQL allows recursion in the `WITH` clause.

In [2]:
%%sql
WITH RECURSIVE
  cnt(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM cnt WHERE x<5)
SELECT x FROM cnt;

 * sqlite:///world-db
Done.


x
1
2
3
4
5


In [39]:
%%sql
WITH RECURSIVE
    factorial(n,x) AS (
        SELECT 1, 1
        UNION
        SELECT n+1, (n+1)*x FROM factorial WHERE n < 5)
SELECT * FROM factorial ;

 * sqlite:///world-db
Done.


n,x
1,1
2,2
3,6
4,24
5,120
