# Intermediate SQL

## Join expressions

Previously, we combined information from several multiple queries using the cartesian product operator (except when we used set operations). In this section we introduce a number of <strong>Join operations</strong> that allow us to express some queries which are harder using the cartesian operator. As before, we will continue using the UNI database for now. Run the scripts in the folder before.

### The ``NATURAL JOIN``

Consider the following SQL query which computes, for each student, the set of courses a student has taken. 

In [2]:
USE uni;
GO
-- the query
SELECT TOP 3 --restricting to first 3 obs
student.name, takes.course_id
FROM student, takes
WHERE student.ID = takes.ID;
GO

name,course_id
Manber,239
Manber,319
Manber,362
Manber,493
Manber,571
Manber,642


Note that in the student and takes table, the matching condition required both <code>ID</code> variables to be the same.

The natural join operation operates on two relations and produces a relation as result. However, unlike the cartesian product, which concatenates each row of the first relation with all of the second, the <strong>natural join considers only the pairs of rows with the same value on the attributes which appear in the schema of both relations</strong>.

<code>Natural Join</code>s are not supported by MS SQL servers. The code in other servers goes like this.

```
student NATURAL JOIN takes;
```

MS SQL, probably for the best, forces you to pre-select the features you want to join the ralations on. Wrote, however, a small nested query which, given two tables, returns the common features. In this case, as expected, it is the ``ID`` feature.

In [10]:
-- checking shared features with a subquery
SELECT COLUMN_NAME AS common_feature
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'student' AND COLUMN_NAME IN (
    SELECT COLUMN_NAME
    FROM INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_NAME = 'takes'
);
GO

common_feature
ID


Join Conditions<br>

As we saw before, the nice property of natural joins is that it identifies overlapping features for joining the data.

The <code>ON</code> keyword allows a general predicate for selecting which variables should be used for the join operation. The on condition is writen exactly as a where predicate. It also appears at the end of the join expression.<br>

In [17]:
-- the join
SELECT COUNT(*) AS row_number
FROM student
JOIN takes ON student.ID = takes.ID;
GO
-- which is the same as
SELECT COUNT(*) AS row_number
FROM student, takes
WHERE student.ID = takes.ID;
GO

row_number
30000


row_number
30000


### `OUTER JOIN`

Suppose we wish to display a list of all students, displaying their id, name, dept_name, tot_cred, along with the courses that they have taken. The following query we used before does not work very well. Suppose that there is some student that takes no course and who, hence, cannot be found in the takes relation (via ID). This student will be dropped out from the resulting relation.

In [26]:
-- insert a dummy
IF NOT EXISTS (SELECT ID FROM student WHERE ID = '10049')
    BEGIN
        INSERT INTO student VALUES ('10049', 'Snow', 'Civil Eng.', 0)
    END;
    GO

-- we loose it with a natural join
SELECT *
FROM student
JOIN takes ON student.ID = takes.ID
WHERE student.ID = '10049';
GO

ID,name,dept_name,tot_cred,ID.1,course_id,sec_id,semester,year,grade


More generally, some tuples in either or both relations being joined may be <em>lost</em> in this way. The outer join operation works in a manner similar to the join operations already studied but <strong>it preserves those tuples that would be lost in a join by creating tuples in the sult containing NULL values.</strong>


There are three forms of outer join:

1. The ``LEFT OUTER JOIN`` preserves tuples in the relation named to the left of the join operation.

In [27]:
SELECT *
FROM student
LEFT OUTER JOIN takes ON student.ID = takes.ID
WHERE student.ID = '10049';
GO

ID,name,dept_name,tot_cred,ID.1,course_id,sec_id,semester,year,grade
10049,Snow,Civil Eng.,0,,,,,,


2. The ``RIGHT OUTER JOIN`` perserves tuple sonly in the relation named after the join operation

In [31]:
-- we miss it like this, since it does not exist in takes
SELECT *
FROM student
RIGHT OUTER JOIN takes ON student.ID = takes.ID
WHERE student.ID = '10049';
GO

-- but not like this
SELECT *
FROM takes
RIGHT OUTER JOIN student ON student.ID = takes.ID
WHERE student.ID = '10049';
GO

ID,name,dept_name,tot_cred,ID.1,course_id,sec_id,semester,year,grade


ID,course_id,sec_id,semester,year,grade,ID.1,name,dept_name,tot_cred
,,,,,,10049,Snow,Civil Eng.,0


3. ``FULL OUTER JOIN`` preserves tuples in both relations

In [32]:
SELECT *
FROM student
FULL OUTER JOIN takes ON student.ID = takes.ID
WHERE student.ID = '10049';
GO

ID,name,dept_name,tot_cred,ID.1,course_id,sec_id,semester,year,grade
10049,Snow,Civil Eng.,0,,,,,,


### ``INNER JOIN``

In contrast, the join operation that do not preserve nonmatched tuples are called inner join operations. It basically returns a table containing tuples in both relations.

In [42]:
SELECT *
FROM student;
go
-- not missing
SELECT stu
FROM student
INNER JOIN takes on student.ID = takes.ID;
GO

-- missing is still...missing
SELECT *
FROM student
INNER JOIN takes ON student.ID = takes.ID
WHERE student.ID = '10049';
GO

ID,name,dept_name,tot_cred
*,Snow,Civil Eng.,0
1000,Manber,Civil Eng.,39
10033,Zelty,Mech. Eng.,60
10049,Snow,Civil Eng.,0
10076,Duan,Civil Eng.,105
1018,Colin,Civil Eng.,81
10204,Mediratta,Geology,112
10267,Rzecz,Comp. Sci.,5
10269,Hilberg,Psychology,75
10454,Ugarte,Pol. Sci.,120


: Msg 207, Level 16, State 1, Line 5
Invalid column name 'stu'.

ID,name,dept_name,tot_cred,ID.1,course_id,sec_id,semester,year,grade


### `Cross Join`

The SQL CROSS JOIN produces a result set which is the number of rows in the first table multiplied by the number of rows in the second table if no WHERE clause is used along with CROSS JOIN.This kind of result is called as Cartesian Product.  


In [45]:
SELECT COUNT(student.ID) AS [count]
FROM student
CROSS JOIN takes;
GO

SELECT COUNT(ID) AS [count]
FROM student;
GO

SELECT COUNT(ID) AS [count]
FROM takes;
GO

SELECT 

count
60060000


count
2002


count
30000


## Views

It is not always desirable for all users to see the entire set of relations in the database. For example, a university worker may need to access the students-related tables but we might want her to not have access to the salaries.

Aside from security reasons, we may wish to create a personalized collection of virtual relation that s better matched to a certain user's intuition of the structure of the database. For example, we may want to have a list of all course sections offered by the physics department in the fall of 2007 with the building and room number of each section.

In [51]:
-- with explicitly joins
SELECT course.course_id, course.title, section.building, section.room_number 
FROM course
INNER JOIN section ON course.course_id = section.course_id
WHERE semester = 'Fall' AND dept_name = 'Physics' AND [year] = '2007';
GO
-- alternatively
SELECT course.course_id, course.title, section.building, section.room_number 
FROM course, section
WHERE semester = 'Fall' AND 
      dept_name = 'Physics' AND 
      section.[year] = '2007' AND
      course.course_id = section.course_id;
GO

course_id,title,building,room_number
612,Mobile Computing,Lamberton,143


course_id,title,building,room_number
612,Mobile Computing,Lamberton,143


It is possibe to compute and store the results of queries such as this and then make them stored relations available to users. However, if we did so, and the underlying data in the relations instructor, course, or section changed, the stored query results would then no longer match the result of reexecuting the query on the relations. In general, it is a bad idea to compute and store query results such as those in the above examples.

<br>

Instead, SQL allows us a "virtual relatio" to be defined by a query and the relation conceptually contains the result of the query. The virtual relation is not precomputed and stored but instead is computed by executing the query whenever the virtual relation is used. Namely, using ``VIEW``s.

<br>

We define a view in SQL using the ``CREATE VIEW`` command. More precisely

```
create view v as <query expression>;
GO
```
Using the query from above...

In [59]:
CREATE VIEW physics_fall_2007 AS 
SELECT course.course_id, course.title, section.building, section.room_number 
FROM course
INNER JOIN section ON course.course_id = section.course_id
WHERE semester = 'Fall' AND dept_name = 'Physics' AND [year] = '2007';
GO

: Msg 2714, Level 16, State 3, Procedure physics_fall_2007, Line 1
There is already an object named 'physics_fall_2007' in the database.

In [54]:
SELECT * 
FROM physics_fall_2007;
GO


course_id,title,building,room_number
612,Mobile Computing,Lamberton,143


In [57]:
-- Another example
CREATE VIEW faculty AS 
SELECT ID, [name], dept_name
FROM instructor;
GO

SELECT TOP 6 * 
FROM faculty;
GO

-- drop it
DROP VIEW faculty;
GO

ID,name,dept_name
10076,Duan,Civil Eng.
10204,Mediratta,Geology
10454,Ugarte,Pol. Sci.
10527,Kieras,Physics
10693,Zabary,Statistics
10834,More,Geology


### Using ``VIEW``s in queries

Once we have defined a view, we can use the view name to refer to the virtual relation and to make queries on this virtual relation.

In [62]:
SELECT COUNT(*) as [n]
FROM physics_fall_2007;

-- drop
DROP VIEW physics_fall_2007;

n
1


In [65]:
-- department salary
CREATE VIEW department_total_salary (dept_name, total_salary) AS
SELECT dept_name, SUM (salary) AS total_salary
FROM instructor
GROUP BY dept_name;
GO

SELECT * FROM department_total_salary;
GO

-- drop
DROP VIEW department_total_salary;
GO


dept_name,total_salary
Accounting,600880.37
Astronomy,746093.08
Athletics,1139516.99
Biology,876600.5
Civil Eng.,667023.0
Comp. Sci.,911917.63
Cybernetics,994407.27
Elec. Eng.,934672.96
English,955379.2
Finance,743333.38


## Transactions

A transaction consists of a sequence of query and/or update statements. The SQL standard specifies that a transaction begins implicitly when an SQL statement is executed. One of the following SQL statements must end the transaction:

- **commit work**; commits the current transaction, that is, it makes the updates performed by the transaction become permanent in the database. After the transaction is committed, a new transaction is automatically started. Once a transaction is commited it cannot be reversed by rollback.
- **rollback** causes the current transaction to be rolled back; that is, it undoes all the updates performed by the SQL statements in the transaction. Thus, the tabase state is restored to what it was before the first statement of the transaction execution. **Rollback** is particularly useful if some error is detected during the execution of the transaction.

For instance, consider a banking application where we need to transfer money from one bank account to another in the same bank. To do so, we need to update two account balances, subtracting the amount transferred from one, and adding it to the other. If the system crashes after subtracting the amount from the first account but before adding it to the second account, the bank balances will be inconsistent. A similar problem occurs if the second account is credited before subtracting the amount from the first account and the system crashes just after crediting the amount.

As another example, consider our running example of a university application. We assume that the attribute tot cred of each tuple in the student relation is kept up-to- date by modifying it whenever the student successfully completes a course. To do so, whenever the takes relation is updated to record successful completion of a course by a student (by assigning an appropriate grade), the corresponding student tuple must also be updated. If the application performing these two updates crashes after one update is performed, but before the second one is performed, the data in the database will be inconsistent.

Applying the notion of transactions to the above applications, the update statements should be executed as a single transaction. An error while a transaction executes one of its statements would result in undoing the effects of the earlier statements of the transaction so that the database is not left in a partially updated state.

Below we have an example of a data insertion with explicit transaction control. In this case, one of the insertions triggers an error due to a violation of an integrity constraint. We run the code under a try-catch framework, if an error is caught we roll back so that the database returns to the original state, else we commit it.

In [80]:
-- check before
SELECT COUNT(*) AS n FROM student;
GO

-- the transaction
BEGIN TRANSACTION;
    -- Try to run the code and catch the error
    BEGIN TRY 
        INSERT INTO student VALUES ('14', 'Anne', NULL, 60); -- This is fine
        INSERT INTO student VALUES ('9002', 'Jane', NULL, -75); -- This violates integritiy constraint of stricly positive total credits
    END TRY
-- check if any errors occured, if yes and there are more than one transactions, rollback
BEGIN CATCH
    SELECT 
        ERROR_NUMBER() AS ErrorNumber,
        ERROR_SEVERITY() AS ErrorSeverity,
        ERROR_STATE() AS ErrorState,
        ERROR_PROCEDURE() AS ErrorProcedure,
        ERROR_LINE() AS ErrorLine,
        ERROR_MESSAGE() AS ErrorMessage;
    -- more than one error, rollback
    IF @@TRANCOUNT > 0
        ROLLBACK TRANSACTION;
END CATCH;
-- no error and more than 0 transactions, commit it
IF @@TRANCOUNT > 0
    COMMIT TRANSACTION;
GO

-- check after: If code goes well, due to the roll back, both should look the same
SELECT COUNT(*) AS n FROM student;
GO

n
2006


ErrorNumber,ErrorSeverity,ErrorState,ErrorProcedure,ErrorLine,ErrorMessage
547,16,0,,7,"The INSERT statement conflicted with the CHECK constraint ""CK__student__tot_cre__3A81B327"". The conflict occurred in database ""uni"", table ""dbo.student"", column 'tot_cred'."


n
2006


Here we do the same, but now both insertions are fine.

In [81]:
-- check before
SELECT COUNT(*) AS n FROM student;
GO

-- the transaction
BEGIN TRANSACTION;
    -- Try to run the code and catch the error
    BEGIN TRY 
        INSERT INTO student VALUES ('14', 'Anne', NULL, 60); -- This is fine
        INSERT INTO student VALUES ('9002', 'Jane', NULL, 75); -- This violates integritiy constraint of stricly positive total credits
    END TRY
-- check if any errors occured, if yes and there are more than one transactions, rollback
BEGIN CATCH
    SELECT 
        ERROR_NUMBER() AS ErrorNumber,
        ERROR_SEVERITY() AS ErrorSeverity,
        ERROR_STATE() AS ErrorState,
        ERROR_PROCEDURE() AS ErrorProcedure,
        ERROR_LINE() AS ErrorLine,
        ERROR_MESSAGE() AS ErrorMessage;
    -- more than one error, rollback
    IF @@TRANCOUNT > 0
        ROLLBACK TRANSACTION;
END CATCH;
-- no error and more than 0 transactions, commit it
IF @@TRANCOUNT > 0
    COMMIT TRANSACTION;
GO

-- check after: If code goes well, due to the roll back, both should look the same
SELECT COUNT(*) AS n FROM student;
GO


n
2006


n
2008
