# Introduction to SQL (part2)

Basing on information from previous class, we will introduce more advanced sql topics. We will consider:
   * join statments,
   * working with sets
   * grouping and agregate functions,
   * arithmetic operators + built-in math functions,
   * subqueries,
   * indexes,
   * views,
   * triggers.
   
During this class we will use data stored in testdb2.db database.We will connect to this database using sqlite3 package and the code given below:
   

In [1]:
import sqlite3

#We connect to testdb2.db database
conn = sqlite3.connect('testdb2.db')

#We close the connection and free all resources
conn.close()

The testdb2.db contains the following tables:
 * student
 * staff
 * specialistation
 * subject
 * student_subject
 
**The free graphical tool SQLStudio (https://sqlitestudio.pl) can be used  to verify changes in given database.


## JOIN statments

Queries against a single table are certainly not rare, but you will find that most of your queries will require two, three, or even more tables. In the code below we wil use one out of three joins types:
* join
* inner join
* outer join

The easiest way to start is to put the student and specialisation tables into the from clause of a query.

In [2]:
import sqlite3

conn = sqlite3.connect('testdb2.db')

c = conn.cursor()


c.execute("SELECT e.name,e.surname,d.name FROM student e join specialisation d")


#get all results,assign them to the list,fecthall() returns empty list if no results
listOfResults=c.fetchall()
for item in listOfResults:
    print(item)


# Save (commit) the changes
conn.commit()

# We close the connection and free all resources
conn.close()

('Tom', 'Silver', 'QF')
('Tom', 'Silver', 'Finance&Accounting')
('Alex', 'Great', 'QF')
('Alex', 'Great', 'Finance&Accounting')
('Michael', 'Jordan', 'QF')
('Michael', 'Jordan', 'Finance&Accounting')
('Ann', 'Green', 'QF')
('Ann', 'Green', 'Finance&Accounting')
('Jack', 'Gold', 'QF')
('Jack', 'Gold', 'Finance&Accounting')
('Jack', 'Smith', 'QF')
('Jack', 'Smith', 'Finance&Accounting')


The code above returns 30 rows. However, we have only 10 students.  The reason of such result is one: the query didn’t specify how the two tables should be joined, the database server generated the Cartesian product, which is every permutation of the two tables.

### INNER JOIN

If we want to obtain only expected 10 rows, then we should modify our query and use inner join statment. We need to describe how the two tables are related. Earlier, we showed that the student.spec_id column serves as the link between the two tables,
so this information needs to be added to the on subclause of the from clause.

#### Two tables

In [4]:
def printAll(listOfResultsA):
    for item in listOfResultsA:
        print(item)
    

import sqlite3

conn = sqlite3.connect('testdb2.db')

c = conn.cursor()

c.execute("Select * from student")
print("\nPresent all data in student table:")
print([description[0] for description in c.description])
printAll(c.fetchall())

c.execute("Select * from specialisation")
print("\nPresent all data in specialisation table:")
print([description[0] for description in c.description])
printAll(c.fetchall())


print("\n inner join results below:")
c.execute("SELECT e.name,e.surname,d.name FROM student e inner join specialisation d on e.spec_id=d.spec_id")


#get all results,assign them to the list,fecthall() returns empty list if no results
listOfResults=c.fetchall()
printAll(listOfResults)


# Save (commit) the changes
conn.commit()

# We close the connection and free all resources
conn.close()


Present all data in student table:
(1, 'Tom', 'Silver', 1996, 70, 176, 1)
(2, 'Alex', 'Great', 1995, 60, 164, None)
(3, 'Michael', 'Jordan', 1975, 95, 201, None)
(4, 'Ann', 'Green', 1996, 50, 168, None)
(5, 'Jack', 'Gold', 1996, 80, 190, None)
(6, 'Jack', 'Smith', 1996, 75, 175, None)

Present all data in specialisation table:
(1, 'QF')
(2, 'Finance&Accounting')

 inner join results below:
('Tom', 'Silver', 'QF')


If a value exists for the spec_id column in one table but not the other, then the join fails for the rows containing that value and those rows are excluded from the result set.Printed results confirm previous statment while we do not the row which contains None value in spec_id column.

*If we do not specify the type of join, then the server will use an inner join by default.

Below we present, how we can join data from more than two tables.

#### Three tables

In [9]:

import sqlite3

conn = sqlite3.connect('testdb2.db')

c = conn.cursor()

c.execute("Select * from student")
print("\nPresent all data in student table:")
print([description[0] for description in c.description])
printAll(c.fetchall())


c.execute("Select * from subject")
print("\nPresent all data in subject table:")
print([description[0] for description in c.description])
printAll(c.fetchall())


c.execute("Select * from student_subject")
print("\nPresent all data in student_subject table:")
print([description[0] for description in c.description])
printAll(c.fetchall())

print("\nprint inner join results below:")
c.execute("SELECT e.name,e.surname,s.name,d.grade FROM student e inner join student_subject d on e.student_id=d.student_id \
          inner join subject s on d.subject_id=s.subject_id where d.grade>4")


#get all results,assign them to the list,fecthall() returns empty list if no results
listOfResults=c.fetchall()
printAll(listOfResults)


# Save (commit) the changes
conn.commit()

# We close the connection and free all resources
conn.close()


Present all data in student table:
['student_id', 'name', 'surname', 'birth', 'weight', 'height', 'spec_id']
(1, 'Tom', 'Silver', 1996, 70, 176, 1)
(2, 'Alex', 'Great', 1995, 60, 164, None)
(3, 'Michael', 'Jordan', 1975, 95, 201, None)
(4, 'Ann', 'Green', 1996, 50, 168, None)
(5, 'Jack', 'Gold', 1996, 80, 190, None)
(6, 'Jack', 'Smith', 1996, 75, 175, None)

Present all data in subject table:
['subject_id', 'name']
(1, 'Python&SQL Intro')
(2, 'Advanced Macroeconomics')

Present all data in student_subject table:
['id', 'student_id', 'subject_id', 'grade']
(1, 1, 1, 5)
(2, 2, 1, 4)
(3, 3, 1, 3)
(4, 4, 1, 4)

print inner join results below:
('Tom', 'Silver', 'Python&SQL Intro', 5)


## OUTER JOIN

Let's assume that, we want to obtain the list of students and their specialistation (also the cases when specialisation was not chosen). The inner join conditions will fail to find matches for all the rows in the students table,while some students did not make decision about their specialistion. This time the correct resul can be obtained withe outer join.

#### Left outer join

The keyword left indicates that the table on the left side of the join is responsible for determining the number of rows in the result set, whereas the table on the right side is used to provide column values whenever a match is found.

In [10]:
def printAll(listOfResultsA):
    for item in listOfResultsA:
        print(item)
    

import sqlite3

conn = sqlite3.connect('testdb2.db')

c = conn.cursor()

c.execute("Select * from student")
print("\nPresent all data in student table:")
printAll(c.fetchall())

print("\n left outer join results below:")
c.execute("SELECT e.name,e.surname,d.name FROM student e left outer join specialisation d on e.spec_id=d.spec_id")


#get all results,assign them to the list,fecthall() returns empty list if no results
printAll(c.fetchall())


print("\n right outer join results below:")
c.execute("SELECT e.name,e.surname,d.name FROM student e right outer join specialisation d on e.spec_id=d.spec_id")


#get all results,assign them to the list,fecthall() returns empty list if no results
printAll(c.fetchall())

# Save (commit) the changes
conn.commit()

# We close the connection and free all resources
conn.close()


Present all data in student table:
(1, 'Tom', 'Silver', 1996, 70, 176, 1)
(2, 'Alex', 'Great', 1995, 60, 164, None)
(3, 'Michael', 'Jordan', 1975, 95, 201, None)
(4, 'Ann', 'Green', 1996, 50, 168, None)
(5, 'Jack', 'Gold', 1996, 80, 190, None)
(6, 'Jack', 'Smith', 1996, 75, 175, None)

 left outer join results below:
('Tom', 'Silver', 'QF')
('Alex', 'Great', None)
('Michael', 'Jordan', None)
('Ann', 'Green', None)
('Jack', 'Gold', None)
('Jack', 'Smith', None)

 right outer join results below:


OperationalError: RIGHT and FULL OUTER JOINs are not currently supported

*Right outer join is not supported in SQLite. However, the other database systems (MySQL,Oracle) allow us to use it. The expected result is the output table in which number of rows is equal to the number of rows in specialisation table  

## Working with sets

### UNION and UNION ALL statment

The SQLite UNION/UNION ALL operator is used to combine the result sets of 2 or more SELECT statements.
Each SELECT statement within the UNION operator must have the same number of fields in the result sets with similar data type.
Remeber,UNION statment sorts results and removes duplicates from the output set!

<img src="union.png">


### EXCEPT statment

The EXCEPT operator will retrieve all records from the first dataset and then remove from the results all records from the second dataset.Each SELECT statement within the EXCEPT query must have the same number of fields in the result sets with similar data types
<img src="except.png">

### INTERSECT statment

The INTERSECT operator returns the intersection of 2 or more datasets. Each dataset is defined by a SELECT statement. If the record exists in both data set, then it will be included in the results.

<img src="intersect.png">

In [11]:
def printAll(listOfResultsA):
    for item in listOfResultsA:
        print(item)
    

import sqlite3

conn = sqlite3.connect('testdb2.db')

c = conn.cursor()

c.execute("Select * from student")
print("\nPresent all data in student table:")
printAll(c.fetchall())

#UNION
print("\n union results below:")
c.execute("SELECT e.student_id,e.name,e.surname FROM student e union SELECT s.staff_id,s.name,s.surname FROM staff s")


#get all results,assign them to the list,fecthall() returns empty list if no results
printAll(c.fetchall())

#UNION ALL
print("\n union all results below:")
c.execute("SELECT e.student_id,e.name,e.surname FROM student e union all SELECT s.staff_id,s.name,s.surname FROM staff s")


#get all results,assign them to the list,fecthall() returns empty list if no results
printAll(c.fetchall())


#EXCEPT

print("\n except statment results below:")
c.execute("SELECT e.student_id,e.name,e.surname FROM student e except SELECT s.staff_id,s.name,s.surname FROM staff s")


#get all results,assign them to the list,fecthall() returns empty list if no results
printAll(c.fetchall())


#INTERSECT
print("\n intersect statment results below:")
c.execute("SELECT e.student_id,e.name,e.surname FROM student e intersect SELECT s.staff_id,s.name,s.surname FROM staff s")


#get all results,assign them to the list,fecthall() returns empty list if no results
printAll(c.fetchall())

# Save (commit) the changes
conn.commit()

# We close the connection and free all resources
conn.close()


Present all data in student table:
(1, 'Tom', 'Silver', 1996, 70, 176, 1)
(2, 'Alex', 'Great', 1995, 60, 164, None)
(3, 'Michael', 'Jordan', 1975, 95, 201, None)
(4, 'Ann', 'Green', 1996, 50, 168, None)
(5, 'Jack', 'Gold', 1996, 80, 190, None)
(6, 'Jack', 'Smith', 1996, 75, 175, None)

 union results below:
(1, 'Alfred', 'Brown')
(1, 'Tom', 'Silver')
(2, 'Alex', 'Great')
(2, 'Tom', 'White')
(3, 'Michael', 'Jordan')
(4, 'Ann', 'Green')
(5, 'Jack', 'Gold')
(6, 'Jack', 'Smith')

 union all results below:
(1, 'Tom', 'Silver')
(2, 'Alex', 'Great')
(3, 'Michael', 'Jordan')
(4, 'Ann', 'Green')
(5, 'Jack', 'Gold')
(6, 'Jack', 'Smith')
(1, 'Alfred', 'Brown')
(2, 'Tom', 'White')
(3, 'Michael', 'Jordan')

 except statment results below:
(1, 'Tom', 'Silver')
(2, 'Alex', 'Great')
(4, 'Ann', 'Green')
(5, 'Jack', 'Gold')
(6, 'Jack', 'Smith')

 intersect statment results below:
(3, 'Michael', 'Jordan')


### Grouping and agregate functions

The other important feature of SQLite is the trends analysis. Let's say, that we need to know how many students were born during each year. We use GROUP BY and COUNT(*) statments for that task. The GROUP BY statment divides the data basing on info stored in selected column (here column name is birth) or columns,while COUNT(*) calculates number of rows in each group.  

#### GROUP BY

In [57]:
def printAll(listOfResultsA):
    for item in listOfResultsA:
        print(item)
    

import sqlite3

conn = sqlite3.connect('testdb2.db')

c = conn.cursor()

c.execute("Select * from student")
print("\nPresent all data in student table:")
printAll(c.fetchall())

#GROUP BY
print("\n GROUP BY statment results below (columns: birth,count):")
c.execute("SELECT e.birth,COUNT(*) FROM student e GROUP BY e.birth ")


#get all results,assign them to the list,fecthall() returns empty list if no results
printAll(c.fetchall())


# Save (commit) the changes
conn.commit()

# We close the connection and free all resources
conn.close()


Select all data from stduents table:
(1, 'Tom', 'Silver', 1996, 70, 176, 1)
(2, 'Alex', 'Great', 1995, 60, 164, None)
(3, 'Michael', 'Jordan', 1975, 95, 201, None)
(4, 'Ann', 'Green', 1996, 50, 168, None)
(5, 'Jack', 'Gold', 1996, 80, 190, None)
(6, 'Jack', 'Smith', 1996, 75, 175, None)

 GROUP BY statment results below (columns: birth,count):
(1975, 1)
(1995, 1)
(1996, 4)


#### Agregate functions

Aggregate functions perform a specific operation over all rows in a group. Although
every database server has its own set of specialty aggregate functions, the common
aggregate functions implemented by all major servers include:
    
    * Max() Returns the maximum value within a set
    * Min() Returns the minimum value within a set
    * Avg() Returns the average value across a set
    * Sum() Returns the sum of the values across a set
    * Count() Returns the number of values in a set

In [60]:
def printAll(listOfResultsA):
    for item in listOfResultsA:
        print(item)
    

import sqlite3

conn = sqlite3.connect('testdb2.db')

c = conn.cursor()

c.execute("Select * from student")
print("\nPresent all data in student table:")
printAll(c.fetchall())

#GROUP BY+agregate functions
print("\nGROUP BY statment results for each birth year below (columns: max_weight,min_weight,avg_weight,count):")
c.execute("SELECT MAX(e.weight),MIN(e.weight),AVG(e.weight),COUNT(*) FROM student e GROUP BY e.birth ")


#get all results,assign them to the list,fecthall() returns empty list if no results
printAll(c.fetchall())


# Save (commit) the changes
conn.commit()

# We close the connection and free all resources
conn.close()


Select all data from stduents table:
(1, 'Tom', 'Silver', 1996, 70, 176, 1)
(2, 'Alex', 'Great', 1995, 60, 164, None)
(3, 'Michael', 'Jordan', 1975, 95, 201, None)
(4, 'Ann', 'Green', 1996, 50, 168, None)
(5, 'Jack', 'Gold', 1996, 80, 190, None)
(6, 'Jack', 'Smith', 1996, 75, 175, None)

GROUP BY statment results for each birth year below (columns: max_weight,min_weight,avg_weight,count):
(95, 95, 95.0, 1)
(60, 60, 60.0, 1)
(80, 50, 68.75, 4)


### Arithmetic operators + built-in math functions

In SQLite, we can use standard arithmetic operators (+,-,*/) and many built-in math functions which we cane use to operate on 
values stored in columns:
    * Acos(x) Calculates the arc cosine of x
    * Asin(x) Calculates the arc sine of x
    * Atan(x) Calculates the arc tangent of x
    * Cos(x) Calculates the cosine of x
    * Cot(x) Calculates the cotangent of x
    * Exp(x) Calculates ex
    * Abs(x) Calculates absolute value of x
    * Ln(x) Calculates the natural log of x
    * Sin(x) Calculates the sine of x
    * Sqrt(x) Calculates the square root of x
    * Tan(x) Calculates the tangent of x

*Remember,when we divide one integer value by second integer value the result is closest integer value.
If we want to obtain correct results we should use CAST() function and covert values to float. Precision of printed results can be set with ROUND(arg1,arg2) function, where arg2 points number of decimal places in the ouput.

We calculate BMI values for each student in the example below:

In [72]:
def printAll(listOfResultsA):
    for item in listOfResultsA:
        print(item)
    

import sqlite3

conn = sqlite3.connect('testdb2.db')

c = conn.cursor()

c.execute("Select * from student")
print("\nPresent all data in student table:")
printAll(c.fetchall())

#calculate BMI values
print("\nBMI results for each student below:")
c.execute("SELECT e.student_id,e.name,e.surname,e.weight,e.height, \
          CAST(e.weight as FLOAT)/(CAST(e.height as float)/100 *CAST(e.height as float)/100), \
          ROUND(CAST(e.weight as FLOAT)/(CAST(e.height as float)/100 *CAST(e.height as float)/100),2) \
          FROM student e")


#get all results,assign them to the list,fecthall() returns empty list if no results
printAll(c.fetchall())


# Save (commit) the changes
conn.commit()

# We close the connection and free all resources
conn.close()


Present all data in students table:
(1, 'Tom', 'Silver', 1996, 70, 176, 1)
(2, 'Alex', 'Great', 1995, 60, 164, None)
(3, 'Michael', 'Jordan', 1975, 95, 201, None)
(4, 'Ann', 'Green', 1996, 50, 168, None)
(5, 'Jack', 'Gold', 1996, 80, 190, None)
(6, 'Jack', 'Smith', 1996, 75, 175, None)

BMI results for each student below:
(1, 'Tom', 'Silver', 70, 176, 22.59814049586777, 22.6)
(2, 'Alex', 'Great', 60, 164, 22.3081499107674, 22.31)
(3, 'Michael', 'Jordan', 95, 201, 23.514269448776027, 23.51)
(4, 'Ann', 'Green', 50, 168, 17.715419501133788, 17.72)
(5, 'Jack', 'Gold', 80, 190, 22.1606648199446, 22.16)
(6, 'Jack', 'Smith', 75, 175, 24.489795918367346, 24.49)


### Subqueries

A subquery is a query contained within another SQL statement. A subquery is always enclosed within parentheses and returns result set which can consist of:
 * A single row with a single column (example below)
 * Multiple rows with a single column (example below)
 * Multiple rows and columns
 
 Remeber,when subquery returns many rows then in WHERE statment we should use IN or NOT IN operator.


In [83]:
def printAll(listOfResultsA):
    for item in listOfResultsA:
        print(item)
    

import sqlite3

conn = sqlite3.connect('testdb2.db')

c = conn.cursor()

c.execute("Select * from student")
print("\nPresent all data in student table:")
printAll(c.fetchall())

#return the name and surname of the student, who got 5,
#subquery returns single row and single column
print("\nname and surname of of the student who got 5:")
c.execute("SELECT e.student_id,e.name,e.surname \
          FROM students e where e.student_id=\
          (SELECT s.student_id from student_subject s where s.grade=5)")

#get all results,assign them to the list,fecthall() returns empty list if no results
printAll(c.fetchall())


#return the name and surname of the students, who got 4,
#subquery returns multiple rows and single column
print("\nnames and surnames of the students who got 4:")
c.execute("SELECT e.student_id,e.name,e.surname \
          FROM student e where e.student_id IN \
          (SELECT s.student_id from student_subject s where s.grade=4)")

#get all results,assign them to the list,fecthall() returns empty list if no results
printAll(c.fetchall())


# Save (commit) the changes
conn.commit()

# We close the connection and free all resources
conn.close()


Present all data in students table:
(1, 'Tom', 'Silver', 1996, 70, 176, 1)
(2, 'Alex', 'Great', 1995, 60, 164, None)
(3, 'Michael', 'Jordan', 1975, 95, 201, None)
(4, 'Ann', 'Green', 1996, 50, 168, None)
(5, 'Jack', 'Gold', 1996, 80, 190, None)
(6, 'Jack', 'Smith', 1996, 75, 175, None)

name and surname of of the student who got 5:
(1, 'Tom', 'Silver')

names and surnames of students who got 4:
(2, 'Alex', 'Great')
(4, 'Ann', 'Green')


### INDEXES

When we execute INSERT statment,the database server does not  put new record in any particular location within the table. Server places the data in the next free location within the file. When you query the student table for students whose name starts with 'J', the server will need to analyse every row of the table. 

You can decide to add an extra index to student table to speed up any queries that specify student name.
If exists more than one index on a table,the optimizer decides which index will be the most beneficial for a particular SQL
statement.

#### CREATE INDEX
We create new index using CREATE INDEX statment. If you create an index that consists of one column, SQLite uses that column as the sort key. In case you create an index that has multiple columns, SQLite uses the additional columns as the second, third, … sort keys.


In [109]:
import sqlite3

conn = sqlite3.connect('testdb2.db')

c = conn.cursor()

#add extra index on one column 
print("\nwe add extra index on column 'name' in student table")
c.execute("CREATE INDEX idx_student_name ON student (name)")

#get all results,assign them to the list,fecthall() returns empty list if no results
printAll(c.fetchall())

#add extra index on two columns 
print("\nwe add extra index on two columns: 'weight' and 'height' in student table")
c.execute("CREATE INDEX idx_student_weight_height ON student (weight,height)")

#get all results,assign them to the list,fecthall() returns empty list if no results
printAll(c.fetchall())

# Save (commit) the changes
conn.commit()

# We close the connection and free all resources
conn.close()


we add extra index on column 'name' in student table

we add extra index on two columns: 'weight' and 'height' in student table


The code below, returns all the students, whose name starts with 'J':


In [106]:
def printAll(listOfResultsA):
    for item in listOfResultsA:
        print(item)
    

import sqlite3

conn = sqlite3.connect('testdb2.db')

c = conn.cursor()

c.execute("Select * from student")
print("\nPresent all data in student table:")
printAll(c.fetchall())

#return students whose name starts with J'
print("\nstudents whose name starts with 'J':")
c.execute("SELECT e.student_id,e.name,e.surname from student e where e.name LIKE 'J%'")

#get all results,assign them to the list,fecthall() returns empty list if no results
printAll(c.fetchall())


# Save (commit) the changes
conn.commit()

# We close the connection and free all resources
conn.close()


Present all data in student table:
(1, 'Tom', 'Silver', 1996, 70, 176, 1)
(2, 'Alex', 'Great', 1995, 60, 164, None)
(3, 'Michael', 'Jordan', 1975, 95, 201, None)
(4, 'Ann', 'Green', 1996, 50, 168, None)
(5, 'Jack', 'Gold', 1996, 80, 190, None)
(6, 'Jack', 'Smith', 1996, 75, 175, None)

students whose name starts with 'J':
(5, 'Jack', 'Gold')
(6, 'Jack', 'Smith')


#### DROP INDEX

In [110]:
import sqlite3

conn = sqlite3.connect('testdb2.db')

c = conn.cursor()


#delete index,
print("\nwe delete  idx_student_name index")
c.execute("DROP INDEX idx_student_name ")

#get all results,assign them to the list,fecthall() returns empty list if no results
printAll(c.fetchall())

print("\nwe delete  idx_student_weight_height")
c.execute("DROP INDEX idx_student_weight_height")

#get all results,assign them to the list,fecthall() returns empty list if no results
printAll(c.fetchall())

# Save (commit) the changes
conn.commit()

# We close the connection and free all resources
conn.close()


we delete  idx_student_name index

we delete  idx_student_weight_height
