<h1 style="color:blue"> Introduction Relational Databases: Data Models and SQL</h2>


##Relational database consists of Tables
<img src="img/table.png" width=700/>

<h2 style="color:red">Data model for university courses</h2>

**Course**: code, name, credits, description, ...<br/>
**Teacher**: name, email, office.<br/>
**Students**: name, email, ...<br/>

##`Courses` table
<img src="img/course.png"/>

Table DDL (data definition language):
```
create table courses (
    number varchar(6) not null,
    name varchar(32) not null,
    description varchar(1024),
    credits int 
);
```

## Sequential Query Language (SQL, "sequel")

### `SELECT ... FROM ... WHERE ...`

##CasJobs
In <a href="http://scitest02.pha.jhu.edu/CasJobs/SubmitJob.aspx" target="_blank">CasJobs</a>, using IPDSDB as contex, try:
```
SELECT number, name, description, credits
  FROM courses
```
<img src="img/query_course.png"/><br/><br/>


Try:
```
SELECT number, name
  FROM courses
 WHERE credits = 3
```


##More `WHERE` clauses

* `   =  <>  !=  <  >  <=  >=`
* `   credits between 2 and 3   ` (inclusive) 
* `   name like ‘%Web%’`
* `   credits = 3 and name like '%Web%`
* `   credits = 3 or credits=1`
* `   credits in (2,4)`
* `   description IS NULL`
* `   description IS NOT NULL`
* `   exists (...)     `   (later)


Try:
```
SELECT *
  FROM courses
 WHERE name like '%Web%'
 ```

## NULL
Try and compare:
```
SELECT number, name
  FROM courses
 WHERE description IS NULL
```
```
SELECT number, name
  FROM courses
 WHERE description IS NOT NULL
```
Three-valued-logic:
```
SELECT number, name
  FROM courses
 WHERE description = NULL
```
```
SELECT number, name
  FROM courses
 WHERE description != NULL
```
```
SELECT number, name
  FROM courses
 WHERE NULL = NULL
```


##Custom column names
Try:
```
SELECT number as courseID, name as courseName
  FROM courses
```

##`ORDER BY ... [ASC|DESC]`
Try:
```
SELECT *
  FROM courses
 ORDER BY credits DESC
```

## TOP
Try:
```
SELECT TOP 1 *
  FROM courses
 ORDER BY credits DESC
```

##Aggregation: `COUNT(...), MAX(...), MIN(...), AVG(...)`
Try:
```
SELECT count(*), MIN(credits), MAX(credits), AVG(credits)
  FROM courses
 WHERE credits > 1
```
Note the AVG of integers.
##Casting:
```
SELECT count(*), MIN(credits), MAX(credits), AVG(cast(credits as real))
  FROM courses
 WHERE credits > 1
```


##Course with Teacher
<img src="img/courseTeacher.png"/>

```
create table courseTeacher (
    number varchar(6),
    name varchar(32),
    description varchar1024),
    credits int ,
    teacherName varchar(64) not null,
	teacherEmail varchar(128),
	teacherOffice varchar(32)
);
```

Try:
```
select * from courseteacher
```

<img src="img/query_courseteacher.png"/>


Note the redundancy: teacher's email and office are repeated for each class (s)he gives.

##Course with Teacher and Students
<img src="img/courseTeacherStudent.png" width="300px"/>

```
create table courseTeacherStudent (
    number varchar(6),
    name varchar(32),
    description varchar1024),
    credits int ,
    teacherName varchar(64) not null,
	teacherEmail varchar(128),
	teacherOffice varchar(32),
    studentName varchar(64) not null,
	studentEmail varchar(128)
);
```

##Redundancy

Try
```
SELECT *
  FROM courseteacherstudent
```
<br/>
<img src="img/query_courseteacherstudent.png"/>
Even more redundancy...

##Normalization
Splitting up tables to remove redundancy.

##Separate Course from Teacher
Each `course` has 1-and-only-1 `teacher`, but a teacher can teach `0,1,2.. many` courses.<br/>
I.e. different courses can have same `teacher`.<br/>
Let `course` identify/point-to `teacher` using a 'FOREIGN KEY` column: `teacherId`<br/>

<img src="img/normalize1.png" width="500px"/><br/>


How to query? Use `JOIN`-s

## JOINS
Try
```
SELECT c.number, c.name,c.description, c.credits
,      t.name as teacherName,t.email as teacherEmail, t.office as teacherOffice
  FROM course c
    JOIN  teacher t
      ON t.id=c.teacherid
 ```

Alternative syntax:
 
 ```
SELECT c.number, c.name,c.description, c.credits
,      t.name as teacherName,t.email as teacherEmail, t.office as teacherOffice
  FROM course c
  ,    teacher t
 WHERE t.id=c.teacherid
 ```

##Separating out the students
A row can only point to at most one other row.<br/>
But a student can be enrolled in multiple courses: so no FK from `student` to `course`.<br/>
A course can contains multiple students: so no FK from `course` to `student`.<br/>
Solution:<br/>
Add seperate `enrolled` table, pointing to both `student` and `course`.


##The full data model
<img src="img/courses.png" width=600/>

<h2 style="color:red">table definitions: DDL</h2>
```
create table teacher (id varchar(5), name varchar(24), email varchar(128), office varchar(32));

create table student (id varchar(3), name varchar(24), email varchar(128));

create table course (
    number varchar(6),
    name varchar(32),
    description varchar(1024),
    credits int,
    teacherid varchar(6) 
);

create table enrolled (studentId varchar(3), courseNumber varchar(6));

```

##Sample queries:
Try:
```
SELECT c.number, s.name
  FROM course c
  ,    enrolled e
  ,    student s
 WHERE e.courseNumber = c.number
   AND e.studentID = s.ID
 ```
```
SELECT t.name as teacher, s.name as student
  FROM course c
  ,    enrolled e
  ,    student s
  ,    teacher t
 WHERE e.courseNumber = c.number
   AND e.studentID = s.ID
   AND t.id = c.teacherId
 ORDER by teacher, student
 ```


## GROUP BY [TODO make some of them homework]
Questions: 
* how can we count for each course the number of students it has.
* count for each teacher the number of courses (s)he teaches
* count for each student the number of courses (s)he takes and the total number of credits (s)he may gain
```
SELECT c.number, count(*)
  FROM course c
  ,    enrolled e
  ,    student s
 WHERE e.courseNumber = c.number
   AND e.studentID = s.ID
 GROUP BY c.number
```
```
SELECT s.name, count(*) as numCourses, SUM(c.credits) as totalCredits
  FROM course c
  ,    enrolled e
  ,    student s
 WHERE e.courseNumber = c.number
   AND e.studentID = s.ID
 GROUP BY s.name
```


## sub-select and common-table-expressions
Question: find all students in the course with the lowest number of credits
```
SELECT c.credits, s.*
  FROM (SELECT TOP 1 number,credits
          FROM course
         ORDER BY credits ASC) c
  ,    enrolled e
  ,    student s
 WHERE e.courseNumber = c.number
   AND e.studentID = s.id
```
Alternative:
```
WITH c as (
SELECT TOP 1 number,credits
  FROM course
 ORDER BY credits ASC
)
SELECT c.credits, s.*
  FROM c
  ,    enrolled e
  ,    student s
 WHERE e.courseNumber = c.number
   AND e.studentID = s.id
```



## Tasks
Create a copy of this model in your MyDB and modify it to include the following:
* Add the student's year to the model.
* Add the start/end dates of the course
* Where would grades go?
* How could we model the Department that provides the course. 
* How/where should we add teaching assistants?
* Suggest a way to add "prerequisite courses" to the model.
* ...

Read up on
* `INSERT INTO ...`
* `DELETE FROM ...`
* `DROP TABLE <TABLE-NAME>`


 

<h1 style="color:green">Using IPython Notebook to query database</h1>

In [34]:
# standard first block for defining the token and makinhg it available as a system variable for the session
# token must be replaced with new one once it has expired
token="01cc861ec78d475cad8af907c31ed0c2"
import sys
sys.argv.append("--ident="+token)

In [35]:
# import CasJobs for querying database
import SciServer.CasJobs
# import pandas for using its DataFrame
import pandas
# import numpy for various numerical utilities
import numpy as np
# import plotting library, in particular for plotting elements
import matplotlib.pyplot as plt

In [36]:
query="""
SELECT *
  FROM Course
  """
queryResponse = SciServer.CasJobs.executeQuery(query, "IPDSDB")
# as CSV
body=queryResponse.read().decode('utf-8')

# run again
queryResponse = SciServer.CasJobs.executeQuery(query, "IPDSDB")
# parse results into pandas.DataFrame 
courses = pandas.read_csv(queryResponse,index_col=None)



executeQuery POST response:  200 OK
executeQuery POST response:  200 OK


In [37]:
print(body)

number,name,description,credits,teacherid
"SDV100","College Success Skills","",1,"dm112"
"ITD110","Web Page Design I","An introduction to web page design",3,"je232"
"ITP100","Software Design","General software design principles",3,"je232"
"ITD132","Structured Query Language","",3,"cm147"
"ITP140","Client Side Scripting","",4,"kr387"
"ITP225","Web Scripting Languages","",4,"kr387"



In [38]:
courses

Unnamed: 0,number,name,description,credits,teacherid
0,SDV100,College Success Skills,,1,dm112
1,ITD110,Web Page Design I,An introduction to web page design,3,je232
2,ITP100,Software Design,General software design principles,3,je232
3,ITD132,Structured Query Language,,3,cm147
4,ITP140,Client Side Scripting,,4,kr387
5,ITP225,Web Scripting Languages,,4,kr387


<h2 style="color:red">Data model for runs of an experiment</h2>

<img src="img/Runs.png" width=700/>

<h2 style="color:red"> table definitions: DDL</h2>
```
create table Instruments
(
	InsID int not null identity(1,1) primary key,
	Name nvarchar(64) not null
)

create table Users
(
	UserID int not null identity(10,1) primary key,
	Name nvarchar(128) not null,
	AdvisorID int null,
)

create table Runs
(
	RunID int not null identity(100,1) primary key,
	InsID int not null,
	Xmin float not null,
	Xmax float not null,
	UserID int not null,
	Comment nvarchar(128) null,
)

create table Data
(
	ID int not null identity(10000,1) primary key,
	RunID int not null,
	X float not null,
	Y float not null,
)
```