## CS 210 Spring 2024 - Apr 22
### Relational Databases Continued

---

### <font color="brown">College Database</font>

---

In case you lost track of the college database state at the end of the previous (Apr 18) class, you can get to it by loading
the <tt>college.sql</tt> file:

<pre>
venugopa@data8:~/cs210_s24/lectures$ mysql venugopa_college < college.sql
</pre>

Note: the college.sql file must be in the folder where you are executing the command.

---

In [31]:
# import connector modules
from mysql.connector import connect, Error

In [34]:
# connect to nobels database
try:
    mydb = connect(unix_socket='/run/mysqld/mysqld.sock', database="venugopa_college")
    cursor = mydb.cursor()
except Error as e:
    print(e)

---

#### <font color="brown">Nested Select</font>

##### **1. Find all students who are older than Chen**
<pre>
  select Name from Student 
  where Age > 
  (select Age from Student where Name = 'Chen')
</pre>

In [17]:
query = """select Name from Student 
           where Age > 
           (select Age from Student where Name='Chen')""" 
doquery(query)

('Eastwood',)
('Bakalova',)


---

#### <font color="brown">Set Operations</font>

##### **1. Find names of students who are either sophomores or honor students**
<pre>
   (select Name from Student where Year='SO') 
      union
   (select Name from Student S, HonorStudent H where H.sid=S.id)
</pre>
Parentheses around each of the select statements are optional

In [18]:
query = """(select Name from Student where Year='SO') 
          union
           (select Name from Student S, HonorStudent H where H.sid=S.id)"""
doquery(query)

('Patel',)
('Perez',)
('Chen',)


---

##### **2. Find names of senior honor students**

The following won't work, because there is no set intersect feature:
<pre>
   (select Name from Student where Year='SR') 
      intersect
   (select Name from Student S, HonorStudent H where H.sid=S.id)
</pre>
But we can do it a couple of other ways:
- **Using join**

<pre>
   select Name from Student S, HonorStudent H where Year='SR' and S.id=H.sid
</pre>

- **Using set membership**

<pre>
   select Name from Student where Year='SR' and Id in (select SId from HonorStudent);  
</pre>
-  Parentheses around the second select are required in the set membership version.   

In [19]:
query = "select Name from Student where Year='SR' and Id in (select SId from HonorStudent)"
doquery(query)

('Chen',)


---

##### **3. Find names of seniors who are not honor students**

**The following won't work, because there is no set difference feature:**
<pre>
   (select Name from Student where Year='SR') 
      difference
   (select Name from Student S, HonorStudent H where H.sid=S.id)
</pre>
But we can do it using set membership:
<pre>
   select Name from Student S where Year='SR' and Name not in 
     (select Name from Student S, HonorStudent H where S.id=H.sid)
</pre>

In [20]:
query="""select Name from Student S 
         where Year='SR' 
         and Name not in 
             (select Name from Student S, HonorStudent H where S.id=H.sid)"""
doquery(query)

('Harris',)


---

##### **4. Find ids of all students who are taking all CS classes**

The strategy is to find all CS classes, then subtract from it all CS classes taken by a student. If there is anything left after subtraction, it means the student is at least one CS class short. In the result, only include students for which the subtraction is an empty set: "exists" tests if a set is non-empty

<pre>
   select distinct E.sid from Enrollment E 
   where not exists 
     (select CName from Class where CName like 'CS%' 
      and CName not in 
        (select F.CName from Enrollment F where F.sid=E.sid))
</pre>

In [21]:
query="""select distinct E.sid from Enrollment E 
   where not exists 
     (select CName from Class where CName like 'CS%' 
      and CName not in 
        (select F.CName from Enrollment F where F.sid=E.sid))"""
doquery(query)

(150,)


---

##### **5. Redo above to get names of students, instead of ids**

<pre>
   select distinct S.Name from Student S, Enrollment E 
   where S.id = E.sid and not exists 
     (select CName from Class where CName like 'CS%' 
      and CName not in 
        (select F.CName from Enrollment F where F.sid=E.sid))
</pre>

In [22]:
query="""select distinct Name from Student S, Enrollment E 
   where S.id=E.sid and not exists 
     (select CName from Class where CName like 'CS%' 
      and CName not in 
        (select F.CName from Enrollment F where F.sid=E.sid))"""
doquery(query)

('Patel',)


---

#### <font color="brown">Order By</font>

##### **1. List students by age**
<pre>
   select Name,Age from Student order by Age
</pre>
You can say 'order by Age asc' but the ascending order is the default, so can be omitted

In [23]:
query='select Name,Age from Student order by Age'
doquery(query)

('Madsen', 18)
('Patel', 19)
('Perez', 21)
('Harris', 21)
('Chen', 22)
('Bakalova', 24)
('Eastwood', 26)


---

##### **2. List CS class enrollments in alphabetical order of class names, and descending order of student positions within each class** 

<pre>
select * from Enrollment 
where CName like 'CS%'
order by CName, Pos desc
</pre>

In [24]:
query="""select * from Enrollment
         where CName like 'CS%' 
         order by CName, Pos desc"""
doquery(query)

(300, 'CS 210', 6)
(100, 'CS 210', 3)
(150, 'CS 210', 2)
(250, 'CS 213', 26)
(150, 'CS 213', 5)
(200, 'CS 323', 2)
(150, 'CS 323', 1)


---

#### <font color="brown">Aggregation/Reduction</font>

##### **1. Find the average student age by year**
<pre>
  select Year, avg(age) 
  from Student 
  group by Year
</pre>

In [25]:
query='select Year, avg(age) from Student group by Year'
cursor.execute(query)
res = cursor.fetchall()
for row in res:
    print(f'{row[0]}  {row[1]:.1f}')

FR  18.0
SO  19.0
JR  21.0
SR  21.5
GR  25.0


---

##### **2. Get the enrollment counts for classes, from highest to lowest enrollment counts**
<pre>
  select CName, count(*) 
  from Enrollment 
  group by CName
  order by count(*) desc
</pre>

In [26]:
query='select CName, count(*) from Enrollment group by CName order by count(*) desc'
doquery(query)

('CS 210', 3)
('CS 213', 2)
('Math 311', 2)
('CS 323', 2)
('Eng 256', 1)
('Eng 316', 1)
('Phy 605', 1)
('Chem 422', 1)
('Econ 586', 1)
('Hist 102', 1)
('Econ 607', 1)
('Hist 401', 1)


---

##### **3. Get the enrollment counts for classes that have at least 2 students**
<pre>
  select CName, count(*) 
  from Enrollment 
  group by CName 
  having count(*) > 1
</pre>

In [27]:
query='select CName, count(*) from Enrollment group by CName having count(*) > 1'
doquery(query)

('CS 210', 3)
('CS 213', 2)
('CS 323', 2)
('Math 311', 2)


---

##### **4. Find the name and id of the youngest student**
<pre>
  select Name,Id from Student 
  where Age in 
  (select min(Age) from Student)
</pre>

In [28]:
query="""select Name,Id from Student 
  where Age in 
  (select min(Age) from Student)"""
doquery(query)

('Madsen', 300)


Alternatively, you can do this:
<pre>
  select Name,Id from Student 
  where Age &lt;= all
  (select Age from Student)
</pre>

**<font color="red">Beware, using &lt; instead of &lt;= will not work!</font>**
<pre>
mysql> select Name,Id from Student 
       where Age &lt; all
      (select Age from Student)
      
Empty set (0.00 sec      
</pre>
**By definition the minimum age must match at least one of the ages, which is impossible with '<'**

---

##### **5. Find the CS class with the least enrollment**
<pre>
  select CName from Enrollment 
  where CName like 'CS%' 
  group by CName
  having count(*) &lt;= all
    (select count(*) from Enrollment
     where CName like 'CS%'
     group by CName)
</pre>

In [29]:
query="""select CName from Enrollment 
  where CName like 'CS%' 
  group by CName
  having count(*) <= all
    (select count(*) from Enrollment
     where CName like 'CS%'
     group by CName)"""
doquery(query)

('CS 213',)
('CS 323',)


In [30]:
cursor.close()
mydb.close()