# Neeps - Easy

The "Neeps" database includes details of all teaching events in the School of Computing at Napier University in Semester 1 of the 2000/2001 academic year.

Graduated questions

The timetable database contains the following tables:

- ut_staff

id | name
---|-----
co.ACg | Cumming, Andrew
co.ACr | Crerar, Dr Alison
co.AFA | Armitage, Dr Alistair
 | ... Result truncated.

The staff table simply gives the unique identifier for each member of staff together with their full name.

- ut_student

id | name | sze | parent
---|------|----:|----------
co1.BAe | BAe | 15 | 
co1.CO | Computing 1st Year | 150 | 
co1.CO.a | Computing 1st Year a | 20 | co1.CO
co1.CO.c | Computing 1st Year c | 20 | co1.CO
co1.CO.d | Computing 1st Year d | 20 | co1.CO
 | | | ... Result truncated.

Each entry in the student table represents a group of students who follow the same timetable. The parent field is used to maintain a hierachy of groups. A student in group co1.CO.a is also in group co1.CO

- ut_room

id | name | capacity | parent
---|------|---------:|---------
co.117+118 |  | 32 | 
co.117 |  | 16 | co.117+118
co.118 |  | 16 | co.117+118
co.B7 |  | 40 | 
 | | | ... result truncated

All classrooms are listed in the room table. The parent relation is used to maintain groupings. The row co.117+118 represents a pair of rooms - sometimes these rooms are used for a single event.

- ut_event

id | modle | kind | dow | tod | duration | room
---|-------|------|-----|-----|---------:|----------
co12004.L01 | co12004 | L | Wednesday | 11:00 | 1 | cr.SMH
co12004.L02 | co12004 | L | Monday | 17:00 | 1 | cr.B13
co12004.T01 | co12004 | T | Monday | 11:00 | 2 | co.G78+G82
co12004.T02 | co12004 | T | Tuesday | 15:00 | 2 | co.B7
co12004.T03 | co12004 | T | Tuesday | 13:00 | 2 | co.G78+G82
 | | | | | | ... Result truncated.

Every event is associated with a single modle (deliberately mis-spelled version of module - which is a reserved word in one of engines). An event occurs on a particular day of the week (dow) at a particular time of day (tod). Every event has a duration specified as a whole number of hours.

- ut_attends

student | event
--------|------
co1.BAe | co12004.L01
co1.CO | co12004.L01
co1.IS | co12004.L01
co1.SE.pt | co12004.L02
co1.CO.a | co12004.T01
 | ... Result truncated.

The attends table links event to student. It realises the many to many relation.

- ut_teaches

staff | event
------|----------
co.AMn | co12004.L01
co.RK | co12004.L01
co.SRM | co12004.L01
co.RK | co12004.L02
co.SRM | co12004.L02
 | ... Result truncated.

The teaches table links event to staff.

- ut_occurs

event | week
------|-------:
co12004.L01 | 01
co12004.L01 | 02
co12004.L01 | 03
co12004.L01 | 04
co12004.L01 | 05
 | ... Result truncated.

Most events occur on weeks 01 to 13. This table records exactly which weeks are part of the plan for each event.

- ut_modle

id | name
---|---------
co12001 | Rapid Application Development
co12002 | Software Development 1A
co12003 | Professional Skills
co12004 | Applications Workshop
 | ... Result truncated.

By the 21st century, the Gradgrindians had completed their assimilation of Education. The process of "modularisation" was just one of their crimes against learning.

- ut_week

id | wkstart
--:|-------:
01 | 02/10/00
02 | 09/10/00
03 | 16/10/00
 | ...Result truncated


In [1]:
import findspark
import pandas as pd
findspark.init()

SVR = '192.168.31.31'
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql import Window

sc = (SparkSession.builder.appName('app15-1') 
      .master(f'spark://{SVR}:7077') 
      .config('spark.sql.warehouse.dir', f'hdfs://{SVR}:9000/user/hive/warehouse') 
      .config('spark.cores.max', '4') 
      .config('spark.executor.instances', '1') 
      .config('spark.executor.cores', '2') 
      .config('spark.executor.memory', '10g') 
      .enableHiveSupport().getOrCreate())

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


In [2]:
ut_staff = sc.read.table('sqlzoo.ut_staff')
ut_student = sc.read.table('sqlzoo.ut_student')
ut_event = sc.read.table('sqlzoo.ut_event')
ut_room = sc.read.table('sqlzoo.ut_room')
ut_attends = sc.read.table('sqlzoo.ut_attends')
ut_teaches = sc.read.table('sqlzoo.ut_teaches')
ut_occurs = sc.read.table('sqlzoo.ut_occurs')
ut_modle = sc.read.table('sqlzoo.ut_modle')
ut_week = sc.read.table('sqlzoo.ut_week')

## 1.
**Give the room id in which the event co42010.L01 takes place.**

In [3]:
(ut_event.filter(ut_event['id']=='co42010.L01')
 .select('room')
 .toPandas())

                                                                                

Unnamed: 0,room
0,cr.132


## 2.
**For each event in module co72010 show the day, the time and the place.**

In [4]:
(ut_event.filter(ut_event['modle']=='co72010')
 .select('dow', 'tod', 'room')
 .toPandas())

Unnamed: 0,dow,tod,room
0,Wednesday,14:00,cr.SMH
1,Tuesday,09:00,cr.B8
2,Wednesday,09:00,co.B7
3,Tuesday,12:00,co.LB42+LB46
4,Tuesday,11:00,co.G75+G76
5,Wednesday,16:00,co.LB42+LB46
6,Thursday,10:00,co.LB42+LB46
7,Wednesday,13:00,co.117+118


## 3.
**List the names of the staff who teach on module co72010.**

In [6]:
(ut_event.filter(ut_event['modle']=='co72010')
 .join(ut_teaches, on=(ut_event['id']==ut_teaches['event']))
 .join(ut_staff, on=(ut_teaches['staff']==ut_staff['id']))
 .select('name')
 .distinct()
 .toPandas())

Unnamed: 0,name
0,"Chisholm, Ken"
1,"Cumming, Andrew"


## 4.
**Give a list of the staff and module number associated with events using room cr.132 on Wednesday, include the time each event starts.**

In [8]:
(ut_event.filter((ut_event['room']=='cr.132') & 
                 (ut_event['dow']=='Wednesday'))
 .join(ut_teaches, on=(ut_event['id']==ut_teaches['event']))
 .join(ut_staff, on=(ut_teaches['staff']==ut_staff['id']))
 .select('name', 'modle')
 .toPandas())

Unnamed: 0,name,modle
0,"Varey, Alison",co32021
1,"Murray, Jim",co22009


## 5.
**Give a list of the student groups which take modules with the word 'Database' in the name.**

In [9]:
(ut_student
 .withColumnRenamed('name', 'student_group')
 .join(ut_attends, on=(ut_student['id']==ut_attends['student']))
 .join(ut_event, on=(ut_attends['event']==ut_event['id']))
 .join(ut_modle.filter(lower(col('name')).like('%database%')), 
       on=(ut_event['modle']==ut_modle['id']))
 .select('student_group')
 .toPandas())

Unnamed: 0,student_group
0,BSc4 Computing
1,BSc4 Computing
2,BEng4 Network and Distributing Systems
3,BEng4 Network and Distributing Systems
4,PgD Information Systems
5,PgD Information Systems a (HCI)
6,PgD Information Systems b (DS)
7,PgD Information Systems d (BT)
8,PgD Information Systems e (OOP)
9,PgD Information System pt. Tues


In [10]:
sc.stop()