# Prelim Scheduling

In this lab, you will be rebuilding part of the model that was used to schedule the Cornell Spring 2021 prelims. Our goal will be to assign a day/time(which will sometimes be refered to as a slot) to every exam, along with a list of rooms that the exam will take place in.

The main purpose of this lab is for you to get some experience in trying to model a real world problem, and in a future lab we will see what the implementation part looks like.
<br/> <br/>
*So even if the model you come up with is incomplete, you will still get the full points as long as you worked on every part.*

In [1]:
# imports the modules we use throughout the notebook
import numpy as np
import pandas as pd

input_data_path = './Data/'

To start with, we will go over the data that are used as an input to the model. 

<br>
Courses can request to have prelims, and can express their preferences as to when the exams will be scheduled by giving $3$ prefered dates on which they would like to have the exam. For simplicity, we assume that all $3$ dates are always present. 
<br> <br>
Formally, we have a set of prelim exams, $I=\{1,\ldots,M\}$. Each prelim  $i\in I$ has:

- A unique exam id, which will be denoted by $i \in I$.
- A class name, which is the name of the course associated with the prelim.
- The academic organization(ie CS, ORIE, MATH), that the class belongs to.
-  Enrollment size $s_i\in S \in \mathbb{N}$, which is the number of students that have enrolled in the class.
-  The modality of the prelim, which can be Online or In person. 
-  A 1st, 2nd and 3rd preferred date. We want to have the exam on one of those dates, in that order of preference. Let us denote them as $p_{i,n}, $ for $i \in I$ and $n = 1,2,3$ and $P_i = \{p_{i,1},p_{i,2},p_{i,3}\}$.

In the cell bellow, we read the file containing all of the prelims to be scheduled, along with the information described above. 

In [3]:
# df with prelim exams requested
exams = (pd.read_csv('prelim_exams.csv', index_col = 'exam_id'))
exams.loc[exams.modality == 'Online','enrollment'] = 0
exams

Unnamed: 0_level_0,course,acadorg,enrollment,modality,prefdate,prefdate2,prefdate3
exam_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1-AEM-2210-LEC-2634167-1,AEM 2210,AEM,0,Online,2021-03-18,2021-03-16,2021-03-23
1-AEM-2210-LEC-2634167-2,AEM 2210,AEM,0,Online,2021-04-15,2021-04-13,2021-04-20
1-AEM-2225-LEC-2634167-1,AEM 2225,AEM,0,Online,2021-03-23,2021-03-18,2021-03-25
1-AEM-2225-LEC-2634167-2,AEM 2225,AEM,0,Online,2021-04-20,2021-04-15,2021-04-22
1-AEM-2240-LEC-3778494-1,AEM 2240,AEM,270,In person,2021-03-16,2021-03-04,2021-03-18
...,...,...,...,...,...,...,...
1-STSCI-1380-LEC-1757307-1,STSCI 1380,STSCI,64,In person,2021-03-16,2021-03-18,2021-03-04
1-STSCI-1380-LEC-1757307-2,STSCI 1380,STSCI,64,In person,2021-04-20,2021-04-22,2021-04-15
1-STSCI-2150-LEC-1319792-1,STSCI 2150,STSCI,140,In person,2021-03-18,2021-03-16,2021-03-23
1-STSCI-2150-LEC-1319792-2,STSCI 2150,STSCI,0,Online,2021-04-08,2021-04-06,2021-04-13


#### Question 
You might notice that in the 2nd line of the previous cell, we change the enrollment number of exams that are meant to be online to zero. Can you give a guess as to why we do that?

#### Answer

I'm assuming that we do that so that later on when we try to assign locations to take the exams we will assign the exam based on how many people will be taking the exam so that we can give them a room large enough to fit everyone.



We are given a set of days on which the exams have to be scheduled. Every day has $K$  different times that an exam can start on. 
So let $D$ be the set of days of the semester on which we will schedule prelims, and $K$ be the set of starting times for exams on each day.
<br><br>
You can assume that the prefered days that the courses gave in the data always fall in $D$.

In [4]:
# reads the available exam dates
exam_dates = (pd.read_csv('avail_prel_dates.csv')).exam_dates.tolist()
# number of slots per day
K = 2
exam_dates

['2021-02-25',
 '2021-03-02',
 '2021-03-04',
 '2021-03-16',
 '2021-03-18',
 '2021-03-23',
 '2021-03-25',
 '2021-03-30',
 '2021-04-01',
 '2021-04-06',
 '2021-04-08',
 '2021-04-13',
 '2021-04-15',
 '2021-04-20',
 '2021-04-22',
 '2021-04-29',
 '2021-05-04',
 '2021-05-06',
 '2021-05-11',
 '2021-05-13']

For the exams that will take place in person, we are given a set of rooms $N$, where we can schedule the exams. Each room $r\in N$, has capacity $s_r$, which is the numbers of seats available in that room, accounting for some empty seats due to Covid restriction. There is also a file containing the distance between buildings, so using the building field for every room, you can find out the distance between rooms. You can consider rooms that are in the same building to have zero distance.

In [4]:
# reads the room and building dist dfs
rooms = (pd.read_csv(input_data_path+'rooms.csv', index_col = 'room_id'))
building_dist = (pd.read_csv(input_data_path+'buildings_dist.csv', index_col = 0))
display(rooms)
display(building_dist.head(10))

Unnamed: 0_level_0,capacity,building
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1
Morrison Hall-342,9,Morrison Hall
Physical Sciences Building-401,9,Physical Sciences Building
Rockefeller Hall-102,8,Rockefeller Hall
Olin Hall-128,9,Olin Hall
Baker Laboratory-G02,8,Baker Laboratory
...,...,...
Sibley Hall-235,65,Sibley Hall
Statler Hall Auditorium-185,76,Statler Hall Auditorium
Schwartz Ctr Performing Arts-111,78,Schwartz Ctr Performing Arts
Bailey Hall-101,130,Bailey Hall


Unnamed: 0,Morrison Hall,Physical Sciences Building,Rockefeller Hall,Olin Hall,Baker Laboratory,White Hall,Weill Hall,Riley-Robb Hall,Plant Science Building,Warren Hall,...,Kennedy Hall,Milstein Hall,Phillips Hall,Biotechnology,Klarman Hall,Uris Library,Anabel Taylor Hall,Sibley Hall,Schwartz Ctr Performing Arts,Bailey Hall
Morrison Hall,0.0,0.684845,0.66475,0.773209,0.692658,0.867034,0.420573,0.102957,0.416633,0.445776,...,0.525352,0.804689,0.660057,0.464509,0.725549,0.828247,0.837567,0.819629,0.889184,0.586574
Physical Sciences Building,0.684845,0.0,0.062251,0.33836,0.02075,0.185802,0.302028,0.610824,0.268815,0.24471,...,0.178271,0.128547,0.373643,0.297855,0.082539,0.234214,0.396508,0.136246,0.559373,0.099358
Rockefeller Hall,0.66475,0.062251,0.0,0.282155,0.083001,0.204554,0.263177,0.584533,0.249718,0.240273,...,0.141969,0.177748,0.311421,0.249907,0.061634,0.201613,0.343565,0.176209,0.502183,0.087859
Olin Hall,0.773209,0.33836,0.282155,0.0,0.357541,0.342742,0.367137,0.674295,0.427648,0.45596,...,0.320362,0.403243,0.137407,0.313952,0.265707,0.173671,0.070338,0.380729,0.221176,0.342415
Baker Laboratory,0.692658,0.02075,0.083001,0.357541,0.0,0.18386,0.316642,0.620729,0.278001,0.249645,...,0.193334,0.114992,0.394386,0.314952,0.097671,0.247618,0.41473,0.126976,0.578668,0.110959
White Hall,0.867034,0.185802,0.204554,0.342742,0.18386,0.0,0.466767,0.788748,0.450476,0.430382,...,0.346524,0.110969,0.437094,0.447665,0.147961,0.172918,0.373643,0.078257,0.547018,0.28099
Weill Hall,0.420573,0.302028,0.263177,0.367137,0.316642,0.466767,0.0,0.330426,0.100141,0.16039,...,0.123811,0.430561,0.283389,0.057498,0.3193,0.407981,0.435951,0.435903,0.529809,0.207164
Riley-Robb Hall,0.102957,0.610824,0.584533,0.674295,0.620729,0.788748,0.330426,0.0,0.342828,0.382075,...,0.442967,0.734863,0.55852,0.36982,0.644056,0.737837,0.737609,0.746932,0.786296,0.511507
Plant Science Building,0.416633,0.268815,0.249718,0.427648,0.278001,0.450476,0.100141,0.342828,0.0,0.06246,...,0.118199,0.392041,0.366344,0.149587,0.311183,0.430111,0.497965,0.404531,0.610643,0.170022
Warren Hall,0.445776,0.24471,0.240273,0.45596,0.249645,0.430382,0.16039,0.382075,0.06246,0.0,...,0.135632,0.359423,0.410154,0.204719,0.301198,0.434538,0.525991,0.376107,0.650349,0.153111


Finally, we are given the coenrollment matrix. This has one entry for each one of combination of prelims that have a coenrollment conflict, meaning that there is some number of students enrolled in both courses corresponding to the exams. 

In [5]:
# read the coenrollment df
coenrollments = (pd.read_csv(input_data_path+'coenrollment_s21_prelims.csv'))
# only keep the exams where there are students in common
coenrollments = coenrollments[coenrollments.coenrollment != 0]
coenrollments

Unnamed: 0,exam_id_1,course_1,exam_id_2,course_2,coenrollment
0,1-AEM-2210-LEC-2634167-1,AEM 2210,1-AEM-2241-LEC-1001120-1,AEM 2241,5
1,1-AEM-2210-LEC-2634167-1,AEM 2210,1-AEM-2241-LEC-1001120-2,AEM 2241,5
2,1-AEM-2210-LEC-2634167-1,AEM 2210,1-AEM-2241-LEC-1001120-3,AEM 2241,5
3,1-AEM-2210-LEC-2634167-2,AEM 2210,1-AEM-2241-LEC-1001120-1,AEM 2241,5
4,1-AEM-2210-LEC-2634167-2,AEM 2210,1-AEM-2241-LEC-1001120-2,AEM 2241,5
...,...,...,...,...,...
5826,1-PHYS-2214-LEC-1017932-1,PHYS 2214,1-PHYS-2217-LEC-1009483-2,PHYS 2217,1
5827,1-PHYS-2214-LEC-1017932-2,PHYS 2214,1-PHYS-2217-LEC-1009483-1,PHYS 2217,1
5828,1-PHYS-2214-LEC-1017932-2,PHYS 2214,1-PHYS-2217-LEC-1009483-2,PHYS 2217,1
5829,1-PHYS-3318-LEC-2319573-1,PHYS 3318,1-PHYS-4443-LEC-1013508-1,PHYS 4443,10


For the rest of the lab, you will be working on coming up with a model in the form of an integer program that uses the data above in order to decide what day and time each exam should be scheduled on, as well as the rooms that the exam will take place in.

## Brainstorming 

#### Question
First you should go over the data given again, to make sure you understand what the input to your model will be. After that, spend some time coming up with ideas about what should be included in your model. Here are a few things you can thing about:
- What do you want an exam schedule to accomplish? 
- What should be assigned to each exam by a schedule?
- Are there limitations when creating the exam schedule? Resources that you do not want to exhaust?
- What are things that might not be mandatory, but having a schedule that accomplishes them would make people happier? You can things about this both from the student side and the professor side.

Write down some of your ideas in the next cell. For this part, you do not need to be formal and define variables/constraints, just write them down in text.

#### Answer

Starting off, this exam schedule should be able to accomadate everyone that is taking a final and not have any overlap for any conflicts. This should be the bare minimum that the schedule would do. Each student will be assigned an exam in the schedule and make sure that no student has two exams they have to take at the same time. Some limitations that are presented are the finite amount of locations to take the exams and the amount of time in which the exams must take place. It would also be the amount of time for the tests to be graded. Some TAs might have to commit time to grading other exams for other classes so acting as if grading tests is another exam would give them more time to study for their own tests. After that, making it so that students have time spaced out between their exams will make it less stressful for them to have to study all at once. More preferably overall to not schedule two exams on the same day for one person. This can go the same for professors so that the classes they teach have exams that are spaced out. Finally, if students do have two exams on the same day, making the test locations not too far away from each other will be better so that they don't have to walk too far.



We encourage you to try to talk to people around you, as that might make it easier to come up with ideas. If nothing comes to mind, make sure to ask for help from a TA(that's what they are there for)!

## Creating the model

In this section, you will work on formalizing some of your ideas from the previous part. While before you might have went over some ideas without them being in a concrete form, you will now want to express them using formal notation. It is not necessary to include all ideas you had in the previous part. Use the next cell to write down what will be included in the model that you will be working on for the rest of the lab(still using text).

#### Answer

We will mainly be working on creating the variables and constraints that will minimize the amount of conflicts during the time period of finals. In a perfect world we would be able to have no conflicts at all but because so many people are taking a variety of classes there's bound to be some conflicts here and there. 

### Variables

Define the variables that you will use in your model, based on your answer in the previous cell. You should include the variable name, what they are indexed by, and what they express using formal notation.

#### Answer

We will have a list of students, classrooms, course numbers, time slots, capacity of locations, and the amount of students taking the course number. We will also have some binary varibales if the student will be taking which exam, if the exam is being held at which location, and if the student will be taking their exam at which timeslot. 

### Constraints

Now using the variables you defined before, write down the constraints for your model. Include both the expression for the constraint, as well as a description of what each constraint ensures.

#### Answer

Our first constraint is making sure that at each time slot, only one exam is being held at that location. Our binary variable of if the exam is being held at a location at a timeslot is less than or equal to 1. Another constraint is making sure that the amount of students enrolled in the course does not exceed the capacity of the location. So the sum of the amount of students taking a course at a certain time slot at a certain location is less than or equal to the locations capactiy. 

### Objective function

Finally, define the objective function. Write down an expression, and explain what each term of the objective function accomplishes.

#### Answer

Our objective function will be minimizing the amount of conflicts for each student. We will sum over all the student that are taking an exam at a certain time slot. Each integer will represent the amoutn of exams the student is taking at a certain time slot so we will try to minimize the number of exmas that are overlapping. 

## Conclusion

#### Question
Were there some ideas that you came up with during the brainstorm part that you did not include in the model because you were not sure how to describe formally? If so, use the next cell to list some of them, as well as what made it difficult for you to express them using mathematical notation.

#### Answer

I wasn't too sure how to make sure that the students exams are spaced out. I think that doing something along the lines of not having a students exams 3 time slots in a row would be able to space out the exams a lot more but not sure how I would describe that in mathmatical notation. Also I'm not sure how I would incorporate the distances of the test locations into the formulation. 