<a href="https://colab.research.google.com/github/kevinlin1/cluster-schedule/blob/master/schedule.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cluster Scheduler

An experimental scheduling system which, given a dataset of students, their preferences, and available sections, optimizes the assignment of students to sections on the basis of the following treatment conditions, subject to their availability constraints.

- Randomized control, or no preference.
- Homogeneous ability grouping.
- Homogeneous ability grouping and homogeneous preference grouping on Likert-scaled responses to the statement, *I prefer to work in a group.*
- Homogeneous ability grouping, and preventing students identifying with an underrepresented demographic from feeling isolated by preferring groups with 0 or 2 such students.

>[Cluster Scheduler](#scrollTo=OzLHT_YnDPjv&uniqifier=1)

>>[Google Colaboratory](#scrollTo=hwN9COgxWFdg&uniqifier=1)

>>[Data representation](#scrollTo=hMmyh0yCmuBb&uniqifier=1)

>>[Solve the Assignment Problem](#scrollTo=-HR_PTM2opN7&uniqifier=1)

>>>[Objective functions](#scrollTo=u1mXd4ZvrYkM&uniqifier=1)

>>>>[Trivial objective](#scrollTo=3PfHgCJqjiz-&uniqifier=1)

>>>>[Ability objective](#scrollTo=1N_rQ3bVjljI&uniqifier=1)

>>>>[Collaboration objective](#scrollTo=zLywi0rFjnx1&uniqifier=1)

>>>>[Identity objective](#scrollTo=hXsJK1nGjp_n&uniqifier=1)

>>>[Solution generation](#scrollTo=uod0tZbiDPlK&uniqifier=1)

>>>>[Split students into treatment conditions](#scrollTo=Qx829OciOhNT&uniqifier=1)

>>>>[Split sections into treatment conditions](#scrollTo=g4GXEy5ZeYQg&uniqifier=1)

>>>[Assign students to sections](#scrollTo=0ObUzdCVXw8O&uniqifier=1)

>>>[Inferential statistics](#scrollTo=yh00tfQW-fdO&uniqifier=1)

>>[Export schedule](#scrollTo=AdA23-zWDPlN&uniqifier=1)

>>[Send emails](#scrollTo=GGUmWTPmOClz&uniqifier=1)

>>>[Send test message](#scrollTo=l_M7Vmprfk-N&uniqifier=1)

>>>[Send messages](#scrollTo=TEHJUvbj1fA-&uniqifier=1)

>>[Post-processing](#scrollTo=l6jSppPRmFCp&uniqifier=1)



## Google Colaboratory

This notebook was developed with Google Colaboratory which requires a few extra steps to install the extra packages and setup the programming environment. Because data is stored in Google Sheets, file names should be specified as Google Drive file identifiers found in the webpage URL for a particular spreadsheet.

In [0]:
import numpy as np
import pandas as pd
import scipy as sp

import random

from collections import namedtuple, defaultdict
from functools import lru_cache

In [0]:
!pip install -U -q gspread

In [0]:
import google.auth

from google.colab import auth, files
auth.authenticate_user()

# Use the Application Default Credentials through Google Cloud SDK to provide access to Drive
credentials, project_id = google.auth.default()

In [0]:
import gspread

# Monkey patch gspread to support google-auth (https://github.com/burnash/gspread/pull/637)
def login(self):
    """Authorize client."""
    if not self.auth.token or (hasattr(self.auth, 'expired') and self.auth.expired):
        from google.auth.transport.requests import Request

        self.auth.refresh(Request())

    self.session.headers.update({
        'Authorization': 'Bearer %s' % self.auth.token
    })

gspread.Client.login = login

gc = gspread.authorize(credentials)

Default to reading from a Google Sheet as the standard dataframe storage format. Saving the dataframe writes the dataframe as a temporary file to the local Colab storage and then downloads the file to the local client.

In [0]:
def load_dataframe(spreadsheet_id, worksheet_name=None, *args, **kwargs):
    """Return a dataframe from the given spreadsheet_id and worksheet_name. If worksheet_name is
    not specified, use the default sheet1 target.
    """
    df = pd.DataFrame.from_records(
        gc.open_by_key(spreadsheet_id).worksheet(worksheet_name).get_all_values()
        if worksheet_name else
        gc.open_by_key(spreadsheet_id).sheet1.get_all_values()
    )
    return df.rename(columns=df.iloc[0]).drop(df.index[0])


def save_dataframe(df, filename, *args, **kwargs):
    """save the given df to the filename and download the saved file to the local host system."""
    if not filename:
        raise ValueError(f'Invalid filename: {filename}')
    df.to_csv(filename, *args, **kwargs)
    files.download(filename)

## Data representation

In [0]:
#@title Students
#@markdown The `students` dataframe is computed by joining the `exam_scores` with the validated `survey` data. The students in the validated `survey` are a strict subset of those who took the exam and are still enrolled in the course.

#@markdown Display the resulting dataframes and series output in this section. Disable to reduce verbosity.
DISPLAY_DATA = False #@param {type:"boolean"}

In [0]:
#@markdown The **survey** data format is highly structured and requires all of the columns in the exact order and field format declared in the **Column extraction** section. If the data is to be loaded from a downloaded CSV, the CSV must be downloaded directly from the Google Form, rather than from a linked Google Sheet so that all values are properly quoted.
SURVEY_FILE = '' #@param {type:"string"}
SURVEY_NAME = '' #@param {type:"string"}

#@markdown Define the column names: the student email address, student name, and student ID.
SURVEY_EMAIL_C = 'Email Address' #@param {type:"string"}
SURVEY_NAME_C = 'What is your preferred name?' #@param {type:"string"}
SURVEY_SID_C = 'What is your student ID?' #@param {type:"string"}

#@markdown Drop the survey consent column. To submit a response through this form, students must have consented to the research.
SURVEY_CONSENT_C = 'By completing this survey and participating in the Computer Science Mentors program for CS 61A, you consent to take part in the research.' #@param {type:"string"}

survey = (
    load_dataframe(SURVEY_FILE, SURVEY_NAME)
    .loc[:, SURVEY_EMAIL_C:]
    .drop(columns=[SURVEY_CONSENT_C])
    .set_index(SURVEY_SID_C)
)

if DISPLAY_DATA:
    survey

In [0]:
#@markdown Import **exam scores**, keyed on student ID, as a dataframe. Requires a sheet with the following columns.
#@markdown > **Student ID, Exam Score**
EXAM_SCORES_FILE = '' #@param {type:"string"}
EXAM_SCORES_NAME = '' #@param {type:"string"}

#@markdown Define the column names: the student ID and their exam score.
EXAM_SID_C = 'SID' #@param {type:"string"}
EXAM_SCORE_C = 'Total Score' #@param {type:"string"}

exam_scores = (
    load_dataframe(EXAM_SCORES_FILE, EXAM_SCORES_NAME)
    .loc[:, [EXAM_SID_C, EXAM_SCORE_C]]
    .set_index(EXAM_SID_C)
    .replace('', np.nan)
    .dropna()
    .astype(float)
)

if DISPLAY_DATA:
    exam_scores

In [0]:
#@markdown Import the **student roster** to validate the survey data and exam scores. Students who are not in the roster will be dropped from the assignment procedure. Requires a sheet of all of the enrolled student IDs.
ROSTER_FILE = '' #@param {type:"string"}
ROSTER_NAME = '' #@param {type:"string"}

#@markdown Define the column name for the student ID.
ROSTER_SID_C = 'Student ID' #@param {type:"string"}

roster = load_dataframe(ROSTER_FILE, ROSTER_NAME)[ROSTER_SID_C]

if DISPLAY_DATA:
    roster

In [0]:
withheld_students = survey[
    ~(survey.index.isin(exam_scores.index) & survey.index.isin(roster))
]

if DISPLAY_DATA:
    withheld_students

In [0]:
students = exam_scores.join(survey[
    ~survey.index.isin(withheld_students)
], how='right')

if DISPLAY_DATA:
    students

In [0]:
SID_C = students.index.name
(MIDTERM_C, STUDENT_EMAIL_C, STUDENT_NAME_C,
 # Availability
 MON_C, TUE_C, WED_C,
 # Demographic and identity questions
 GROUPS_Q, GENDER_Q, ETHNIC_Q, AGEGRP_Q, FIRSTG_Q, INTNLS_Q, TRNSFR_Q, INTMJR_Q,
 # General evaluation questions
 LIKELY_Q, WRITEP_Q, LEARNC_Q, ANXITY_Q, STRESS_Q, ENCRGE_Q, PICTUR_Q,
 # Class-specific belonging questions
 CLASS_SUPPRT_Q, CLASS_PARTOF_Q, CLASS_ACCEPT_Q, CLASS_COMFRT_Q,
 # Community-specific belonging questions
 CMNTY_SUPPRT_Q, CMNTY_PARTOF_Q, CMNTY_ACCEPT_Q, CMNTY_COMFRT_Q) = students.columns

#@markdown Define the days over which sections can take place. This will be used to extract columns from the schedule and from students' availabilities. If modifications are made, ensure that `DAY_COLUMNS` and the day names are changed as well.
DAYS = 'Monday, Tuesday, Wednesday' #@param {type:"string"}
DAYS = DAYS.split(', ')

DAY_COLUMNS = [MON_C, TUE_C, WED_C]

#@markdown Define the delimiter for section availability.
TIME_DELIMITER = ', ' #@param {type:"string"}


class Student(namedtuple('Student', ['SID'] + list(students.columns), rename=True)):

    _idx = {name: i for i, name in enumerate([students.index.name] + list(students.columns))}

    @property
    @lru_cache(maxsize=None)
    def availability(self):
        return frozenset(
            f'{hour} {day}'
            for column, day in zip(DAY_COLUMNS, DAYS)
            for hour in self[column].split(TIME_DELIMITER)
        )

    def __getitem__(self, key):
        if isinstance(key, str):
            return super().__getitem__(self._idx[key])
        return super().__getitem__(key)

    def __eq__(self, other):
        return self[0] == other[0]

    def __hash__(self):
        return hash(self[0])

class Students(pd.DataFrame):

    _internal_names = pd.DataFrame._internal_names + ['_tuples', '_arrays']
    _internal_names_set = set(_internal_names)

    _tuples = None
    _arrays = None

    @property
    def tuples(self):
        if self._tuples is None:
            self._tuples = {student[SID_C]: student for student in self.itertuples()}
        return self._tuples

    def itertuples(self):
        if self._arrays is None:
            self._arrays = [self.index] + [self.iloc[:, k] for k in range(len(self.columns))]
        return map(Student._make, zip(*self._arrays))

    @property
    def _constructor(self):
        return Students

students = Students(students)

for _, student in zip(range(5), students.tuples.values()):
    student_id = student[SID_C]
    midterm_score = student[MIDTERM_C]
    monday_availability = student[MON_C]
    print(f'{student_id} got {midterm_score} points and can make {monday_availability} Monday')

In [0]:
#@title Sections
#@markdown Import the schedule of sections. Requires a sheet with the following columns.
#@markdown > **Teacher Email, Teacher Name, Room, Capacity, Time**
#@markdown A unique section key is defined on the tuple *(Identifier, Time)* to allow for teachers with the same identifier (i.e. email) to teach multiple sections at different times. Not all data is used until the end of the processing pipeline. For example, the teacher's name is not referenced until sending personalized emails.

#@markdown Display the resulting dataframes and series output in this section. Disable to reduce verbosity.
DISPLAY_DATA = False #@param {type:"boolean"}

In [0]:
SECTION_FILE = '' #@param {type:"string"}
SECTION_NAME = '' #@param {type:"string"}

#@markdown Define the column names: teacher email, teacher name, room, capacity, and time.
TEACHER_EMAIL_C = 'Email' #@param {type:"string"}
TEACHER_NAME_C = 'Name' #@param {type:"string"}
ROOM_C = 'Room' #@param {type:"string"}
CAPACITY_C = 'Capacity' #@param {type:"string"}
TIME_C = 'Time' #@param {type:"string"}

#@markdown Define parameters for the generated section codes.
CODE_C = 'Code' #@param {type:"string"}
CODE_LENGTH = 6 #@param {type:"integer"}

sections = load_dataframe(SECTION_FILE, SECTION_NAME)

# Drop rows with blank entries.
sections = sections[sections != ''].dropna()

# Given date-time strings, convert them to the canonical format (e.g. 9 AM Monday).
sections[TIME_C] = sections[TIME_C].apply(
    lambda s: pd.to_datetime(''.join(s.split()[1:]) + ' ' + s.split()[0])
).dt.strftime('%-I %p %A')

# Interpret section capacity as an integer.
sections[CAPACITY_C] = sections[CAPACITY_C].astype(int)

# Generate section control codes to uniquely identify each section by the email and time.
def ccn(email, time, length=CODE_LENGTH):
    """Return the section code uniquely identifying the section."""
    return str(abs(hash((email, time))))[:length].zfill(length)

sections[CODE_C] = pd.Series([
    ccn(email, time) for email, time in sections[[TEACHER_EMAIL_C, TIME_C]].itertuples(index=False)
])

sections = sections.set_index(CODE_C)

if DISPLAY_DATA:
    sections

In [0]:
CODE_C = sections.index.name
TEACHER_EMAIL_C, TEACHER_NAME_C, ROOM_C, CAPACITY_C, TIME_C = sections.columns


class Section(namedtuple('Section', ['Code'] + list(sections.columns), rename=True)):

    _idx = {name: i for i, name in enumerate([sections.index.name] + list(sections.columns))}

    def __getitem__(self, key):
        if isinstance(key, str):
            return super().__getitem__(self._idx[key])
        return super().__getitem__(key)

    def __eq__(self, other):
        return self[0] == other[0]

    def __hash__(self):
        return hash(self[0])

class Sections(pd.DataFrame):

    _internal_names = pd.DataFrame._internal_names + ['_tuples', '_arrays', 'matching']
    _internal_names_set = set(_internal_names)

    _tuples = None
    _arrays = None

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # Attach the lru_cache to each instance's bound method rather than the class function.
        self.matching = lru_cache(maxsize=None)(self._matching)

    @property
    def tuples(self):
        if self._tuples is None:
            self._tuples = {section[CODE_C]: section for section in self.itertuples()}
        return self._tuples

    def itertuples(self):
        if self._arrays is None:
            self._arrays = [self.index] + [self.iloc[:, k] for k in range(len(self.columns))]
        return map(Section._make, zip(*self._arrays))

    def _matching(self, student):
        """Return the Sections matching the student's availability."""
        return frozenset(
            section
            for time in student.availability
            for section in self[self[TIME_C] == time].itertuples()
        )

    @property
    def _constructor(self):
        return Sections

sections = Sections(sections)

for _, section in zip(range(5), sections.tuples.values()):
    code = section[CODE_C]
    email = section[TEACHER_EMAIL_C]
    time = section[TIME_C]
    print(f'{code} is the unique identifier for the section ({email}, {time})')

## Solve the Assignment Problem

The section scheduling problem is an instance of the [generalized assignment problem](https://en.wikipedia.org/wiki/Generalized_assignment_problem), a constrained, combinatorial optimization problem where multiple students may be assigned to a single section.

### Objective functions

Define the objective functions used in the optimization problem. We use the following mathematical notation and functions.

- $\sigma(\cdot)$ is the population variance, so for the set $X$,
$$\sigma(X) = \frac{1}{n} \sum_{i \in n} (x_i - \bar{x})^2$$

- $g^*_X$ is the group with the theoretical maximum possible variance on set $X$, so
$$\sigma(g^*_X) = \frac{(\max X - \min X)^2}{4}$$

- $\mathrm{likert}(\cdot)$ is a mapping from strings to the integers based on the [Likert scale](https://en.wikipedia.org/wiki/Likert_scale).

A solution is characterized by its assignment of students to sections. Let each student, $s_j \in S$ have an exam score, $e_j$, and a string choice for collaboration (*I prefer to work in a group*), $c_j$ . Let $g_i \in G$ represent each group of students (each $s_j$) assigned to group, $s_j \in g_i$.

In the code below, we define `max_scores_variance` and `max_likert_variance` to normalize exam score and likert responses, respectively.

In [0]:
LIKERT = {
    'Strongly agree': 5,
    'Somewhat agree': 4,
    'Neither agree nor disagree': 3,
    'Somewhat disagree': 2,
    'Strongly disagree': 1,

    'Extremely likely': 5,
    'Somewhat likely': 4,
    'Neither likely nor unlikely': 3,
    'Somewhat unlikely': 2,
    'Extremely unlikely': 1,
}

max_scores_variance = ((students[MIDTERM_C].max() - students[MIDTERM_C].min()) ** 2) / 4
max_likert_variance = ((max(LIKERT.values()) - min(LIKERT.values())) ** 2) / 4

#### Trivial objective

The **trivial objective** simply returns a constant value, $0$, so any valid assignment of students to sections is considered optimal. Since the algorithm is randomized, section assignments will also be randomized based on an initial seed defined later.

In [0]:
def trivial(assignments, schedule):
    """Return the trivial objective, not taking into account any data."""
    return 0

#### Ability objective

The **ability objective** returns the sum of each section's variance in exam scores normalized by the maximum possible intra-group variance.

$$\sum_{g_i \in G} \frac{\sigma(e_j \forall s_j \in g_i)}{\sigma(e_j \forall s_j \in g^*_S)}$$

In [0]:
def ability(assignments, schedule):
    """Return the sum of each section's variance in exam scores."""
    return sum(
        np.var([student[MIDTERM_C] for student in students]) / max_scores_variance
        for students in schedule.values()
    )

#### Collaboration objective

The **collaboration objective** returns the sum of each section's variance in exam scores with the variance in their Likert-scaled responses to the statement, *I prefer to work in a group*.

Both terms are normalized, and the likert scale is given an additional scalar, `COLLABORATION_WEIGHT`, to control its effect.

$$\sum_{g_i \in G} \left(\frac{\sigma(e_j \forall s_j \in g_i)}{\sigma(e_j \forall s_j \in g^*_S)} + \texttt{COLLABORATION_WEIGHT} \cdot \frac{\sigma(\mathrm{likert}(c_j)~\forall s_j \in g_i)}{\sigma(\mathrm{likert}(c_j)~\forall s_j \in g^*_S)}\right)$$

In [0]:
#@markdown Define the constant, *c*, the collaboration weight.
COLLABORATION_WEIGHT = 0.75 #@param {type:"number"}

def collaboration(assignments, schedule):
    """Return the sum of each section's variance in exam scores and the variance in preference for
    working in groups.
    """
    return sum(
        np.var([student[MIDTERM_C] for student in students]) / max_scores_variance
        + COLLABORATION_WEIGHT * (
            np.var([LIKERT[student[GROUPS_Q]] for student in students])
            / max_likert_variance
        )
        for students in schedule.values()
    )

#### Identity objective

The **identity objective** returns the sum of each section's variance in exam scores, normalized against the maximum possible variance, but further penalizing solutions with groups which have exactly 1 student who would otherwise feel isolated on the basis of their identity, modeled by the $\mathit{isolation}(\cdot)$ function.

As with the collaboration objective, the parameter `IDENTITY_WEIGHT` controls the effect of the isolation penalty. Each of the `ISOLATION_PARAMETERS` imposes a penalty to the overall quality of the solution. The `isolation_penalty` prefers sections containing zero or two such students, rather than only one student, to avoid creating groups with one student who may otherwise feel isolated in the section.

> $$\sum_{g_i \in G} \left(\frac{\sigma(e_j \forall s_j \in g_i)}{\sigma(e_j \forall s_j \in g^*_S)} + \texttt{IDENTITY_WEIGHT} \cdot \mathit{isolation}(s_j \in g_i)\right)$$

In [0]:
#@markdown Define the constant, *i*, the identity weight.
IDENTITY_WEIGHT = 0.1 #@param {type:"number"}

#@markdown Define the penalties for each isolation parameter. These values are multiplied with the identity weight to compute the group penalty.
GENDER_PENALTY = 1.5 #@param {type:"number"}
ETHNIC_PENALTY = 1 #@param {type:"number"}
AGEGRP_PENALTY = 1 #@param {type:"number"}
TRNSFR_PENALTY = 1 #@param {type:"number"}
INTMJR_PENALTY = 1 #@param {type:"number"}

ISOLATION_PARAMETERS = (
    ((lambda s: s[GENDER_Q] in ('Female', 'Genderqueer / non-binary')),
     GENDER_PENALTY),
    ((lambda s: s[ETHNIC_Q] == 'Black or African American'),
     ETHNIC_PENALTY),
    ((lambda s: s[ETHNIC_Q] == 'Chicano or Latino'),
     ETHNIC_PENALTY),
    ((lambda s: s[ETHNIC_Q] == 'Middle Eastern or North African'),
     ETHNIC_PENALTY),
    ((lambda s: s[ETHNIC_Q] == 'Native American or Alaska Native'),
     ETHNIC_PENALTY),
    ((lambda s: s[ETHNIC_Q] == 'Pacific Islander'),
     ETHNIC_PENALTY),
    ((lambda s: s[AGEGRP_Q] in ('25-29 years old', '30-34 years old', '35 years old or older')),
     AGEGRP_PENALTY),
    ((lambda s: s[TRNSFR_Q].startswith('Yes')),
     TRNSFR_PENALTY),
    ((lambda s: all(major not in ('Computer Science', 'EECS') for major in s[INTMJR_Q].split(', '))),
     INTMJR_PENALTY),
)

def isolation_penalty(students, parameters=ISOLATION_PARAMETERS):
    """Return a non-negative penalty if the group of students contains a student who might feel
    isolated in the group. The penalty value varies based on the parameters.
    """
    for parameter, penalty in parameters:
        if sum(parameter(student) for student in students) == 1:
            return penalty
    return 0

def identity(assignments, schedule):
    """Return the sum of each sections' variance in exam scores and the isolation penalty."""
    return sum(
        np.var([student[MIDTERM_C] for student in students]) / max_scores_variance
        + IDENTITY_WEIGHT * isolation_penalty(students)
        for students in schedule.values()
    )

Collect all of the objectives into a list of functions and a list of their names for categorizing and identifying results.

In [0]:
objective_functions = [
    trivial,
    ability,
    collaboration,
    identity,
]

### Solution generation

At a high-level, the solution to the assignment problem is generated using the following procedure.

1. Using random assignment, split the students into treatment/control groups, one for each objective function.
2. Using the student splits, compute the optimal `partition` of the section schedule into different treatment conditions so that as many students can be accommodated across all groups possible.
3. Compute the `initial` solution by assigning as many students as possible into sections. Students who are impossible to schedule are dropped.
4. Iteratively improve the solution using the simulated annealing algorithm provided by `anneal`, which calls the `Solution.neighbor` method to compute the next neighboring solution.

`Solution.neighbor` will choose between three possible actions to compute a potential neighboring solution. Which of the three actions is taken is chosen at random.

1. Schedule a student who was previously dropped ("unscheduled") by the algorithm. Iterate through all of the unscheduled students and attempt to schedule them one-by-one. If not possible, swap a student instead.
2. Unschedule, or drop, a student who is currently scheduled. This allows the algorithm to more easily escape a local optimum at the cost of decreasing total enrollment. If not possible, swap a student instead.
3. Swap a student with another student, so long as both students can attend each other's sections.

In [0]:
SEED_PHRASE = 'Computer Science Mentors' #@param {type:"string"}

SEED = sum(ord(c) for c in SEED_PHRASE)
RAND = np.random.RandomState(SEED)
random.seed(SEED)

def shuffled(iterable):
    """Return a new list representing a random shuffling of the items in the iterable."""
    lst = [x for x in iterable]
    random.shuffle(lst)
    return lst

In [0]:
#@title `Solution` class
#@markdown Define the probability weights for each of the possible actions taken by the `neighbor` method. These probabilities will be normalized so they don't need to add to 1.
SCHEDULE_STUDENT_WEIGHT = 0.9 #@param {type:"number"}
UNSCHEDULE_STUDENT_WEIGHT = 0.1 #@param {type:"number"}
SWAP_STUDENT_WEIGHT = 0 #@param {type:"number"}

#@markdown Define the fraction of initial assignments that the `neighbor` algorithm must maintain in the solution. Lower values will drop more students, which may optimize the solution at the cost of reducing total enrollment.
MIN_ASSIGNMENTS_FACTOR = 0.95 #@param {type:"number"}

class Solution:
    """An immutable solution class which contains the assignments of students to sections, sections
    to sets of students, the set of unscheduled students, and methods to compute new solutions.
    """

    @classmethod
    def initial(cls, students, sections, objective, quiet=False):
        """Solution factory method that computes an initial assignment of students to sections,
        dropping students whose availability is too incompatible with the schedule.
        """
        unscheduled = set()
        while True:
            assignments = {}
            schedule = defaultdict(set)
            conflict = None
            for student in shuffled(students.tuples.values()):
                conflict = student
                for section in shuffled(sections.matching(student)):
                    if len(schedule[section]) < section[CAPACITY_C]:
                        assignments[student] = section
                        schedule[section].add(student)
                        conflict = None
                        break
                if conflict:
                    break
            if conflict is None:
                if not quiet:
                    print(f'{objective.__name__}: {len(assignments)} assigned')
                return cls(
                    objective,
                    assignments,
                    schedule,
                    unscheduled,
                    sections.matching,
                    int(len(assignments) * MIN_ASSIGNMENTS_FACTOR),
                ).backfill(quiet=True, interactive=False)
            else:
                students = students.drop(index=conflict[SID_C])
                unscheduled.add(conflict)

    def __init__(self, objective, assignments, schedule, unscheduled=frozenset(),
                 matching=lambda x: frozenset(), min_assignments=None):
        """Create a new Solution."""
        self.objective = objective
        self.costs = {self.objective: self.objective(assignments, schedule)}
        self.assignments = dict(assignments)
        self.schedule = defaultdict(frozenset)
        for section, students in schedule.items():
            self.schedule[section] = students
        self.unscheduled = unscheduled
        self.matching = matching
        if min_assignments is None:
            self.min_assignments = len(assignments) + len(unscheduled)
        else:
            self.min_assignments = min_assignments

    def cost(self, objective=None):
        """Return the cost as computed using the objective for this solution."""
        if objective is None or objective is self.objective:
            return self.costs[self.objective]
        elif objective not in self.costs:
            self.costs[objective] = objective(self.assignments, self.schedule)
        return self.costs[objective]

    def add_student(self, student, section=None, not_at=None, quiet=False, interactive=False):
        """Return a new Solution scheduling the given student into the specified section."""
        if student in self.assignments:
            if not quiet:
                print(f'Already assigned: {student}')
            return self
        elif not quiet:
            print(f'Assigning: {student}')
        if section is None:
            for match in shuffled(self.matching(student)):
                if match[TIME_C] == not_at:
                    continue
                elif match[CAPACITY_C] > len(self.schedule[match]):
                    if not quiet or interactive:
                        print(f'Available section: {match}')
                    accept = ''
                    while interactive and accept == '':
                        accept = input('Accept? [Y/N] ').lower()
                    if not interactive or accept in ('y', 'yes'):
                        section = match
                        break
            if section is None:
                if not quiet:
                    print(f'No available section: {student}')
                return self
        if not quiet:
            print(f'Specified section: {section}')
        if section[CAPACITY_C] <= len(self.schedule[section]):
            if not quiet:
                print(f'No space in section: {section}')
            return self
        return Solution(
            self.objective,
            {
                **self.assignments,
                student: section,
            },
            {
                **self.schedule,
                section: self.schedule[section] | {student},
            },
            self.unscheduled - {student},
            self.matching,
            self.min_assignments
        )

    def add_students(self, pairs, quiet=False, interactive=True):
        """Return a new Solution from enrolling (student, not_at) pairs into sections."""
        pairs = tuple(pairs)
        solutions = (
            self.add_student(
                student, not_at=not_at, quiet=quiet, interactive=interactive
            ) for student, not_at in pairs
        )
        pair, solution = min((
            (pair, solution) for pair, solution in zip(pairs, solutions) if solution is not self
        ), default=(None, self), key=lambda result: result[1].cost())
        if solution is self:
            return self
        if not quiet:
            print(f'Enrolled: {pair[0]}')
            print(solution)
        if interactive:
            accept = ''
            while accept == '':
                accept = input('Accept? [Y/N] ').lower()
            if accept not in ('y', 'yes'):
                return self
        return solution.add_students(frozenset(pairs) - {pair}, quiet=quiet, interactive=interactive)

    def backfill(self, quiet=False, interactive=True):
        """Return a new Solution from enrolling remaining unscheduled students into sections."""
        return self.add_students(
            zip(shuffled(self.unscheduled), [None] * len(self.unscheduled)),
            quiet=quiet,
            interactive=interactive
        )

    def drop_student(self, student):
        """Return a new Solution unscheduling the given student."""
        return self.drop_students([student])

    def drop_students(self, to_drop):
        """Return a new Solution unscheduling the given students."""
        to_drop = frozenset(to_drop)
        dropped = {
            section: students & to_drop
            for section, students in self.schedule.items()
            if any(student in to_drop for student in students)
        }
        return Solution(
            self.objective,
            {
                student: section
                for student, section in self.assignments.items()
                if student not in to_drop
            },
            {
                section: students - to_drop
                for section, students in self.schedule.items()
                if section not in dropped
            },
            self.unscheduled | to_drop,
            self.matching,
            self.min_assignments,
        ), dropped

    def drop_section(self, section):
        """Return a new Solution unscheduling the given section."""
        return self.drop_sections([section])

    def drop_sections(self, to_drop):
        """Return a new Solution unscheduling the given sections."""
        dropped = {section: self.schedule[section] for section in to_drop}
        return Solution(
            self.objective,
            {
                student: section
                for student, section in self.assignments.items()
                if section not in dropped
            },
            {
                section: students
                for section, students in self.schedule.items()
                if section not in dropped
            },
            self.unscheduled.union(*dropped.values()),
            self.matching,
            self.min_assignments
        ), dropped

    def drop_underfull(self, min_size):
        """Return a new Solution unscheduling students in sections with strictly fewer than min_size
        students, and the set of dropped sections.
        """
        return self.drop_sections([
            section
            for section, students in self.schedule.items()
            if len(students) < min_size
        ])

    def reschedule(self, students, not_at=None, quiet=False, interactive=True):
        """Return a new Solution rescheduling the students into sections at different times."""
        solution, dropped = self.drop_students(students)
        if not quiet:
            print(f'Dropped: {dropped.values()}')
        if not_at is None:
            pairs = (
                (student, section[TIME_C])
                for section, students in dropped.items()
                for student in students
            )
        else:
            pairs = zip(students, not_at)
        return solution.add_students(pairs, quiet=quiet, interactive=interactive)

    def neighbor(self):
        """Return a random new Solution which will choose to either enroll an unscheduled student,
        unschedule a student, or swap two students, based on the given probability weights.
        """
        return random.choices(self._options, self._weights, k=1)[0](self)

    def _schedule_student(self):
        """Return a new Solution by randomly scheduling a previously-unscheduled student."""
        for student in shuffled(self.unscheduled):
            solution = self.add_student(student, quiet=True)
            if solution is not self:
                return solution
        return self._swap_student()

    def _unschedule_student(self):
        """Return a new Solution by randomly unscheduling a single student, if allowed."""
        if len(self.assignments) > self.min_assignments:
            return self.drop_student(random.choice(tuple(self.assignments)))[0]
        return self._swap_student()

    def _swap_student(self):
        """Return a new Solution by randomly swapping two students between sections."""
        for student, section in shuffled(self.assignments.items()):
            for swap_section in self.matching(student):
                if section == swap_section:
                    continue
                for swap_student in self.schedule[swap_section]:
                    if section[TIME_C] in swap_student.availability:
                        return Solution(
                            self.objective,
                            {
                                **self.assignments,
                                student: swap_section,
                                swap_student: section,
                            },
                            {
                                **self.schedule,
                                section: (self.schedule[section] - {student}) | {swap_student},
                                swap_section: (self.schedule[swap_section] - {swap_student}) | {student},
                            },
                            self.unscheduled,
                            self.matching,
                            self.min_assignments
                        )
        return self

    def __str__(self):
        return f'Solution optimizing for {self.objective.__name__}: {self.cost()}'

    def __repr__(self):
        return 'Solution(' + ', '.join([
            self.objective.__name__,
            f'assignments={len(self.assignments)}',
            f'schedule={len(self.schedule)}',
            f'unscheduled={len(self.unscheduled)}',
            'matching',
            str(self.min_assignments),
        ]) + f'): {self.cost()}'

    def __eq__(self, other):
        return (
            self.objective is other.objective
            and self.assignments == other.assignments
            and self.schedule == other.schedule
        )

    _options = (
        _schedule_student,
        _unschedule_student,
        _swap_student,
    )

    _weights = (
        SCHEDULE_STUDENT_WEIGHT,
        UNSCHEDULE_STUDENT_WEIGHT,
        SWAP_STUDENT_WEIGHT,
    )

#### Split students into treatment conditions

Use a process of random assignment to split students into groups.

In [0]:
num_conditions = len(objective_functions)

In [0]:
groups_students = np.array_split(students.sample(frac=1, random_state=RAND), num_conditions)
[len(group) for group in groups_students]

#### Split sections into treatment conditions

Use the excess capacity to determine the amount of sections that should be set aside for the withheld group and non-participants. Then, shuffle and split the `schedule` to compute a partitioning of sections between the different treatment conditions.

In [0]:
excess = sections[CAPACITY_C].sum() - len(students)
excess

In [0]:
num_withhold = excess // sections[CAPACITY_C].max()
withheld_sections = sections.sample(num_withhold, random_state=RAND)
groups_sections = np.array_split(sections.drop(withheld_sections.index), num_conditions)
initial_solutions = [
    Solution.initial(group_students, group_sections, objective)
    for group_students, group_sections, objective in zip(
        groups_students, groups_sections, objective_functions
    )
]

Compute partition statistics.

In [0]:
withheld_sections[CAPACITY_C].sum()

In [0]:
[sections[CAPACITY_C].sum() - len(students) for students, sections in zip(groups_students, groups_sections)]

### Assign students to sections

Use a [simulated annealing](https://en.wikipedia.org/wiki/Simulated_annealing) meta-heuristic algorithm to improve upon an initial solution. The algorithm as implemented below performs restarts (temperature reheating) to escape local extrema.

In [0]:
#@title Simulated annealing
#@markdown The initial temperature. The temperature defines the likelihood of accepting a sub-optimal solution.
T_INIT = 1 #@param {type:"number"}

#@markdown The minimum temperature. When the temperature drops below this level, the algorithm restarts (or stops, if it has exceeded `NUM_RESTARTS` restarts).
T_MIN = 0.01 #@param {type:"number"}

#@markdown The multiplicative factor used to decrease the temperature.
ALPHA = 0.99 #@param {type:"number"}

#@markdown The number of neighboring solutions the algorithm should compute at every temperature level.
NUM_ATTEMPTS = 100 #@param {type:"integer"}

#@markdown The number of times the algorithm can restart ("reheat") from its current position, allowing it return to accepting sub-optimal solutions to escape from local minima.
NUM_RESTARTS = 10 #@param {type:"integer"}

def anneal(solution, t_init=T_INIT, t_min=T_MIN, alpha=ALPHA, attempts=NUM_ATTEMPTS,
           restarts=NUM_RESTARTS, quiet=False, verbose=False):
    """Return the best solution using a simulated annealing process."""
    if not quiet:
        print(f'Anneal: {solution.objective.__name__}')
    best = solution
    for i in range(restarts):
        if not quiet:
            print(f'Iteration {i}: {len(best.assignments)} assigned')
        t = t_init
        while t > t_min:
            for _ in range(attempts):
                neighbor = solution.neighbor()
                if (neighbor.cost() <= solution.cost()
                    or np.exp((solution.cost() - neighbor.cost()) / t) > random.random()):
                    # Acceptance probability based on the Metropolis-Hasting algorithm
                    solution = neighbor
                    if solution.cost() <= best.cost():
                        if verbose:
                            print(solution)
                        best = solution
            t *= alpha
    return best


solutions = [
    anneal(solution, verbose=solution.objective is not trivial)
    for solution in initial_solutions
]

List unscheduled students amongst all solutions.

In [0]:
[solution.unscheduled for solution in solutions]

Determine the impact of back-filling the solution. In most cases, the cost of adding these unscheduled students back into the solution is too high to accept their re-enrollment.

In [0]:
def compute_stats(solution, objective, mode, mode_name):
    new_solution = mode(solution)
    return (
        new_solution.objective.__name__,
        objective.__name__,
        mode_name,
        len(new_solution.assignments),
        len(new_solution.assignments) - len(solution.assignments),
        sum(len(students) / section[CAPACITY_C]
            for section, students in new_solution.schedule.items()) / len(new_solution.schedule),
        new_solution.cost(objective),
        new_solution.cost(objective) - new_solution.cost(ability)
        if objective is not trivial else
        new_solution.cost(trivial)
    )


pd.DataFrame.from_records(data=[
    compute_stats(solution, objective, mode, mode_name)
    for solution in solutions
    for objective in objective_functions
    for mode_name, mode in (('Optimal', lambda sol: sol), ('Backfill', Solution.backfill))
], columns=[
    'Group', 'Objective', 'Backfill', 'Assigned', 'Diff', 'Utilization', 'Cost', 'Normalized'
]).set_index([
    'Group', 'Objective', 'Backfill'
])

In [0]:
#@markdown Small sections may be undesirable. Sections with fewer than this many students assigned to them will be removed and those students will be dropped.
MIN_SECTION_SIZE = 3 #@param {type:"integer"}

[sum(len(students) < MIN_SECTION_SIZE for section, students in solution.schedule.items())
 for solution in solutions]

In [0]:
final_solutions, dropped = zip(*(solution.drop_underfull(MIN_SECTION_SIZE) for solution in solutions))
dropped

Compute the new costs and compare against the old costs.

In [0]:
pd.DataFrame.from_records(data=[
    compute_stats(solution, objective, mode, mode_name)
    for solution in final_solutions
    for objective in objective_functions
    for mode_name, mode in (('Optimal', lambda sol: sol), ('Backfill', Solution.backfill))
], columns=[
    'Group', 'Objective', 'Backfill', 'Assigned', 'Diff', 'Utilization', 'Cost', 'Normalized'
]).set_index([
    'Group', 'Objective', 'Backfill'
])

### Inferential statistics

Do the dropped students affect the overall composition of the treatment condition? Compute inferential statistics to compare the treatment groups before and after scheduling to determine the impact of dropping students throughout the scheduling process. Note that any differences before assignment are due to random chance since the student split was done by random assignment.

In [0]:
def align_indexes(*dataframes):
    """Re-index all of the given dataframes so they represent the intersection of all indexes."""
    index = pd.Index(set.intersection(*(set(df.index) for df in dataframes)))
    return (df.reindex(index) for df in dataframes)


GROUP_VALIDATION_TESTS = [
    (MIDTERM_Q, lambda df1, df2, column: sp.stats.mannwhitneyu(df1[column], df2[column])),
    (GROUPS_Q, lambda df1, df2, column: sp.stats.mannwhitneyu(
        df1[column].apply(LIKERT.get),
        df2[column].apply(LIKERT.get),
    )),
] + [
    (question, lambda df1, df2, column: sp.stats.power_divergence(
        *align_indexes(df1[column].value_counts(),
                       df2[column].value_counts()),
        lambda_='log-likelihood',
    )) for question in (GENDER_Q, ETHNIC_Q, AGEGRP_Q, FIRSTG_Q, INTNLS_Q, TRNSFR_Q)
] + [
    (INTMJR_Q, lambda df1, df2, column: sp.stats.power_divergence(
        *align_indexes(df1[column].str.split(', ', expand=True).stack().value_counts(),
                       df2[column].str.split(', ', expand=True).stack().value_counts()),
        lambda_='log-likelihood',
    )),
    (LIKELY_Q, lambda df1, df2, column: sp.stats.mannwhitneyu(
        df1[column].apply(LIKERT.get),
        df2[column].apply(LIKERT.get),
    )),
] + [
    (question.split('[')[-1].rstrip(']'), lambda df1, df2, column: sp.stats.mannwhitneyu(
        df1.filter(like=column).squeeze().apply(LIKERT.get),
        df2.filter(like=column).squeeze().apply(LIKERT.get),
    )) for question in (WRITEP_Q, LEARNC_Q, ANXITY_Q, STRESS_Q, ENCRGE_Q, PICTUR_Q,
                        CLASS_SUPPRT_Q, CLASS_PARTOF_Q, CLASS_ACCEPT_Q, CLASS_COMFRT_Q,
                        CMNTY_SUPPRT_Q, CMNTY_PARTOF_Q, CMNTY_ACCEPT_Q, CMNTY_COMFRT_Q)
]


validation = pd.DataFrame.from_records(data=[
    [solution.objective.__name__, test_name] + list(test(
        group_students, 
        pd.DataFrame.from_records(
            data=[student for student in solution.assignments],
            columns=[group_students.index.name] + [c for c in group_students.columns]
        ).set_index(group_students.index.name),
        test_name
    ))
    for group_students, solution in zip(groups_students, final_solutions)
    for test_name, test in GROUP_VALIDATION_TESTS
], columns=['Objective', 'Question', 'Statistic', 'p-value']).set_index(['Objective', 'Question'])
validation

In [0]:
#@markdown Define the significance level. We may wish to apply significance test corrections (e.g. Bonferroni) due to multiple hypothesis testing and a priori knowledge.
SIGNIFICANCE_LEVEL = 0.05 #@param {type:"number"}

validation[validation['p-value'] < SIGNIFICANCE_LEVEL]

## Export schedule

Export the schedule by loading in student emails from the survey, and then re-joining them with the section assignments.

In [0]:
#@title Section assignments
#@markdown Define the column names: the treatment condition, student ID, student name, student email, section code, teacher name, teacher email, room, and time.
EXPORT_OBJECTIVE_C = 'Treatment Condition' #@param {type:"string"}
EXPORT_SID_C = 'Student ID' #@param {type:"string"}
EXPORT_STUDENT_NAME_C = 'Student Name' #@param {type:"string"}
EXPORT_STUDENT_EMAIL_C = 'Student Email' #@param {type:"string"}
EXPORT_CODE_C = 'Code' #@param {type:"string"}
EXPORT_TEACHER_NAME_C = 'Teacher Name' #@param {type:"string"}
EXPORT_TEACHER_EMAIL_C = 'Teacher Email' #@param {type:"string"}
EXPORT_ROOM_C = 'Room' #@param {type:"string"}
EXPORT_TIME_C = 'Time' #@param {type:"string"}

assignments = pd.DataFrame.from_records((
    (
        solution.objective.__name__,
        student[STUDENT_NAME_C],
        student[STUDENT_EMAIL_C],
        student[SID_C],
        section[CODE_C],
        section[TEACHER_NAME_C],
        section[TEACHER_EMAIL_C],
        section[ROOM_C],
        section[TIME_C],
    )
    for solution in final_solutions
    for student, section in solution.assignments.items()
), columns=[
    EXPORT_OBJECTIVE_C,
    EXPORT_STUDENT_NAME_C,
    EXPORT_STUDENT_EMAIL_C,
    EXPORT_SID_C,
    EXPORT_CODE_C,
    EXPORT_TEACHER_NAME_C,
    EXPORT_TEACHER_EMAIL_C,
    EXPORT_ROOM_C,
    EXPORT_TIME_C,
]).set_index(EXPORT_SID_C)


class Assignment(namedtuple('Assignment', ['SID'] + list(assignments.columns), rename=True)):

    _idx = {name: i for i, name in enumerate([assignments.index.name] + list(assignments.columns))}

    def __getitem__(self, key):
        if isinstance(key, str):
            return super().__getitem__(self._idx[key])
        return super().__getitem__(key)

    def __eq__(self, other):
        return self[0] == other[0]

    def __hash__(self):
        return hash(self[0])

class Assignments(pd.DataFrame):

    _internal_names = pd.DataFrame._internal_names + ['_tuples', '_arrays']
    _internal_names_set = set(_internal_names)

    _tuples = None
    _arrays = None

    @property
    def tuples(self):
        if self._tuples is None:
            self._tuples = {assignment[EXPORT_SID_C]: assignment for assignment in self.itertuples()}
        return self._tuples

    def itertuples(self):
        if self._arrays is None:
            self._arrays = [self.index] + [self.iloc[:, k] for k in range(len(self.columns))]
        return map(Assignment._make, zip(*self._arrays))

    @property
    def _constructor(self):
        return Assignments

assignments = Assignments(assignments)

for _, assignment in zip(range(5), assignments.tuples.values()):
    sid = assignment[EXPORT_SID_C]
    email = assignment[EXPORT_TEACHER_EMAIL_C]
    time = assignment[EXPORT_TIME_C]
    print(f'{sid} is assigned to the section ({email}, {time})')

In [0]:
#@markdown Save the `assignments` dataframe to the following filename.
ASSIGNMENTS_FILE = 'assignments.csv' #@param {type:"string"}

save_dataframe(assignments, ASSIGNMENTS_FILE)

In [0]:
#@title Withheld students
#@markdown Append unscheduled students to the previously withheld students and save the dataframe.
WITHHELD_STUDENTS_FILE = 'withheld-students.csv' #@param {type:"string"}

save_dataframe(withheld_students.append(
    pd.DataFrame.from_records(
        data=[student for solution in final_solutions for student in solution.unscheduled],
        columns=[students.index.name] + list(students.columns),
    ).set_index([students.index.name])
), WITHHELD_STUDENTS_FILE)

In [0]:
#@title Withheld sections
#@markdown Append dropped sections to the previously withheld sections and save the dataframe.
WITHHELD_SECTIONS_FILE = 'withheld-sections.csv' #@param {type:"string"}

save_dataframe(withheld_sections.append(
    pd.DataFrame.from_records(
        data=list(set().union(*dropped_sections)),
        columns=[sections.index.name] + list(sections.columns),
    ).set_index([sections.index.name])
), WITHHELD_SECTIONS_FILE)

## Send emails

Email notifications are configured to send through the [GMail API](https://developers.google.com/gmail/api). The GMail API will need to be [enabled in the user's Google Cloud Console](https://developers.google.com/gmail/api/quickstart/python#step_1_turn_on_the). The GMail API is subject to [Mail Sending Limits](https://developers.google.com/gmail/api/v1/reference/quota#mail_sending_limits).

In [0]:
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

import base64
import os
import pickle

from email.mime.text import MIMEText

In [0]:
#@title Authenticate

#@markdown Define the name of the OAuth2 credentials file with the GMail Google Cloud API enabled.
CREDENTIALS_FILE = 'credentials.json' #@param {type:"string"}

#@markdown Define the name of the token file used to store the temporary authentication.
TOKEN_FILE = 'token.pickle' #@param {type:"string"}
os.remove(TOKEN_FILE)

#@markdown Define the access scopes to request. Other scopes may include useful features like drafting messages.
SCOPES = 'https://www.googleapis.com/auth/gmail.send' #@param {type:"string"}

creds = None
if os.path.exists(TOKEN_FILE):
    with open(TOKEN_FILE, 'rb') as token:
        creds = pickle.load(token)

if not creds or not creds.valid:
    if creds and creds.expired and creds.refresh_token:
        creds.refresh(Request())
    else:
        flow = InstalledAppFlow.from_client_secrets_file(CREDENTIALS_FILE, SCOPES)
        creds = flow.run_local_server()
    with open('token.pickle', 'wb') as token:
        pickle.dump(creds, token)

service = build('gmail', 'v1', credentials=creds)

In [0]:
#@title Message template
#@markdown Define the message template and `MIMEText` object builder.

def create_message(sender, to, cc, subject, message_text, display=True):
    """Create a message for an email."""
    message = MIMEText(message_text)
    message['to'] = to
    message['cc'] = cc
    message['from'] = sender
    message['subject'] = subject
    if display:
        print(message)
    return {'raw': base64.urlsafe_b64encode(message.as_bytes()).decode()}


#@markdown The "from" field containing email sender information.
EMAIL_FROM = '' #@param {type:"string"}

#@markdown The "subject" field containing the email's subject line.
EMAIL_SUBJECT = '' #@param {type:"string"}

#@markdown *Modify the multiline string for `EMAIL_MESSAGE` to change the email's text content.*
EMAIL_MESSAGE = """
Hi {student},

Your CSM CS 61A section will be with {teacher} at {time} in room {room}.

If you have any questions, please make a post on the CSM 61A Piazza.

Kevin
""".strip() or input()

### Send test message

Send a test message to yourself before sending the real emails.

In [0]:
def send_test_message(assignment, interactive=True):
    message = create_message(
        sender=EMAIL_FROM,
        to=EMAIL_FROM,
        subject=EMAIL_SUBJECT,
        message_text=EMAIL_MESSAGE.format(
            student=assignment[EXPORT_STUDENT_NAME_C],
            teacher=assignment[EXPORT_TEACHER_NAME_C],
            time=assignment[EXPORT_TIME_C],
            room=assignment[EXPORT_ROOM_C],
        )
        display=True
    )
    accept = ''
    while interactive and accept == '':
        accept = input('Send message? [Y/N] ').lower()
    if not interactive or accept in ('y', 'yes'):
        service.users().messages().send(userId='me', body=message).execute()
        print(f'Message sent to: {EMAIL_FROM}')

send_test_message(next(assignments.itertuples()))

### Send messages

Send the real messages in batches.

In [0]:
#@markdown Display the message drafts before sending. Disable to reduce verbosity.
DISPLAY_MESSAGES = True #@param {type:"boolean"}
#@markdown Define the number of batches to split the job into. Each batch needs to be approved manually.
NUM_BATCHES = 10 #@param {type:"integer"}
#@markdown Define the starting batch, for jobs that were previously halted.
START_BATCH = 0 #@param {type:"integer"}

def message_from(assignment):
    return create_message(
        sender=EMAIL_FROM,
        to=assignment[EXPORT_STUDENT_EMAIL_C],
        cc=assignment[EXPORT_TEACHER_EMAIL_C],
        subject=EMAIL_SUBJECT,
        message_text=EMAIL_MESSAGE.format(
            student=assignment[EXPORT_STUDENT_NAME_C],
            teacher=assignment[EXPORT_TEACHER_NAME_C],
            time=assignment[EXPORT_TIME_C],
            room=assignment[EXPORT_ROOM_C],
        ),
        display=DISPLAY_MESSAGES
    )

for i, batch in enumerate(np.array_split(assignments, NUM_BATCHES)):
    if i < START_BATCH:
        continue
    messages = [message_from(assignment) for assignment in batch.itertuples()]
    accept = ''
    while accept == '':
        accept = input(f'Send {len(messages)} messages? [Y/N] ').lower()
    if accept in ('y', 'yes'):
        for message in messages:
            service.users().messages().send(userId='me', message).execute()
        print(f'Sent {len(message)} messages')
    else:
        print(f'Halt at batch {i}')
        break

## Post-processing

Drop or reschedule students after initial assignments have been made.

In [0]:
#@markdown Given the exported schedule of assignments, re-load it into the environment for further manipulation.
EXISTING_ASSIGNMENTS_FILE = '' #@param {type:"string"}
EXISTING_ASSIGNMENTS_NAME = '' #@param {type:"string"}

existing_assignments = load_dataframe(
    EXISTING_ASSIGNMENTS_FILE, EXISTING_ASSIGNMENTS_NAME
).set_index(EXPORT_SID_C)
existing_assignments

Restore solutions from the existing assignments. This function depends on the format and representation in which the data was originally exported.

In [0]:
restored_solutions = []
for objective, group_students in zip(objective_functions, groups_students):
    existing = existing_assignments[existing_assignments[EXPORT_OBJECTIVE_C] == objective.__name__]

    assignments = {
        students.tuples[assign[EXPORT_SID_C]]: sections.tuples[assign[EXPORT_CODE_C]]
        for assign in existing.itertuples()
    }
    schedule = {
        sections.tuples[code]: set(students.tuples[sid] for sid in row.dropna())
        for code, row in (
            existing
            .set_index([EXPORT_CODE_C, EXPORT_STUDENT_EMAIL_C])
            [EXPORT_SID_C]
            .unstack()
            .iterrows()
        )
    }
    unscheduled = {
        students.tuples[sid] for sid in set(group_students.index) - set(existing[EXPORT_SID_C])
    },
    matching = sections[
        sections[[TEACHER_EMAIL_C, TIME_C]]
        .isin(existing[[EXPORT_TEACHER_EMAIL_C, EXPORT_TIME_C]])
        .any(axis=1)
    ].matching

    solution = Solution(objective, assignments, schedule, unscheduled, matching)
    restored_solutions.append(solution)
    print(solution)

In [0]:
#@title Drop students
#@markdown The comma-separated list of student IDs to drop.
TO_DROP = '' #@param {type:"string"}

for solution in restored_solutions:
    print(solution)
    _, dropped = solution.drop_students([students.tuples[sid] for sid in TO_DROP.split(', ')])
    print(dropped)
    accept = ''
    while accept == '':
        accept = input('Continue? [Y/N] ').lower()
    if accept not in ('y', 'yes'):
        break

In [0]:
#@title Reschedule students
#@markdown The comma-separated list of student IDs to reschedule.
TO_RESCHEDULE = '' #@param {type:"string"}

for solution in restored_solutions:
    print(solution)
    new_soln = solution.reschedule([students.tuples[sid] for sid in TO_RESCHEDULE.split(', ')])
    print(new_soln)
    accept = ''
    while accept == '':
        accept = input('Continue? [Y/N] ').lower()
    if accept not in ('y', 'yes'):
        break