# Scheduling Participants for the May 2025 Atlanta Writer's Conference
*Becky Hodge*

#### Summary
This code should be run ~1 month prior to the conference, and will need to be run repeatedly in the weeks and days prior to accomodate any changes.

The code in this notebook imports the full list of registered and waitlist participants for the May 2025 conference, loads the (manually created) list of fiction and non-fiction genres, as well as the list of agents and timekeepers, and makes any corrections needed before moving on to the next section of code, which involves scheduling Friday's query letter critique sessions.

<font color='red'>**NOTE:**</font> Prior to running this code, make sure you update the *timekeepers* sheet in the *List_of_agents_editors.xlsx* document. If the assigments of the timekeepers is still TBD, that's okay - what's more important is making sure at minimum all the emails for everyone who's going to timekeeper is listed in there. That's the only part needed for this code.

In [528]:
# Install any needed packages
import pandas as pd
import numpy as np
import datetime
import os 
import re
from datetime import timedelta

today = datetime.datetime.today().strftime('%Y-%m-%d')

In [529]:
# Set the conference dates
date_str_fri = '2025-05-02'
date_str_sat = '2025-05-03'

# Reference the current conference folder we should be pulling and storing all datasets/excel files/templates
current_conference_folder= "May2025"

## 1. Data Cleaning

#### Load and clean the different files/reports

In [530]:
# Select the file with the most recent date
directory = f'{current_conference_folder}/Cvent_report_downloads'

most_recent_file = max(
    (f for f in os.listdir(directory) if f.startswith('Registered_') and f.endswith('.csv')),
    key=lambda x: datetime.datetime.strptime(x.split('_')[1].split('.')[0], '%m-%d-%y'),
)

# Load the most recent file
most_recent_path = os.path.join(directory, most_recent_file)
registered = pd.read_csv(most_recent_path)

In [531]:
most_recent_file = max(
    (f for f in os.listdir(directory) if f.startswith('Waitlists_') and f.endswith('.csv')),
    key=lambda x: datetime.datetime.strptime(x.split('_')[1].split('.')[0], '%m-%d-%y'),
)

# Load the most recent file
most_recent_path = os.path.join(directory, most_recent_file)
waitlist = pd.read_csv(most_recent_path)

The below code brings in ALL participants, which is key for knowing whether any waitlist only people are virtual or in person.

In [532]:
most_recent_file = max(
    (f for f in os.listdir(directory) if f.startswith('Allparticipants_') and f.endswith('.csv')),
    key=lambda x: datetime.datetime.strptime(x.split('_')[1].split('.')[0], '%m-%d-%y'),
)

# Load the most recent file
most_recent_path = os.path.join(directory, most_recent_file)
all_participants = pd.read_csv(most_recent_path)

In [533]:
all_participants = all_participants.rename(columns={'Email Address':'Email'})

In [534]:
# Filter this dataset to just virtual people
virtual_only = all_participants.loc[all_participants['Hotel vs. Zoom'] == 'Virtually via Zoom (only available for query letter critiques, manuscript sample critiques, and pitches)', :]

In [535]:
del(directory, most_recent_file, most_recent_path)

fict_gen = pd.read_excel(f'{current_conference_folder}/List_of_genres_agents_editors.xlsx', sheet_name='fiction')
nonfict_gen = pd.read_excel(f'{current_conference_folder}/List_of_genres_agents_editors.xlsx', sheet_name='nonfiction')
pubs = pd.read_excel(f'{current_conference_folder}/List_of_genres_agents_editors.xlsx', sheet_name='agents_editors')

Oh gosh, some of the column names are hefty...  Let's fix those.

In [536]:
registered = registered.rename(columns={'Hotel vs. Zoom':'Virtual', 
                                        "What fiction genre(s) will you be presenting to agents/editors at the conference? (If you're not signing up for any agent/editor meetings, indicate which genre(s) you write.)":'Fiction genre', 
                                        "What nonfiction topic(s) will you be presenting to agents/editors at the conference? (If you're not signing up for any agent/editor meetings, indicate which topic(s) you write.)":'Nonfiction genre', 
                                        'Registration Date (GMT)':'Registration Date',
                                        'Email Address':'Email'})

In [537]:
waitlist = waitlist.rename(columns={'Registration Date (GMT)':'Registration Date',
                                     'Email Address':'Email'})

Let's also fix so that we drop the 'Not applicable --I don't write fiction' and 'Not applicable--I don't write nonfiction'. We'll set them to missing.

In [538]:
registered['Fiction genre']= registered['Fiction genre'].replace("Not Applicable --I don't write fiction", np.nan)
registered['Nonfiction genre']= registered['Nonfiction genre'].replace("Not Applicable--I don't write nonfiction", np.nan)

Also, there's people who wrote in 'Other',  but for our purposes, we don't care about that info for the purposes of matching to agents/editors. Let's remove those.

In [539]:
import re
import numpy as np
def clean_genres(genre_string):
    if genre_string is None or pd.isna(genre_string) or "":
        return ""

    genres = [genre.strip() for genre in genre_string.split(',')]
    cleaned_genres = [genre for genre in genres if not re.match(r"^Other \(please specify\):", genre)]

    return ", ".join(cleaned_genres)

registered['Fiction genre'] = registered['Fiction genre'].apply(clean_genres)
registered['Nonfiction genre'] = registered['Nonfiction genre'].apply(clean_genres)

Lastly, let's replace a few of the ones that have ' in them, which make things tricky

In [540]:
registered['Fiction genre'] = registered['Fiction genre'].str.replace("Women’s", "Women's")
registered['Fiction genre'] = registered['Fiction genre'].str.replace("Children’s picture/chapter books", "Children's picture/chapter books")

registered['Nonfiction genre'] = registered['Nonfiction genre'].str.replace("Women’s issues", "Women's issues")

##### Fix date-times and emails

We need to change the registration date to a date_time variable

In [541]:
registered["datetime"] = pd.to_datetime(registered["Registration Date"])
waitlist["datetime"] = pd.to_datetime(waitlist["Registration Date"])

Let's check to see if every Email Address is associated with a unique first and last name, since ideally we just use email as our unique identifier. It's possible spouses use the same email.

In [542]:
len(registered['Email'].unique())

221

In [543]:
check = registered[['Email', 'First Name']].value_counts().reset_index()
len(check['Email'].unique())

221

Perfect. The number of unique emails match, whether we just look at email, or if we also look at email and first name. Moving forward, we can use email address as a unique identifier.

In [544]:
del(check)

##### Fix phone numbers

Check for any phone numbers (in both the waitlist and registered files) that aren't just 10 digits

In [545]:
phonecheck_wait = waitlist.loc[waitlist['Mobile Phone Number'].str.contains("-")]
print(phonecheck_wait['Mobile Phone Number'])

waitlist['phone'] = waitlist['Mobile Phone Number'].str.replace(r'^(?:\(\+\d+\))|\D', '', regex=True)
print("After fixing waitlist phone numbers, there are now", len(waitlist.loc[waitlist['phone'].str.contains("-")]), "phones with dashes or parentheses")

104    704-965-8148
105    704-965-8148
106    704-965-8148
Name: Mobile Phone Number, dtype: object
After fixing waitlist phone numbers, there are now 0 phones with dashes or parentheses


In [546]:
phonecheck_reg = registered.loc[registered['Mobile Phone Number'].str.contains("-")]
print(phonecheck_reg['Mobile Phone Number'])

registered['phone'] = registered['Mobile Phone Number'].str.replace(r'^(?:\(\+\d+\))|\D', '', regex=True)
print("After fixing registered phone numbers, there are now", len(registered.loc[registered['phone'].str.contains("-")]), "phones with dashes or parentheses")

175     704-965-8148
185     217-637-3230
212     770-655-8952
560     704-965-8148
576     770-655-8952
610     217-637-3230
626     217-637-3230
642     770-655-8952
659     704-965-8148
910     217-637-3230
934     770-655-8952
947     704-965-8148
1087    217-637-3230
1183    770-655-8952
1184    217-637-3230
1860    770-655-8952
1877    217-637-3230
1879    704-965-8148
1902    217-637-3230
1913    770-655-8952
2067    704-965-8148
2079    770-655-8952
2087    217-637-3230
2382    217-637-3230
2397    770-655-8952
2415    704-965-8148
Name: Mobile Phone Number, dtype: object
After fixing registered phone numbers, there are now 0 phones with dashes or parentheses


Now let's check that all phone numbers are ten digits

In [547]:
phonecheck_reg = registered.loc[registered['phone'].str.len()>10]
print(phonecheck_reg[['Mobile Phone Number', 'phone']])

     Mobile Phone Number         phone
17       '+49 1704174774  491704174774
260      '+49 1704174774  491704174774
481      '+49 1704174774  491704174774
783      '+49 1704174774  491704174774
1014     '+49 1704174774  491704174774
1230     '+49 1704174774  491704174774
1303     '+49 1704174774  491704174774
1437     '+49 1704174774  491704174774
1512     '+49 1704174774  491704174774
1662     '+49 1704174774  491704174774
1680     '+49 1704174774  491704174774
2018     '+49 1704174774  491704174774
2262     '+49 1704174774  491704174774


In [548]:
phonecheck_wait = waitlist.loc[waitlist['phone'].str.len()>10]
print(phonecheck_wait[['Mobile Phone Number', 'phone']])

Empty DataFrame
Columns: [Mobile Phone Number, phone]
Index: []


Let's fix both these datasets, so anyone with an international number gets their phone reset to missing (though we'll keep the original Mobile Phone Number column intact)

In [549]:
registered.loc[registered['phone'].str.len()>10, 'phone'] = None
registered['phone'].head()

waitlist.loc[waitlist['phone'].str.len()>10, 'phone'] = None
waitlist['phone'].head()

print(registered['Mobile Phone Number'].isna().sum())
print(registered['phone'].isna().sum())

print(waitlist['Mobile Phone Number'].isna().sum())
print(waitlist['phone'].isna().sum())

0
13
0
0


Good! We didn't have any missing values to begin with, but we reset those 13 international numbers to missing for the phone column, but not the Mobile Phone Number column.

Let's move on to email addresses now, and check for any that are missing or problematic. First, we'll check if any are missing:

##### Fix emails

In [550]:
weird_emails = registered.loc[registered['Email'].isna(), ]

Yay! Everyone filled out an email. So now we just need to check that nobody put in faulty emails that will cause problems later:

In [551]:
weird_emails = registered.loc[registered['Email'].str.contains(r'^[\w\.-]+@[a-zA-Z\d-]+\.[a-zA-Z]{2,}$', regex=True)==False, ]
print(weird_emails['Email'].drop_duplicates())

201     aflo_1@yahoo.co.uk
271    jlary@alumni.iu.edu
Name: Email, dtype: object


In [552]:
weird_emails = waitlist.loc[waitlist['Email'].str.contains(r'^[\w\.-]+@[a-zA-Z\d-]+\.[a-zA-Z]{2,}$', regex=True)==False, ]
print(weird_emails['Email'].drop_duplicates())

10    aflo_1@yahoo.co.uk
Name: Email, dtype: object


Okay, the emails all look fine. They are valid email addresses.

##### Add in virtual variable to the waitlist dataset

In [553]:
waitlist['virtual'] = waitlist['Email'].apply(
    lambda email: 'Virtual' if email in virtual_only['Email'].values else 'In person'
)

#### Drop any unneeded variables

Let's drop any extraneous variables from the waitlist and registration datasets

In [554]:
waitlist.drop(columns=['Registration Date', 'Invitee Status', 'Action', 'Confirmation Number'],axis=1, inplace=True) # columns are 1, rows are 0
registered.drop(columns=['Agenda Item Type', 'Registration Date', 'Registration Type', 'Action'],axis=1, inplace=True) # columns are 1, rows are 0

In [555]:
del(weird_emails, phonecheck_reg, phonecheck_wait)

#### Bring in timekeeper information

In [556]:
# Load the time keepers
timekeepers = pd.read_excel(f'{current_conference_folder}/List_of_genres_agents_editors.xlsx', sheet_name='timekeepers')

#### Create lists with all time-by-room values
We need to pull in the start times for each of the time slots for Friday afternoon (query letter critiques), Saturday morning (manuscript critiques), and Saturday afternoon (pitches) sessions. Without worrying about who our timekeepers are, or which agents are assigned to those rooms, we'll create 3 lists with the times-by-room.

In [557]:
room_fr = pd.read_excel(f'{current_conference_folder}/List_of_genres_agents_editors.xlsx', sheet_name='rooms_friday')
room_sat = pd.read_excel(f'{current_conference_folder}/List_of_genres_agents_editors.xlsx', sheet_name='rooms_sat')
timeslots = pd.read_excel(f'{current_conference_folder}/List_of_genres_agents_editors.xlsx', sheet_name='timeslots')

In [558]:
rooms_friday = room_fr.loc[:, 'day':'room_name']
rooms_saturday = room_sat.loc[:, 'day':'room_name']

Now let's combine the timeslots dataset with the friday and saturday rooms datasets to get the lists we need

In [559]:
tslist_fri = pd.merge(timeslots.loc[(timeslots['day']=='Friday') & (timeslots['day_session']=='Afternoon'), :], rooms_friday, how='outer', on='day')

In [560]:
rooms_coach = pd.read_excel(f'{current_conference_folder}/List_of_genres_agents_editors.xlsx', sheet_name='coaches')

In [561]:
tslist_coach = pd.merge(timeslots.loc[(timeslots['day']=='Friday') & (timeslots['day_session']=='Coaching'), :], rooms_coach, how='outer', on='day')

We need to load the info on which agents and editors were paired together and are in which rooms.

In [562]:
final_room_pairings_Friday = pd.read_excel(f'{current_conference_folder}/Outputs/Finalized datasets/Editor-agent pairings for Friday.xlsx')
final_cross_pubs = pd.read_excel(f'{current_conference_folder}/Outputs/Finalized datasets/Final editor-agent pairings with combined genres.xlsx')

## 2. Bring in Requests and Prior Assignments and check for any changes/updates

To assign participants to editor-agent pairs for the conference, we need to ensure that they write in fiction and/or nonfiction genre(s) that either editor and/or agent in the pair represents. To do that, let's first identify every participant who registered for a query letter critique on Friday.

In [563]:
query_critique_names = registered.loc[registered['Agenda Item Name'].str.contains('Query Letter Critique'), :]

Let's get a count of each email in this list, so we know the number of query letter critiques each person signed up for. Then we'll delete the original query_critique_names dataset.

In [564]:
queries = query_critique_names['Email'].value_counts().reset_index()
del(query_critique_names)

Importantly, for Friday's assignments, we can't assign people to agents/editors they're seeing on Saturday for a pitch or manuscript critique. In order to account for this, we also need to create datasets for the manuscript and pitches, so we can combine all three datasets later. 

Our goal is to create a single row per participant that lists any agents/editors they chose on Saturday, and to have know how many query letter critiques those people want.

In [565]:
pitches = registered.loc[registered['Agenda Item Name'].str.contains('Pitch'), :]

We need to extract out the publisher name from the Agenda Item Name column

In [566]:
pitches['pubname'] = pitches['Agenda Item Name'].str.replace("Pitch [A-Z] with ", "", regex=True)
pitch = pitches[['Email', 'pubname']].value_counts().reset_index()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [567]:
print(pitch.loc[pitch['count']>1, :])

Empty DataFrame
Columns: [Email, pubname, count]
Index: []


<font color='red'>**NOTE FOR THE ABOVE:**</font> In an ideal world, there should be nobody printed above. Everyone should have a count of 1, since they can't meet with and pitch the same publisher multiple times for the same book. However, very, very rarely, someone will want to meet with a publisher twice to pitch them *different* books, so there can be counts of two or more.

We always want to confirm this with the participants though, to confirm that the double booking was intentional and not a registration error.

Now that we've checked that, please note that people can sign up for up to three pitches (typically with 3 different agents/editors). We now need to create a combined variable per registrant that has ALL their pitch agents/editors.

In [568]:
pitchA = pitches.loc[pitches['Agenda Item Name'].str.contains("Pitch A with "), ['Email', 'pubname']]
pitchB = pitches.loc[pitches['Agenda Item Name'].str.contains("Pitch B with "), ['Email', 'pubname']]
pitchC = pitches.loc[pitches['Agenda Item Name'].str.contains("Pitch C with "), ['Email', 'pubname']]

pitchA = pitchA.rename(columns={'pubname':'pitchA'})
pitchB = pitchB.rename(columns={'pubname':'pitchB'})
pitchC = pitchC.rename(columns={'pubname':'pitchC'})

In [569]:
pitch2 = pd.merge(pd.merge(pitchA, pitchB, how='outer', on='Email'), pitchC, how='outer', on='Email')
pitch2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 118 entries, 0 to 117
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Email   118 non-null    object
 1   pitchA  75 non-null     object
 2   pitchB  80 non-null     object
 3   pitchC  69 non-null     object
dtypes: object(4)
memory usage: 3.8+ KB


In [570]:
# Did any participants request the same person to pitch more than once?
print(pitch2.loc[(pitch2['pitchA'] == pitch2['pitchB']) | (pitch2['pitchA'] == pitch2['pitchC']) | (pitch2['pitchB'] == pitch2['pitchC'])])

Empty DataFrame
Columns: [Email, pitchA, pitchB, pitchC]
Index: []


Now let's create a combined variable of everyone's chosen publishers for their pitch.

In [571]:
def combine_variables(row):
    return ', '.join(str(x) for x in row.dropna()) #convert to strings, drop Nas, and join.

pitch2['pitches_chosen_pubs'] = pitch2[['pitchA', 'pitchB', 'pitchC']].apply(combine_variables, axis=1)

Great! Now let's repeat this process for manuscript critiques.

In [572]:
ms = registered.loc[registered['Agenda Item Name'].str.contains('Manuscript'), :]

In [573]:
ms['pubname'] = ms['Agenda Item Name'].str.replace("Manuscript Critique [A-Z] with ", "", regex=True)
manuscript = ms[['Email', 'pubname']].value_counts().reset_index()
len(ms['Email'].unique())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


99

In [574]:
len(ms['Email'].unique()) == len(manuscript['Email'].unique())

True

Cool. Nobody signed up for duplicate manuscript critiques.

In [575]:
msA = ms.loc[ms['Agenda Item Name'].str.contains("Manuscript Critique A with "), ['Email', 'pubname']]
msB = ms.loc[ms['Agenda Item Name'].str.contains("Manuscript Critique B with "), ['Email', 'pubname']]
msC = ms.loc[ms['Agenda Item Name'].str.contains("Manuscript Critique C with "), ['Email', 'pubname']]

msA = msA.rename(columns={'pubname':'msA'})
msB = msB.rename(columns={'pubname':'msB'})
msC = msC.rename(columns={'pubname':'msC'})

In [576]:
manuscript = pd.merge(pd.merge(msA, msB, how='outer', on='Email'), msC, how='outer', on='Email')

In [577]:
manuscript['ms_chosen_pubs'] = manuscript[['msA', 'msB', 'msC']].apply(combine_variables, axis=1)

In [578]:
del(pitch, pitches)
queries = queries.rename(columns={'count': 'num_query_critiques'})

In [579]:
# Let's create a dataset with everyone doing the Friday workshop, so we can easily reference them later
fri_workshop = registered[registered['Agenda Item Name'].str.contains('Friday Workshop')]

Starting for May 2025, George added in 'Author coaching' as one of the types of sessions participates could select when registering. These are scheduled AROUND the query letter critiques (if relevant), and so as not to coincide with the Friday workshop at 4pm (if relevant).

Let's make a dataset with the coaching info.

In [580]:
reg_coaching = registered.loc[registered['Agenda Item Name'].str.contains("Coach"), :].copy()
reg_coaching['Friday_workshop'] = reg_coaching['Email'].isin(fri_workshop['Email'])
reg_coaching['QLC'] = reg_coaching['Email'].isin(queries['Email'])

# Extract out just the coaches' names
reg_coaching['Coach'] = reg_coaching['Agenda Item Name'].str.removeprefix("Author Coaching with ")

Woohoo! Okay, now it's time to merge the pitch and the manuscript info, and then link it back to the query critiques as well, so we have the full list of participants with all of their chosen editors, and whether or not they have any query letter critiques.

In [581]:
merge1 = pd.merge(manuscript, pitch2, how='outer', on='Email')[['Email', 'pitchA', 'pitchB', 'pitchC', 'msA', 'msB', 'msC', 'pitches_chosen_pubs', 'ms_chosen_pubs']]
email_set = set(queries['Email'].dropna())
merge2 = pd.merge(merge1, queries, how='outer', on='Email')
merge2['query_critique'] = merge2['Email'].apply(lambda email: email in email_set if pd.notna(email) else False)

In [582]:
del(merge1, pitch2, manuscript, queries, room_fr, room_sat, email_set)

Perrrfect. Last step is to create a dataset with one row per email, which has their fiction and non-fiction genres, as well as if they're virtual or remote. We'll then join this to our dataset above.

In [583]:
per_registrant = registered.drop_duplicates(subset='Email', keep='first')[['Email', 'Virtual', 'Fiction genre', 'Nonfiction genre', 'datetime']]
per_registrant['Virtual'] = per_registrant['Virtual'].replace(['Virtually via Zoom (only available for query letter critiques, manuscript sample critiques, and pitches)', 'In person at the conference hotel'],
                                                              ['Virtual', 'In person'])

In [584]:
print(per_registrant['Virtual'].value_counts())

Virtual
In person    187
Virtual       32
Name: count, dtype: int64


Now we'll rename the Email Address to Email, and then we'll merge the dataframes to get one big one with all participants who registered for any of the three main activities: query letter critiques, manuscripts, or pitches.

In [585]:
per_registrant2 = pd.merge(per_registrant, merge2, how='outer', on='Email')

### Create 3 different datasets: one per query letter critiques, MS critiques, and pitches
Before doing any scheduling, we need to create 3 different datasets for these three different activities, so we can easily schedule them below in their respective sections.

In [586]:
ms_critiques = per_registrant2.loc[pd.notna(per_registrant2['ms_chosen_pubs']), ['Email', 'Virtual', 'ms_chosen_pubs', 'msA', 'msB', 'msC']]

In [587]:
pitches = per_registrant2.loc[pd.notna(per_registrant2['pitches_chosen_pubs']), ['Email', 'Virtual', 'pitches_chosen_pubs', 'pitchA', 'pitchB', 'pitchC']]

In [588]:
query_critiques = per_registrant2.loc[per_registrant2['query_critique']==True, :].copy()
query_critiques['Friday_workshop'] = query_critiques['Email'].isin(fri_workshop['Email']) # Add in a flag for Friday workshop people

In [589]:
del(per_registrant, merge2)

### Bring in hardcoded and prior scheduling assignments

Some participants requested specific publishers and/or time slots, and these participants were 'hardcoded', with their requests being automatically handled by the code. Additionally, we want our script to basically re-assign everybody who had an assignment already to their prior spots.

The one catch, however, is we also need to ensure that we catch any changes in registration - meaning, is every hardcoded and prior assigned participant still signed up for their particular activity they've been assigned to?

##### Hardcoded people

Let's start by loading the hardcoded participants. This is a manually maintained excel file, and please note that for the Friday QLCs, the pubname1 and pubname2 MUST match the editor-agent pairings, and that it's using email to identify a participant.

In [590]:
hardcoded = pd.read_excel(f'{current_conference_folder}/People to hardcode.xlsx')

# Filter this to create separate QLC, MS and Pitch datasets
hardcoded_qlc = hardcoded.loc[hardcoded['Session']=='QLC',['Email', 'timeslot_start', 'pubname1', 'pubname2']]
hardcoded_ms = hardcoded.loc[hardcoded['Session']=='Manuscript critique', ['Email', 'timeslot_start', 'pubname1']]
hardcoded_pitch = hardcoded.loc[hardcoded['Session']=='Pitch', ['Email', 'timeslot_start', 'pubname1']]
hardcoded_coach = hardcoded.loc[hardcoded['Session']=='Coach', ['Email', 'timeslot_start', 'pubname1']] # This has zero rows

hardcoded_ms = hardcoded_ms.rename(columns={'pubname1' : 'publisher'})
hardcoded_pitch = hardcoded_pitch.rename(columns={'pubname1' : 'publisher'})
hardcoded_coach = hardcoded_coach.rename(columns={'pubname1' : 'publisher'})

##### Prior assignments

Now let's check for any prior assignments for the different activities: QLC, coaching, pitch, and MS.

In [591]:
# QLC prior assignments 

folder = f'{current_conference_folder}/Outputs/Finalized datasets'
matching_files = [
    f for f in os.listdir(folder)
    if f.startswith('Final Friday query letter critique assignments') and f.endswith('.xlsx')
]

if matching_files:
    most_recent_file = max(
        matching_files,
        key=lambda x: datetime.datetime.strptime(x.split('_')[1].split('.')[0], '%Y-%m-%d')
    )
    most_recent_path = os.path.join(folder, most_recent_file)
    qlc_prior_assignments = pd.read_excel(most_recent_path)
else:
    qlc_prior_assignments = None

In [592]:
# MS prior assignments 

matching_files = [
    f for f in os.listdir(folder)
    if f.startswith('Final manuscript critique assignments') and f.endswith('.xlsx')
]

if matching_files:
    most_recent_file = max(
        matching_files,
        key=lambda x: datetime.datetime.strptime(x.split('_')[1].split('.')[0], '%Y-%m-%d')
    )
    most_recent_path = os.path.join(folder, most_recent_file)
    ms_prior_assignments = pd.read_excel(most_recent_path)
else:
    ms_prior_assignments = None

In [593]:
# Pitch prior assignments 

matching_files = [
    f for f in os.listdir(folder)
    if f.startswith('Finalized pitch assignments') and f.endswith('.xlsx')
]

if matching_files:
    most_recent_file = max(
        matching_files,
        key=lambda x: datetime.datetime.strptime(x.split('_')[1].split('.')[0], '%Y-%m-%d')
    )
    most_recent_path = os.path.join(folder, most_recent_file)
    pitch_prior_assignments = pd.read_excel(most_recent_path)
else:
    pitch_prior_assignments = None

In [594]:
# Coaching prior assignments 

matching_files = [
    f for f in os.listdir(folder)
    if f.startswith('Finalized coaching schedule') and f.endswith('.xlsx')
]

if matching_files:
    most_recent_file = max(
        matching_files,
        key=lambda x: datetime.datetime.strptime(x.split('_')[1].split('.')[0], '%Y-%m-%d')
    )
    most_recent_path = os.path.join(folder, most_recent_file)
    coach_prior_assignments = pd.read_excel(most_recent_path)
else:
    coach_prior_assignments = None

#### Merge hardcodes and prior assignments

In [595]:
# QUERY LETTER CRITIQUES

columns_to_keep = ['Email', 'pubname1', 'pubname2', 'timeslot_start']

# Filter and rename for clarity
hc = (
    hardcoded_qlc[columns_to_keep].copy()
    if 'hardcoded_qlc' in locals() and isinstance(hardcoded_qlc, pd.DataFrame) and not hardcoded_qlc.empty
    else pd.DataFrame(columns=columns_to_keep)
)

prior = (
    qlc_prior_assignments[columns_to_keep].copy()
    if 'qlc_prior_assignments' in locals() and isinstance(qlc_prior_assignments, pd.DataFrame) and not qlc_prior_assignments.empty
    else pd.DataFrame(columns=columns_to_keep)
)

if not hc.empty or not prior.empty:
    # Merge on identifying columns
    merged = pd.merge(
        hc,
        prior,
        on=['Email', 'pubname1', 'pubname2'],
        how='outer',
        suffixes=('_hc', '_prior')
    )

    # Use hardcoded time if present, otherwise fallback to prior assignment
    merged['timeslot_start'] = merged['timeslot_start_hc'].combine_first(merged['timeslot_start_prior'])

    # Keep only the final form
    qlc_allrequests = merged[['Email', 'pubname1', 'pubname2', 'timeslot_start']].drop_duplicates()
    del(merged)

else:
    print("No QLC data available to combine.")
del(hc, prior)

In [596]:
# MS CRITIQUES
columns_to_keep = ['Email', 'publisher', 'timeslot_start']

# Filter and rename for clarity
hc = (
    hardcoded_ms[columns_to_keep].copy()
    if 'hardcoded_ms' in locals() and isinstance(hardcoded_ms, pd.DataFrame) and not hardcoded_ms.empty
    else pd.DataFrame(columns=columns_to_keep)
)

prior = (
    ms_prior_assignments[columns_to_keep].copy()
    if 'ms_prior_assignments' in locals() and isinstance(ms_prior_assignments, pd.DataFrame) and not ms_prior_assignments.empty
    else pd.DataFrame(columns=columns_to_keep)
)

if not hc.empty or not prior.empty:
    # Merge on identifying columns
    merged = pd.merge(
        hc,
        prior,
        on=['Email', 'publisher'],
        how='outer',
        suffixes=('_hc', '_prior')
    )

    # Use hardcoded time if present, otherwise fallback to prior assignment
    merged['timeslot_start'] = merged['timeslot_start_hc'].combine_first(merged['timeslot_start_prior'])

    # Keep only the final form
    ms_allrequests = merged[columns_to_keep].drop_duplicates()
    del(merged)

else:
    print("No MS data available to combine.")
del(hc, prior)

In [597]:
# PITCHES
columns_to_keep = ['Email', 'publisher', 'timeslot_start']

# Filter and rename for clarity
hc = (
    hardcoded_pitch[columns_to_keep].copy()
    if 'hardcoded_pitch' in locals() and isinstance(hardcoded_pitch, pd.DataFrame) and not hardcoded_pitch.empty
    else pd.DataFrame(columns=columns_to_keep)
)

prior = (
    pitch_prior_assignments[columns_to_keep].copy()
    if 'pitch_prior_assignments' in locals() and isinstance(pitch_prior_assignments, pd.DataFrame) and not pitch_prior_assignments.empty
    else pd.DataFrame(columns=columns_to_keep)
)

if not hc.empty or not prior.empty:
    # Merge on identifying columns
    merged = pd.merge(
        hc,
        prior,
        on=['Email', 'publisher'],
        how='outer',
        suffixes=('_hc', '_prior')
    )

    # Use hardcoded time if present, otherwise fallback to prior assignment
    merged['timeslot_start'] = merged['timeslot_start_hc'].combine_first(merged['timeslot_start_prior'])

    # Keep only the final form
    pitch_allrequests = merged[columns_to_keep].drop_duplicates()
    del(merged)

else:
    print("No pitch data available to combine.")
del(hc, prior)

In [598]:
# COACHES
columns_to_keep = ['Email', 'publisher', 'timeslot_start']

# Filter and rename for clarity
hc = (
    hardcoded_coach[columns_to_keep].copy()
    if 'hardcoded_coach' in locals() and isinstance(hardcoded_coach, pd.DataFrame) and not hardcoded_coach.empty
    else pd.DataFrame(columns=columns_to_keep)
)

prior = (
    coach_prior_assignments[columns_to_keep].copy()
    if 'coach_prior_assignments' in locals() and isinstance(coach_prior_assignments, pd.DataFrame) and not coach_prior_assignments.empty
    else pd.DataFrame(columns=columns_to_keep)
)

if not hc.empty or not prior.empty:
    # Merge on identifying columns
    merged = pd.merge(
        hc,
        prior,
        on=['Email', 'publisher'],
        how='outer',
        suffixes=('_hc', '_prior')
    )

    # Use hardcoded time if present, otherwise fallback to prior assignment
    merged['timeslot_start'] = merged['timeslot_start_hc'].combine_first(merged['timeslot_start_prior'])

    # Keep only the final form
    coach_allrequests = merged[columns_to_keep].drop_duplicates()
    del(merged)
else:
    print("No coach data available to combine.")
del(hc, prior)

#### Check for changes in registration that affects prior scheduling/requests

Now we need to double check that nobody's changed their registration - if so, we want to DROP them from the allrequests datasets.

In [599]:
# Check for MS critique changes

if 'ms_allrequests' in locals() and not ms_allrequests.empty: # We only want to run this code IF the ms_allrequests dataset exists
    # Step 1: Melt ms_critiques to long format
    ms_long = ms_critiques.melt(id_vars='Email', value_vars=['msA', 'msB', 'msC'], 
                                var_name='ms_slot', value_name='publisher')

    # Step 2: Drop rows with missing publishers (in case some msA/B/C are blank)
    ms_long = ms_long.dropna(subset=['publisher'])

    # Step 3: Merge to keep only rows from ms_allrequests that match email + publisher
    filtered_ms_allrequests = pd.merge(ms_allrequests, ms_long[['Email', 'publisher']], 
                                       on=['Email', 'publisher'], how='inner')

    del ms_long, ms_allrequests

In [600]:
# Check for pitch changes

if 'pitch_allrequests' in locals() and not pitch_allrequests.empty: # We only want to run this code IF the pitch_allrequests dataset exists
    # Step 1: Melt ms_critiques to long format
    pitch_long = pitches.melt(id_vars='Email', value_vars=['pitchA', 'pitchB', 'pitchC'], 
                                var_name='pitch_slot', value_name='publisher')

    # Step 2: Drop rows with missing publishers (in case some msA/B/C are blank)
    pitch_long = pitch_long.dropna(subset=['publisher'])

    # Step 3: Merge to keep only rows from ms_allrequests that match email + publisher
    filtered_pitch_allrequests = pd.merge(pitch_allrequests, pitch_long[['Email', 'publisher']], 
                                    on=['Email', 'publisher'], how='inner')

    del(pitch_long, pitch_allrequests)

In [601]:
# Check for QLC changes

if not qlc_allrequests.empty: # We only want to run this code IF the qlc_allrequests dataset exists
    # Step 1: Count how many requests each email made in qlc_allrequests
    request_counts = qlc_allrequests['Email'].value_counts().reset_index()
    request_counts.columns = ['Email', 'actual_requests']

    # Step 2: Merge that with query_critiques to compare with allowed requests
    comparison = pd.merge(request_counts, query_critiques[['Email', 'num_query_critiques']], on='Email', how='inner')

    # Step 3: Filter for only those emails where actual ≤ allowed
    valid_emails = comparison[comparison['actual_requests'] <= comparison['num_query_critiques']]['Email']

    # Step 4: Filter qlc_allrequests to keep only valid emails
    filtered_qlc_allrequests = qlc_allrequests[qlc_allrequests['Email'].isin(valid_emails)]

    # Just to manually check, print any rows in the comparison dataset where actual_requests != num_query_critiques
    print(comparison.loc[comparison['actual_requests'] != comparison['num_query_critiques']],)

    del(request_counts, comparison, valid_emails, qlc_allrequests)

Empty DataFrame
Columns: [Email, actual_requests, num_query_critiques]
Index: []


In [602]:
# Check for Coach changes

if 'coach_allrequests' in locals() and not coach_allrequests.empty: # We only want to run this code IF the coach_allrequests dataset exists
    # Step 1: Melt to long format
    coach_long = reg_coaching.melt(id_vars='Email', value_vars=['Coach'], 
                                var_name='coach_slot', value_name='publisher')

    # Step 2: Drop rows with missing publishers (in case some msA/B/C are blank)
    coach_long = coach_long.dropna(subset=['publisher'])

    # Step 3: Merge to keep only rows from ms_allrequests that match email + publisher
    filtered_coach_allrequests = pd.merge(coach_allrequests, coach_long[['Email', 'publisher']], 
                                    on=['Email', 'publisher'], how='inner')

    del(coach_long, coach_allrequests)

Woohoo! This should work now. Awesome!

## 3. Schedule Friday QLCs

We need to add in the requests and prior assignments information to the query_critiques dataset, so requests and prior assignments can be accounted for when scheduling. Please note that even if every single person has already been scheduled, the code should still run, and will just schedule every single person to exactly where they were previously assigned.

In [603]:
# Now let's merge it with the filtered_qlc_allrequests dataset (assuming it exists)
if 'filtered_qlc_allrequests' in locals() and not filtered_qlc_allrequests.empty: # We only want to run this code IF the qlc_allrequests dataset exists
    merged = pd.merge(query_critiques, filtered_qlc_allrequests, on='Email', how='left').drop_duplicates()
else:
    merged = query_critiques
    merged['pubname1'] = np.nan
    merged['pubname2'] = np.nan
    merged['timeslot_start'] = pd.NaT 

In [604]:
expanded = []

# First, get the number of rows per email
email_counts = merged['Email'].value_counts()

for email, count in email_counts.items():

    # Iterate through each participant (e.g., by email)
    email_rows = merged[merged['Email'] == email]

    # If there's only one row, keep as is
    if email_rows['num_query_critiques'].iloc[0] == 1:
        expanded.append(email_rows.iloc[0])
    # If the number of query critiques ==2
    elif email_rows['num_query_critiques'].iloc[0] == 2:
        if count == 2: # If already has two rows (because made two requests), keep the two rows as is
            expanded.extend(email_rows.values)
        elif count ==1: # If only have one request but paid for two meetings
            expanded.append(email_rows.iloc[0])
            # Add a 2nd row with blank values for pubname1, pubname2 and timeslot_start
            row_copy = email_rows.iloc[0].copy()
            row_copy['pubname1'] = np.nan
            row_copy['pubname2'] = np.nan
            row_copy['timeslot_start'] = pd.NaT
            expanded.append(row_copy)

expanded_query_critiques = pd.DataFrame(expanded)
expanded_query_critiques.columns = merged.columns

del(expanded, merged, query_critiques, filtered_qlc_allrequests)

#### Get frequencies of fiction and nonfiction genres among registrants

Before we move on to scheduling, there's ONE final step: getting the different combinations of fiction and nonfiction genres among our registrants, to see which are most popular/least popular, so we can do our best to match agent-editor pairings that will meet everyone's needs.

We likely won't do much with this information, but it's nice to see.

In [605]:
def count_genres(row):
    genres = row['Fiction genre'].split(', ')
    unique_genres = set(genres)
    for genre in unique_genres:
        genre_counts[genre] = genre_counts.get(genre, 0) + 1

# Initialize genre_counts
genre_counts = {}

# Apply the function to each row
per_registrant2.apply(count_genres, axis=1)

# Convert genre_counts to DataFrame
reg_fict_counts = pd.DataFrame(list(genre_counts.items()), columns=['fiction', 'registrant_fiction_counts'])

In [606]:
del(genre_counts, count_genres)

In [607]:
def count_genres(row):
    genres = row['Nonfiction genre'].split(', ')
    unique_genres = set(genres)
    for genre in unique_genres:
        genre_counts[genre] = genre_counts.get(genre, 0) + 1

# Initialize genre_counts
genre_counts = {}

# Apply the function to each row
per_registrant2.apply(count_genres, axis=1)

# Convert genre_counts to DataFrame
reg_nonfict_counts = pd.DataFrame(list(genre_counts.items()), columns=['nonfiction', 'registrant_nonfiction_counts'])

Because of some weirdness with the other writeins (for instance "Other (please specify): songs, movie and tv scripts"), there are some nonfiction genres popping up that shouldn't be. We'll filter out anything that isn't our true genre lists.

In [608]:
reg_nonfict_counts = reg_nonfict_counts.loc[reg_nonfict_counts['nonfiction'].isin(nonfict_gen['list_nonfiction']), :]
reg_fict_counts = reg_fict_counts.loc[reg_fict_counts['fiction'].isin(fict_gen['fiction_genres']), :]

Cool, let's now get expanded counts for each of the types in the cross_pubs listing too, so we can cross-tabulate that with the datasets above.

In [609]:
def count_genres(row):
    genres = row['combined_fiction'].split(', ')
    unique_genres = set(genres)
    for genre in unique_genres:
        genre_counts[genre] = genre_counts.get(genre, 0) + 1

# Initialize genre_counts
genre_counts = {}

# Apply the function to each row
final_cross_pubs.apply(count_genres, axis=1)

# Convert genre_counts to DataFrame
fiction_counts = pd.DataFrame(list(genre_counts.items()), columns=['fiction', 'fiction_count'])


In [610]:
del(count_genres, genre_counts, nonfict_gen, fict_gen)

In [611]:
def count_genres(row):
    genres = row['combined_nonfiction'].split(', ')
    unique_genres = set(genres)
    for genre in unique_genres:
        genre_counts[genre] = genre_counts.get(genre, 0) + 1

# Initialize genre_counts
genre_counts = {}

# Apply the function to each row
cross_pubs_filtered = final_cross_pubs[final_cross_pubs['combined_nonfiction'].notnull()] 
cross_pubs_filtered.apply(count_genres, axis=1)

# Convert genre_counts to DataFrame
nonfiction_counts = pd.DataFrame(list(genre_counts.items()), columns=['nonfiction', 'nonfiction_count'])

Now let's create a dataset with the participant, as well as publisher-pairing, genre count info.

In [612]:
all_fiction_genre_info = pd.merge(fiction_counts, reg_fict_counts, how='outer', on='fiction')
all_nonfiction_genre_info = pd.merge(nonfiction_counts, reg_nonfict_counts, how='outer', on='nonfiction')

In [613]:
all_fiction_genre_info.to_excel(f"{current_conference_folder}/Outputs/Frequencies_fiction.xlsx", index=False)
all_nonfiction_genre_info.to_excel(f"{current_conference_folder}/Outputs/Frequencies_nonfiction.xlsx", index=False)

Whew! We're finally done with that. You can manually review the files output above to help figure out the agent-editor pairings, or double check them, or just to see which genres in fiction and nonfiction are really popular. But otherwise, we're good to move on to actually assigning participants to their Friday timeslots.

#### Assign participants to their Friday timeslots

Note that in the prior sections of code, we created:

1) The rooms for Friday with the agents/editors assigned to them (final_room_pairings_Friday)
2) The list of all time slots and rooms for Friday (tslist_fri)
3) The list of all participants who signed up for a query letter critique (expanded_query_critiques), which has multiple rows per person - one for the number of queries they signed up for

Before moving on though, let's join #1 and #2.

In [614]:
times_friday = tslist_fri.merge(final_room_pairings_Friday, on=['day', 'room_name'], how='outer').sort_values(['timeslot_start', 'room_name'])

Let's also bring in the information on what combined genres those publisher pairings represent

In [615]:
times_friday2 = times_friday.merge(final_cross_pubs[['pubname1', 'pubname2', 'combined_fiction', 'combined_nonfiction']], on=['pubname1', 'pubname2'], how='inner')

In [616]:
del(tslist_fri, times_friday, reg_nonfict_counts, reg_fict_counts, rooms_friday, nonfiction_counts, fiction_counts, all_fiction_genre_info, all_nonfiction_genre_info)

Sweet! Now let's begin the assignments.

In this section of code, we will assign participants to an agent-editor pairing for whom they're not pitching or doing a manuscript critique for (if any), and that represents at least one of the genres (fiction and/or nonfiction) that the registrant writes in.

Note that we will not schedule anyone back-to-back, that we will prioritize virtual people for the first sessions (and prioritize any virtual people to be followed by virtual people), and that we will also prioritize anyone who signed up for the Friday workshop for the earlier sessions.

In [617]:
# First, we need to create lists for anything separated by a comma
times_friday2['combined_nonfiction'] = times_friday2['combined_nonfiction'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])
times_friday2['combined_fiction'] = times_friday2['combined_fiction'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])

expanded_query_critiques['nonfiction_genre'] = expanded_query_critiques['Nonfiction genre'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])
expanded_query_critiques['fiction_genre'] = expanded_query_critiques['Fiction genre'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])
expanded_query_critiques['pitches_chosen_pubs'] = expanded_query_critiques['pitches_chosen_pubs'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])
expanded_query_critiques['ms_chosen_pubs'] = expanded_query_critiques['ms_chosen_pubs'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])

In [618]:
# Combine the publishers into a single list
expanded_query_critiques['chosen_pubs'] = expanded_query_critiques['pitches_chosen_pubs'] + expanded_query_critiques['ms_chosen_pubs']

In [619]:
# Quick check  before we proceed - does anyone have duplicate assignments? As in, does the same email and room combo appear twice?
check = expanded_query_critiques[['Email', 'pubname1']].value_counts().reset_index(name='counts')
check = check[check['counts'] > 1]
print(check)

Empty DataFrame
Columns: [Email, pubname1, counts]
Index: []


For assignment purposes, we're going to prioritize people according to the following:
1) Virtual
2) How many publishers they signed up with for manuscript critiques and/or pitches
3) Friday workshop attendees
4) Registration date 

In [620]:
# Sort participants by prioritization criteria
expanded_query_critiques['chosen_pubs_count'] = expanded_query_critiques['chosen_pubs'].apply(len)
expanded_query_critiques.sort_values(
    by=['timeslot_start', 'pubname1', 'Virtual', 'chosen_pubs_count', 'Friday_workshop', 'datetime'],
    ascending=[True, True, False, False, True, True],
    inplace=True
)

In [621]:
# Convert the 'timeslot_start' to datetime variable
times_friday2['timeslot_start'] = pd.to_datetime(date_str_fri + ' ' + times_friday2['timeslot_start'].astype(str))

# Adjust the times to represent the afternoon (add 12 hours if in AM range)
times_friday2['timeslot_start'] = times_friday2['timeslot_start'].apply(
    lambda x: x + pd.Timedelta(hours=12) if x.hour < 12 else x
)

Note that if everyone has already been completely assigned, it should fill every slot with the very first seed (seed=0).

In [622]:
assignments = []

# Create a new dataset with all the people:
participants_df = expanded_query_critiques.copy()
slots_df = times_friday2

# Helper function to check slot compatibility
def is_slot_compatible(participant, slot):

    # Check if at least one genre matches
    participant_fiction_genres = set(participant['fiction_genre'])
    slot_fiction_genres = set(slot['combined_fiction'])

    participant_nonfiction_genres = set(participant['nonfiction_genre'])
    slot_nonfiction_genres = set(slot['combined_nonfiction'])

    if not (participant_fiction_genres & slot_fiction_genres or participant_nonfiction_genres & slot_nonfiction_genres):
        #print(f"Incompatible due to genres.") 
        return False

    # Check publisher overlap
    participant_pubs = set(participant['chosen_pubs'])
    slot_pubs = {slot['pubname1'], slot['pubname2']}
    if participant_pubs.intersection(slot_pubs):
        #print(f"Incompatible due to publisher overlap. Participant publishers: {participant_pubs}, Slot publishers: {slot_pubs}")
        return False

    # Check if they're a workshop person and don't assign for 4pm or later
    if participant['Friday_workshop'] and slot['timeslot_start'].hour >= 16:
        return False

    return True

# We need to create another helper function specifically for hardcoded request people. In that case, it doesn't matter if they have any genre overlap - we just need to not schedule them after 4pm if they're a workshop attendee
def is_slot_compatible_requests(participant, slot):

        # Check if they're a workshop person and don't assign for 4pm or later
    if participant['Friday_workshop'] and slot['timeslot_start'].hour >= 16:
        return False

    return True

# Define a function to perform the assignment process
def assign_participants(participants_df, slots_df, seed):

    # Shuffle slots_df with the given seed
    randomized_slots = slots_df.sample(frac=1, random_state=seed).reset_index(drop=True)
    assignments_local = []

    # Create a copy of participants_df to modify
    remaining_participants = participants_df.copy()

    for _, slot in randomized_slots.iterrows():
        if remaining_participants.empty:
            break  # Exit if all participants are assigned

        for index, participant in remaining_participants.iterrows():
            # Skip participants already assigned to conflicting slots
            assigned_slots = [a['timeslot_start'] for a in assignments_local if a['Email'] == participant['Email']]
            if any(abs(slot['timeslot_start'] - assigned) <= timedelta(minutes=15) for assigned in assigned_slots):
                continue

            # Skip if participant has already been assigned to either of these publishers
            participant_assignments = [a for a in assignments_local if a['Email'] == participant['Email']]
            assigned_pubs = {a['pubname1'] for a in participant_assignments} | {a['pubname2'] for a in participant_assignments}

            if slot['pubname1'] in assigned_pubs or slot['pubname2'] in assigned_pubs:
                continue

            requested_timeslots = participant.get('timeslot_start')

            participant_p1 = participant.get('pubname1')
            participant_p2 = participant.get('pubname2')
            slot_p1 = slot['pubname1']
            slot_p2 = slot['pubname2']

            requested_publishers = {
                pub for pub in [participant_p1, participant_p2] if pd.notna(pub)
            }

            slot_publishers = {pub for pub in [slot_p1, slot_p2] if pd.notna(pub)}

            has_requested_publisher = bool(requested_publishers)  # True only if they actually requested publishers

            if has_requested_publisher:  # If they requested ANY publisher
                #print(f"🔎 {participant['Email']} has requested publishers: {participant_p1}, {participant_p2}")

                # Case 1: If they requested a specific timeslot, enforce that too
                if requested_publishers == slot_publishers and pd.notna(requested_timeslots) and requested_timeslots == slot['timeslot_start']:
                    #print(f"✅ Assigning {participant['Email']} to requested slot with {slot_p1}, {slot_p2} and start time")

                    assignments_local.append({
                        'Email': participant['Email'],
                        'timeslot_start': slot['timeslot_start'],
                        'room_name': slot['room_name'],
                        'pubname1': slot['pubname1'],
                        'pubname2': slot['pubname2'],
                        'participant_fiction_genre': ', '.join(participant['fiction_genre']),
                        'participant_nonfiction_genre': ', '.join(participant['nonfiction_genre']),
                        'publisher_fiction_genre': slot['combined_fiction'],
                        'publisher_nonfiction_genre': slot['combined_nonfiction'],
                        'workshop': participant['Friday_workshop'],
                        'virtual': participant['Virtual']
                    })
                    remaining_participants.drop(index, inplace=True)
                    break

                # Case 2: If they didn't specify a time, assign them as long as publisher matches
                elif requested_publishers == slot_publishers and pd.isna(requested_timeslots):  
                    #print("Requested publishers but not a specific time")
                    if is_slot_compatible_requests(participant, slot):
                        
                        assignments_local.append({
                            'Email': participant['Email'],
                            'timeslot_start': slot['timeslot_start'],
                            'room_name': slot['room_name'],
                            'pubname1': slot['pubname1'],
                            'pubname2': slot['pubname2'],
                            'participant_fiction_genre': ', '.join(participant['fiction_genre']),
                            'participant_nonfiction_genre': ', '.join(participant['nonfiction_genre']),
                            'publisher_fiction_genre': slot['combined_fiction'],
                            'publisher_nonfiction_genre': slot['combined_nonfiction'],
                            'workshop': participant['Friday_workshop'],
                            'virtual': participant['Virtual']
                        })
                        remaining_participants.drop(index, inplace=True)
                        break
                    else:
                        #print(f"Anybody showing up here?")
                        # IMPORTANT: Prevent assigning them if NO requested publishers match
                        continue
                else:
                    #print(f"What about here?")
                    continue  
            else:
                #print('Participant had no requests')
                if is_slot_compatible(participant, slot):
                    assignments_local.append({
                        'Email': participant['Email'],
                        'timeslot_start': slot['timeslot_start'],
                        'room_name': slot['room_name'],
                        'pubname1': slot['pubname1'],
                        'pubname2': slot['pubname2'],
                        'participant_fiction_genre': ', '.join(participant['fiction_genre']),
                        'participant_nonfiction_genre': ', '.join(participant['nonfiction_genre']),
                        'publisher_fiction_genre': slot['combined_fiction'],
                        'publisher_nonfiction_genre': slot['combined_nonfiction'],
                        'workshop': participant['Friday_workshop'],
                        'virtual': participant['Virtual']
                    })
                    # Remove the assigned participant row
                    remaining_participants.drop(index, inplace=True)
                    break

    return assignments_local, remaining_participants

# Attempt to assign participants with different seeds until successful
seed = 0

while seed < 1000:
    #print(f"Trying seed: {seed}")
    assignments, remaining = assign_participants(participants_df, slots_df, seed)
    
    if remaining.empty:  # All participants assigned
        print(f"All participants successfully assigned with seed: {seed}")
        break

    print(f"{len(remaining)} Unassigned participants remain with seed: {seed}.")
    seed += 1  # Increment the seed for the next iteration

# Convert assignments to a DataFrame
assignments_df = pd.DataFrame(assignments)
remaining = pd.DataFrame(remaining)
del(seed)

All participants successfully assigned with seed: 0


In [623]:
print(len(participants_df))
print(len(expanded_query_critiques)== len(assignments_df))

108
True


In [624]:
# Check which people didn't get assigned to figure out why
print(remaining)

Empty DataFrame
Columns: [Email, Virtual, Fiction genre, Nonfiction genre, datetime, pitchA, pitchB, pitchC, msA, msB, msC, pitches_chosen_pubs, ms_chosen_pubs, num_query_critiques, query_critique, Friday_workshop, pubname1, pubname2, timeslot_start, nonfiction_genre, fiction_genre, chosen_pubs, chosen_pubs_count]
Index: []


In [625]:
# Check: which pairings are left?
check = assignments_df[['pubname1', 'pubname2']].value_counts().reset_index(name='counts')
check = check[check['counts'] <12]
print(check)

Empty DataFrame
Columns: [pubname1, pubname2, counts]
Index: []


In [626]:
# Check the originals before this further assignment
check = expanded_query_critiques[['pubname1', 'pubname2']].value_counts().reset_index(name='counts')
check = check[check['counts'] <12]
print(check)

Empty DataFrame
Columns: [pubname1, pubname2, counts]
Index: []


Woohoo! That looks amazing, and all participants got slotted! Let's just add in some checks to flag participants who have bad criteria.

In [627]:
# Flag participants with a Friday workshop and slots from 4-5pm
assignments_df['Flag'] = assignments_df.apply(
    lambda x: (x['workshop']) and (x['timeslot_start'].hour >= 16),  # After 4pm check
    axis=1
)

In [628]:
print(assignments_df['Flag'].value_counts())

Flag
False    108
Name: count, dtype: int64


In [629]:
# Who are the people being assigned spots after 4pm that are also in the workshop?
print(assignments_df.loc[assignments_df['Flag']==True])

Empty DataFrame
Columns: [Email, timeslot_start, room_name, pubname1, pubname2, participant_fiction_genre, participant_nonfiction_genre, publisher_fiction_genre, publisher_nonfiction_genre, workshop, virtual, Flag]
Index: []


Okay, awesome! We're good to go, and now let's just print out the csv file with all the assignments, and also save the dataset as a final (better named) dataset.

In [630]:
final_friday_assignments = assignments_df

del(assignments_df, participants_df, slots_df, is_slot_compatible_requests, clean_genres, combine_variables, count_genres,
    is_slot_compatible, genre_counts, assignments, hardcoded_qlc, hardcoded, expanded_query_critiques, email_counts, email_rows)

Lastly, create a column called 'publisher' that is a merging of the two publishers names, and also add a variable 'Session' that says 'Query Letter Critiques'. Oh, and add in a variable for 'Timekeeper', which is True/False depending on if the participant is a timekeeper that day or not.

In [631]:
final_friday_assignments['publisher'] = final_friday_assignments['pubname1'] + " and " + final_friday_assignments['pubname2']
final_friday_assignments['Session'] = "Query Letter Critiques"
final_friday_assignments['Timekeeper'] = final_friday_assignments['Email'].isin(timekeepers['Email'])

Lastly, let's link in the first and last names,as well as phone numbers.

In [632]:
final_friday_assignments2 = pd.merge(final_friday_assignments, registered[['Email', 'First Name', 'Last Name', 'phone']].drop_duplicates(), on="Email", how="inner")

In [633]:
# Quick final check before we save the dataset - does anyone have duplicate assignments? As in, does the same email and room combo appear twice?
check = final_friday_assignments2[['Email', 'pubname1']].value_counts().reset_index(name='counts')
check = check[check['counts'] > 1]
print(check)

Empty DataFrame
Columns: [Email, pubname1, counts]
Index: []


In [634]:
final_friday_assignments2.to_excel(f"{current_conference_folder}/Outputs/Finalized datasets/Final Friday query letter critique assignments_{today}.xlsx", index=False)

Sweet. Now we're good to go on Friday, so let's move on to Saturday assignments, which are easier. For these, the publishers have already been assigned to their own rooms, and our participants signed up to meet with specific agents and editors, so we just need to assign them to specific times.

## 4. Friday Author Coaching

Okay, we need to update the timeslots thing so that they are datetimes.

In [635]:
# Convert the 'timeslot_start' to datetime variable
tslist_coach['timeslot_start'] = pd.to_datetime(date_str_fri + ' ' + tslist_coach['timeslot_start'].astype(str))

# Adjust the times to represent the afternoon (add 12 hours if in AM range)
tslist_coach['timeslot_start'] = tslist_coach['timeslot_start'].apply(
    lambda x: x + pd.Timedelta(hours=12) if x.hour < 12 else x
)

In [636]:
# We need to create an alternate version of the friday assignments dataset so that we can add buffer times for our purposes here:
publisher_meetings = final_friday_assignments2.copy()
publisher_meetings['timeslot_end'] = publisher_meetings['timeslot_start'] + timedelta(minutes=15)
publisher_meetings['buffer_start'] = publisher_meetings['timeslot_start'] - timedelta(minutes=15)
publisher_meetings['buffer_end'] = publisher_meetings['timeslot_end'] + timedelta(minutes=15)

In [637]:
# We also need to add end times for the coaching meetings. Note that they're technically 15 minutes, but we'll ignore that and pretend they're 17 since there's a two minute break between them
tslist_coach['timeslot_end'] = tslist_coach['timeslot_start'] + timedelta(minutes=17)

We to transform the publisher_meetings dataset so that it has one row per participant. This way we can merge it with the coaching list.

In [638]:
# Add a count per Email to identify qlc1 and qlc2
publisher_meetings['qlc_number'] = publisher_meetings.groupby('Email').cumcount() + 1

# Pivot the table to wide format
qlc_times_to_exclude = publisher_meetings.pivot(index='Email', columns='qlc_number', values=['buffer_start', 'buffer_end'])

# Flatten the MultiIndex columns
qlc_times_to_exclude.columns = [f"qlc{col[1]}_{col[0]}" for col in qlc_times_to_exclude.columns]

# Reset index to make 'Email' a column again
qlc_times_to_exclude = qlc_times_to_exclude.reset_index()

In [639]:
# Merge this data with the coaching dataset
reg_coaching2 = pd.merge(reg_coaching, qlc_times_to_exclude[['Email', 'qlc1_buffer_start', 'qlc1_buffer_end', 'qlc2_buffer_start', 'qlc2_buffer_end']], how="left", on="Email")

If there are any hardcoded or prior requests, merge in this info with the reg_coaching2 dataset

In [640]:
# Now let's merge it with the filtered_qlc_allrequests dataset (assuming it exists)
if 'filtered_coach_allrequests' in locals() and not filtered_coach_allrequests.empty: # We only want to run this code IF the qlc_allrequests dataset exists
    coach_toassign = pd.merge(reg_coaching2, filtered_coach_allrequests, on='Email', how='left').drop_duplicates()
else:
    coach_toassign = reg_coaching2
    coach_toassign['publisher'] = np.nan
    coach_toassign['timeslot_start'] = pd.NaT 

In [641]:
del(reg_coaching, qlc_times_to_exclude, publisher_meetings, reg_coaching2)

In [642]:
# Lastly, we also need to add a flag for whether or not the person is attending the Q&A
fri_qa_panel = registered.loc[registered['Agenda Item Name'].str.contains("Q&A")]
coach_toassign['QA_panel'] = coach_toassign['Email'].isin(fri_qa_panel['Email'])

In [643]:
# Sort participants by prioritization criteria
coach_toassign.sort_values(
    by=['timeslot_start', 'publisher', 'QLC', 'Virtual', 'QA_panel', 'Friday_workshop'],
    ascending=[True, True, False, False, False, False],
    inplace=True
)

In [644]:
coaching_timeslots = tslist_coach.copy()
participants_df = coach_toassign.copy()
from datetime import time

# ---- Helper function to check slot compatibility
def compatible_with_slot(participant, slot):

    # 1a. Check if they're a workshop person and don't assign to any of the timeslots that'd run over into the workshop time
    if participant['Friday_workshop'] and slot['timeslot_start'].time() >= time(15, 45):
        return False

    # 1b. Check if they're part of the Q&A panel (doing 1:40 so they have time to get there 15 mins in advance)
    if participant['QA_panel'] and time(12,20) <= slot['timeslot_start'].time() <= time(13, 40):
        return False

    # 2. Check whether the slot's timeslot_start conflict with Query Letter Critique meetings
    slot_end = slot['timeslot_start'] + timedelta(minutes=17)    
    
    # Check QLC1 conflict
    qlc1_conflict = (
        pd.notna(participant['qlc1_buffer_start']) and
        pd.notna(participant['qlc1_buffer_end']) and
        slot['timeslot_start'] < participant['qlc1_buffer_end'] and
        slot_end > participant['qlc1_buffer_start']
    )

    # Check QLC2 conflict
    qlc2_conflict = (
        pd.notna(participant['qlc2_buffer_start']) and
        pd.notna(participant['qlc2_buffer_end']) and
        slot['timeslot_start'] < participant['qlc2_buffer_end'] and
        slot_end > participant['qlc2_buffer_start']
    )

    if qlc1_conflict or qlc2_conflict:
        return False

    # 3. Lastly, and most obviously, make sure that the selected publisher is the same as the slot being evaluated
    if participant['Coach'] != slot['coach']:
        return False
    
    return True


# ----- Now run the actual code to assign people
def assign_coaching_meetings(participants_df, slots_df, seed):

    # Shuffle timeslots with the given seed
    randomized_timeslots = slots_df.sample(frac=1, random_state=seed).reset_index(drop=True)
    coaching_schedule = []
    remaining_participants = participants_df.copy()

    for _, slot in randomized_timeslots.iterrows():
        if remaining_participants.empty:
            break  # Exit if all participants are assigned

        for index, participant in remaining_participants.iterrows():

            # Skip participants already assigned to conflicting slots and move to checking the next participant for compatibility
            assigned_slots = [a['timeslot_start'] for a in coaching_schedule if a['Email'] == participant['Email']]
            if any(abs(slot['timeslot_start'] - assigned) <= timedelta(minutes=17) for assigned in assigned_slots):
                continue

            # Check for any requests or prior assignments
            requested_timeslots = participant.get('timeslot_start')
            requested_coach = participant.get('publisher')

            # If they have requests or prior assignments...
            if pd.notna(requested_coach):
                
                # Scenario 1: They requested or were assigned to a specific timeslot. Check that the time matches, then assign them if so
                if (
                    requested_coach == slot['coach'] and
                    pd.notna(requested_timeslots) and 
                    requested_timeslots == slot['timeslot_start']
                ):
                    coaching_schedule.append({
                        'Email': participant['Email'],
                        'timeslot_start': slot['timeslot_start'],
                        'publisher': slot['coach'],
                        'room_name': slot['room_name']
                    })
                    # Remove the assigned participant row
                    remaining_participants.drop(index, inplace=True)
                    break

                # Scenario 2: They requested a specific coach but didn't care when. Check that they don't have problems with the workshop, then assign
                elif (
                    requested_coach == slot['coach'] and 
                    pd.isna(requested_timeslots) and
                    compatible_with_slot(participant, slot)
                ):
                    coaching_schedule.append({
                        'Email': participant['Email'],
                        'timeslot_start': slot['timeslot_start'],
                        'publisher': slot['coach'],
                        'room_name': slot['room_name']
                    })
                    # Remove the assigned participant row
                    remaining_participants.drop(index, inplace=True)
                    break

                # Scenario 3: They requested a specific coach, but the slot's coach is the wrong one, OR they didn't meet slot compatibility - move to next participant
                #else:
                #    continue

            # If they don't have any requests or prior assignments:
            elif pd.isna(requested_coach):
                if compatible_with_slot(participant, slot):
                    coaching_schedule.append({
                        'Email': participant['Email'],
                        'timeslot_start': slot['timeslot_start'],
                        'publisher': slot['coach'],
                        'room_name': slot['room_name']
                    })
                    # Remove the assigned participant row
                    remaining_participants.drop(index, inplace=True)
                    break

                # If the timeslot and coach doesn't meet their requirements, then move to next participant
                #else:
                #    continue

            #else: # Probably don't need this one, but just in case
            #    continue     

    return coaching_schedule, remaining_participants


# Attempt to assign participants with different seeds until successful
seed = 0

while seed < 1000:
    #print(f"Trying seed: {seed}")
    assignments, remaining = assign_coaching_meetings(participants_df, coaching_timeslots, seed)
    
    if remaining.empty :  # All participants assigned
        print(f"All participants successfully assigned with seed: {seed}")
        break

    print(f"{len(remaining)} Unassigned participants remain with seed: {seed}, retrying...")
    seed += 1  # Increment the seed for the next iteration

# Convert assignments to a DataFrame
assignments_df = pd.DataFrame(assignments)
remaining = pd.DataFrame(remaining)
del(seed)

All participants successfully assigned with seed: 0


In [645]:
# Check that the numbers match
len(assignments_df) == len(coach_toassign)

True

In [646]:
del(coaching_timeslots, assign_coaching_meetings, assign_participants)

In [647]:
# Add in the first and last names
final_coaching = pd.merge(assignments_df, registered[['Email', 'First Name', 'Last Name', 'phone']].drop_duplicates(), on="Email", how="inner")

In [648]:
# Save this scheduling
final_coaching.to_excel(f"{current_conference_folder}/Outputs/Finalized datasets/Finalized coaching schedule_{today}.xlsx", index=False)

## 5. Manuscript critiques scheduling
This is the easiest scheduling assignment. Everyone has already signed up for their critiques, so we just need to make sure:

1) Nobody is scheduled back-to-back (for anyone with multiple)
2) Timekeepers aren't first or last
3) Virtual participants are grouped back-to-back (we will prioritize them for the first time slots per room).
4) Any manual requests or prior scheduling assignments are applied

Ideally, we also try to ensure that there's only one substitute for each time slot, though we have plenty of substitute timekeepers. This can just be a manual check and fix later on.

In [649]:
# Assign the individual publishers to their respective rooms for Saturday morning (MS) and Saturday afternoon (Pitches)
times_sat = pd.concat([rooms_saturday.reset_index(drop=True), pubs.reset_index(drop=True)], axis=1)

tslist_satmorn= pd.merge(timeslots.loc[(timeslots['day']=='Saturday') & (timeslots['day_session']=='Morning'), :], times_sat, how='outer', on='day')
tslist_sataft= pd.merge(timeslots.loc[(timeslots['day']=='Saturday') & (timeslots['day_session']=='Afternoon'), :], times_sat, how='outer', on='day')

In [650]:
final_saturday_rooms = pd.concat([rooms_saturday.reset_index(drop=True), pubs.reset_index(drop=True)], axis=1)
final_saturday_rooms.to_excel(f"{current_conference_folder}/Outputs/Finalized datasets/Final saturday rooms.xlsx")

We need to create a single dataset for the manuscript critiques where every person has a row for their critique (as in, a person can have up to three rows).

In [651]:
del(msA, msB, msC, pitchA, pitchB, pitchC) # delete these - had originally kept for this but need virtual info
msA = ms_critiques[['Email', 'Virtual', 'msA']]
msB = ms_critiques[['Email', 'Virtual', 'msB']]
msC = ms_critiques[['Email', 'Virtual', 'msC']]

In [652]:
msA = msA.rename(columns={'msA': 'publisher'})
msB = msB.rename(columns={'msB': 'publisher'})
msC = msC.rename(columns={'msC': 'publisher'})

ms_all = pd.merge(pd.merge(msA, msB, on=['Email', 'Virtual', 'publisher'], how="outer"), msC, on=['Email', 'Virtual', 'publisher'], how='outer')

Drop any rows with NaN

In [653]:
ms_all = ms_all.dropna()

In [654]:
del(msA, msB, msC)

Okay, now we just need to convert the timeslot_start to a timestamp variable

In [655]:
# Convert the 'timeslot_start' to datetime variable
tslist_satmorn['timeslot_start'] = pd.to_datetime(date_str_sat + ' ' + tslist_satmorn['timeslot_start'].astype(str))

In [656]:
# Convert the 'timeslot_start' to datetime variable
tslist_sataft['timeslot_start'] = pd.to_datetime(date_str_sat + ' ' + tslist_sataft['timeslot_start'].astype(str))

# Adjust the times to represent the afternoon (add 12 hours if in AM range)
tslist_sataft['timeslot_start'] = tslist_sataft['timeslot_start'].apply(
    lambda x: x + pd.Timedelta(hours=12) if x.hour < 12 else x
)

Let's also identify the timekeepers' emails. We'll make sure not to give them the first or last time slot.

In [657]:
timekeeps = timekeepers[['Email']].drop_duplicates()
timekeeps = timekeeps['Email'].tolist()

Lastly, we need to add in any hardcoded or prior requests information to our dataset

In [658]:
# Now let's merge it with the filtered_qlc_allrequests dataset (assuming it exists)
if 'filtered_ms_allrequests' in locals() and not filtered_ms_allrequests.empty: # We only want to run this code IF the qlc_allrequests dataset exists
    ms_all_withrequests = pd.merge(ms_all, filtered_ms_allrequests, on=['Email', 'publisher'], how='left').drop_duplicates()
else:
    ms_all_withrequests = ms_all
    ms_all_withrequests['timeslot_start'] = pd.NaT # Only thing you can request is the timeslot, since you paid for an MS with a specific publisher

Whew! Okay, now it's time to assign the participants for the manuscript critiques. The code below works by:

1) It fills alphabetically by the publisher name, so that Alexandria Brown gets all her timeslots filled first, before moving on to the next publisher in the alphabet. **NOTE**: I have it randomly filling time slots. It's not running by earliest time to latest time.

2) It prioritizes assignment of participants according to how many manuscript critique slots they still need to be assigned. This means that for the first time slot it tries filling, it'll prioritize people with 3 critiques, then 2, then 1. As it continues to iterate and participants get assigned slots, a participant who initially had 3 meetings but who was already scheduled for 2 (meaning n_remaining=1) will get less priority over participants still with 2 or three meetings needing assignment.

3) I randomly shuffled the participants within their priority groups. This means that participant emails are randomly ordered in the A) three remaining group, B) two remaining and C) one remaining group. This way we don't prioritize people according to the alphabetical ordering of their emails but just do random assignments. (I had implemented this because I had noticed initially that a lot of the T-Z emails weren't being assigned as readily).

<font color='red'>**BIGGEST NOTE**:</font>
This entire code is embedded within one giant function because I'm having it run this code repeatedly using different random seeds, until it finds the seed that ensures that ALL participants get assigned time slots. Then it stops and that's the seed number that's kept.

In [659]:
# Define a function to perform the assignment process
def assign_slots_with_seed(participants_df, slots_df, seed):

    # Add a column to flag timekeepers in the participants dataset
    participants_df['is_timekeeper'] = participants_df['Email'].isin(timekeeps)

    # Create the blank datasets and lists for the assignments and used-up slots
    assignments = []
    used_slots = set()

    # Add a column to track the number of meetings each participant needs
    participants_df['remaining_meetings'] = participants_df.groupby('Email')['Email'].transform('count')

    # --- Handle preassigned participants (i.e., participants with requests or who were previously scheduled) ---
    preassigned_df = participants_df[participants_df['timeslot_start'].notna()].copy()
    to_assign_df = participants_df[participants_df['timeslot_start'].isna()].copy()

    for index, participant in preassigned_df.iterrows():
        target_time = participant['timeslot_start']

        matching_slots = slots_df[
            (slots_df['timeslot_start'] == target_time) &
            (slots_df['lit_guest_name'].apply(lambda x: set(x) == set(participant['publisher'])))
        ]

        # Assign to the first available matching slot
        for _, slot in matching_slots.iterrows():
            room_slot_id = (slot['timeslot_start'], slot['room_name'])
            if room_slot_id not in used_slots:
                assignments.append({
                    'Email': participant['Email'],
                    'timeslot_start': slot['timeslot_start'],
                    'room_name': slot['room_name'],
                    'publisher': slot['lit_guest_name'],
                    'virtual': participant['Virtual'],
                    'Session': "Manuscript critique",
                    'Timekeeper': participant['is_timekeeper']
                })
                used_slots.add(room_slot_id)
                break
        else:
            print(f"⚠️ Warning: Could not assign preassigned participant {participant['Email']} to their requested timeslot.")

    # Update participants_df to just those who still need assigning
    participants_df = to_assign_df.copy()

    # --- STEP 2: Continue with normal assignment ---
    # Repeat until all participants are assigned or no more slots remain
    while not participants_df.empty:
        assigned_any = False

        for _, slot in slots_df.iterrows():
            room_slot_id = (slot['timeslot_start'], slot['room_name'])

            if room_slot_id in used_slots:
                continue

            if participants_df.empty:
                break

            sorted_participants = (
                participants_df
                .sample(frac=1, random_state=seed)  # Shuffle randomly
                .sort_values(by='remaining_meetings', ascending=False)
            )

            for index, participant in sorted_participants.iterrows():

                # Skip back-to-back assignments
                assigned_slots = [
                    (a['timeslot_start'], a['room_name']) for a in assignments if a['Email'] == participant['Email']
                ]
                if any(
                    abs(slot['timeslot_start'] - assigned_time) <= timedelta(minutes=15)
                    for assigned_time, _ in assigned_slots
                ):
                    continue

                
                # Skip the earliest and latest timeslots for timekeepers if possible
                if participant['is_timekeeper'] and slot['timeslot_start'] in [earliest_time, latest_time]:
                    # Check if there are other slots available for this participant
                    has_alternative = any(
                        set(participant['publisher']) == set(alt_slot['lit_guest_name']) and
                        alt_slot['timeslot_start'] not in [earliest_time, latest_time] and
                        (alt_slot['timeslot_start'], alt_slot['room_name']) not in used_slots
                        for _, alt_slot in slots_df.iterrows()
                    )
                    if not has_alternative:
                        continue

                if set(participant['publisher']) == set(slot['lit_guest_name']):
                    assignments.append({
                        'Email': participant['Email'],
                        'timeslot_start': slot['timeslot_start'],
                        'room_name': slot['room_name'],
                        'publisher': slot['lit_guest_name'],
                        'virtual': participant['Virtual'],
                        'Session': "Manuscript critique",
                        'Timekeeper': participant['is_timekeeper']
                    })
                    used_slots.add(room_slot_id)
                    participants_df.drop(index, inplace=True)
                    participants_df['remaining_meetings'] = participants_df.groupby('Email')['Email'].transform('count')
                    assigned_any = True
                    break

        if not assigned_any:
            break

    return assignments, participants_df

# Initialize variables
success = False
max_attempts = 1000  # Limit the number of attempts
seed = 0

while not success and seed < max_attempts:
    seed += 1
    print(f"Trying seed {seed}...")
    
    # Copy the original dataframes to avoid modifying them directly
    participants_copy = ms_all_withrequests.copy()
    slots_copy = tslist_satmorn.copy()

    # Get the earliest and latest timeslots
    earliest_time = slots_copy['timeslot_start'].min()
    latest_time = slots_copy['timeslot_start'].max()

    # Run the assignment process with the current seed
    assignments, remaining_participants = assign_slots_with_seed(participants_copy, slots_copy, seed)

    # Check if all participants were assigned
    if remaining_participants.empty:
        success = True
        print(f"Success! All participants assigned using seed {seed}.")
        break

if success:
    # Convert assignments to a DataFrame
    ms_assignments = pd.DataFrame(assignments)
else:
    print("Failed to assign all participants within the maximum number of attempts.")


Trying seed 1...
Success! All participants assigned using seed 1.


Yay! This code works beautifully!! Everyone's been assigned and now let's just do a little cleaning, then repeat the process for the Saturday afternoon pitches.

In [660]:
final_satmorn_assignments = ms_assignments

Let's just add in the first and last names, plus phones.

In [661]:
final_satmorn_assignments2 = pd.merge(final_satmorn_assignments, registered[['Email', 'First Name', 'Last Name', 'phone']].drop_duplicates(), on='Email', how='inner')

In [662]:
del(assignments_df, ms_all, ms_critiques, participants_copy, slots_copy, remaining_participants,
    assignments, earliest_time, latest_time, seed, success, assign_slots_with_seed, final_friday_assignments, final_satmorn_assignments)

In [663]:
#  Save the dataset
final_satmorn_assignments2.to_excel(f"{current_conference_folder}/Outputs/Finalized datasets/Final manuscript critique assignments_{today}.xlsx", index=False)

## 6. Saturday Afternoon - Pitches

Let's do the pitch assignments now! We'll do the exact same process, except using the saturday afternoon times and the pitch dataset.

In [664]:
pitchA = pitches[['Email', 'Virtual', 'pitchA']]
pitchB = pitches[['Email', 'Virtual', 'pitchB']]
pitchC = pitches[['Email', 'Virtual', 'pitchC']]

In [665]:
pitchA = pitchA.rename(columns={'pitchA': 'publisher'})
pitchB = pitchB.rename(columns={'pitchB': 'publisher'})
pitchC = pitchC.rename(columns={'pitchC': 'publisher'})

pitches_all = pd.merge(pd.merge(pitchA, pitchB, on=['Email', 'Virtual', 'publisher'], how="outer"), pitchC, on=['Email', 'Virtual', 'publisher'], how='outer')

In [666]:
pitches_all = pitches_all.dropna()
del(pitchA, pitchB, pitchC)

Before we run the code, let's add in the requests (assuming there are any).

In [667]:
# Now let's merge it with the filtered_pitch_allrequests dataset (assuming it exists)
if 'filtered_pitch_allrequests' in locals() and not filtered_pitch_allrequests.empty: # We only want to run this code IF the filtered_pitch_allrequests dataset exists
    pitches_all_withrequests = pd.merge(pitches_all, filtered_pitch_allrequests, on=['Email', 'publisher'], how='left').drop_duplicates()
else:
    pitches_all_withrequests = pitches_all
    pitches_all_withrequests['timeslot_start'] = pd.NaT # Only thing you can request is the timeslot, since you paid for a pitch with a specific publisher

As it gets closer to the conference and people get their MS critiques back, some of them decide not to meet with the editor/agent, and so we can then fill their vacant MS spots with Pitches. We have to keep a manual list of the people we're adding when and when:

In [668]:
pitches_as_ms = pd.read_excel(f'{current_conference_folder}/Hardcode_pitches_slotted_to_MS.xlsx')

In [669]:
# Remove participants who signed up for pitches during the MS slots
matches = pitches_all_withrequests.merge(
    pitches_as_ms[['Email', 'publisher']],
    on=['Email', 'publisher'],
    how='inner'
)

# Drop those matching rows from pitches_all_withrequests
pitches_all_withrequests = pitches_all_withrequests[
    ~pitches_all_withrequests.set_index(['Email', 'publisher']).index.isin(
        matches.set_index(['Email', 'publisher']).index
    )
]

In [670]:
# Get the earliest and latest timeslots
earliest_time = tslist_sataft['timeslot_start'].min()
latest_time = tslist_sataft['timeslot_start'].max()

# Define a function to perform the assignment process
def assign_slots_with_seed(participants_df, slots_df, seed):

    # Shuffle the timeslots within each publisher group using the seed
    shuffled_slots = (
        slots_df.groupby('lit_guest_name', group_keys=False)
        .apply(lambda group: group.sample(frac=1, random_state=seed))
    )

    # Add a column to flag timekeepers in the participants dataset
    participants_df['is_timekeeper'] = participants_df['Email'].isin(timekeeps)

    # Create the blank datasets and lists for the assignments and used-up slots
    assignments = []
    used_slots = set()

    # Add a column to track the number of meetings each participant needs
    participants_df['remaining_meetings'] = participants_df.groupby('Email')['Email'].transform('count')

    # --- Handle any preassigned participants (i.e., those with requests or those previously assigned)
    preassigned_df = participants_df[participants_df['timeslot_start'].notna()].copy()
    to_assign_df = participants_df[participants_df['timeslot_start'].isna()].copy()

    for index, participant in preassigned_df.iterrows():
        target_time = participant['timeslot_start']

        matching_slots = slots_df[
            (slots_df['timeslot_start'] == target_time) &
            (slots_df['lit_guest_name'].apply(lambda x: set(x) == set(participant['publisher'])))
        ]

        # Assign to the first available matching slot
        for _, slot in matching_slots.iterrows():
            room_slot_id = (slot['timeslot_start'], slot['room_name'])
            if room_slot_id not in used_slots:
                assignments.append({
                    'Email': participant['Email'],
                    'timeslot_start': slot['timeslot_start'],
                    'room_name': slot['room_name'],
                    'publisher': slot['lit_guest_name'],
                    'virtual': participant['Virtual'],
                    'Session': "Pitch",
                    'Timekeeper': participant['is_timekeeper']
                })
                used_slots.add(room_slot_id)
                break
        else:
            print(f"Warning: Could not assign preassigned participant {participant['Email']} to their requested timeslot.")

    # Update participants_df to just those who still need assigning
    participants_df = to_assign_df.copy()

    # --- For all participants WITHOUT requests/prior scheduling assignments ---
    while not participants_df.empty:
        assigned_any = False

        for _, slot in shuffled_slots.iterrows():
            room_slot_id = (slot['timeslot_start'], slot['room_name'])

            if room_slot_id in used_slots:
                continue

            if participants_df.empty:
                break

            sorted_participants = (
                participants_df
                .sample(frac=1, random_state=seed)  # Shuffle randomly
                .sort_values(by='remaining_meetings', ascending=False)
            )

            for index, participant in sorted_participants.iterrows():

                # Skip back-to-back assignments
                assigned_slots = [
                    (a['timeslot_start'], a['room_name']) for a in assignments if a['Email'] == participant['Email']
                ]
                if any(
                    abs(slot['timeslot_start'] - assigned_time) <= timedelta(minutes=15)
                    for assigned_time, _ in assigned_slots
                ):
                    continue

                # Skip the earliest and latest timeslots for timekeepers if possible
                if participant['is_timekeeper'] and slot['timeslot_start'] in [earliest_time, latest_time]:
                    # Check if there are other slots available for this participant
                    if not any(
                        set(participant['publisher']) == set(alt_slot['lit_guest_name']) and
                        alt_slot['timeslot_start'] not in [earliest_time, latest_time] and
                        (alt_slot['timeslot_start'], alt_slot['room_name']) not in used_slots
                        for _, alt_slot in shuffled_slots.iterrows()
                    ):
                        print(f"Timekeeper {participant['Email']} has no alternative slot; assigning to edge slot.")
                    else:
                        continue

                if set(participant['publisher']) == set(slot['lit_guest_name']):
                    assignments.append({
                        'Email': participant['Email'],
                        'timeslot_start': slot['timeslot_start'],
                        'room_name': slot['room_name'],
                        'publisher': slot['lit_guest_name'],
                        'virtual': participant['Virtual'],
                        'Session': "Pitch",
                        'Timekeeper': participant['is_timekeeper']
                    })
                    used_slots.add(room_slot_id)
                    participants_df.drop(index, inplace=True)
                    participants_df['remaining_meetings'] = participants_df.groupby('Email')['Email'].transform('count')
                    assigned_any = True
                    break

        if not assigned_any:
            break

    return assignments, participants_df

# Initialize variables
success = False
max_attempts = 1000  # Limit the number of attempts
seed = 0

while not success and seed < max_attempts:
    seed += 1
    print(f"Trying seed {seed}...")
    
    # Copy the original dataframes to avoid modifying them directly
    participants_copy = pitches_all_withrequests.copy()
    slots_copy = tslist_sataft.copy()

    # Run the assignment process with the current seed
    assignments, remaining_participants = assign_slots_with_seed(participants_copy, slots_copy, seed)

    # Check if all participants were assigned
    if remaining_participants.empty or len(remaining_participants)==1:
        success = True
        print(f"Success! All participants assigned using seed {seed}.")
        break

if success:
    # Convert assignments to a DataFrame
    assignments_df = pd.DataFrame(assignments)
else:
    print("Failed to assign all participants within the maximum number of attempts.")

Trying seed 1...
Success! All participants assigned using seed 1.




Yay! That worked great too. Let's just save it and delete any extraneous datasets.

In [671]:
final_sataft_assignment = assignments_df
final_sataft_assignments2 = pd.merge(final_sataft_assignment, registered[['Email', 'First Name', 'Last Name', 'phone']].drop_duplicates(), on='Email', how='inner')

del(pitches, pitches_all, assignments_df, remaining_participants, timeslots, tslist_sataft, tslist_satmorn, times_friday2, times_sat,
    slots_copy, assignments, earliest_time, latest_time, max_attempts, success, seed, assign_slots_with_seed, timedelta, participants_copy, final_sataft_assignment)

Woohoo! Now we're officially all done with the assignments, and we just need to deal withe waitlists now. FInal step after that will be to print out everything we've got into exactly the excel and word files we want.

In [672]:
# Save the dataset
final_sataft_assignments2.to_excel(f"{current_conference_folder}/Outputs/Finalized datasets/Finalized pitch assignments_{today}.xlsx", index=False)

Lastly,  let's add back in anybody in the pitches as MS hardcodes excel file to the final MS assignments. We'll overwrite it.

In [676]:
final_satmorn_assignments3 = pd.concat([final_satmorn_assignments2, pitches_as_ms])

In [677]:
final_satmorn_assignments3.to_excel(f"{current_conference_folder}/Outputs/Finalized datasets/Final manuscript critique assignments_{today}.xlsx", index=False)

## 7. Deal with the Waitlists

Dealing with the waitlists is pretty simple. We already corrected some of the basic stuff earlier, like emails and phones. Now let's split into what they're waitlisted for:
1) manuscript critiques
2) pitches
3) pre-conference edits
4) book fairs
5) query letter critiques

In [678]:
wait_ms = waitlist[waitlist['Session Name'].str.contains('Manuscript')].copy()
wait_pitch = waitlist[waitlist['Session Name'].str.contains('Pitch')].copy()
wait_prec = waitlist[waitlist['Session Name'].str.contains('Pre-conference')].copy()
wait_book = waitlist[waitlist['Session Name'].str.contains('Book Fair')].copy()
wait_qlc = waitlist[waitlist['Session Name'].str.contains('Query')].copy()
wait_coach = waitlist[waitlist['Session Name'].str.contains('oaching')].copy()

wait_ms['Waitlist_type'] = 'Manuscript critique'
wait_pitch['Waitlist_type'] = 'Pitch'
wait_prec['Waitlist_type'] = 'Preconference edit'
wait_book['Waitlist_type'] = 'Book fair'
wait_qlc['Waitlist_type'] = 'Query letter critique'
wait_coach['Waitlist_type'] = 'Author coaching'

In [679]:
# Create a publisher variable that for MS and pitch
wait_pitch['wl_publisher'] = wait_pitch['Session Name'].str.replace(r"Pitch [A-Z] with ", "", regex=True)
wait_ms['wl_publisher'] = wait_ms['Session Name'].str.replace(r"Manuscript Critique [A-Z] with ", "", regex=True)

In [680]:
# Remove duplicates (some people went crazy and signed up for the same waitlist spot)
wait_ms = wait_ms.drop_duplicates(subset=['Email', 'wl_publisher', 'Waitlist_type'])
wait_pitch = wait_pitch.drop_duplicates(subset=['Email', 'wl_publisher', 'Waitlist_type'])

Let's drop the A, B, C, etc. from the Session Names

In [681]:
# For Pitch sessions
wait_pitch.loc[wait_pitch['Session Name'].str.contains(r"Pitch [A-Z] with "), 'Session Name'] = \
    wait_pitch['Session Name'].str.replace(r"Pitch [A-Z] with ", "Pitch with ", regex=True)

# For Manuscript Critique sessions
wait_ms.loc[wait_ms['Session Name'].str.contains(r"Manuscript Critique [A-Z] with "), 'Session Name'] = \
    wait_ms['Session Name'].str.replace(r"Manuscript Critique [A-Z] with ", "Manuscript Critique with ", regex=True)

# For Book Fair
wait_book.loc[wait_book['Session Name'].str.contains("Book Fair Book Selling"), 'Session Name'] = \
    wait_book['Session Name'].str.replace("Book Fair Book Selling", "Book Fair Selling", regex=True)

In [682]:
# We need to link the pitch waitlists to the query letter, pitch and manuscript assignments to drop people whose publisher request is already being met
# ------ QLC
# Melt to long format and drop NAs
qlc_long = pd.melt(final_friday_assignments2, id_vars='Email', value_vars=['pubname1', 'pubname2'])
qlc_long = qlc_long.rename(columns={'value': 'qlc_pub'})
qlc_long = qlc_long.dropna(subset=['qlc_pub']).drop_duplicates()

# Group by Email and assign a number to each pub (for pivoting)
qlc_long['pub_num'] = qlc_long.groupby('Email').cumcount() + 1
qlc_wide = qlc_long.pivot(index='Email', columns='pub_num', values='qlc_pub')
qlc_wide.columns = [f'qlc_pub{i}' for i in qlc_wide.columns]
qlc_wide = qlc_wide.reset_index()

# ----- Manuscripts
final_satmorn_assignments2['pub_num'] = final_satmorn_assignments2.groupby('Email').cumcount() + 1
ms_wide = final_satmorn_assignments2[['Email', 'publisher', 'pub_num']].pivot(index='Email', columns='pub_num', values='publisher')
ms_wide.columns = [f'ms_pub{i}' for i in ms_wide.columns]
ms_wide = ms_wide.reset_index()

# ----- Pitches
final_sataft_assignments2['pub_num'] = final_sataft_assignments2.groupby('Email').cumcount() + 1
pitch_wide = final_sataft_assignments2[['Email', 'publisher', 'pub_num']].pivot(index='Email', columns='pub_num', values='publisher')
pitch_wide.columns = [f'pitch_pub{i}' for i in pitch_wide.columns]
pitch_wide = pitch_wide.reset_index()

In [683]:
# Now merge it in with the waitlists
wait_pitch2 = pd.merge(pd.merge(pd.merge(wait_pitch, qlc_wide, on='Email', how='left'), ms_wide, on='Email', how='left'), pitch_wide, on='Email', how='left')

In [684]:
# Now filter it out so anybody whose waitlisted pitch person is already being met in the other activities gets removed
filter_cols = ['qlc_pub1', 'qlc_pub2', 'qlc_pub3', 'qlc_pub4', 'ms_pub1', 'ms_pub2', 'ms_pub3', 'pitch_pub1', 'pitch_pub2', 'pitch_pub3']

wait_pitch_cleaned = wait_pitch2[
    ~wait_pitch2.apply(lambda row: row['wl_publisher'] in row[filter_cols].values, axis=1)
]

We need to double check that no participant has more than 3 manuscript critique or pitch waitlist spots.

In [685]:
print(wait_pitch_cleaned['Email'].value_counts().unique())
print(wait_ms['Email'].value_counts().unique())

[3 2 1]
[3 2 1]


Good. As you can see above, nobody's got 4 or higher for how often their emails appear in these lists. Now let's sort by registration date for each Session Name, so that we assign a value of #1, #2, etc. by registration date for each Manuscript critique/pitch spot with each publisher.

In [686]:
# Sort by 'Session Name' and 'datetime', and rank participants
wait_ms['Waitlist_ms'] = wait_ms.sort_values(['Session Name', 'datetime']) \
               .groupby('Session Name')['datetime'] \
               .rank(method='first').astype(int)

# Sort DataFrame for display (optional)
wait_ms = wait_ms.sort_values(['Session Name', 'Waitlist_ms']).reset_index(drop=True)

# Now add in the variable taht combines the the rank and session field:
wait_ms['Waitlisted Activity'] = "Waitlist #" + wait_ms['Waitlist_ms'].astype(str) + " - " + wait_ms['Session Name']

In [687]:
# Sort by 'Session Name' and 'datetime', and rank participants
wait_pitch_cleaned['Waitlist_pitch'] = wait_pitch_cleaned.sort_values(['Session Name', 'datetime']) \
               .groupby('Session Name')['datetime'] \
               .rank(method='first').astype(int)

# Sort DataFrame for display (optional)
wait_pitch_cleaned = wait_pitch_cleaned.sort_values(['Session Name', 'Waitlist_pitch']).reset_index(drop=True)

# Now add in the variable taht combines the the rank and session field:
wait_pitch_cleaned['Waitlisted Activity'] = "Waitlist #" + wait_pitch_cleaned['Waitlist_pitch'].astype(str) + " - " + wait_pitch_cleaned['Session Name']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Okay, looks good. Now let's tweak it a little bit more so we create an 'Agenda Item Name' that is 'Waitlisted - #1 - Manuscript Critique with [publisher].'

In [688]:
# For the book fair, just add a simple ranking, according to who registered for the waitlist first
wait_book['Waitlist_bookfair'] = wait_book['datetime'].rank(method='first', ascending=True).astype(int)

wait_book['Waitlisted Activity'] = "Waitlist #" + wait_book['Waitlist_bookfair'].astype(str) + " - " + wait_book['Session Name']

In [689]:
# For the QLC, do a simple ranking - and don't allow any duplicates!!
wait_qlc = wait_qlc.drop_duplicates(subset='Email')

wait_qlc['Waitlisted_qlc'] = wait_qlc['datetime'].rank(method='first', ascending=True).astype(int)
wait_qlc['Waitlisted Activity'] = "Waitlist #" + wait_qlc['Waitlisted_qlc'].astype(str) + " - " + wait_qlc['Waitlist_type']

In [690]:
# Waitlisted coaching
wait_coach = wait_coach.drop_duplicates(subset=['Email', 'Session Name']).sort_values(['Session Name', 'datetime'])
wait_coach['Waitlisted_coach']= wait_coach.groupby('Session Name')['datetime'].rank(method='first', ascending=True).astype(int)
wait_coach['Waitlisted Activity'] = "Waitlist #" + wait_coach['Waitlisted_coach'].astype(str) + " - " + wait_coach['Session Name']

In [691]:
# For right now, let's just merge all the waitlist stuff back together, and add in the participant info so that it's all in one place.
wait_all = pd.merge(pd.merge(pd.merge(pd.merge(pd.merge(wait_ms[['Email', 'First Name', 'Last Name', 'phone', 'virtual', 'Session Name', 'Waitlisted Activity']], 
                            wait_pitch_cleaned[['Email', 'First Name', 'Last Name', 'phone', 'virtual', 'Session Name', 'Waitlisted Activity']], how="outer"), 
                            wait_book[['Email', 'First Name', 'Last Name', 'phone', 'virtual', 'Session Name', 'Waitlisted Activity']], how='outer'),
                            wait_qlc[['Email', 'First Name', 'Last Name', 'phone', 'virtual', 'Session Name', 'Waitlisted Activity']], how='outer'),
                            wait_book[['Email', 'First Name', 'Last Name', 'phone', 'virtual', 'Session Name', 'Waitlisted Activity']], how='outer'),
                            wait_coach[['Email', 'First Name', 'Last Name', 'phone', 'virtual', 'Session Name', 'Waitlisted Activity']], how='outer')

# print for George
wait_all.to_excel(f"{current_conference_folder}/Outputs/Finalized Datasets/Waitlist participants_{today}.xlsx", index=False)

We won't really do much with these particular excel documents, except to export them for manual review (and potentially manual changes).

In [692]:
registered.to_excel(f"{current_conference_folder}/Outputs/Finalized Datasets/Registered_cleaned_{today}.xlsx", index=False,
                        columns=['Agenda Item Name', 'Email', 'First Name', 'Last Name', 'Mobile Phone Number', 'Virtual', 'Fiction genre', 'Nonfiction genre', 'phone'], )

In [693]:
waitlist.to_excel(f"{current_conference_folder}/Outputs/Finalized Datasets/Waitlist_cleaned_{today}.xlsx", index=False,
                        columns=['Session Name', 'Email', 'First Name', 'Last Name', 'Mobile Phone Number', 'phone', 'virtual'], )

## Identify all participants who match each publisher

To make it easier to figure out who to email about open pitch or QLC spots (or open MS critiques that will be used as pitch spots), we're going to make a list of all waitlisted and registered individuals and all their activities, merge it with the info on their genres, and then iterate through the publishers' genres to identify everyone they'd match with.

In [694]:
# Create a dataset with one row per registered and waitlisted participants with all their activities
everyone = pd.merge(pd.merge(pd.merge(registered[['Email', 'First Name', 'Last Name', 'phone', 'Fiction genre', 'Nonfiction genre']], qlc_wide, on='Email', how='outer'),
                    ms_wide, on='Email', how='outer'), pitch_wide, on='Email', how='outer')
everyone = everyone.drop_duplicates()

In [695]:
everyone['Nonfiction genre'] = everyone['Nonfiction genre'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])
everyone['Fiction genre'] = everyone['Fiction genre'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])

In [696]:
# Now fix the publishers' genres
pubs['lit_guest_fiction'] = pubs['lit_guest_fiction'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])
pubs['lit_guest_nonfiction'] = pubs['lit_guest_nonfiction'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])

In [697]:
def clean_split(genres):
    # Safely split, strip, and lowercase genres
    return set(g.strip().lower() for g in str(genres).split(',') if g.strip())

def find_matching_publishers(row):
    p_fiction = clean_split(row['Fiction genre'])
    p_nonfiction = clean_split(row['Nonfiction genre'])

    columns = ['qlc_pub1', 'qlc_pub2', 'qlc_pub3', 'qlc_pub4',
               'ms_pub1', 'ms_pub2', 'ms_pub3',
               'pitch_pub1', 'pitch_pub2', 'pitch_pub3']
    scheduled = set(str(row[col]).strip().lower() for col in columns if pd.notna(row[col]))

    matching = []
    for _, pub in pubs.iterrows():
        pub_name = str(pub['lit_guest_name']).strip()
        pub_name_lower = pub_name.lower()

        pub_fiction = clean_split(pub['lit_guest_fiction'])
        pub_nonfiction = clean_split(pub['lit_guest_nonfiction'])

        if (p_fiction & pub_fiction or p_nonfiction & pub_nonfiction) and pub_name_lower not in scheduled:
            matching.append(pub_name)

    return ', '.join(matching)

# Now apply to the participants dataframe
everyone['matching_publishers'] = everyone.apply(find_matching_publishers, axis=1)


In [698]:
everyone.to_excel(f"{current_conference_folder}/Outputs/Finalized Datasets/All_participants_with_matching_publishers_{today}.xlsx", index=False)

## 8. To run when Manuscript Critique submissions have passed

Manuscript critiques are due one month prior to the conference. After that point, anyone on a waitlist for the manuscript critiques can no longer be moved off the waitlist.

However, those participants might want instead to sign up for a query critique with the people whose manuscript critique waitlists they didn't get.

To do that, we'll:

1. Check how many pitches and query letter critique spots are still open, and who they're with
2. See which participants have:
    * Manuscript critique waitlist spots
    * Aren't yet registered for a query letter critique 
    * Don't have any registered pitch spots 

In [699]:
# Get value_counts for each of the pitches to see which publishers have openings
final_sataft_assignments2['publisher'].value_counts()<12

publisher
Kurt Brackob           False
Monica Rae Brown       False
Paloma Hernando        False
Nicole Luongo          False
Renée Fountain         False
Dianna Vega            False
Micah Brocker          False
Wendy Wong             False
Joëlle Delbourgo       False
Alexandria Brown       False
Foyinsi Adegbonmire    False
Jynastie Wilson        False
Lauren Bieker          False
Jéla Lewter            False
Jenna Satterthwaite    False
Grace Gay              False
Vicky Weber            False
Jake Lovell            False
Name: count, dtype: bool

In [700]:
# Check the QLC openings
qlc_openings = pd.DataFrame(final_friday_assignments2['publisher'].value_counts())
qlc_openings = qlc_openings.loc[qlc_openings['count']<12,]
qlc_openings['Pairings'] = qlc_openings.index
qlc_openings['Openings'] = 12-qlc_openings['count']
qlc_openings = qlc_openings.reset_index()

In [701]:
print(qlc_openings[['Pairings', 'Openings']])

Empty DataFrame
Columns: [Pairings, Openings]
Index: []


In [702]:
qlc_openings.to_excel(f"{current_conference_folder}/Outputs/Finalized Datasets/Open_Query_spots_{today}.xlsx", index=False)

Okay, now let's dig into the participants themselves.

In [703]:
wait_toemail = wait_ms[['Email', 'First Name', 'Last Name', 'phone', 'virtual', 'Session Name']].copy()
wait_toemail['Publisher'] = wait_toemail['Session Name'].str.removeprefix("Waitlisted - Manuscript Critique with ")

In [704]:
# Link this list to the pitches and QLCs
wait_toemail2 = pd.merge(pd.merge(wait_toemail, final_sataft_assignments2[['Email', 'publisher']], on='Email', how='outer'), 
                            final_friday_assignments2[['Email', 'pubname1', 'pubname2']], on='Email', how='outer')
wait_toemail2 = wait_toemail2.loc[wait_toemail2['Publisher'].notna(),]

In [705]:
# The above created a big list, with multiple rows per participant. We're going to find all the rows where the participant's 
# requested publisher for the manuscript critique has already been fulfilled via a pitch or QLC. Then we'll drop ALL their emails from the toemail
# dataset so we only identify people with unmet needs.

wait_toemail2['Fulfilled'] = wait_toemail2.apply(
    lambda row: row['Publisher'] in [row['publisher'], row['pubname1'], row['pubname2']], axis=1
)

fulfilled = wait_toemail2.loc[wait_toemail2['Fulfilled']== True, ]
unfulfilled = wait_toemail2.loc[~wait_toemail2['Email'].isin(fulfilled['Email']), ]

In [707]:
# Okay, drop any duplicates now so we just have each person and the people they requested for manuscript critiques that they haven't already fulfill
# via a pitch or QLC. We also need to remove anyone who requested Alexandria brown or renee fountain, since their QLC spots are all taken

wait_toemail = unfulfilled[['Email', 'First Name', 'Last Name', 'Publisher', 'virtual']].drop_duplicates(subset=['Email', 'Publisher'])
wait_toemail = wait_toemail.loc[wait_toemail['Publisher'].str.contains('Vicky Weber|Wendy Wong|Lauren Bieker|Nicole Luongo|Alexandria Brown|Renée Fountain|Jynastie Wilson|Dianna Vega'),]

In [708]:
# Print this to excel
wait_toemail.to_excel(f"{current_conference_folder}/Outputs/Finalized Datasets/MSwaitlists_unfulfilled_{today}.xlsx", index=False)