# Scheduling Participants for the May 2025 Atlanta Writer's Conference
*Becky Hodge*

#### Summary
This code should be run ~1 month prior to the conference, and will need to be run repeatedly in the weeks and days prior to accomodate any changes.

The code in this notebook imports the full list of registered and waitlist participants for the May 2025 conference, loads the (manually created) list of fiction and non-fiction genres, as well as the list of agents and timekeepers, and makes any corrections needed before moving on to the next section of code, which involves scheduling Friday's query letter critique sessions.

<font color='red'>**NOTE:**</font> Prior to running this code, make sure you update the *timekeepers* sheet in the *List_of_agents_editors.xlsx* document. If the assigments of the timekeepers is still TBD, that's okay - what's more important is making sure at minimum all the emails for everyone who's going to timekeeper is listed in there. That's the only part needed for this code.

In [None]:
# Install any needed packages
import pandas as pd
import numpy as np
import datetime
import os 

today = datetime.datetime.today().strftime('%Y-%m-%d')

In [None]:
# Set the conference dates
date_str_fri = '2025-05-02'
date_str_sat = '2025-05-03'

# Reference the current conference folder we should be pulling and storing all datasets/excel files/templates
current_conference_folder= "May2025"

## 1. Data Cleaning

#### Load and clean the different files/reports

In [117]:
# Select the file with the most recent date
directory = f'{current_conference_folder}/Cvent_report_downloads'

most_recent_file = max(
    (f for f in os.listdir(directory) if f.startswith('Registered_') and f.endswith('.csv')),
    key=lambda x: datetime.datetime.strptime(x.split('_')[1].split('.')[0], '%m-%d-%y'),
)

# Load the most recent file
most_recent_path = os.path.join(directory, most_recent_file)
registered = pd.read_csv(most_recent_path)


In [118]:
most_recent_file = max(
    (f for f in os.listdir(directory) if f.startswith('Waitlists_') and f.endswith('.csv')),
    key=lambda x: datetime.datetime.strptime(x.split('_')[1].split('.')[0], '%m-%d-%y'),
)

# Load the most recent file
most_recent_path = os.path.join(directory, most_recent_file)
waitlist = pd.read_csv(most_recent_path)

The below code brings in ALL participants, which is key for knowing whether any waitlist only people are virtual or in person.

In [119]:
most_recent_file = max(
    (f for f in os.listdir(directory) if f.startswith('Allparticipants_') and f.endswith('.csv')),
    key=lambda x: datetime.datetime.strptime(x.split('_')[1].split('.')[0], '%m-%d-%y'),
)

# Load the most recent file
most_recent_path = os.path.join(directory, most_recent_file)
all_participants = pd.read_csv(most_recent_path)

In [120]:
all_participants = all_participants.rename(columns={'Email Address':'Email'})

In [121]:
# Filter this dataset to just virtual people
virtual_only = all_participants.loc[all_participants['Hotel vs. Zoom'] == 'Virtually via Zoom (only available for query letter critiques, manuscript sample critiques, and pitches)', :]

In [122]:
del(directory, most_recent_file, most_recent_path)

fict_gen = pd.read_excel('List_of_genres_agents_editors.xlsx', sheet_name='fiction')
nonfict_gen = pd.read_excel('List_of_genres_agents_editors.xlsx', sheet_name='nonfiction')
pubs = pd.read_excel('List_of_genres_agents_editors.xlsx', sheet_name='agents_editors')

Oh gosh, some of the column names are hefty...  Let's fix those.

In [123]:
registered = registered.rename(columns={'Hotel vs. Zoom':'Virtual', 
                                        "What fiction genre(s) will you be presenting to agents/editors at the conference? (If you're not signing up for any agent/editor meetings, indicate which genre(s) you write.)":'Fiction genre', 
                                        "What nonfiction topic(s) will you be presenting to agents/editors at the conference? (If you're not signing up for any agent/editor meetings, indicate which topic(s) you write.)":'Nonfiction genre', 
                                        'Registration Date (GMT)':'Registration Date',
                                        'Email Address':'Email'})

In [124]:
waitlist = waitlist.rename(columns={'Registration Date (GMT)':'Registration Date',
                                     'Email Address':'Email'})

Let's also fix so that we drop the 'Not applicable --I don't write fiction' and 'Not applicable--I don't write nonfiction'. We'll set them to missing.

In [125]:
registered['Fiction genre']= registered['Fiction genre'].replace("Not Applicable --I don't write fiction", np.nan)
registered['Nonfiction genre']= registered['Nonfiction genre'].replace("Not Applicable--I don't write nonfiction", np.nan)

Also, there's people who wrote in 'Other',  but for our purposes, we don't care about that info for the purposes of matching to agents/editors. Let's remove those.

In [126]:
import re
import numpy as np
def clean_genres(genre_string):
    if genre_string is None or pd.isna(genre_string) or "":
        return ""

    genres = [genre.strip() for genre in genre_string.split(',')]
    cleaned_genres = [genre for genre in genres if not re.match(r"^Other \(please specify\):", genre)]

    return ", ".join(cleaned_genres)

registered['Fiction genre'] = registered['Fiction genre'].apply(clean_genres)
registered['Nonfiction genre'] = registered['Nonfiction genre'].apply(clean_genres)


Lastly, let's replace a few of the ones that have ' in them, which make things tricky

In [127]:
registered['Fiction genre'] = registered['Fiction genre'].str.replace("Women’s", "Women's")
registered['Fiction genre'] = registered['Fiction genre'].str.replace("Children’s picture/chapter books", "Children's picture/chapter books")

registered['Nonfiction genre'] = registered['Nonfiction genre'].str.replace("Women’s issues", "Women's issues")


##### Fix date-times and emails

We need to change the registration date to a date_time variable

In [128]:
registered["datetime"] = pd.to_datetime(registered["Registration Date"])
waitlist["datetime"] = pd.to_datetime(waitlist["Registration Date"])

Let's check to see if every Email Address is associated with a unique first and last name, since ideally we just use email as our unique identifier. It's possible spouses use the same email.

In [129]:
len(registered['Email'].unique())

172

In [130]:
check = registered[['Email', 'First Name']].value_counts().reset_index()
len(check['Email'].unique())

172

Perfect. The number of unique emails match, whether we just look at email, or if we also look at email and first name. Moving forward, we can use email address as a unique identifier.

In [131]:
del(check)

##### Fix phone numbers

Check for any phone numbers (in both the waitlist and registered files) that aren't just 10 digits

In [132]:
phonecheck_wait = waitlist.loc[waitlist['Mobile Phone Number'].str.contains("-")]
print(phonecheck_wait['Mobile Phone Number'])

waitlist['phone'] = waitlist['Mobile Phone Number'].str.replace(r'^(?:\(\+\d+\))|\D', '', regex=True)
print("After fixing waitlist phone numbers, there are now", len(waitlist.loc[waitlist['phone'].str.contains("-")]), "phones with dashes or parentheses")

26     678-708-3046
119    801-390-4595
120    801-390-4595
Name: Mobile Phone Number, dtype: object
After fixing waitlist phone numbers, there are now 0 phones with dashes or parentheses


In [133]:
phonecheck_reg = registered.loc[registered['Mobile Phone Number'].str.contains("-")]
print(phonecheck_reg['Mobile Phone Number'])

registered['phone'] = registered['Mobile Phone Number'].str.replace(r'^(?:\(\+\d+\))|\D', '', regex=True)
print("After fixing registered phone numbers, there are now", len(registered.loc[registered['phone'].str.contains("-")]), "phones with dashes or parentheses")

1         404-429-4890
26        828-279-6154
105       801-390-4595
134       828-279-6154
138       678-457-7878
             ...      
1826      801-390-4595
1848      404-429-4890
1884      410-746-0590
1897    (517) 944-2233
1915      678-708-3046
Name: Mobile Phone Number, Length: 101, dtype: object
After fixing registered phone numbers, there are now 0 phones with dashes or parentheses


Now let's check that all phone numbers are ten digits

In [134]:
phonecheck_reg = registered.loc[registered['phone'].str.len()>10]
print(phonecheck_reg[['Mobile Phone Number', 'phone']])

     Mobile Phone Number         phone
28       '+49 1704174774  491704174774
117      '+49 1704174774  491704174774
308      '+49 1704174774  491704174774
585      '+49 1704174774  491704174774
788      '+49 1704174774  491704174774
807      '+49 1704174774  491704174774
891      '+49 1704174774  491704174774
920      '+49 1704174774  491704174774
996      '+49 1704174774  491704174774
1124     '+49 1704174774  491704174774
1200     '+49 1704174774  491704174774
1327     '+49 1704174774  491704174774
1484     '+49 1704174774  491704174774
1558     '+49 1704174774  491704174774
1618     '+49 1704174774  491704174774
1805     '+49 1704174774  491704174774


In [135]:
phonecheck_wait = waitlist.loc[waitlist['phone'].str.len()>10]
print(phonecheck_wait[['Mobile Phone Number', 'phone']])

Empty DataFrame
Columns: [Mobile Phone Number, phone]
Index: []


Let's fix both these datasets, so anyone with an international number gets their phone reset to missing (though we'll keep the original Mobile Phone Number column intact)

In [136]:
registered.loc[registered['phone'].str.len()>10, 'phone'] = None
registered['phone'].head()

waitlist.loc[waitlist['phone'].str.len()>10, 'phone'] = None
waitlist['phone'].head()

print(registered['Mobile Phone Number'].isna().sum())
print(registered['phone'].isna().sum())

print(waitlist['Mobile Phone Number'].isna().sum())
print(waitlist['phone'].isna().sum())

0
16
0
0


Good! We didn't have any missing values to begin with, but we reset those 16 international numbers to missing for the phone column, but not the Mobile Phone Number column.

Let's move on to email addresses now, and check for any that are missing or problematic. First, we'll check if any are missing:

##### Fix emails

In [137]:
weird_emails = registered.loc[registered['Email'].isna(), ]

Yay! Everyone filled out an email. So now we just need to check that nobody put in faulty emails that will cause problems later:

In [138]:
weird_emails = registered.loc[registered['Email'].str.contains(r'^[\w\.-]+@[a-zA-Z\d-]+\.[a-zA-Z]{2,}$', regex=True)==False, ]
print(weird_emails['Email'])

262     jlary@alumni.iu.edu
383     jlary@alumni.iu.edu
623     jlary@alumni.iu.edu
643     jlary@alumni.iu.edu
1010    jlary@alumni.iu.edu
1279    jlary@alumni.iu.edu
1337    jlary@alumni.iu.edu
1487    jlary@alumni.iu.edu
1671    jlary@alumni.iu.edu
1863    jlary@alumni.iu.edu
Name: Email, dtype: object


In [139]:
weird_emails = waitlist.loc[waitlist['Email'].str.contains(r'^[\w\.-]+@[a-zA-Z\d-]+\.[a-zA-Z]{2,}$', regex=True)==False, ]
print(weird_emails['Email'])

Series([], Name: Email, dtype: object)


Okay, the emails all look fine. That particular Alumni email isn't a problem, so emails are good to go.

##### Add in virtual variable to the waitlist dataset

In [140]:
waitlist['virtual'] = waitlist['Email'].apply(
    lambda email: 'Virtual' if email in virtual_only['Email'].values else 'In person'
)

#### Drop any unneeded variables

Let's drop any extraneous variables from the waitlist and registration datasets

In [141]:
waitlist.drop(columns=['Registration Date', 'Invitee Status', 'Action', 'Confirmation Number'],axis=1, inplace=True) # columns are 1, rows are 0

registered.drop(columns=['Agenda Item Type', 'Registration Date', 'Registration Type', 'Action'],axis=1, inplace=True) # columns are 1, rows are 0

In [142]:
del(weird_emails, phonecheck_reg, phonecheck_wait)

In [143]:
today = datetime.date.today().strftime('%Y-%m-%d') # Let's save today's date for when writing excel files

#### Bring in timekeeper information

In [144]:
# Load the time keepers
timekeepers = pd.read_excel(f'{current_conference_folder}/List_of_genres_agents_editors.xlsx', sheet_name='timekeepers')

#### Create lists with all time-by-room values
We need to pull in the start times for each of the time slots for Friday afternoon (query letter critiques), Saturday morning (manuscript critiques), and Saturday afternoon (pitches) sessions. Without worrying about who our timekeepers are, or which agents are assigned to those rooms, we'll create 3 lists with the times-by-room.

In [145]:
room_fr = pd.read_excel('List_of_genres_agents_editors.xlsx', sheet_name='rooms_friday')
room_sat = pd.read_excel('List_of_genres_agents_editors.xlsx', sheet_name='rooms_sat')
timeslots = pd.read_excel('List_of_genres_agents_editors.xlsx', sheet_name='timeslots')

In [146]:
rooms_friday = room_fr.loc[:, 'day':'room_name']
rooms_saturday = room_sat.loc[:, 'day':'room_name']

Now let's combine the timeslots dataset with the friday and saturday rooms datasets to get the lists we need

In [147]:
tslist_fri = pd.merge(timeslots.loc[(timeslots['day']=='Friday') & (timeslots['day_session']=='Afternoon'), :], rooms_friday, how='outer', on='day')

In [148]:
rooms_coach = pd.read_excel('List_of_genres_agents_editors.xlsx', sheet_name='coaches')

In [149]:
tslist_coach = pd.merge(timeslots.loc[(timeslots['day']=='Friday') & (timeslots['day_session']=='Coaching'), :], rooms_coach, how='outer', on='day')

## 2. Friday Query Letter Critiques

### Create pairwise agent-editor genre combos
We need to create a dataset that pairs all agents and editors with each other (18x18), so that we get their combined fiction and non-fiction genre listings. We will need this to ensure that the participants assigned to them for the Friday query letter critiques are a good match.

In [150]:
cross_pubs = pd.merge(pubs, pubs, how='cross')

# Drop rows where they cross-reference themselves
cross_pubs = cross_pubs.loc[cross_pubs['lit_guest_name_x'] != cross_pubs['lit_guest_name_y'], :]

# Rename
cross_pubs = cross_pubs.rename(columns={'lit_guest_name_x': 'pubname1', 'lit_guest_type_x': 'pubtype1', 'lit_guest_company_x': 'comp1', 'lit_guest_fiction_x': 'fict1', 'lit_guest_nonfiction_x': 'nonfict1', 'lit_guest_name_y': 'pubname2', 'lit_guest_type_y': 'pubtype2', 'lit_guest_company_y': 'comp2', 'lit_guest_fiction_y': 'fict2', 'lit_guest_nonfiction_y': 'nonfict2'})

Now we need to make two columns that are the full combination of elements in the 'fiction' and 'nonfiction' genre columns.

In [151]:
cross_pubs['combined_fiction'] = cross_pubs.apply(
    lambda row: ', '.join(
        sorted(set(row['fict1'].split(', ') + row['fict2'].split(', ')))
    ),
    axis=1
)

print(cross_pubs['combined_fiction'].head())


1    Coming-of-age, Contemporary, Family saga/drama...
2    Christian, Coming-of-age, Contemporary, Family...
3    Coming-of-age, Contemporary, Fantasy, Humor, L...
4    Coming-of-age, Contemporary, Fantasy, Horror/S...
5    Coming-of-age, Contemporary, Family saga/drama...
Name: combined_fiction, dtype: object


Looks great! We have all the unique fiction genres now. We'll just do the same for nonfiction, though that has some missing values, so the code below adjusts for that (not every agent or editor represents nonfiction).

In [152]:
cross_pubs['combined_nonfiction'] = cross_pubs.apply(
    lambda row: ', '.join(
        sorted(
            set(
                (row['nonfict1'].split(', ') if pd.notna(row['nonfict1']) else []) +
                (row['nonfict2'].split(', ') if pd.notna(row['nonfict2']) else [])
            )
        )
    ) if pd.notna(row['nonfict1']) or pd.notna(row['nonfict2']) else None,
    axis=1
)

print(cross_pubs['combined_nonfiction'].head())

1                                                 None
2    Business/leadership/law, Cooking/food/cookbook...
3    Business/leadership/law, Health/diet/wellness,...
4    Essay collection, Self-help/relationships, Tru...
5    Current events/politics/social commentary, Ess...
Name: combined_nonfiction, dtype: object


Perfect! Now, just to be certain it worked, let's count up the unique fiction and nonfiction genres for each guest, as well as for their combined list.

In [153]:
cross_pubs['num_fict1'] = cross_pubs['fict1'].apply(
    lambda x: len(x.split(', ')) if pd.notna(x) else 0
)
cross_pubs['num_fict2'] = cross_pubs['fict2'].apply(
    lambda x: len(x.split(', ')) if pd.notna(x) else 0
)
cross_pubs['num_nonfict1'] = cross_pubs['nonfict1'].apply(
    lambda x: len(x.split(', ')) if pd.notna(x) else 0
)
cross_pubs['num_nonfict2'] = cross_pubs['nonfict2'].apply(
    lambda x: len(x.split(', ')) if pd.notna(x) else 0
)
cross_pubs['num_combined_fict'] = cross_pubs['combined_fiction'].apply(
    lambda x: len(x.split(', ')) if pd.notna(x) else 0
)
cross_pubs['num_combined_nonfict'] = cross_pubs['combined_nonfiction'].apply(
    lambda x: len(x.split(', ')) if pd.notna(x) else 0
)

print(cross_pubs.loc[0:5, ['num_fict1', 'num_fict2', 'num_combined_fict']].head())

   num_fict1  num_fict2  num_combined_fict
1         14         16                 18
2         14         23                 23
3         14          6                 16
4         14          8                 16
5         14         11                 20


Awesome. This worked great, and now the last bit on this step is to calculate the number of *overlapping* fiction and nonfiction genres per pairing.

In [154]:
def count_overlapping_genres(df):
    df['fict1'] = df['fict1'].str.split(',')
    df['fict2'] = df['fict2'].str.split(',')

    df['fiction_overlap'] = df.apply(lambda row: 
                                  len(set(row['fict1']).intersection(set(row['fict2']))), 
                                  axis=1)
    return df

fiction_overlaps = count_overlapping_genres(cross_pubs)


In [155]:
def count_overlapping_nonfiction(df):

    def clean_genres(genre_list):
        if pd.isna(genre_list) or genre_list == 'None' or not genre_list:
            return []
        else:
            try:
                return genre_list.split(',')
            except AttributeError:
                # Handle cases where genre_list is not a string 
                # (e.g., if it's a list or another object)
                return [] 

    df['nonfict1'] = df['nonfict1'].astype(str).apply(clean_genres) 
    df['nonfict2'] = df['nonfict2'].astype(str).apply(clean_genres) 

    df['nonfiction_overlap'] = df.apply(lambda row: 
                                  len(set(row['nonfict1']).intersection(set(row['nonfict2']))), 
                                  axis=1)
    return df

final_cross_pubs = count_overlapping_nonfiction(fiction_overlaps)

Perfect. Now let's drop any unncessary variables and JUST keep both guests names, their types (agent or editor), and their combined lists of fiction and nonfiction.

In [156]:
del(cross_pubs, fiction_overlaps)
final_cross_pubs = final_cross_pubs.drop(['comp1', 'comp2', 'fict1', 'fict2', 'nonfict1', 'nonfict2', 'num_fict1', 'num_fict2', 'num_nonfict1', 'num_nonfict2'], axis=1)

Whoops - last thing I noticed is that we have some duplicates. Specifically, we have the same 80 rows duplicated, because we have them as agent-editor pairings, and then also as editor-agent pairings. Let's delete one of these sets.

<font color = 'red'>**NOTE**:</font> Even though there are duplicates, we actually want to keep these, since otherwise our code later on will have issues.

In [157]:
#final_cross_pubs = final_cross_pubs.loc[((final_cross_pubs['pubtype1']=='Editor') & (final_cross_pubs['pubtype2']=='Editor')) |
#                            ((final_cross_pubs['pubtype1']=='Editor') & (final_cross_pubs['pubtype2']=='Agent')) | 
#                            ((final_cross_pubs['pubtype1']=='Agent') & (final_cross_pubs['pubtype2']=='Agent')), :]

### Rank publishers

Amazing! Now we've gotten the number of unique genres in each of these combined fiction and nonfiction lists for all pairings of agent-agent, editor-editor, and editor-agent. This will help us more easily identify which agent-editor, editor-editor, and agent-agent pairings make the most sense, in terms of representing the most genres and being able to meet the greatest amount of participant needs for Friday's query letter critiques.

To better help us though, let's rank the pairings. We'll rank pairings according to the number of fiction and nonfiction genres represented between them.

In [158]:
final_cross_pubs[['pubtype1', 'pubtype2']].value_counts()


pubtype1  pubtype2
Agent     Editor      81
Editor    Agent       81
Agent     Agent       72
Editor    Editor      72
Name: count, dtype: int64

In [159]:
final_cross_pubs['rank_fiction'] = final_cross_pubs['num_combined_fict'].rank(ascending=False, method='dense')
final_cross_pubs['rank_nonfiction'] = final_cross_pubs['num_combined_nonfict'].rank(ascending=False, method='dense')

Let's also check to see what each publisher's average ranking is (it's clear some people who represent a ton of genres will average high). Note that this is the sum of a publisher's rankings across all their agent-editor, editor-editor, and agent-agent pairings.

In [160]:
ranks_per_pub = pd.DataFrame(final_cross_pubs.groupby('pubname1')['rank_fiction'].mean().reset_index())

Awesome! Okay, now let's add a few variables:
1) The ratio of total fiction represented across the pairing to the overlap shared between them (# combined fiction genres / # fiction genres overlapping)
2) The ratio of total nonfiction genres represented across the pairing to their overlap
3) The sum of these two ratios

We'll use these to help identify which pairings are best.

In [161]:
final_cross_pubs['ratio_fiction'] = final_cross_pubs['num_combined_fict'].div(final_cross_pubs['fiction_overlap'])
final_cross_pubs['ratio_nonfiction'] = final_cross_pubs['num_combined_nonfict'].div(final_cross_pubs['nonfiction_overlap'])
final_cross_pubs['sum_ratios'] = final_cross_pubs['ratio_fiction'] + final_cross_pubs['ratio_nonfiction']

Great! Now let's just add the avg ranking variable to the final_cross_pubs list.

In [162]:
final_pubs = final_cross_pubs.merge(ranks_per_pub, how= 'outer', on='pubname1')

In [163]:
final_pubs = final_pubs.rename(columns={'rank_fiction_y':'avg_pub_rank',
                                        'rank_fiction_x':'rank_fiction'})
final_cross_pubs = final_pubs
del(final_pubs)

### Pair the publishers for Friday

In this section, we'll use the final_cross_pubs dataset to pair the publishers for Friday's query critiques. We'll do so by starting with the publisher with the highest average ranking (meaning the agent or editor who had the least in common, on average, with all the other editors and agents). 

For the May 2025 conference, we'll start with Joelle. We'll select her five pairings with the lowest ratio sum. This means we'll be selecting the five other agents/editors who had the MOST overlap with her in both fiction and nonfiction genres (anyone with no overlap is being excluded entirely). We'll then sort this list by the publisher type and ratio sum. Since Joelle is an editor, this way we can prioritize picking among the agents first in this top 5 list, and then if there aren't any, we'll pair her with another editor.

Let's set this up.

In [164]:
# Create an empty list to save the pairings
selected_pairs =[]

# We'd like to exclude any same-type pairings, so we don't get agent-agent or editor-editor pairings
df = final_cross_pubs[final_cross_pubs['pubtype1'] != final_cross_pubs['pubtype2']]

# Let's also exclude any rows where the sum_ratio is Infinity - couldn't do this. Meant there weren't rows for certain pairings
#df = final_cross_pubs[np.isfinite(final_cross_pubs['sum_ratios'])]


In [165]:
top_pubname = df.loc[df['avg_pub_rank'].idxmax(), 'pubname1']

# Filter rows for this pubname1
filtered = df[df['pubname1'] == top_pubname]

pubtype1 = filtered.iloc[0]['pubtype1']
if pubtype1 == 'Editor':
    filtered = filtered.sort_values(
        by=['pubtype2', 'sum_ratios'], ascending=[True, True])
else:  # pubtype1 == 'Agent'
    filtered = filtered.sort_values(
        by=['pubtype2', 'sum_ratios'], ascending=[False, True])

check = filtered[['pubname1', 'pubtype1', 'pubname2', 'pubtype2', 'avg_pub_rank', 'sum_ratios']]
check2 = check.sort_values(by=['pubtype2', 'sum_ratios'], ascending=[False, True])

In [166]:
while not df.empty:
    # Identify the publisher with the highest rank (regardless of whether they are pubname1 or pubname2)
    top_pubname = df.loc[df['avg_pub_rank'].idxmax(), 'pubname1']

    # Filter rows for this pubname1
    filtered = df[df['pubname1'] == top_pubname]

    # Sort these rows dynamically based on `pubtype1` and `pubtype2`
    pubtype1 = filtered.iloc[0]['pubtype1']
    if pubtype1 == 'Editor':
        filtered = filtered.sort_values(by=['pubtype2', 'sum_ratios'], ascending=[True, True])
    if pubtype1 == 'Agent':
        filtered = filtered.sort_values(by=['pubtype2', 'sum_ratios'], ascending=[False, True])

    # Select the first row from the sorted list
    selected_row = filtered.iloc[0]
    selected_pairs.append(selected_row)

    # Drop all rows containing the selected `pubname1` or `pubname2`
    df = df[~df['pubname1'].isin([selected_row['pubname1'], selected_row['pubname2']]) &
            ~df['pubname2'].isin([selected_row['pubname1'], selected_row['pubname2']])]

    # Stop if we have 9 rows in the result
    if len(selected_pairs) == 9:
        break

# Step 4: Create a new DataFrame or list with the selected rows
result_df = pd.DataFrame(selected_pairs)
result_df = result_df[['pubname1', 'pubtype1', 'pubname2', 'pubtype2']]

# Print or save the result
print(result_df)

             pubname1 pubtype1             pubname2 pubtype2
104  Joëlle Delbourgo    Agent         Kurt Brackob   Editor
76        Jake Lovell    Agent            Grace Gay   Editor
187     Micah Brocker    Agent  Foyinsi Adegbonmire   Editor
145       Jéla Lewter   Editor      Paloma Hernando    Agent
6    Alexandria Brown   Editor       Renée Fountain    Agent
133   Jynastie Wilson    Agent          Dianna Vega   Editor
216  Monica Rae Brown   Editor  Jenna Satterthwaite    Agent
181     Lauren Bieker    Agent        Nicole Luongo   Editor
288       Vicky Weber    Agent           Wendy Wong   Editor


In [167]:
# Confirm we have 9 pairings
len(result_df)

9

### Assign the paired publishers their rooms

Woohoo! These all look good and make sense, so we're good to go for the room pairings for Friday's query letter critiques. Let's just assign these pairings to actual rooms now.

In [168]:
final_room_pairings_Friday = pd.concat([room_fr.reset_index(drop=True), result_df.reset_index(drop=True)], axis=1)

Amazing!! We are now officially done with determining Friday's pairings, and now we need to assign participants to the different time/room slots.

### Assign participants timeslots/publisher pairings for Friday

#### Create a dataset with a single row per participant and all their relevant activities

Let's first identify every participant who registered for a query letter critique on Friday.

In [169]:
query_critique_names = registered.loc[registered['Agenda Item Name'].str.contains('Query Letter Critique'), :]

Let's get a count of each email in this list, so we know the number of query letter critiques each person signed up for. Then we'll delete the original query_critique_names dataset.

In [170]:
queries = query_critique_names['Email'].value_counts().reset_index()
del(query_critique_names)

Importantly, for Friday's assignments, we can't assign people to agents/editors they're seeing on Saturday for a pitch or manuscript critique. In order to account for this, we also need to create datasets for the manuscript and pitches, so we can combine all three datasets later. 

Our goal is to create a single row per participant that lists any agents/editors they chose on Saturday, and to have know how many query letter critiques those people want.

In [171]:
pitches = registered.loc[registered['Agenda Item Name'].str.contains('Pitch'), :]

We need to extract out the publisher name from the Agenda Item Name column

In [172]:
import re
pitches['pubname'] = pitches['Agenda Item Name'].str.replace("Pitch [A-Z] with ", "", regex=True)
pitch = pitches[['Email', 'pubname']].value_counts().reset_index()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [173]:
print(pitch.loc[pitch['count']>1, :])

Empty DataFrame
Columns: [Email, pubname, count]
Index: []


<font color='red'>**NOTE:**</font> In an ideal world, there should be nobody printed above. Everyone should have a count of 1, since they can't meet with and pitch the same publisher multiple times for the same book. However, very, very rarely, someone will want to meet with a publisher twice to pitch them *different* books, so there can be counts of two or more.

We always want to confirm this with the participants though, to confirm that the double booking was intentional and not a registration error.

<font color='blue'>**UPDATE AFTER SPEAKING WITH GEORGE:**</font> This person (tk@tkread.com) DOES want 2 pitches with the same person. She has two different manuscripts to pitch.

Now that we've checked that, please note that people can sign up for up to three pitches (typically with 3 different agents/editors). We now need to create a combined variable per registrant that has ALL their pitch agents/editors.

In [174]:
pitchA = pitches.loc[pitches['Agenda Item Name'].str.contains("Pitch A with "), ['Email', 'pubname']]
pitchB = pitches.loc[pitches['Agenda Item Name'].str.contains("Pitch B with "), ['Email', 'pubname']]
pitchC = pitches.loc[pitches['Agenda Item Name'].str.contains("Pitch C with "), ['Email', 'pubname']]

pitchA = pitchA.rename(columns={'pubname':'pitchA'})
pitchB = pitchB.rename(columns={'pubname':'pitchB'})
pitchC = pitchC.rename(columns={'pubname':'pitchC'})

In [175]:
pitch2 = pd.merge(pd.merge(pitchA, pitchB, how='outer', on='Email'), pitchC, how='outer', on='Email')
pitch2.info()
# Reset anybody with the same values - in our case, we don't want this to happen, since the duplicated person is intentional
#pitch2.loc[(pitch2['pitchA'] == pitch2['pitchB']), 'pitchB'] = np.nan
#pitch2.loc[(pitch2['pitchA'] == pitch2['pitchC']), 'pitchC'] = np.nan
#pitch2.loc[(pitch2['pitchB'] == pitch2['pitchC']), 'pitchC'] = np.nan
#pitch2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 109 entries, 0 to 108
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Email   109 non-null    object
 1   pitchA  73 non-null     object
 2   pitchB  71 non-null     object
 3   pitchC  60 non-null     object
dtypes: object(4)
memory usage: 3.5+ KB


Now let's create a combined variable of everyone's chosen publishers for their pitch.

In [176]:
def combine_variables(row):
    return ', '.join(str(x) for x in row.dropna()) #convert to strings, drop Nas, and join.

pitch2['pitches_chosen_pubs'] = pitch2[['pitchA', 'pitchB', 'pitchC']].apply(combine_variables, axis=1)

Great! Now let's repeat this process for manuscript critiques.

In [177]:
ms = registered.loc[registered['Agenda Item Name'].str.contains('Manuscript'), :]

In [178]:
ms['pubname'] = ms['Agenda Item Name'].str.replace("Manuscript Critique [A-Z] with ", "", regex=True)
manuscript = ms[['Email', 'pubname']].value_counts().reset_index()
len(ms['Email'].unique()) == len(manuscript['Email'].unique())
len(ms['Email'].unique())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


100

Cool. Nobody signed up for duplicate manuscript critiques.

In [179]:
msA = ms.loc[ms['Agenda Item Name'].str.contains("Manuscript Critique A with "), ['Email', 'pubname']]
msB = ms.loc[ms['Agenda Item Name'].str.contains("Manuscript Critique B with "), ['Email', 'pubname']]
msC = ms.loc[ms['Agenda Item Name'].str.contains("Manuscript Critique C with "), ['Email', 'pubname']]

msA = msA.rename(columns={'pubname':'msA'})
msB = msB.rename(columns={'pubname':'msB'})
msC = msC.rename(columns={'pubname':'msC'})

In [180]:
manuscript = pd.merge(pd.merge(msA, msB, how='outer', on='Email'), msC, how='outer', on='Email')


In [181]:
manuscript['ms_chosen_pubs'] = manuscript[['msA', 'msB', 'msC']].apply(combine_variables, axis=1)

In [182]:
del(pitch, pitches)
queries = queries.rename(columns={'count': 'num_query_critiques'})

Woohoo! Okay, now it's time to merge the pitch and the manuscript info, and then link it back to the query critiques as well, so we have the full list of participants with all of their chosen editors, and whether or not they have any query letter critiques.

In [183]:
merge1 = pd.merge(manuscript, pitch2, how='outer', on='Email')[['Email', 'pitchA', 'pitchB', 'pitchC', 'msA', 'msB', 'msC', 'pitches_chosen_pubs', 'ms_chosen_pubs']]
email_set = set(queries['Email'].dropna())
merge2 = pd.merge(merge1, queries, how='outer', on='Email')
merge2['query_critique'] = merge2['Email'].apply(lambda email: email in email_set if pd.notna(email) else False)

In [184]:
del(merge1, pitch2, manuscript, queries, room_fr, room_sat, selected_row, email_set, pubtype1, selected_pairs, top_pubname)

Perrrfect. Last step is to create a dataset with one row per email, which has their fiction and non-fiction genres, as well as if they're virtual or remote. We'll then join this to our dataset above.

In [185]:
per_registrant = registered.drop_duplicates(subset='Email', keep='first')[['Email', 'Virtual', 'Fiction genre', 'Nonfiction genre', 'datetime']]
per_registrant['Virtual'] = per_registrant['Virtual'].replace(['Virtually via Zoom (only available for query letter critiques, manuscript sample critiques, and pitches)', 'In person at the conference hotel'],
                                                              ['Virtual', 'In person'])

In [186]:
print(per_registrant['Virtual'].value_counts())

Virtual
In person                                                      150
Virtual                                                         20
Only doing the pre-conference edit (which will be by email)      1
Name: count, dtype: int64


Now we'll rename the Email Address to Email, and then we'll merge the dataframes to get one big one with all participants who registered for any of the three main activities: query letter critiques, manuscripts, or pitches.

In [187]:
per_registrant2 = pd.merge(per_registrant, merge2, how='outer', on='Email')

#### Create 3 different datasets: one per query letter critiques, MS critiques, and pitches
Before doing any scheduling, we need to create 3 different datasets for these three different activities, so we can easily schedule them below in their respective sections.

In [188]:
ms_critiques = per_registrant2.loc[pd.notna(per_registrant2['ms_chosen_pubs']), ['Email', 'Virtual', 'ms_chosen_pubs', 'msA', 'msB', 'msC']]

In [189]:
pitches = per_registrant2.loc[pd.notna(per_registrant2['pitches_chosen_pubs']), ['Email', 'Virtual', 'pitches_chosen_pubs', 'pitchA', 'pitchB', 'pitchC']]

In [190]:
query_critiques = per_registrant2.loc[per_registrant2['query_critique']==True, :]

In [191]:
del(per_registrant,merge2, df, filtered, check, check2)

Lastly, before we finalize this dataset, we also need to account for people who signed up for the Friday workshop from 4-6pm. These people need to be assigned query letter critiques prior to 4pm.

In [192]:
fri_workshop = registered[registered['Agenda Item Name']=='Friday Workshop- Writer Beware: How Writers Can Protect Themselves']
query_critiques['Friday_workshop'] = query_critiques['Email'].isin(fri_workshop['Email'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


The below will adjust the query manuscripts dataset so that there's as many rows per person as there are counts for them. This way, anyone who signed up for two query critiques will have two rows.

In [193]:
def expand_dataframe(df, id_col, count_col):
    rows = []
    for _, row in df.iterrows():
        count = int(row[count_col])  # Convert float count to integer
        for _ in range(count):
            rows.append(row.drop(count_col).to_dict())
    return pd.DataFrame(rows)

expanded_query_critiques = expand_dataframe(query_critiques, 'Email', 'num_query_critiques')


Final check: are any of the timekeepers doing query critiques?

In [194]:
len(query_critiques.loc[query_critiques['Email'].isin(timekeepers['Email']), :])

0

Sweet! None of our timekeepers are doing query letter critiques, so that makes things easy. That said, we don't have to worry about accounting for them when scheduling the Friday critiques at all then, and it's irrelevant for scheduling for the manuscripts and pitches (for those, we only care that they're serving as timekeepers, but not who they're pitching, etc.).

#### Get frequencies of fiction and nonfiction genres among registrants

Before we move on to scheduling, there's ONE final step: getting the different combinations of fiction and nonfiction genres among our registrants, to see which are most popular/least popular, so we can do our best to match agent-editor pairings that will meet everyone's needs.

We likely won't do much with this information, but it's nice to see.

In [195]:
def count_genres(row):
    genres = row['Fiction genre'].split(', ')
    unique_genres = set(genres)
    for genre in unique_genres:
        genre_counts[genre] = genre_counts.get(genre, 0) + 1

# Initialize genre_counts
genre_counts = {}

# Apply the function to each row
per_registrant2.apply(count_genres, axis=1)

# Convert genre_counts to DataFrame
reg_fict_counts = pd.DataFrame(list(genre_counts.items()), columns=['fiction', 'registrant_fiction_counts'])

In [196]:
del(genre_counts, count_genres)

In [197]:
def count_genres(row):
    genres = row['Nonfiction genre'].split(', ')
    unique_genres = set(genres)
    for genre in unique_genres:
        genre_counts[genre] = genre_counts.get(genre, 0) + 1

# Initialize genre_counts
genre_counts = {}

# Apply the function to each row
per_registrant2.apply(count_genres, axis=1)

# Convert genre_counts to DataFrame
reg_nonfict_counts = pd.DataFrame(list(genre_counts.items()), columns=['nonfiction', 'registrant_nonfiction_counts'])

Because of some weirdness with the other writeins (for instance "Other (please specify): songs, movie and tv scripts"), there are some nonfiction genres popping up that shouldn't be. We'll filter out anything that isn't our true genre lists.

In [198]:
reg_nonfict_counts = reg_nonfict_counts.loc[reg_nonfict_counts['nonfiction'].isin(nonfict_gen['list_nonfiction']), :]
reg_fict_counts = reg_fict_counts.loc[reg_fict_counts['fiction'].isin(fict_gen['fiction_genres']), :]

Cool, let's now get expanded counts for each of the types in the cross_pubs listing too, so we can cross-tabulate that with the datasets above.

In [199]:
# Now sure why this is happening, but we need to remove white spaces at teh beginning of the series
final_cross_pubs['combined_fiction'] = final_cross_pubs['combined_fiction'].str.lstrip()
final_cross_pubs['combined_nonfiction'] = final_cross_pubs['combined_nonfiction'].str.lstrip()

In [200]:
def count_genres(row):
    genres = row['combined_fiction'].split(', ')
    unique_genres = set(genres)
    for genre in unique_genres:
        genre_counts[genre] = genre_counts.get(genre, 0) + 1

# Initialize genre_counts
genre_counts = {}

# Apply the function to each row
final_cross_pubs.apply(count_genres, axis=1)

# Convert genre_counts to DataFrame
fiction_counts = pd.DataFrame(list(genre_counts.items()), columns=['fiction', 'fiction_count'])


In [201]:
del(count_genres, genre_counts, nonfict_gen, fict_gen)

In [202]:
def count_genres(row):
    genres = row['combined_nonfiction'].split(', ')
    unique_genres = set(genres)
    for genre in unique_genres:
        genre_counts[genre] = genre_counts.get(genre, 0) + 1

# Initialize genre_counts
genre_counts = {}

# Apply the function to each row
cross_pubs_filtered = final_cross_pubs[final_cross_pubs['combined_nonfiction'].notnull()] 
cross_pubs_filtered.apply(count_genres, axis=1)

# Convert genre_counts to DataFrame
nonfiction_counts = pd.DataFrame(list(genre_counts.items()), columns=['nonfiction', 'nonfiction_count'])


Now let's create a dataset with the participant, as well as publisher-pairing, genre count info.

In [203]:
all_fiction_genre_info = pd.merge(fiction_counts, reg_fict_counts, how='outer', on='fiction')
all_nonfiction_genre_info = pd.merge(nonfiction_counts, reg_nonfict_counts, how='outer', on='nonfiction')

In [204]:
all_fiction_genre_info.to_excel("Outputs/Frequencies_fiction.xlsx", index=False)
all_nonfiction_genre_info.to_excel("Outputs/Frequencies_nonfiction.xlsx", index=False)

Whew! We're finally done with that. You can manually review the files output above to help figure out the agent-editor pairings, or double check them, or just to see which genres in fiction and nonfiction are really popular. But otherwise, we're good to move on to actually assigning participants to their Friday timeslots.

In [205]:
final_cross_pubs.to_excel(f"Outputs/Finalized datasets/publisher_pair_rankings_{today}.xlsx", index=False)

#### Assign participants to their Friday timeslots

Note that in the prior sections of code, we created:

1) The rooms for Friday with the agents/editors assigned to them (final_room_pairings_Friday)
2) The list of all time slots and rooms for Friday (tslist_fri)
3) The list of all participants who signed up for a query letter critique (expanded_query_critiques), which has multiple rows per person - one for the number of queries they signed up for

Before moving on though, let's join #1 and #2.

In [206]:
times_friday = tslist_fri.merge(final_room_pairings_Friday, on=['day', 'room_name'], how='outer').sort_values(['timeslot_start', 'room_name'])

Let's also bring in the information on what combined genres those publisher pairings represent

In [207]:
times_friday2 = times_friday.merge(final_cross_pubs[['pubname1', 'pubname2', 'combined_fiction', 'combined_nonfiction']], on=['pubname1', 'pubname2'], how='inner')

In [208]:
del(tslist_fri, times_friday, reg_nonfict_counts, reg_fict_counts, result_df, rooms_friday, nonfiction_counts, fiction_counts, all_fiction_genre_info, all_nonfiction_genre_info)

Sweet! Now let's begin the assignments.

In this section of code, we will assign participants to an agent-editor pairing for whom they're not pitching or doing a manuscript critique for (if any), and that represents at least one of the genres (fiction and/or nonfiction) that the registrant writes in.

Note that we will not schedule anyone back-to-back, that we will prioritize virtual people for the first sessions (and prioritize any virtual people to be followed by virtual people), and that we will also prioritize anyone who signed up for the Friday workshop for the earlier sessions.

In [209]:
# First, we need to create lists for anything separated by a comma
times_friday2['combined_nonfiction'] = times_friday2['combined_nonfiction'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])
times_friday2['combined_fiction'] = times_friday2['combined_fiction'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])

expanded_query_critiques['nonfiction_genre'] = expanded_query_critiques['Nonfiction genre'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])
expanded_query_critiques['fiction_genre'] = expanded_query_critiques['Fiction genre'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])
expanded_query_critiques['pitches_chosen_pubs'] = expanded_query_critiques['pitches_chosen_pubs'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])
expanded_query_critiques['ms_chosen_pubs'] = expanded_query_critiques['ms_chosen_pubs'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])


In [210]:
# Combine the publishers into a single list
expanded_query_critiques['chosen_pubs'] = expanded_query_critiques['pitches_chosen_pubs'] + expanded_query_critiques['ms_chosen_pubs']

Let's save this dataset:

In [211]:
expanded_query_critiques.to_excel(f"Outputs/Finalized datasets/Expanded_query_critiques-ds prior to assignment_{today}.xlsx", 
                                columns=['Email', 'Virtual', 'pitchA', 'pitchB', 'pitchC', 'msA', 'msB', 'msC', 'pitches_chosen_pubs', 'ms_chosen_pubs', 'query_critique', 'Friday_workshop', 'fiction_genre', 'nonfiction_genre', 'chosen_pubs'], 
                                index=False)

For assignment purposes, we're going to prioritize people according to the following:
1) Virtual
2) How many publishers they signed up with for manuscript critiques and/or pitches
3) Friday workshop attendees
4) Registration date 

In [212]:
# Sort participants by prioritization criteria
expanded_query_critiques['chosen_pubs_count'] = expanded_query_critiques['chosen_pubs'].apply(len)
expanded_query_critiques.sort_values(
    by=['Virtual', 'chosen_pubs_count', 'Friday_workshop', 'datetime'],
    ascending=[False, False, True, True],
    inplace=True
)

In [213]:
# Initialize an assignment dictionary
assignments = []

# Convert the 'timeslot_start' to datetime variable
times_friday2['timeslot_start'] = pd.to_datetime(date_str_fri + ' ' + times_friday2['timeslot_start'].astype(str))

# Adjust the times to represent the afternoon (add 12 hours if in AM range)
times_friday2['timeslot_start'] = times_friday2['timeslot_start'].apply(
    lambda x: x + pd.Timedelta(hours=12) if x.hour < 12 else x
)

from datetime import timedelta


In [214]:
assignments = []

# Create a new dataset with all the people:
participants_df = expanded_query_critiques.copy()
slots_df = times_friday2

# Helper function to check slot compatibility
def is_slot_compatible(participant, slot):

    # Check if at least one genre matches
    participant_fiction_genres = set(participant['fiction_genre'])
    slot_fiction_genres = set(slot['combined_fiction'])

    participant_nonfiction_genres = set(participant['nonfiction_genre'])
    slot_nonfiction_genres = set(slot['combined_nonfiction'])

    if not (participant_fiction_genres & slot_fiction_genres or participant_nonfiction_genres & slot_nonfiction_genres):
        print(f"Incompatible due to genres.") 
        print(f"Participant fiction genres: {participant_fiction_genres}")
        print(f"Publisher fiction genres: {slot_fiction_genres}")
        print(f"Participant nonfiction genres: {participant_nonfiction_genres}") 
        print(f"Publisher nonfiction genres: {slot_nonfiction_genres}")
        return False

    # Check publisher overlap
    participant_pubs = set(participant['chosen_pubs'])
    slot_pubs = {slot['pubname1'], slot['pubname2']}
    if participant_pubs.intersection(slot_pubs):
        print(f"Incompatible due to publisher overlap. Participant publishers: {participant_pubs}, Slot publishers: {slot_pubs}")
        return False

    # Check if they're a workshop person and don't assign for 4pm or later
    if participant['Friday_workshop'] and slot['timeslot_start'].hour >= 16:
        return False

    return True

# Define a function to perform the assignment process
def assign_participants(participants_df, slots_df, seed):
    # Shuffle slots_df with the given seed
    randomized_slots = slots_df.sample(frac=1, random_state=seed).reset_index(drop=True)
    assignments_local = []

    # Create a copy of participants_df to modify
    remaining_participants = participants_df.copy()

    for _, slot in randomized_slots.iterrows():
        if remaining_participants.empty:
            break  # Exit if all participants are assigned

        for index, participant in remaining_participants.iterrows():
            # Skip participants already assigned to conflicting slots
            assigned_slots = [a['timeslot_start'] for a in assignments_local if a['Email'] == participant['Email']]
            if any(abs(slot['timeslot_start'] - assigned) <= timedelta(minutes=15) for assigned in assigned_slots):
                continue

            # Check slot compatibility
            if is_slot_compatible(participant, slot):
                assignments_local.append({
                    'Email': participant['Email'],
                    'timeslot_start': slot['timeslot_start'],
                    'room_name': slot['room_name'],
                    'pubname1': slot['pubname1'],
                    'pubname2': slot['pubname2'],
                    'participant_fiction_genre': ', '.join(participant['fiction_genre']),
                    'participant_nonfiction_genre': ', '.join(participant['nonfiction_genre']),
                    'publisher_fiction_genre': slot['combined_fiction'],
                    'publisher_nonfiction_genre': slot['combined_nonfiction'],
                    'workshop': participant['Friday_workshop'],
                    'virtual': participant['Virtual']
                })

                # Remove the assigned participant row
                remaining_participants.drop(index, inplace=True)
                break

    return assignments_local, remaining_participants

# Attempt to assign participants with different seeds until successful
seed = 0
while True:
    print(f"Trying seed: {seed}")
    assignments, remaining = assign_participants(participants_df, slots_df, seed)
    
    if remaining.empty:  # All participants assigned
        print(f"All participants successfully assigned with seed: {seed}")
        break

    print(f"Unassigned participants remain with seed: {seed}, retrying...")
    seed += 1  # Increment the seed for the next iteration

# Convert assignments to a DataFrame
assignments_df = pd.DataFrame(assignments)

# Output
print(assignments_df)

Trying seed: 0
Incompatible due to publisher overlap. Participant publishers: {'Paloma Hernando', 'Alexandria Brown', 'Lauren Bieker', 'Foyinsi Adegbonmire', 'Jynastie Wilson'}, Slot publishers: {'Paloma Hernando', 'Jéla Lewter'}
Incompatible due to publisher overlap. Participant publishers: {'Paloma Hernando', 'Micah Brocker', 'Jenna Satterthwaite', 'Dianna Vega', 'Vicky Weber', 'Renée Fountain'}, Slot publishers: {'Paloma Hernando', 'Jéla Lewter'}
Incompatible due to publisher overlap. Participant publishers: {'Paloma Hernando', 'Micah Brocker', 'Jenna Satterthwaite', 'Dianna Vega', 'Vicky Weber', 'Renée Fountain'}, Slot publishers: {'Paloma Hernando', 'Jéla Lewter'}
Incompatible due to publisher overlap. Participant publishers: {'Jéla Lewter', 'Wendy Wong', 'Jenna Satterthwaite', 'Lauren Bieker', 'Vicky Weber', 'Renée Fountain'}, Slot publishers: {'Paloma Hernando', 'Jéla Lewter'}
Incompatible due to publisher overlap. Participant publishers: {'Jéla Lewter', 'Wendy Wong', 'Jenna Sat

In [215]:
print(len(participants_df))
print(len(expanded_query_critiques)== len(assignments_df))

65
True


Woohoo! That looks amazing, and all participants got slotted! Let's just add in some checks to flag participants who have bad criteria.

In [216]:
# Flag participants with a Friday workshop and slots from 4-5pm

assignments_df['Flag'] = assignments_df.apply(
    lambda x: (x['workshop']) and (x['timeslot_start'].hour >= 16),  # After 4pm check
    axis=1
)

In [217]:
print(assignments_df['Flag'].value_counts())

Flag
False    65
Name: count, dtype: int64


Okay, awesome! We're good to go, and now let's just print out the csv file with all the assignments, and also save the dataset as a final (better named) dataset.

In [218]:
final_friday_assignments = assignments_df

del(assignments_df, participants_df, slots_df, query_critiques,ranks_per_pub, seed, clean_genres, combine_variables, expand_dataframe, count_genres,
    count_overlapping_genres, count_overlapping_nonfiction, is_slot_compatible, genre_counts, assignments)

Lastly, create a column called 'publisher' that is a merging of the two publishers names, and also add a variable 'Session' that says 'Query Letter Critiques'. Oh, and add in a variable for 'Timekeeper', which is True/False depending on if the participant is a timekeeper that day or not.

In [219]:
final_friday_assignments['publisher'] = final_friday_assignments['pubname1'] + " and " + final_friday_assignments['pubname2']
final_friday_assignments['Session'] = "Query Letter Critiques"
final_friday_assignments['Timekeeper'] = final_friday_assignments['Email'].isin(timekeepers['Email'])


Lastly, let's link in the first and last names,as well as phone numbers.

In [220]:
final_friday_assignments2 = pd.merge(final_friday_assignments, registered[['Email', 'First Name', 'Last Name', 'phone']].drop_duplicates(), on="Email", how="inner")

In [221]:
final_friday_assignments.to_excel(f"Outputs/Finalized datasets/Final_Friday_query_letter_critique_assignments_{today}.xlsx", index=False)

Sweet. Now we're good to go on Friday, so let's move on to Saturday assignments, which are easier. For these, the publishers have already been assigned to their own rooms, and our participants signed up to meet with specific agents and editors, so we just need to assign them to specific times.

## 3. Friday Author Coaching

Starting for May 2025, George added in 'Author coaching' as one of the types of sessions participates could select when registering. These are scheduled AROUND the query letter critiques (if relevant), and so as not to coincide with the Friday workshop at 4pm (if relevant).

Now that we've assigned everyone above, we now need to:
1) Create a list of anyone who's signed up for coaching
2) Add in information about what times they have their query critiques (if applicable)
3) Add in information about whether they're in the Friday workshop (add those two times, 4:00 and 4:15, as times to exclude).

We also need to load the times and room information.

In [222]:
reg_coaching = registered.loc[registered['Agenda Item Name'].str.contains("Coach"), :]
reg_coaching['Friday_workshop'] = reg_coaching['Email'].isin(fri_workshop['Email'])
reg_coaching['QLC'] = reg_coaching['Email'].isin(final_friday_assignments2['Email'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Okay, we need to update the timeslots thing so that they are datetimes.

In [223]:
# Convert the 'timeslot_start' to datetime variable
tslist_coach['timeslot_start'] = pd.to_datetime(date_str_fri + ' ' + tslist_coach['timeslot_start'].astype(str))

# Adjust the times to represent the afternoon (add 12 hours if in AM range)
tslist_coach['timeslot_start'] = tslist_coach['timeslot_start'].apply(
    lambda x: x + pd.Timedelta(hours=12) if x.hour < 12 else x
)

from datetime import timedelta


In [224]:
# We need to create an alternate version of the friday assignments dataset so that we can add buffer times for our purposes here:
publisher_meetings = final_friday_assignments2.copy()
publisher_meetings['timeslot_end'] = publisher_meetings['timeslot_start'] + timedelta(minutes=15)
publisher_meetings['buffer_start'] = publisher_meetings['timeslot_start'] - timedelta(minutes=15)
publisher_meetings['buffer_end'] = publisher_meetings['timeslot_end'] + timedelta(minutes=15)


In [225]:
# We also need to add end times for the coaching meetings. Note that they're technically 15 minutes, but we'll ignore that and pretend they're 17 since there's a two minute break between them
tslist_coach['timeslot_end'] = tslist_coach['timeslot_start'] + timedelta(minutes=17)

In [226]:
coaching_timeslots = tslist_coach.copy()

def assign_coaching_meetings(seed):
    # Shuffle timeslots with the given seed
    randomized_timeslots = coaching_timeslots.sample(frac=1, random_state=seed).reset_index(drop=True)
    coaching_schedule = []
    remaining_participants = reg_coaching.copy()

    for _, participant in remaining_participants.iterrows():
        Email = participant['Email']
        selected_coach = participant['Agenda Item Name']
        workshop_flag = participant['Friday_workshop']
        
        # Exclude slots that conflict with publisher meetings
        valid_slots = randomized_timeslots[~randomized_timeslots.apply(
            lambda slot: any(
                (publisher_meetings['buffer_start'] <= slot['timeslot_start']) &
                (slot['timeslot_start'] < publisher_meetings['buffer_end'])
            ), axis=1
        )]

        # Exclude slots after 4:00 PM for workshop participants
        if workshop_flag:
            valid_slots = valid_slots[valid_slots['timeslot_start'].dt.hour < 16]
        
        # Assign the first valid slot
        if not valid_slots.empty:
            chosen_slot = valid_slots.iloc[0]
            coaching_schedule.append({
                'Email': Email,
                'Session': selected_coach,
                'timeslot_start': chosen_slot['timeslot_start'],
                'publisher': chosen_slot['coach'],
                'room_name': chosen_slot['room_name']
            })

            # Remove the chosen slot to prevent double-booking
            randomized_timeslots.drop(valid_slots.index[0], inplace=True)
        else:
            # If no valid slots remain, return the incomplete schedule
            return pd.DataFrame(coaching_schedule), remaining_participants

    return pd.DataFrame(coaching_schedule), pd.DataFrame()  # Return schedule and empty DataFrame if all assigned

# Try different seeds until all participants are successfully assigned
seed = 0
while True:
    print(f"Trying seed: {seed}")
    coaching_schedule, unassigned = assign_coaching_meetings(seed)
    
    if unassigned.empty:
        print(f"All participants successfully assigned with seed: {seed}")
        break

    print(f"Unassigned participants remain with seed: {seed}, retrying...")
    seed += 1

# Output the final coaching schedule
#print(coaching_schedule)

Trying seed: 0
All participants successfully assigned with seed: 0


In [227]:
# Check that the numbers match
len(coaching_schedule) == len(reg_coaching)

True

In [228]:
del(coaching_timeslots, assign_coaching_meetings, assign_participants)

In [229]:
# Add in the first and last names
coaching_schedule = pd.merge(coaching_schedule, registered[['Email', 'First Name', 'Last Name', 'phone']].drop_duplicates(), on="Email", how="inner")

In [230]:
# Save this scheduling
coaching_schedule.to_excel(f"Outputs/Finalized datasets/Finalized coaching schedule_{today}.xlsx", index=False)

## 3. Saturday Morning - Manuscript critique scheduling
This is the easiest scheduling assignment. Everyone has already signed up for their critiques, so we just need to make sure:

1) Nobody is scheduled back-to-back (for anyone with multiple)
2) Timekeepers aren't first or last
3) Virtual participants are grouped back-to-back (we will prioritize them for the first time slots per room).

Ideally, we also try to ensure that there's only one substitute for each time slot, though we have plenty of substitute timekeepers. This can just be a manual check and fix later on.

In [231]:
# Assign the individual publishers to their respective rooms for Saturday morning (MS) and Saturday afternoon (Pitches)
times_sat = pd.concat([rooms_saturday.reset_index(drop=True), pubs.reset_index(drop=True)], axis=1)

tslist_satmorn= pd.merge(timeslots.loc[(timeslots['day']=='Saturday') & (timeslots['day_session']=='Morning'), :], times_sat, how='outer', on='day')
tslist_sataft= pd.merge(timeslots.loc[(timeslots['day']=='Saturday') & (timeslots['day_session']=='Afternoon'), :], times_sat, how='outer', on='day')

In [232]:
final_saturday_rooms = pd.concat([rooms_saturday.reset_index(drop=True), pubs.reset_index(drop=True)], axis=1)
final_saturday_rooms.to_excel(f"Outputs/Finalized datasets/Final saturday rooms_{today}.xlsx")

We need to create a single dataset for the manuscript critiques where every person has a row for their critique (as in, a person can have up to three rows).

In [233]:
del(msA, msB, msC, pitchA, pitchB, pitchC) # delete these - had originally kept for this but need virtual info
msA = ms_critiques[['Email', 'Virtual', 'msA']]
msB = ms_critiques[['Email', 'Virtual', 'msB']]
msC = ms_critiques[['Email', 'Virtual', 'msC']]

In [234]:
msA = msA.rename(columns={'msA': 'publisher'})
msB = msB.rename(columns={'msB': 'publisher'})
msC = msC.rename(columns={'msC': 'publisher'})

ms_all = pd.merge(pd.merge(msA, msB, on=['Email', 'Virtual', 'publisher'], how="outer"), msC, on=['Email', 'Virtual', 'publisher'], how='outer')

Drop any rows with NaN

In [235]:
ms_all = ms_all.dropna()

In [236]:
del(msA, msB, msC)

Okay, now we just need to convert the timeslot_start to a timestamp variable

In [237]:
# Convert the 'timeslot_start' to datetime variable
tslist_satmorn['timeslot_start'] = pd.to_datetime(date_str_sat + ' ' + tslist_satmorn['timeslot_start'].astype(str))

In [238]:
# Convert the 'timeslot_start' to datetime variable
tslist_sataft['timeslot_start'] = pd.to_datetime(date_str_sat + ' ' + tslist_sataft['timeslot_start'].astype(str))

# Adjust the times to represent the afternoon (add 12 hours if in AM range)
tslist_sataft['timeslot_start'] = tslist_sataft['timeslot_start'].apply(
    lambda x: x + pd.Timedelta(hours=12) if x.hour < 12 else x
)

Let's also identify the timekeepers' emails. We'll make sure not to give them the first or last time slot.

In [239]:
timekeeps = timekeepers[['Email']].drop_duplicates()
timekeeps = timekeeps['Email'].tolist()

Whew! Okay, now it's time to assign the participants for the manuscript critiques. The code below works by:

1) It fills alphabetically by the publisher name, so that Alexandria Brown gets all her timeslots filled first, before moving on to the next publisher in the alphabet. **NOTE**: I have it randomly filling time slots. It's not running by earliest time to latest time.

2) It prioritizes assignment of participants according to how many manuscript critique slots they still need to be assigned. This means that for the first time slot it tries filling, it'll prioritize people with 3 critiques, then 2, then 1. As it continues to iterate and participants get assigned slots, a participant who initially had 3 meetings but who was already scheduled for 2 (meaning n_remaining=1) will get less priority over participants still with 2 or three meetings needing assignment.

3) I randomly shuffled the participants within their priority groups. This means that participant emails are randomly ordered in the A) three remaining group, B) two remaining and C) one remaining group. This way we don't prioritize people according to the alphabetical ordering of their emails but just do random assignments. (I had implemented this because I had noticed initially that a lot of the T-Z emails weren't being assigned as readily).

<font color='red'>**BIGGEST NOTE**:</font>
This entire code is embedded within one giant function because I'm having it run this code repeatedly using different random seeds, until it finds the seed that ensures that ALL participants get assigned time slots. Then it stops and that's the seed number that's kept.

In [240]:
import random

# Create copies of the datasets to use in the function, since we drop participants as we go
slots_df = tslist_satmorn
participants_df = ms_all.copy()

# Get the earliest and latest timeslots
earliest_time = slots_df['timeslot_start'].min()
latest_time = slots_df['timeslot_start'].max()

# Define a function to perform the assignment process
def assign_slots_with_seed(participants_df, slots_df, seed):

    # Add a column to flag timekeepers in the participants dataset
    participants_df['is_timekeeper'] = participants_df['Email'].isin(timekeeps)

    # Create the blank datasets and lists for the assignments and used-up slots
    assignments = []
    used_slots = set()

    # Add a column to track the number of meetings each participant needs
    participants_df['remaining_meetings'] = participants_df.groupby('Email')['Email'].transform('count')

    # Repeat until all participants are assigned or no more slots remain
    while not participants_df.empty:
        assigned_any = False

        for _, slot in slots_df.iterrows():
            room_slot_id = (slot['timeslot_start'], slot['room_name'])

            if room_slot_id in used_slots:
                continue

            if participants_df.empty:
                break

            sorted_participants = (
                participants_df
                .sample(frac=1, random_state=seed)  # Shuffle randomly
                .sort_values(by='remaining_meetings', ascending=False)
            )

            for index, participant in sorted_participants.iterrows():

                # Skip back-to-back assignments
                assigned_slots = [
                    (a['timeslot_start'], a['room_name']) for a in assignments if a['Email'] == participant['Email']
                ]
                if any(
                    abs(slot['timeslot_start'] - assigned_time) <= timedelta(minutes=15)
                    for assigned_time, _ in assigned_slots
                ):
                    continue

                
                # Skip the earliest and latest timeslots for timekeepers if possible
                if participant['is_timekeeper'] and slot['timeslot_start'] in [earliest_time, latest_time]:
                    # Check if there are other slots available for this participant
                    has_alternative = any(
                        set(participant['publisher']) == set(alt_slot['lit_guest_name']) and
                        alt_slot['timeslot_start'] not in [earliest_time, latest_time] and
                        (alt_slot['timeslot_start'], alt_slot['room_name']) not in used_slots
                        for _, alt_slot in slots_df.iterrows()
                    )
                    if not has_alternative:
                        continue

                if set(participant['publisher']) == set(slot['lit_guest_name']):
                    assignments.append({
                        'Email': participant['Email'],
                        'timeslot_start': slot['timeslot_start'],
                        'room_name': slot['room_name'],
                        'publisher': slot['lit_guest_name'],
                        'virtual': participant['Virtual'],
                        'Session': "Manuscript critique",
                        'Timekeeper': participant['is_timekeeper']
                    })
                    used_slots.add(room_slot_id)
                    participants_df.drop(index, inplace=True)
                    participants_df['remaining_meetings'] = participants_df.groupby('Email')['Email'].transform('count')
                    assigned_any = True
                    break

        if not assigned_any:
            break

    return assignments, participants_df

# Initialize variables
success = False
max_attempts = 1000  # Limit the number of attempts
seed = 0

while not success and seed < max_attempts:
    seed += 1
    print(f"Trying seed {seed}...")
    
    # Copy the original dataframes to avoid modifying them directly
    participants_copy = ms_all.copy()
    slots_copy = tslist_satmorn.copy()

    # Run the assignment process with the current seed
    assignments, remaining_participants = assign_slots_with_seed(participants_copy, slots_copy, seed)

    # Check if all participants were assigned
    if remaining_participants.empty:
        success = True
        print(f"Success! All participants assigned using seed {seed}.")
        break

if success:
    # Convert assignments to a DataFrame
    assignments_df = pd.DataFrame(assignments)
else:
    print("Failed to assign all participants within the maximum number of attempts.")


Trying seed 1...
Trying seed 2...
Trying seed 3...
Success! All participants assigned using seed 3.


Yay! This code works beautifully!! Everyone's been assigned and now let's just do a little cleaning, then repeat the process for the Saturday afternoon pitches.

In [241]:
final_satmorn_assignments = assignments_df

Let's just add in the first and last names, plus phones.

In [242]:
final_satmorn_assignments2 = pd.merge(final_satmorn_assignments, registered[['Email', 'First Name', 'Last Name', 'phone']].drop_duplicates(), on='Email', how='inner')

In [243]:
del(assignments_df, ms_all, ms_critiques, participants_copy, slots_copy, slots_df, remaining_participants,
    assignments, earliest_time, latest_time, seed, success, assign_slots_with_seed, final_friday_assignments, final_satmorn_assignments)

In [244]:
#  Save the dataset
final_satmorn_assignments2.to_excel(f"Outputs/Finalized datasets/Final manuscript critique assignments_{today}.xlsx", index=False)

## 4. Saturday Afternoon - Pitches

Let's do the pitch assignments now! We'll do the exact same process, except using the saturday afternoon times and the pitch dataset.

In [245]:
pitchA = pitches[['Email', 'Virtual', 'pitchA']]
pitchB = pitches[['Email', 'Virtual', 'pitchB']]
pitchC = pitches[['Email', 'Virtual', 'pitchC']]

In [246]:
pitchA = pitchA.rename(columns={'pitchA': 'publisher'})
pitchB = pitchB.rename(columns={'pitchB': 'publisher'})
pitchC = pitchC.rename(columns={'pitchC': 'publisher'})

pitches_all = pd.merge(pd.merge(pitchA, pitchB, on=['Email', 'Virtual', 'publisher'], how="outer"), pitchC, on=['Email', 'Virtual', 'publisher'], how='outer')

In [247]:
pitches_all = pitches_all.dropna()
del(pitchA, pitchB, pitchC)

In [248]:
import random

# Get the earliest and latest timeslots
earliest_time = tslist_sataft['timeslot_start'].min()
latest_time = tslist_sataft['timeslot_start'].max()

# Define a function to perform the assignment process
def assign_slots_with_seed(participants_df, slots_df, seed):

    # Shuffle the timeslots within each publisher group using the seed
    shuffled_slots = (
        slots_df.groupby('lit_guest_name', group_keys=False)
        .apply(lambda group: group.sample(frac=1, random_state=seed))
    )

    # Add a column to flag timekeepers in the participants dataset
    participants_df['is_timekeeper'] = participants_df['Email'].isin(timekeeps)

    # Create the blank datasets and lists for the assignments and used-up slots
    assignments = []
    used_slots = set()

    # Add a column to track the number of meetings each participant needs
    participants_df['remaining_meetings'] = participants_df.groupby('Email')['Email'].transform('count')

    # Repeat until all participants are assigned or no more slots remain
    while not participants_df.empty:
        assigned_any = False

        for _, slot in shuffled_slots.iterrows():
            room_slot_id = (slot['timeslot_start'], slot['room_name'])

            if room_slot_id in used_slots:
                continue

            if participants_df.empty:
                break

            sorted_participants = (
                participants_df
                .sample(frac=1, random_state=seed)  # Shuffle randomly
                .sort_values(by='remaining_meetings', ascending=False)
            )

            for index, participant in sorted_participants.iterrows():

                # Skip back-to-back assignments
                assigned_slots = [
                    (a['timeslot_start'], a['room_name']) for a in assignments if a['Email'] == participant['Email']
                ]
                if any(
                    abs(slot['timeslot_start'] - assigned_time) <= timedelta(minutes=15)
                    for assigned_time, _ in assigned_slots
                ):
                    continue

                # Skip the earliest and latest timeslots for timekeepers if possible
                if participant['is_timekeeper'] and slot['timeslot_start'] in [earliest_time, latest_time]:
                    # Check if there are other slots available for this participant
                    if not any(
                        set(participant['publisher']) == set(alt_slot['lit_guest_name']) and
                        alt_slot['timeslot_start'] not in [earliest_time, latest_time] and
                        (alt_slot['timeslot_start'], alt_slot['room_name']) not in used_slots
                        for _, alt_slot in shuffled_slots.iterrows()
                    ):
                        print(f"Timekeeper {participant['Email']} has no alternative slot; assigning to edge slot.")
                    else:
                        continue

                if set(participant['publisher']) == set(slot['lit_guest_name']):
                    assignments.append({
                        'Email': participant['Email'],
                        'timeslot_start': slot['timeslot_start'],
                        'room_name': slot['room_name'],
                        'publisher': slot['lit_guest_name'],
                        'virtual': participant['Virtual'],
                        'Session': "Pitch",
                        'Timekeeper': participant['is_timekeeper']
                    })
                    used_slots.add(room_slot_id)
                    participants_df.drop(index, inplace=True)
                    participants_df['remaining_meetings'] = participants_df.groupby('Email')['Email'].transform('count')
                    assigned_any = True
                    break

        if not assigned_any:
            break

    return assignments, participants_df

# Initialize variables
success = False
max_attempts = 1000  # Limit the number of attempts
seed = 0

while not success and seed < max_attempts:
    seed += 1
    print(f"Trying seed {seed}...")
    
    # Copy the original dataframes to avoid modifying them directly
    participants_copy = pitches_all.copy()
    slots_copy = tslist_sataft.copy()

    # Run the assignment process with the current seed
    assignments, remaining_participants = assign_slots_with_seed(participants_copy, slots_copy, seed)

    # Check if all participants were assigned
    if remaining_participants.empty:
        success = True
        print(f"Success! All participants assigned using seed {seed}.")
        break

if success:
    # Convert assignments to a DataFrame
    assignments_df = pd.DataFrame(assignments)
else:
    print("Failed to assign all participants within the maximum number of attempts.")


Trying seed 1...




Trying seed 2...




Trying seed 3...




Trying seed 4...




Trying seed 5...




Trying seed 6...




Trying seed 7...




Trying seed 8...




Trying seed 9...




Trying seed 10...




Trying seed 11...




Trying seed 12...




Trying seed 13...




Trying seed 14...




Trying seed 15...




Trying seed 16...




Trying seed 17...




Trying seed 18...




Trying seed 19...




Success! All participants assigned using seed 19.


Yay! That worked great too. Let's just save it and delete any extraneous datasets.

In [249]:
final_sataft_assignment = assignments_df
final_sataft_assignments2 = pd.merge(final_sataft_assignment, registered[['Email', 'First Name', 'Last Name', 'phone']].drop_duplicates(), on='Email', how='inner')

del(pitches, pitches_all, assignments_df, remaining_participants, timeslots, tslist_sataft, tslist_satmorn, times_friday2, times_sat,
    slots_copy, assignments, earliest_time, latest_time, max_attempts, success, seed, assign_slots_with_seed, timedelta, participants_copy, final_sataft_assignment)

Woohoo! Now we're officially all done with the assignments, and we just need to deal withe waitlists now. FInal step after that will be to print out everything we've got into exactly the excel and word files we want.

In [250]:
# Save the dataset
final_sataft_assignments2.to_excel(f"Outputs/Finalized datasets/Finalized pitch assignments_{today}.xlsx", index=False)

# 2. Deal with the Waitlists

Dealing with the waitlists is pretty simple. We already corrected some of the basic stuff earlier, like emails and phones. Now let's split into what they're waitlisted for:
1) manuscript critiques
2) pitches
3) pre-conference edits
4) book fairs
5) query letter critiques

In [251]:
wait_ms = waitlist[waitlist['Session Name'].str.contains('Manuscript')]
wait_pitch = waitlist[waitlist['Session Name'].str.contains('Pitch')]
wait_prec = waitlist[waitlist['Session Name'].str.contains('Pre-conference')]

# May also need to do bookfair and query letter critique waitlists

Now let's change all the code so that instead of Manuscript A, B, C etc, it says 'Waitlisted'

In [252]:
wait_pitch['Session Name'] = wait_pitch['Session Name'].str.replace("Pitch [A-Z] with ", "Waitlisted - Pitch with ", regex=True)
wait_ms['Session Name'] = wait_ms['Session Name'].str.replace("Manuscript Critique [A-Z] with ", "Waitlisted - Manuscript Critique with ", regex=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


We need to double check that no participant has more than 3 manuscript critique waitlist spots.

In [253]:
print(wait_pitch['Email'].value_counts().unique())
print(wait_ms['Email'].value_counts().unique())

[3 2 1]
[3 2 1]


Good. As you can see above, nobody's got 4 or higher for how often their emails appear in these lists. Now let's sort by registration date for each Session Name, so that we assign a value of #1, #2, etc. by registration date for each Manuscript critique/pitch spot with each publisher.

In [254]:
# Sort by 'Session Name' and 'datetime', and rank participants
wait_ms['Waitlist_ms'] = wait_ms.sort_values(['Session Name', 'datetime']) \
               .groupby('Session Name')['datetime'] \
               .rank(method='first').astype(int)

# Sort DataFrame for display (optional)
wait_ms = wait_ms.sort_values(['Session Name', 'Waitlist_ms']).reset_index(drop=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [255]:
# Sort by 'Session Name' and 'datetime', and rank participants
wait_pitch['Waitlist_pitch'] = wait_pitch.sort_values(['Session Name', 'datetime']) \
               .groupby('Session Name')['datetime'] \
               .rank(method='first').astype(int)

# Sort DataFrame for display (optional)
wait_pitch = wait_pitch.sort_values(['Session Name', 'Waitlist_pitch']).reset_index(drop=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Okay, looks good. Now let's tweak it a little bit more so we create an 'Agenda Item Name' that is 'Waitlisted - #1 - Manuscript Critique with [publisher].'

In [256]:
wait_pitch['Agenda Item Name'] = wait_pitch.apply(
    lambda row: row['Session Name'].replace(
        "Waitlisted", f"Waitlist #{row['Waitlist_pitch']}"
    ) if "Waitlisted" in row['Session Name'] else row['Session Name'],
    axis=1
)

In [257]:
wait_ms['Agenda Item Name'] = wait_ms.apply(
    lambda row: row['Session Name'].replace(
        "Waitlisted", f"Waitlist #{row['Waitlist_ms']}"
    ) if "Waitlisted" in row['Session Name'] else row['Session Name'],
    axis=1
)

In [258]:
# For right now, let's just merge all the waitlist stuff back together, and add in the participant info so that it's all in one place.
wait_all = pd.merge(wait_ms, wait_pitch, how="outer")

# Let's extract the publisher
wait_all['publisher'] = wait_all['Session Name'].str.replace("Waitlisted - Manuscript Critique with ", "")
wait_all['publisher'] = wait_all['publisher'].str.replace("Waitlisted - Pitch with ", "")
wait_all = wait_all[['Email', 'First Name', 'Last Name', 'phone', 'Agenda Item Name', 'publisher']]

# print for George
wait_all.to_excel("Outputs/Finalized Datasets/Waitlist participants.xlsx", index=False)


# 4. Print a bunch of excel documents

We won't really do much with these particular excel documents, except to export them for manual review (and potentially manual changes).

In [259]:
final_room_pairings_Friday.to_excel(f"Outputs/Finalized Datasets/Editor-agent pairings for Friday_{today}.xlsx", index=False)

In [260]:
final_friday_assignments2.to_excel(f"Outputs/Finalized Datasets/Friday query letter critique assignments_{today}.xlsx", index=False)

In [261]:
final_sataft_assignments2.to_excel(f"Outputs/Finalized Datasets/Saturday pitch assignments_{today}.xlsx", index=False)

In [262]:
final_satmorn_assignments2.to_excel(f"Outputs/Finalized Datasets/Saturday manuscript critique assignments_{today}.xlsx", index=False)

In [263]:
registered.to_excel(f"Outputs/Finalized Datasets/Registered_cleaned_{today}.xlsx", index=False,
                        columns=['Agenda Item Name', 'Email', 'First Name', 'Last Name', 'Mobile Phone Number', 'Virtual', 'Fiction genre', 'Nonfiction genre', 'phone'], )

In [264]:
waitlist.to_excel(f"Outputs/Finalized Datasets/Waitlist_cleaned_{today}.xlsx", index=False,
                        columns=['Session Name', 'Email', 'First Name', 'Last Name', 'Mobile Phone Number', 'phone', 'virtual'], )