# Final run for the Atlanta Writer's Conference Scheduling
*Becky Hodge*

#### Summary
We previously ran all the code to schedule participants, etc. However, there are always changes and drop-outs and additions to teh waitlist, etc. after we send the email one month prior to the conference telling people their schedules.

In this code, we will rerun a lot of the prior code, but specifically ONLY for participants who had any changes made.

## 1. Data Cleaning

#### Load and clean the different files/reports

In [855]:
# Install any needed packages
import pandas as pd
import numpy as np
import datetime
import os 

today = datetime.datetime.today().strftime('%Y-%m-%d')

# Set the conference dates
date_str_fri = '2025-05-02'
date_str_sat = '2025-05-03'


In [856]:
# Select the file with the most recent date
directory = 'May2025_reports'

most_recent_file = max(
    (f for f in os.listdir(directory) if f.startswith('Registered_') and f.endswith('.csv')),
    key=lambda x: datetime.datetime.strptime(x.split('_')[1].split('.')[0], '%m-%d-%y'),
)

# Load the most recent file
most_recent_path = os.path.join(directory, most_recent_file)
registered = pd.read_csv(most_recent_path)


In [857]:
most_recent_file = max(
    (f for f in os.listdir(directory) if f.startswith('Waitlists_') and f.endswith('.csv')),
    key=lambda x: datetime.datetime.strptime(x.split('_')[1].split('.')[0], '%m-%d-%y'),
)

# Load the most recent file
most_recent_path = os.path.join(directory, most_recent_file)
waitlist = pd.read_csv(most_recent_path)

The below code brings in ALL participants, which is key for knowing whether any waitlist only people are virtual or in person.

In [858]:
most_recent_file = max(
    (f for f in os.listdir(directory) if f.startswith('Allparticipants_') and f.endswith('.csv')),
    key=lambda x: datetime.datetime.strptime(x.split('_')[1].split('.')[0], '%m-%d-%y'),
)

# Load the most recent file
most_recent_path = os.path.join(directory, most_recent_file)
all_participants = pd.read_csv(most_recent_path)

In [859]:
all_participants = all_participants.rename(columns={'Email Address':'Email'})

In [860]:
# Filter this dataset to just virtual people
virtual_only = all_participants.loc[all_participants['Hotel vs. Zoom'] == 'Virtually via Zoom (only available for query letter critiques, manuscript sample critiques, and pitches)', :]

In [861]:
del(directory, most_recent_file, most_recent_path)

fict_gen = pd.read_excel('List_of_genres_agents_editors.xlsx', sheet_name='fiction')
nonfict_gen = pd.read_excel('List_of_genres_agents_editors.xlsx', sheet_name='nonfiction')
pubs = pd.read_excel('List_of_genres_agents_editors.xlsx', sheet_name='agents_editors')

Oh gosh, some of the column names are hefty...  Let's fix those.

In [862]:
registered = registered.rename(columns={'Hotel vs. Zoom':'Virtual', 
                                        "What fiction genre(s) will you be presenting to agents/editors at the conference? (If you're not signing up for any agent/editor meetings, indicate which genre(s) you write.)":'Fiction genre', 
                                        "What nonfiction topic(s) will you be presenting to agents/editors at the conference? (If you're not signing up for any agent/editor meetings, indicate which topic(s) you write.)":'Nonfiction genre', 
                                        'Registration Date (GMT)':'Registration Date',
                                        'Email Address':'Email'})

In [863]:
waitlist = waitlist.rename(columns={'Registration Date (GMT)':'Registration Date',
                                     'Email Address':'Email'})

Let's also fix so that we drop the 'Not applicable --I don't write fiction' and 'Not applicable--I don't write nonfiction'. We'll set them to missing.

In [864]:
registered['Fiction genre']= registered['Fiction genre'].replace("Not Applicable --I don't write fiction", np.nan)
registered['Nonfiction genre']= registered['Nonfiction genre'].replace("Not Applicable--I don't write nonfiction", np.nan)

Also, there's people who wrote in 'Other',  but for our purposes, we don't care about that info for the purposes of matching to agents/editors. Let's remove those.

In [865]:
import re
import numpy as np
def clean_genres(genre_string):
    if genre_string is None or pd.isna(genre_string) or "":
        return ""

    genres = [genre.strip() for genre in genre_string.split(',')]
    cleaned_genres = [genre for genre in genres if not re.match(r"^Other \(please specify\):", genre)]

    return ", ".join(cleaned_genres)

registered['Fiction genre'] = registered['Fiction genre'].apply(clean_genres)
registered['Nonfiction genre'] = registered['Nonfiction genre'].apply(clean_genres)


Lastly, let's replace a few of the ones that have ' in them, which make things tricky

In [866]:
registered['Fiction genre'] = registered['Fiction genre'].str.replace("Women’s", "Women's")
registered['Fiction genre'] = registered['Fiction genre'].str.replace("Children’s picture/chapter books", "Children's picture/chapter books")

registered['Nonfiction genre'] = registered['Nonfiction genre'].str.replace("Women’s issues", "Women's issues")


##### Fix date-times and emails

We need to change the registration date to a date_time variable

In [867]:
registered["datetime"] = pd.to_datetime(registered["Registration Date"])
waitlist["datetime"] = pd.to_datetime(waitlist["Registration Date"])

Let's check to see if every Email Address is associated with a unique first and last name, since ideally we just use email as our unique identifier. It's possible spouses use the same email.

In [868]:
len(registered['Email'].unique())

162

In [869]:
check = registered[['Email', 'First Name']].value_counts().reset_index()
len(check['Email'].unique())

162

Perfect. The number of unique emails match, whether we just look at email, or if we also look at email and first name. Moving forward, we can use email address as a unique identifier.

In [870]:
del(check)

##### Fix phone numbers

Check for any phone numbers (in both the waitlist and registered files) that aren't just 10 digits

In [871]:
phonecheck_wait = waitlist.loc[waitlist['Mobile Phone Number'].str.contains("-")]
print(phonecheck_wait['Mobile Phone Number'])

waitlist['phone'] = waitlist['Mobile Phone Number'].str.replace(r'^(?:\(\+\d+\))|\D', '', regex=True)
print("After fixing waitlist phone numbers, there are now", len(waitlist.loc[waitlist['phone'].str.contains("-")]), "phones with dashes or parentheses")

25     678-708-3046
105    801-390-4595
106    801-390-4595
Name: Mobile Phone Number, dtype: object
After fixing waitlist phone numbers, there are now 0 phones with dashes or parentheses


In [872]:
phonecheck_reg = registered.loc[registered['Mobile Phone Number'].str.contains("-")]
print(phonecheck_reg['Mobile Phone Number'])

registered['phone'] = registered['Mobile Phone Number'].str.replace(r'^(?:\(\+\d+\))|\D', '', regex=True)
print("After fixing registered phone numbers, there are now", len(registered.loc[registered['phone'].str.contains("-")]), "phones with dashes or parentheses")

0         404-941-0572
4         828-279-6154
35        404-429-4890
95        828-279-6154
97        801-390-4595
             ...      
1757      404-941-0572
1769      201-906-3051
1774      678-708-3046
1791      410-746-0590
1815    (517) 944-2233
Name: Mobile Phone Number, Length: 116, dtype: object
After fixing registered phone numbers, there are now 0 phones with dashes or parentheses


Now let's check that all phone numbers are ten digits

In [873]:
phonecheck_reg = registered.loc[registered['phone'].str.len()>10]
print(phonecheck_reg[['Mobile Phone Number', 'phone']])

     Mobile Phone Number         phone
41       '+49 1704174774  491704174774
184      '+49 1704174774  491704174774
349      '+49 1704174774  491704174774
490      '+49 1704174774  491704174774
648      '+49 1704174774  491704174774
778      '+49 1704174774  491704174774
819      '+49 1704174774  491704174774
868      '+49 1704174774  491704174774
946      '+49 1704174774  491704174774
1071     '+49 1704174774  491704174774
1146     '+49 1704174774  491704174774
1279     '+49 1704174774  491704174774
1292     '+49 1704174774  491704174774
1486     '+49 1704174774  491704174774
1645     '+49 1704174774  491704174774
1704     '+49 1704174774  491704174774


In [874]:
phonecheck_wait = waitlist.loc[waitlist['phone'].str.len()>10]
print(phonecheck_wait[['Mobile Phone Number', 'phone']])

Empty DataFrame
Columns: [Mobile Phone Number, phone]
Index: []


Let's fix both these datasets, so anyone with an international number gets their phone reset to missing (though we'll keep the original Mobile Phone Number column intact)

In [875]:
registered.loc[registered['phone'].str.len()>10, 'phone'] = None
registered['phone'].head()

waitlist.loc[waitlist['phone'].str.len()>10, 'phone'] = None
waitlist['phone'].head()

print(registered['Mobile Phone Number'].isna().sum())
print(registered['phone'].isna().sum())

print(waitlist['Mobile Phone Number'].isna().sum())
print(waitlist['phone'].isna().sum())

0
16
0
0


Good! We didn't have any missing values to begin with, but we reset those 16 international numbers to missing for the phone column, but not the Mobile Phone Number column.

Let's move on to email addresses now, and check for any that are missing or problematic. First, we'll check if any are missing:

##### Fix emails

In [876]:
weird_emails = registered.loc[registered['Email'].isna(), ]

Yay! Everyone filled out an email. So now we just need to check that nobody put in faulty emails that will cause problems later:

In [877]:
weird_emails = registered.loc[registered['Email'].str.contains(r'^[\w\.-]+@[a-zA-Z\d-]+\.[a-zA-Z]{2,}$', regex=True)==False, ]
print(weird_emails['Email'])

138     jlary@alumni.iu.edu
340     jlary@alumni.iu.edu
506     jlary@alumni.iu.edu
611     jlary@alumni.iu.edu
955     jlary@alumni.iu.edu
1223    jlary@alumni.iu.edu
1280    jlary@alumni.iu.edu
1309    jlary@alumni.iu.edu
1628    jlary@alumni.iu.edu
1664    jlary@alumni.iu.edu
Name: Email, dtype: object


In [878]:
weird_emails = waitlist.loc[waitlist['Email'].str.contains(r'^[\w\.-]+@[a-zA-Z\d-]+\.[a-zA-Z]{2,}$', regex=True)==False, ]
print(weird_emails['Email'])

Series([], Name: Email, dtype: object)


Okay, the emails all look fine. That particular Alumni email isn't a problem, so emails are good to go.

##### Add in virtual variable to the waitlist dataset

In [879]:
waitlist['virtual'] = waitlist['Email'].apply(
    lambda email: 'Virtual' if email in virtual_only['Email'].values else 'In person'
)

#### Drop any unneeded variables

Let's drop any extraneous variables from the waitlist and registration datasets

In [880]:
waitlist.drop(columns=['Registration Date', 'Invitee Status', 'Action', 'Confirmation Number'],axis=1, inplace=True) # columns are 1, rows are 0

registered.drop(columns=['Agenda Item Type', 'Registration Date', 'Registration Type', 'Action'],axis=1, inplace=True) # columns are 1, rows are 0

In [881]:
del(weird_emails, phonecheck_reg, phonecheck_wait)

In [882]:
today = datetime.date.today().strftime('%Y-%m-%d') # Let's save today's date for when writing excel files

#### Bring in timekeeper information

In [883]:
# Load the time keepers
timekeepers = pd.read_excel('List_of_genres_agents_editors.xlsx', sheet_name='timekeepers')

#### Create lists with all time-by-room values
We need to pull in the start times for each of the time slots for Friday afternoon (query letter critiques), Saturday morning (manuscript critiques), and Saturday afternoon (pitches) sessions. Without worrying about who our timekeepers are, or which agents are assigned to those rooms, we'll create 3 lists with the times-by-room.

In [884]:
room_fr = pd.read_excel('List_of_genres_agents_editors.xlsx', sheet_name='rooms_friday')
room_sat = pd.read_excel('List_of_genres_agents_editors.xlsx', sheet_name='rooms_sat')
timeslots = pd.read_excel('List_of_genres_agents_editors.xlsx', sheet_name='timeslots')

In [885]:
rooms_friday = room_fr.loc[:, 'day':'room_name']
rooms_saturday = room_sat.loc[:, 'day':'room_name']

Now let's combine the timeslots dataset with the friday and saturday rooms datasets to get the lists we need

In [886]:
tslist_fri = pd.merge(timeslots.loc[(timeslots['day']=='Friday') & (timeslots['day_session']=='Afternoon'), :], rooms_friday, how='outer', on='day')

In [887]:
rooms_coach = pd.read_excel('List_of_genres_agents_editors.xlsx', sheet_name='coaches')

In [888]:
tslist_coach = pd.merge(timeslots.loc[(timeslots['day']=='Friday') & (timeslots['day_session']=='Coaching'), :], rooms_coach, how='outer', on='day')

In [889]:
# Load the agent-editor pairs for Friday and their rooms
final_room_pairings_Friday = pd.read_excel("Outputs/Finalized datasets/Editor-agent pairings for Friday_2025-01-24.xlsx")

## 2. Friday Query Letter Critiques

### Assign CHANGED participants timeslots/publisher pairings for Friday

Previously, we ran the code to schedule all participants. Now, however, we want to run update that so we drop anyone who is NO LONGER scheduled for that event (e.g., drop outs), and to then schedule anyone new. We'll do this for Friday first

#### Create a dataset with a single row per participant and all their relevant activities

Let's first identify every participant who registered for a query letter critique on Friday.

In [890]:
query_critique_names = registered.loc[registered['Agenda Item Name'].str.contains('Query Letter Critique'), :]

Let's get a count of each email in this list, so we know the number of query letter critiques each person signed up for. Then we'll delete the original query_critique_names dataset.

In [891]:
queries = query_critique_names['Email'].value_counts().reset_index()
del(query_critique_names)

Importantly, for Friday's assignments, we can't assign people to agents/editors they're seeing on Saturday for a pitch or manuscript critique. In order to account for this, we also need to create datasets for the manuscript and pitches, so we can combine all three datasets later. 

Our goal is to create a single row per participant that lists any agents/editors they chose on Saturday, and to have know how many query letter critiques those people want.

In [892]:
pitches = registered.loc[registered['Agenda Item Name'].str.contains('Pitch'), :]

We need to extract out the publisher name from the Agenda Item Name column

In [893]:
import re
pitches['pubname'] = pitches['Agenda Item Name'].str.replace("Pitch [A-Z] with ", "", regex=True)
pitch = pitches[['Email', 'pubname']].value_counts().reset_index()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [894]:
print(pitch.loc[pitch['count']>1, :])

Empty DataFrame
Columns: [Email, pubname, count]
Index: []


<font color='red'>**NOTE:**</font> In an ideal world, there should be nobody printed above. Everyone should have a count of 1, since they can't meet with and pitch the same publisher multiple times for the same book. However, very, very rarely, someone will want to meet with a publisher twice to pitch them *different* books, so there can be counts of two or more.

We always want to confirm this with the participants though, to confirm that the double booking was intentional and not a registration error.

<font color='blue'>**UPDATE AFTER SPEAKING WITH GEORGE:**</font> This person (tk@tkread.com) DOES want 2 pitches with the same person. She has two different manuscripts to pitch.

Now that we've checked that, please note that people can sign up for up to three pitches (typically with 3 different agents/editors). We now need to create a combined variable per registrant that has ALL their pitch agents/editors.

In [895]:
pitchA = pitches.loc[pitches['Agenda Item Name'].str.contains("Pitch A with "), ['Email', 'pubname']]
pitchB = pitches.loc[pitches['Agenda Item Name'].str.contains("Pitch B with "), ['Email', 'pubname']]
pitchC = pitches.loc[pitches['Agenda Item Name'].str.contains("Pitch C with "), ['Email', 'pubname']]

pitchA = pitchA.rename(columns={'pubname':'pitchA'})
pitchB = pitchB.rename(columns={'pubname':'pitchB'})
pitchC = pitchC.rename(columns={'pubname':'pitchC'})

In [896]:
pitch2 = pd.merge(pd.merge(pitchA, pitchB, how='outer', on='Email'), pitchC, how='outer', on='Email')
pitch2.info()
# Reset anybody with the same values - in our case, we don't want this to happen, since the duplicated person is intentional
#pitch2.loc[(pitch2['pitchA'] == pitch2['pitchB']), 'pitchB'] = np.nan
#pitch2.loc[(pitch2['pitchA'] == pitch2['pitchC']), 'pitchC'] = np.nan
#pitch2.loc[(pitch2['pitchB'] == pitch2['pitchC']), 'pitchC'] = np.nan
#pitch2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105 entries, 0 to 104
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Email   105 non-null    object
 1   pitchA  73 non-null     object
 2   pitchB  69 non-null     object
 3   pitchC  58 non-null     object
dtypes: object(4)
memory usage: 3.4+ KB


Now let's create a combined variable of everyone's chosen publishers for their pitch.

In [897]:
def combine_variables(row):
    return ', '.join(str(x) for x in row.dropna()) #convert to strings, drop Nas, and join.

pitch2['pitches_chosen_pubs'] = pitch2[['pitchA', 'pitchB', 'pitchC']].apply(combine_variables, axis=1)

Great! Now let's repeat this process for manuscript critiques.

In [898]:
ms = registered.loc[registered['Agenda Item Name'].str.contains('Manuscript'), :]

In [899]:
ms['pubname'] = ms['Agenda Item Name'].str.replace("Manuscript Critique [A-Z] with ", "", regex=True)
manuscript = ms[['Email', 'pubname']].value_counts().reset_index()
len(ms['Email'].unique()) == len(manuscript['Email'].unique())
len(ms['Email'].unique())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


99

Cool. Nobody signed up for duplicate manuscript critiques.

In [900]:
msA = ms.loc[ms['Agenda Item Name'].str.contains("Manuscript Critique A with "), ['Email', 'pubname']]
msB = ms.loc[ms['Agenda Item Name'].str.contains("Manuscript Critique B with "), ['Email', 'pubname']]
msC = ms.loc[ms['Agenda Item Name'].str.contains("Manuscript Critique C with "), ['Email', 'pubname']]

msA = msA.rename(columns={'pubname':'msA'})
msB = msB.rename(columns={'pubname':'msB'})
msC = msC.rename(columns={'pubname':'msC'})

In [901]:
manuscript = pd.merge(pd.merge(msA, msB, how='outer', on='Email'), msC, how='outer', on='Email')


In [902]:
manuscript['ms_chosen_pubs'] = manuscript[['msA', 'msB', 'msC']].apply(combine_variables, axis=1)

In [903]:
del(pitch, pitches)
queries = queries.rename(columns={'count': 'num_query_critiques'})

Woohoo! Okay, now it's time to merge the pitch and the manuscript info, and then link it back to the query critiques as well, so we have the full list of participants with all of their chosen editors, and whether or not they have any query letter critiques.

In [904]:
merge1 = pd.merge(manuscript, pitch2, how='outer', on='Email')[['Email', 'pitchA', 'pitchB', 'pitchC', 'msA', 'msB', 'msC', 'pitches_chosen_pubs', 'ms_chosen_pubs']]
email_set = set(queries['Email'].dropna())
merge2 = pd.merge(merge1, queries, how='outer', on='Email')
merge2['query_critique'] = merge2['Email'].apply(lambda email: email in email_set if pd.notna(email) else False)

In [905]:
del(merge1, pitch2, manuscript, queries, room_fr, room_sat, email_set)

Perrrfect. Last step is to create a dataset with one row per email, which has their fiction and non-fiction genres, as well as if they're virtual or remote. We'll then join this to our dataset above.

In [906]:
per_registrant = registered.drop_duplicates(subset='Email', keep='first')[['Email', 'Virtual', 'Fiction genre', 'Nonfiction genre', 'datetime']]
per_registrant['Virtual'] = per_registrant['Virtual'].replace(['Virtually via Zoom (only available for query letter critiques, manuscript sample critiques, and pitches)', 'In person at the conference hotel'],
                                                              ['Virtual', 'In person'])

In [907]:
print(per_registrant['Virtual'].value_counts())

Virtual
In person                                                      142
Virtual                                                         18
Only doing the pre-conference edit (which will be by email)      1
Name: count, dtype: int64


Now we'll rename the Email Address to Email, and then we'll merge the dataframes to get one big one with all participants who registered for any of the three main activities: query letter critiques, manuscripts, or pitches.

In [908]:
per_registrant2 = pd.merge(per_registrant, merge2, how='outer', on='Email')

#### Create 3 different datasets: one per query letter critiques, MS critiques, and pitches
Before doing any scheduling, we need to create 3 different datasets for these three different activities, so we can easily schedule them below in their respective sections.

In [909]:
ms_critiques = per_registrant2.loc[pd.notna(per_registrant2['ms_chosen_pubs']), ['Email', 'Virtual', 'ms_chosen_pubs', 'msA', 'msB', 'msC']]

In [910]:
pitches = per_registrant2.loc[pd.notna(per_registrant2['pitches_chosen_pubs']), ['Email', 'Virtual', 'pitches_chosen_pubs', 'pitchA', 'pitchB', 'pitchC']]

In [911]:
query_critiques = per_registrant2.loc[per_registrant2['query_critique']==True, :]

In [912]:
del(per_registrant,merge2)

Lastly, before we finalize this dataset, we also need to account for people who signed up for the Friday workshop from 4-6pm. These people need to be assigned query letter critiques prior to 4pm.

In [913]:
fri_workshop = registered[registered['Agenda Item Name']=='Friday Workshop- Writer Beware: How Writers Can Protect Themselves']

In [914]:
query_critiques['Friday_workshop'] = query_critiques['Email'].isin(fri_workshop['Email'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In this code below, we're going to bring in everyone we previously assigned to timeslots and check to see for the following 3 scenarios:

1) Someone dropped out of 1-2 query letter critiques (either dropped completely, or just dropped from 2 to 1)
2) Somebody signed up for 1-2 query letter critiques that hadn't had any before
3) Somebody signed up for an additional query letter critique (changed from 1 to 2)

In [923]:
# Bring in the prior assignments
prior_query = pd.read_excel("Outputs/Finalized datasets/Expanded_query_critiques-ds prior to assignment_2025-01-24.xlsx")
prior_friday_queries = pd.read_excel("Outputs/Finalized datasets/Final_Friday_query_letter_critique_assignments_2025-01-24.xlsx")

# Get the list of unique participants doing query letters from last time, and check who this has changed for:
prior_unique_query = prior_query.groupby('Email', as_index=False).size()

# Rename the 'size' column to 'num_queries'
prior_unique_query.rename(columns={'size': 'num_query_critiques'}, inplace=True)

droppedout = prior_unique_query.merge(
    query_critiques,
    on=['Email', 'num_query_critiques'],
    how='left',  # Keep all rows from `prior_unique_query`
    indicator=True
)

# Keep rows that are only in `prior_unique_query`
droppedout = droppedout[droppedout['_merge'] == 'left_only'].drop(columns=['_merge'])

newpeeps = prior_unique_query.merge(
    query_critiques,
    on=['Email', 'num_query_critiques'],
    how='right',  # Keep all rows from `prior_unique_query`
    indicator=True
)

# Keep rows that are only in `prior_unique_query`
newpeeps = newpeeps[newpeeps['_merge'] == 'right_only'].drop(columns=['_merge'])

Awesome! For anyone that dropped out, let's drop them from the prior list. <font color='red'>** NOTE: YOU WILL NEED TO DOUBLE CHECK THIS- IF SOMEBODY JUST DROPPED FROM ONE, BUT STILL HAS ONE LEFT, THE BELOW CODE WILL DROP THEM FROM BOTH...**

In [924]:
prior_friday_queries = prior_friday_queries[~prior_friday_queries['Email'].isin(droppedout['Email'])]

The below will adjust the query manuscripts dataset so that there's as many rows per person as there are counts for them. This way, anyone who signed up for two query critiques will have two rows.

In [925]:
def expand_dataframe(df, id_col, count_col):
    rows = []
    for _, row in df.iterrows():
        count = int(row[count_col])  # Convert float count to integer
        for _ in range(count):
            rows.append(row.drop(count_col).to_dict())
    return pd.DataFrame(rows)

expanded_query_critiques = expand_dataframe(newpeeps, 'Email', 'num_query_critiques')


#### Assign participants to their Friday timeslots

Note that in the prior sections of code, we created:

1) The rooms for Friday with the agents/editors assigned to them (final_room_pairings_Friday)
2) The list of all time slots and rooms for Friday (tslist_fri)
3) The list of all participants who signed up for a query letter critique (expanded_query_critiques), which has multiple rows per person - one for the number of queries they signed up for

Before moving on though, let's join #1 and #2.

In [919]:
times_friday = tslist_fri.merge(final_room_pairings_Friday, on=['day', 'room_name'], how='outer').sort_values(['timeslot_start', 'room_name'])

In [920]:
# Bring in the final_cross_pubs dataset too
final_cross_pubs = pd.read_excel("Outputs/Finalized datasets/publisher_pair_rankings_2025-01-24.xlsx")

Let's also bring in the information on what combined genres those publisher pairings represent

In [921]:
times_friday2 = times_friday.merge(final_cross_pubs[['pubname1', 'pubname2', 'combined_fiction', 'combined_nonfiction']], on=['pubname1', 'pubname2'], how='inner')

In [926]:
# First, we need to create lists for anything separated by a comma
times_friday2['combined_nonfiction'] = times_friday2['combined_nonfiction'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])
times_friday2['combined_fiction'] = times_friday2['combined_fiction'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])

newpeeps['nonfiction_genre'] = newpeeps['Nonfiction genre'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])
newpeeps['fiction_genre'] = newpeeps['Fiction genre'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])
newpeeps['pitches_chosen_pubs'] = newpeeps['pitches_chosen_pubs'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])
newpeeps['ms_chosen_pubs'] = newpeeps['ms_chosen_pubs'].apply(lambda x: [genre.strip() for genre in x.split(',')] if isinstance(x, str) else [genre.strip() for genre in x] if isinstance(x, list) else [])


In [927]:
# Combine the publishers into a single list
newpeeps['chosen_pubs'] = newpeeps['pitches_chosen_pubs'] + newpeeps['ms_chosen_pubs']

In [928]:
# Convert the 'timeslot_start' to datetime variable
times_friday2['timeslot_start'] = pd.to_datetime(date_str_fri + ' ' + times_friday2['timeslot_start'].astype(str))

# Adjust the times to represent the afternoon (add 12 hours if in AM range)
times_friday2['timeslot_start'] = times_friday2['timeslot_start'].apply(
    lambda x: x + pd.Timedelta(hours=12) if x.hour < 12 else x
)

from datetime import timedelta

Now, we're ONLY going to run the code for the newpeeps dataset. However, first we need to drop any previously assigned times and editor-agent pairs from the times_friday2, so we only have the NEW slots that need assignment.

In [929]:
# Keep only times that were NOT previously assigned and that still need participants assigned to them.

merged = times_friday2.merge(
    prior_friday_queries[['timeslot_start', 'room_name']],
    on=['timeslot_start', 'room_name'],
    how='left',  # Keep all rows from `times_friday2`
    indicator=True  # Add an indicator column showing the source of each row
)

# Keep only rows that are not in prior_friday_queries
new_times_friday2 = merged[merged['_merge'] == 'left_only'].drop(columns=['_merge'])
del(merged)


For assignment purposes, we're going to prioritize people according to the following:
1) Virtual
2) How many publishers they signed up with for manuscript critiques and/or pitches
3) Friday workshop attendees
4) Registration date 

In [930]:
# Sort participants by prioritization criteria
newpeeps['chosen_pubs_count'] = newpeeps['chosen_pubs'].apply(len)
newpeeps.sort_values(
    by=['Virtual', 'chosen_pubs_count', 'Friday_workshop', 'datetime'],
    ascending=[False, False, True, True],
    inplace=True
)

In [931]:
# Initialize an assignment dictionary
assignments = []

# Create a new dataset with all the people:
participants_df = newpeeps.copy()
slots_df = new_times_friday2

# Helper function to check slot compatibility
def is_slot_compatible(participant, slot):
    # Check if at least one genre matches
    participant_fiction_genres = set(participant['fiction_genre'])
    slot_fiction_genres = set(slot['combined_fiction'])

    participant_nonfiction_genres = set(participant['nonfiction_genre'])
    slot_nonfiction_genres = set(slot['combined_nonfiction'])

    if not (participant_fiction_genres & slot_fiction_genres or participant_nonfiction_genres & slot_nonfiction_genres):
        print(f"Incompatible due to genres.") 
        print(f"Participant fiction genres: {participant_fiction_genres}")
        print(f"Publisher fiction genres: {slot_fiction_genres}")
        print(f"Participant nonfiction genres: {participant_nonfiction_genres}") 
        print(f"Publisher nonfiction genres: {slot_nonfiction_genres}")
        return False

    # Check publisher overlap
    participant_pubs = set(participant['chosen_pubs'])
    slot_pubs = {slot['pubname1'], slot['pubname2']}
    if participant_pubs.intersection(slot_pubs):
        print(f"Incompatible due to publisher overlap. Participant publishers: {participant_pubs}, Slot publishers: {slot_pubs}")
        return False

    return True

# Iterate through slots and assign participants
for _, slot in slots_df.iterrows():
    if participants_df.empty:
        break  # Exit if no participants are left to assign

    for index, participant in participants_df.iterrows():
        # Skip participants already assigned to conflicting slots
        assigned_slots = [a['timeslot_start'] for a in assignments if a['Email'] == participant['Email']]
        if any(abs(slot['timeslot_start'] - assigned) <= timedelta(minutes=15) for assigned in assigned_slots):
            print(f"Participant {participant['Email']} skipped due to conflicting time slot.")
            continue

        # Check slot compatibility
        if is_slot_compatible(participant, slot):
            assignments.append({
                'Email': participant['Email'],
                'timeslot_start': slot['timeslot_start'],
                'room_name': slot['room_name'],
                'pubname1': slot['pubname1'],
                'pubname2': slot['pubname2'],
                'participant_fiction_genre': ', '.join(participant['fiction_genre']),
                'participant_nonfiction_genre': ', '.join(participant['nonfiction_genre']),
                'publisher_fiction_genre': slot['combined_fiction'],
                'publisher_nonfiction_genre': slot['combined_nonfiction'],
                'workshop': participant['Friday_workshop'],
                'virtual':participant['Virtual']
            })

            print(f"Assigned {participant['Email']} to {slot['timeslot_start']} in room {slot['room_name']}")

            # Remove the assigned participant row
            participants_df.drop(index, inplace=True)
            break
        else:
                print(f"Participant {participant['Email']} not compatible with slot {slot['timeslot_start']} in room {slot['room_name']}")


# Convert assignments to a DataFrame
assignments_df = pd.DataFrame(assignments)

Incompatible due to genres.
Participant fiction genres: {"Children's picture/chapter books"}
Publisher fiction genres: {"Women's", 'Humor', 'LGBTQ+', 'Magical realism', 'Science fiction', 'Mainstream/commercial', 'Romance', 'Southern', 'Thriller', 'Young adult', 'Historical', 'Coming-of-age', 'Horror/Supernatural', 'Fantasy', 'Contemporary', 'Speculative fiction/myths & fairy tales', 'Middle grade', 'Upmarket commercial/book club'}
Participant nonfiction genres: {'Spiritual/inspirational/religious'}
Publisher nonfiction genres: set()
Participant mogboz@gmail.com not compatible with slot 2025-05-02 15:30:00 in room Board Room VI
Assigned mogboz@gmail.com to 2025-05-02 15:30:00 in room Fayetteville


In [933]:
print(len(participants_df))
print(len(expanded_query_critiques)== len(assignments_df))

0
False


<font color='red'>**NOTE**: THERE WILL ALMOST DEFINITELY BE SOME PEOPLE WHO COULDN'T BE ASSIGNED. YOU WILL NEED TO MANUALLY CORRECT STUFF.

In [761]:
# Flag participants with a Friday workshop and slots from 4-5pm

assignments_df['Flag'] = assignments_df.apply(
    lambda x: (x['workshop']) and (x['timeslot_start'].hour >= 16),  # After 4pm check
    axis=1
)

In [762]:
print(assignments_df['Flag'].value_counts())

Flag
False    1
True     1
Name: count, dtype: int64


Okay, awesome! We're good to go, and now let's just print out the csv file with all the assignments, and also save the dataset as a final (better named) dataset.

In [763]:
final_friday_assignments = assignments_df

del(assignments_df, participants_df, slots_df, is_slot_compatible, expand_dataframe, clean_genres, assigned_slots, assignments)

In [770]:
# Merge back in with the original dataset:
all_friday_assignments = pd.merge(final_friday_assignments, prior_friday_queries, how='outer', on=['Email', 'timeslot_start', 'room_name', 'pubname1', 'pubname2', 'participant_fiction_genre', 'participant_nonfiction_genre', 'workshop', 'virtual', 'Flag'])

Lastly, create a column called 'publisher' that is a merging of the two publishers names, and also add a variable 'Session' that says 'Query Letter Critiques'. Oh, and add in a variable for 'Timekeeper', which is True/False depending on if the participant is a timekeeper that day or not.

In [771]:
all_friday_assignments['publisher'] = all_friday_assignments['pubname1'] + " and " + all_friday_assignments['pubname2']
all_friday_assignments['Session'] = "Query Letter Critiques"
all_friday_assignments['Timekeeper'] = all_friday_assignments['Email'].isin(timekeepers['Email'])


Lastly, let's link in the first and last names,as well as phone numbers.

In [772]:
final_friday_assignments2 = pd.merge(all_friday_assignments, registered[['Email', 'First Name', 'Last Name', 'phone']].drop_duplicates(), on="Email", how="inner")

In [775]:
final_friday_assignments2.to_excel("Outputs/Finalized datasets/FINAL_Friday_QLC_assignments.xlsx", index=False)

<font color='red'>**NOTE**</font> Double check if any of the new people are workshop people that were assigned after 4pm. You may need to manually change a few people's information in the excel and then reload it.

In [None]:
final_friday_assignments2 = pd.read_excel("Outputs/Finalized datasets/FINAL_Friday_QLC_assignments.xlsx")

In [776]:
del(final_friday_assignments, droppedout, newpeeps, expanded_query_critiques, index)

## 3. Friday Author Coaching - <font color='red'>**FIX FROM HERE ON**</font>

In [777]:
reg_coaching = registered.loc[registered['Agenda Item Name'].str.contains("Coach"), :]
reg_coaching['Friday_workshop'] = reg_coaching['Email'].isin(fri_workshop['Email'])
reg_coaching['QLC'] = reg_coaching['Email'].isin(final_friday_assignments2['Email'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Okay, we need to update the timeslots thing so that they are datetimes.

In [778]:
# Convert the 'timeslot_start' to datetime variable
tslist_coach['timeslot_start'] = pd.to_datetime(date_str_fri + ' ' + tslist_coach['timeslot_start'].astype(str))

# Adjust the times to represent the afternoon (add 12 hours if in AM range)
tslist_coach['timeslot_start'] = tslist_coach['timeslot_start'].apply(
    lambda x: x + pd.Timedelta(hours=12) if x.hour < 12 else x
)

from datetime import timedelta


In [779]:
# We need to create an alternate version of the friday assignments dataset so that we can add buffer times for our purposes here:
publisher_meetings = final_friday_assignments2.copy()
publisher_meetings['timeslot_end'] = publisher_meetings['timeslot_start'] + timedelta(minutes=15)
publisher_meetings['buffer_start'] = publisher_meetings['timeslot_start'] - timedelta(minutes=15)
publisher_meetings['buffer_end'] = publisher_meetings['timeslot_end'] + timedelta(minutes=15)


In [780]:
# We also need to add end times for the coaching meetings. Note that they're technically 15 minutes, but we'll ignore that and pretend they're 17 since there's a two minute break between them
tslist_coach['timeslot_end'] = tslist_coach['timeslot_start'] + timedelta(minutes=17)

In [781]:
# Load the previous assignments
prior_coaching = pd.read_excel("Outputs/Finalized datasets/Finalized coaching schedule_2025-01-24.xlsx")

In [782]:
# Identify any new people and any dropouts
newpeeps = reg_coaching[~reg_coaching['Email'].isin(prior_coaching['Email'])]

In [112]:
coaching_timeslots = tslist_coach.copy()

# Assign coaching meetings
def assign_coaching_meetings():
    coaching_schedule = []

    for _, participant in reg_coaching.iterrows():
        Email = participant['Email']
        selected_coach = participant['Agenda Item Name']
        workshop_flag = participant['Friday_workshop']
        
        # Exclude slots that conflict with publisher meetings
        valid_slots = coaching_timeslots[~coaching_timeslots.apply(
            lambda slot: any(
                (publisher_meetings['buffer_start'] <= slot['timeslot_start']) &
                (slot['timeslot_start'] < publisher_meetings['buffer_end'])
            ), axis=1
        )]

        # Exclude slots after 4:00 PM for workshop participants
        if workshop_flag:
            valid_slots = valid_slots[valid_slots['timeslot_start'].dt.hour < 16]
        
        # Assign the first valid slot
        if not valid_slots.empty:
            chosen_slot = valid_slots.iloc[0]
            coaching_schedule.append({
                'Email': Email,
                'Session': selected_coach,
                'timeslot_start': chosen_slot['timeslot_start'],
                'publisher': chosen_slot['coach'],
                'room_name': chosen_slot['room_name']
            })

            # Remove the chosen slot to prevent double-booking
            coaching_timeslots.drop(valid_slots.index[0], inplace=True)
    
    return pd.DataFrame(coaching_schedule)

# Generate coaching schedule
coaching_schedule = assign_coaching_meetings()

In [113]:
del(coaching_timeslots, assign_coaching_meetings)

In [114]:
# Save this scheduling
coaching_schedule.to_excel(f"Outputs/Finalized datasets/Finalized coaching schedule_{today}.xlsx", index=False)

## 3. Saturday Morning - Manuscript critique scheduling
This is the easiest scheduling assignment. Everyone has already signed up for their critiques, so we just need to make sure:

1) Nobody is scheduled back-to-back (for anyone with multiple)
2) Timekeepers aren't first or last
3) Virtual participants are grouped back-to-back (we will prioritize them for the first time slots per room).

Ideally, we also try to ensure that there's only one substitute for each time slot, though we have plenty of substitute timekeepers. This can just be a manual check and fix later on.

In [115]:
# Assign the individual publishers to their respective rooms for Saturday morning (MS) and Saturday afternoon (Pitches)
times_sat = pd.concat([rooms_saturday.reset_index(drop=True), pubs.reset_index(drop=True)], axis=1)

tslist_satmorn= pd.merge(timeslots.loc[(timeslots['day']=='Saturday') & (timeslots['day_session']=='Morning'), :], times_sat, how='outer', on='day')
tslist_sataft= pd.merge(timeslots.loc[(timeslots['day']=='Saturday') & (timeslots['day_session']=='Afternoon'), :], times_sat, how='outer', on='day')

In [116]:
final_saturday_rooms = pd.concat([rooms_saturday.reset_index(drop=True), pubs.reset_index(drop=True)], axis=1)

We need to create a single dataset for the manuscript critiques where every person has a row for their critique (as in, a person can have up to three rows).

In [117]:
del(msA, msB, msC, pitchA, pitchB, pitchC) # delete these - had originally kept for this but need virtual info
msA = ms_critiques[['Email', 'Virtual', 'msA']]
msB = ms_critiques[['Email', 'Virtual', 'msB']]
msC = ms_critiques[['Email', 'Virtual', 'msC']]

In [118]:
msA = msA.rename(columns={'msA': 'publisher'})
msB = msB.rename(columns={'msB': 'publisher'})
msC = msC.rename(columns={'msC': 'publisher'})

ms_all = pd.merge(pd.merge(msA, msB, on=['Email', 'Virtual', 'publisher'], how="outer"), msC, on=['Email', 'Virtual', 'publisher'], how='outer')

Drop any rows with NaN

In [119]:
ms_all = ms_all.dropna()

In [120]:
del(msA, msB, msC)

Okay, now we just need to convert the timeslot_start to a timestamp variable

In [121]:
# Convert the 'timeslot_start' to datetime variable
tslist_satmorn['timeslot_start'] = pd.to_datetime(date_str_sat + ' ' + tslist_satmorn['timeslot_start'].astype(str))

In [122]:
# Convert the 'timeslot_start' to datetime variable
tslist_sataft['timeslot_start'] = pd.to_datetime(date_str_sat + ' ' + tslist_sataft['timeslot_start'].astype(str))

# Adjust the times to represent the afternoon (add 12 hours if in AM range)
tslist_sataft['timeslot_start'] = tslist_sataft['timeslot_start'].apply(
    lambda x: x + pd.Timedelta(hours=12) if x.hour < 12 else x
)

Let's also identify the timekeepers' emails. We'll make sure not to give them the first or last time slot.

In [123]:
timekeeps = timekeepers[['Email']].drop_duplicates()
timekeeps = timekeeps['Email'].tolist()

Whew! Okay, now it's time to assign the participants for the manuscript critiques. The code below works by:

1) It fills alphabetically by the publisher name, so that Alexandria Brown gets all her timeslots filled first, before moving on to the next publisher in the alphabet. **NOTE**: I have it randomly filling time slots. It's not running by earliest time to latest time.

2) It prioritizes assignment of participants according to how many manuscript critique slots they still need to be assigned. This means that for the first time slot it tries filling, it'll prioritize people with 3 critiques, then 2, then 1. As it continues to iterate and participants get assigned slots, a participant who initially had 3 meetings but who was already scheduled for 2 (meaning n_remaining=1) will get less priority over participants still with 2 or three meetings needing assignment.

3) I randomly shuffled the participants within their priority groups. This means that participant emails are randomly ordered in the A) three remaining group, B) two remaining and C) one remaining group. This way we don't prioritize people according to the alphabetical ordering of their emails but just do random assignments. (I had implemented this because I had noticed initially that a lot of the T-Z emails weren't being assigned as readily).

<font color='red'>**BIGGEST NOTE**:</font>
This entire code is embedded within one giant function because I'm having it run this code repeatedly using different random seeds, until it finds the seed that ensures that ALL participants get assigned time slots. Then it stops and that's the seed number that's kept.

In [124]:
import random

# Create copies of the datasets to use in the function, since we drop participants as we go
slots_df = tslist_satmorn
participants_df = ms_all.copy()

# Get the earliest and latest timeslots
earliest_time = slots_df['timeslot_start'].min()
latest_time = slots_df['timeslot_start'].max()

# Define a function to perform the assignment process
def assign_slots_with_seed(participants_df, slots_df, seed):

    # Add a column to flag timekeepers in the participants dataset
    participants_df['is_timekeeper'] = participants_df['Email'].isin(timekeeps)

    # Create the blank datasets and lists for the assignments and used-up slots
    assignments = []
    used_slots = set()

    # Add a column to track the number of meetings each participant needs
    participants_df['remaining_meetings'] = participants_df.groupby('Email')['Email'].transform('count')

    # Repeat until all participants are assigned or no more slots remain
    while not participants_df.empty:
        assigned_any = False

        for _, slot in slots_df.iterrows():
            room_slot_id = (slot['timeslot_start'], slot['room_name'])

            if room_slot_id in used_slots:
                continue

            if participants_df.empty:
                break

            sorted_participants = (
                participants_df
                .sample(frac=1, random_state=seed)  # Shuffle randomly
                .sort_values(by='remaining_meetings', ascending=False)
            )

            for index, participant in sorted_participants.iterrows():

                # Skip back-to-back assignments
                assigned_slots = [
                    (a['timeslot_start'], a['room_name']) for a in assignments if a['Email'] == participant['Email']
                ]
                if any(
                    abs(slot['timeslot_start'] - assigned_time) <= timedelta(minutes=15)
                    for assigned_time, _ in assigned_slots
                ):
                    continue

                
                # Skip the earliest and latest timeslots for timekeepers if possible
                if participant['is_timekeeper'] and slot['timeslot_start'] in [earliest_time, latest_time]:
                    # Check if there are other slots available for this participant
                    has_alternative = any(
                        set(participant['publisher']) == set(alt_slot['lit_guest_name']) and
                        alt_slot['timeslot_start'] not in [earliest_time, latest_time] and
                        (alt_slot['timeslot_start'], alt_slot['room_name']) not in used_slots
                        for _, alt_slot in slots_df.iterrows()
                    )
                    if not has_alternative:
                        continue

                if set(participant['publisher']) == set(slot['lit_guest_name']):
                    assignments.append({
                        'Email': participant['Email'],
                        'timeslot_start': slot['timeslot_start'],
                        'room_name': slot['room_name'],
                        'publisher': slot['lit_guest_name'],
                        'virtual': participant['Virtual'],
                        'Session': "Manuscript critique",
                        'Timekeeper': participant['is_timekeeper']
                    })
                    used_slots.add(room_slot_id)
                    participants_df.drop(index, inplace=True)
                    participants_df['remaining_meetings'] = participants_df.groupby('Email')['Email'].transform('count')
                    assigned_any = True
                    break

        if not assigned_any:
            break

    return assignments, participants_df

# Initialize variables
success = False
max_attempts = 1000  # Limit the number of attempts
seed = 0

while not success and seed < max_attempts:
    seed += 1
    print(f"Trying seed {seed}...")
    
    # Copy the original dataframes to avoid modifying them directly
    participants_copy = ms_all.copy()
    slots_copy = tslist_satmorn.copy()

    # Run the assignment process with the current seed
    assignments, remaining_participants = assign_slots_with_seed(participants_copy, slots_copy, seed)

    # Check if all participants were assigned
    if remaining_participants.empty:
        success = True
        print(f"Success! All participants assigned using seed {seed}.")
        break

if success:
    # Convert assignments to a DataFrame
    assignments_df = pd.DataFrame(assignments)
else:
    print("Failed to assign all participants within the maximum number of attempts.")


Trying seed 1...
Trying seed 2...
Success! All participants assigned using seed 2.


Yay! This code works beautifully!! Everyone's been assigned and now let's just do a little cleaning, then repeat the process for the Saturday afternoon pitches.

In [125]:
final_satmorn_assignments = assignments_df

Let's just add in the first and last names, plus phones.

In [126]:
final_satmorn_assignments2 = pd.merge(final_satmorn_assignments, registered[['Email', 'First Name', 'Last Name', 'phone']].drop_duplicates(), on='Email', how='inner')

In [127]:
del(assignments_df, ms_all, ms_critiques, participants_copy, slots_copy, slots_df, remaining_participants,
    assignments, earliest_time, latest_time, seed, success, assign_slots_with_seed, final_friday_assignments, final_satmorn_assignments)

In [128]:
#  Save the dataset
final_satmorn_assignments2.to_excel(f"Outputs/Finalized datasets/Final manuscript critique assignments_{today}.xlsx", index=False)

## 4. Saturday Afternoon - Pitches

Let's do the pitch assignments now! We'll do the exact same process, except using the saturday afternoon times and the pitch dataset.

In [129]:
pitchA = pitches[['Email', 'Virtual', 'pitchA']]
pitchB = pitches[['Email', 'Virtual', 'pitchB']]
pitchC = pitches[['Email', 'Virtual', 'pitchC']]

In [130]:
pitchA = pitchA.rename(columns={'pitchA': 'publisher'})
pitchB = pitchB.rename(columns={'pitchB': 'publisher'})
pitchC = pitchC.rename(columns={'pitchC': 'publisher'})

pitches_all = pd.merge(pd.merge(pitchA, pitchB, on=['Email', 'Virtual', 'publisher'], how="outer"), pitchC, on=['Email', 'Virtual', 'publisher'], how='outer')

In [131]:
pitches_all = pitches_all.dropna()
del(pitchA, pitchB, pitchC)

In [132]:
import random

# Get the earliest and latest timeslots
earliest_time = tslist_sataft['timeslot_start'].min()
latest_time = tslist_sataft['timeslot_start'].max()

# Define a function to perform the assignment process
def assign_slots_with_seed(participants_df, slots_df, seed):

    # Shuffle the timeslots within each publisher group using the seed
    shuffled_slots = (
        slots_df.groupby('lit_guest_name', group_keys=False)
        .apply(lambda group: group.sample(frac=1, random_state=seed))
    )

    # Add a column to flag timekeepers in the participants dataset
    participants_df['is_timekeeper'] = participants_df['Email'].isin(timekeeps)

    # Create the blank datasets and lists for the assignments and used-up slots
    assignments = []
    used_slots = set()

    # Add a column to track the number of meetings each participant needs
    participants_df['remaining_meetings'] = participants_df.groupby('Email')['Email'].transform('count')

    # Repeat until all participants are assigned or no more slots remain
    while not participants_df.empty:
        assigned_any = False

        for _, slot in shuffled_slots.iterrows():
            room_slot_id = (slot['timeslot_start'], slot['room_name'])

            if room_slot_id in used_slots:
                continue

            if participants_df.empty:
                break

            sorted_participants = (
                participants_df
                .sample(frac=1, random_state=seed)  # Shuffle randomly
                .sort_values(by='remaining_meetings', ascending=False)
            )

            for index, participant in sorted_participants.iterrows():

                # Skip back-to-back assignments
                assigned_slots = [
                    (a['timeslot_start'], a['room_name']) for a in assignments if a['Email'] == participant['Email']
                ]
                if any(
                    abs(slot['timeslot_start'] - assigned_time) <= timedelta(minutes=15)
                    for assigned_time, _ in assigned_slots
                ):
                    continue

                # Skip the earliest and latest timeslots for timekeepers if possible
                if participant['is_timekeeper'] and slot['timeslot_start'] in [earliest_time, latest_time]:
                    # Check if there are other slots available for this participant
                    if not any(
                        set(participant['publisher']) == set(alt_slot['lit_guest_name']) and
                        alt_slot['timeslot_start'] not in [earliest_time, latest_time] and
                        (alt_slot['timeslot_start'], alt_slot['room_name']) not in used_slots
                        for _, alt_slot in shuffled_slots.iterrows()
                    ):
                        print(f"Timekeeper {participant['Email']} has no alternative slot; assigning to edge slot.")
                    else:
                        continue

                if set(participant['publisher']) == set(slot['lit_guest_name']):
                    assignments.append({
                        'Email': participant['Email'],
                        'timeslot_start': slot['timeslot_start'],
                        'room_name': slot['room_name'],
                        'publisher': slot['lit_guest_name'],
                        'virtual': participant['Virtual'],
                        'Session': "Pitch",
                        'Timekeeper': participant['is_timekeeper']
                    })
                    used_slots.add(room_slot_id)
                    participants_df.drop(index, inplace=True)
                    participants_df['remaining_meetings'] = participants_df.groupby('Email')['Email'].transform('count')
                    assigned_any = True
                    break

        if not assigned_any:
            break

    return assignments, participants_df

# Initialize variables
success = False
max_attempts = 1000  # Limit the number of attempts
seed = 0

while not success and seed < max_attempts:
    seed += 1
    print(f"Trying seed {seed}...")
    
    # Copy the original dataframes to avoid modifying them directly
    participants_copy = pitches_all.copy()
    slots_copy = tslist_sataft.copy()

    # Run the assignment process with the current seed
    assignments, remaining_participants = assign_slots_with_seed(participants_copy, slots_copy, seed)

    # Check if all participants were assigned
    if remaining_participants.empty:
        success = True
        print(f"Success! All participants assigned using seed {seed}.")
        break

if success:
    # Convert assignments to a DataFrame
    assignments_df = pd.DataFrame(assignments)
else:
    print("Failed to assign all participants within the maximum number of attempts.")


Trying seed 1...




Trying seed 2...




Trying seed 3...




Trying seed 4...




Trying seed 5...




Trying seed 6...




Trying seed 7...




Trying seed 8...




Trying seed 9...




Success! All participants assigned using seed 9.


Yay! That worked great too. Let's just save it and delete any extraneous datasets.

In [133]:
final_sataft_assignment = assignments_df
final_sataft_assignments2 = pd.merge(final_sataft_assignment, registered[['Email', 'First Name', 'Last Name', 'phone']].drop_duplicates(), on='Email', how='inner')

del(pitches, pitches_all, assignments_df, remaining_participants, timeslots, tslist_sataft, tslist_satmorn, times_friday2, times_sat,
    slots_copy, assignments, earliest_time, latest_time, max_attempts, success, seed, assign_slots_with_seed, timedelta, participants_copy, final_sataft_assignment)

Woohoo! Now we're officially all done with the assignments, and we just need to deal withe waitlists now. FInal step after that will be to print out everything we've got into exactly the excel and word files we want.

In [134]:
# Save the dataset
final_sataft_assignments2.to_excel(f"Outputs/Finalized datasets/Finalized pitch assignments_{today}.xlsx", index=False)

## 5. Deal with the Waitlists

Dealing with the waitlists is pretty simple. We already corrected some of the basic stuff earlier, like emails and phones. Now let's split into what they're waitlisted for:
1) manuscript critiques
2) pitches
3) pre-conference edits
4) book fairs
5) query letter critiques

In [135]:
wait_ms = waitlist[waitlist['Session Name'].str.contains('Manuscript')]
wait_pitch = waitlist[waitlist['Session Name'].str.contains('Pitch')]
wait_prec = waitlist[waitlist['Session Name'].str.contains('Pre-conference')]

# May also need to do bookfair and query letter critique waitlists

Now let's change all the code so that instead of Manuscript A, B, C etc, it says 'Waitlisted'

In [136]:
wait_pitch['Session Name'] = wait_pitch['Session Name'].str.replace("Pitch [A-Z] with ", "Waitlisted - Pitch with ", regex=True)
wait_ms['Session Name'] = wait_ms['Session Name'].str.replace("Manuscript Critique [A-Z] with ", "Waitlisted - Manuscript Critique with ", regex=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


We need to double check that no participant has more than 3 manuscript critique waitlist spots.

In [137]:
print(wait_pitch['Email'].value_counts().unique())
print(wait_ms['Email'].value_counts().unique())

[3 2 1]
[3 2 1]


Good. As you can see above, nobody's got 4 or higher for how often their emails appear in these lists. Now let's sort by registration date for each Session Name, so that we assign a value of #1, #2, etc. by registration date for each Manuscript critique/pitch spot with each publisher.

In [138]:
# Sort by 'Session Name' and 'datetime', and rank participants
wait_ms['Waitlist_ms'] = wait_ms.sort_values(['Session Name', 'datetime']) \
               .groupby('Session Name')['datetime'] \
               .rank(method='first').astype(int)

# Sort DataFrame for display (optional)
wait_ms = wait_ms.sort_values(['Session Name', 'Waitlist_ms']).reset_index(drop=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [139]:
# Sort by 'Session Name' and 'datetime', and rank participants
wait_pitch['Waitlist_pitch'] = wait_pitch.sort_values(['Session Name', 'datetime']) \
               .groupby('Session Name')['datetime'] \
               .rank(method='first').astype(int)

# Sort DataFrame for display (optional)
wait_pitch = wait_pitch.sort_values(['Session Name', 'Waitlist_pitch']).reset_index(drop=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Okay, looks good. Now let's tweak it a little bit more so we create an 'Agenda Item Name' that is 'Waitlisted - #1 - Manuscript Critique with [publisher].'

In [140]:
wait_pitch['Agenda Item Name'] = wait_pitch.apply(
    lambda row: row['Session Name'].replace(
        "Waitlisted", f"Waitlist #{row['Waitlist_pitch']}"
    ) if "Waitlisted" in row['Session Name'] else row['Session Name'],
    axis=1
)

In [141]:
wait_ms['Agenda Item Name'] = wait_ms.apply(
    lambda row: row['Session Name'].replace(
        "Waitlisted", f"Waitlist #{row['Waitlist_ms']}"
    ) if "Waitlisted" in row['Session Name'] else row['Session Name'],
    axis=1
)

In [None]:
# For right now, let's just merge all the waitlist stuff back together, and add in the participant info so that it's all in one place.
wait_all = pd.merge(wait_ms, wait_pitch, how="outer")

# Let's extract the publisher
wait_all['publisher'] = wait_all['Session Name'].str.replace("Waitlisted - Manuscript Critique with ", "")
wait_all['publisher'] = wait_all['publisher'].str.replace("Waitlisted - Pitch with ", "")
wait_all = wait_all[['Email', 'First Name', 'Last Name', 'phone', 'Agenda Item Name', 'publisher']]

# print for George
wait_all.to_excel("Outputs/Finalized Datasets/Waitlist participants.xlsx", index=False)


## 6. Print a bunch of excel documents

We won't really do much with these particular excel documents, except to export them for manual review (and potentially manual changes).

In [166]:
final_room_pairings_Friday.to_excel(f"Outputs/Finalized Datasets/Editor-agent pairings for Friday_{today}.xlsx", index=False)

In [167]:
final_friday_assignments2.to_excel(f"Outputs/Finalized Datasets/Friday query letter critique assignments_{today}.xlsx", index=False)

In [168]:
final_sataft_assignments2.to_excel(f"Outputs/Finalized Datasets/Saturday pitch assignments_{today}.xlsx", index=False)

In [169]:
final_satmorn_assignments2.to_excel(f"Outputs/Finalized Datasets/Saturday manuscript critique assignments_{today}.xlsx", index=False)