# Daily Assignment Planning

## Introduction and Process Overview

In healthcare data management, efficient allocation of medical abstractors to various facilities is crucial for maintaining accurate and timely medical records. This notebook documents the systematic approach used to manage daily abstractor assignments across multiple healthcare facilities, ensuring optimal resource utilization while meeting specific skill requirements and time constraints.

### Process Context
The daily assignment planning process involves matching qualified medical abstractors with facility requests based on several key factors:

1. **Incoming Requests**
   - Requests are received through Salesforce
   - Each request specifies:
     - Target facility (Hospital X, Y, Z, etc.)
     - Required hours of work (n, m, o hours)
     - Specific skill requirements (XX, YY, ZZ)

2. **Resource Assessment**
   - Query Salesforce database for available abstractors
   - Filter abstractors based on required skills
   - Evaluate abstractor availability and current workload
   - Consider geographical and temporal constraints

3. **Matching Logic**
   - Align abstractor qualifications with facility requirements
   - Balance workload distribution
   - Optimize for efficiency and quality of service

### Purpose of This Notebook
This Jupyter notebook serves as both documentation and an operational tool, demonstrating the step-by-step process of:
- Processing incoming facility requests
- Analyzing abstractor availability and qualifications
- Generating and validating assignments

The following sections will detail each step of the process, including Python code implementations and data analysis techniques used to ensure optimal assignment distribution.

In [None]:
# first to import the necessary modules and create a connection to Salesforce
import pandas as pd
import numpy as np
import os
from datetime import datetime, timedelta
from simple_salesforce import Salesforce
# Import the module under a specific name
import importlib
import sf_queries_class
importlib.reload(sf_queries_class)
from sf_queries_class import SfQueries
import my_sf_secrets
import capacity_portion
importlib.reload(capacity_portion)
import create_capbase
importlib.reload(create_capbase)
my_sf_username, my_sf_password, my_sf_security_token = my_sf_secrets.get_my_sf_secrets()

# queries module that I created to retrieve data from Salesforce that I was consistently using
queries = SfQueries(
    username=my_sf_username,
    password=my_sf_password,
    security_token=my_sf_security_token
)
from reportforce import Reportforce
rf = Reportforce(session_id=queries.sf.session_id, instance_url=queries.sf.sf_instance)
# get the username from the system
username = os.getlogin()

# Demand Prep

Data was gathered from Salesforce to understand what the current needs of the business are. 

In [None]:
# query has been modified to give the general structure of the query, 
# but all of the actual column names removed
sf_query = """
SELECT 
-- ALL_Necessary_Columns

FROM 
    Table__c 
WHERE 
    Filters
"""

requests_fil = queries.convert_salesforce_data_to_df(queries.sf.query_all(query=sf_query))
requests_fil = requests_fil.rename(columns={
    'salesforce_name__c': 'needed_name'
})

# Calculate 'Age_Days'
requests_fil['Created Date'] = pd.to_datetime(requests_fil['Created Date']) 
requests_fil['Age_Days'] = (datetime.now().astimezone() - requests_fil['Created Date']).dt.days

# Min-Max scaling for to create a `normalized_days` column that can add a decimal value 
# to the priority total to create a composite score that can be used to differentiate between
# requests as older requests are given higher priority when the Priority Total is the same.
min_days = requests_fil['Age_Days'].min()
max_days = requests_fil['Age_Days'].max() + 2
requests_fil['normalized_days'] = (requests_fil['Age_Days'] - min_days) / (max_days - min_days)

# Calculate 'composite'
requests_fil['composite'] = requests_fil['Priority Total'] + requests_fil['normalized_days']

# Create 'In Implementation' column that tells us whether the request is a backfill for a project 
# that is already running, or if it is a new request for a new project.
requests_fil['In Implementation'] = requests_fil['Stage'].apply(lambda x: 1 if x in ['In Implementation', 'Delayed'] else 0)

# but all of the actual column names removed
requests_sel_1 = requests_fil.loc[:, ['Relevant_column_names',]].drop_duplicates()
categories = requests_sel_1['Category'].unique()
categories = [category for category in categories if category != 'Unecessary Category 1'] # category != 'Unecessary Category 2' and 
requests_sel = requests_sel_1.loc[requests_sel_1['Category'].isin(categories), :].copy()

Much of the work was in placing abstractors with the correct skills and capcity in the correct location, then having a template that I could easily use daily to create the assignments. This next block of code creates the template that I used to create the assignments.

In [None]:
def create_assignment_template(requests_df):
    """
    Creates a standardized template for processing assignment requests
    """
    # Create a copy of the input dataframe to avoid modifications to original data
    template_df = requests_df.copy()
    
    # Step 1: Process Category Classifications
    # Replace specific categories based on business rules
    # Comments explain what's happening without revealing specific logic
    template_df['Category'] = np.where(
        (template_df['Category'].str.contains('CATEGORY_A')) |
        (template_df['Category'].isnull()),
        template_df['Category'],
        'Other'
    )
    
    # Step 2: Additional category processing based on team assignments
    template_df['Category'] = np.where(
        (template_df['Team_Name'].str.contains('TEAM_A')) &
        (template_df['Category'] == 'Other'),
        'CATEGORY_A',
        template_df['Category']
    )
    
    # Step 3: Initialize new columns for assignment processing
    # Set default values for required fields
    template_df['Suggested_Resource'] = np.nan
    template_df['Resource_ID'] = np.nan
    template_df['Position_Type'] = 'DEFAULT_POSITION'
    template_df['Access_Requirements'] = 'STANDARD_ACCESS'
    template_df['Status_Update'] = np.nan
    template_df['Hours_Update_Required'] = False
    template_df['Updated_Hours'] = np.nan
    template_df['Assignment_Status'] = ""
    
    # Step 4: Select and organize relevant columns for processing
    columns_needed = [
        'Category', 'Team_Name', 'Request_ID', 
        'Qualification_Score', 'Status_Update',
        'Priority_Score', 'Requested_Hours',
        'Suggested_Resource', 'Resource_ID',
        'Assignment_Status', 'Eligibility_Flag',
        'Special_Resource_Requirements', 'Implementation_Status',
        'Notes', 'Assignment_Plan', 'Position_Type',
        'Access_Requirements', 'Hours_Update_Required',
        'Updated_Hours'
    ]
    
    processed_template = template_df[columns_needed]
    
    # Step 5: Process special cases and adjustments
    # Adjust hours for specific implementation cases
    processed_template['Requested_Hours'] = np.where(
        processed_template['Implementation_Status'] == 1,
        processed_template['Requested_Hours'] * 1.3,
        processed_template['Requested_Hours']
    )
    
    # Step 6: Final categorization and sorting
    processed_template['Category'] = np.where(
        processed_template['Category'] == 'SPECIAL_CATEGORY',
        processed_template['Alternative_Category'],
        processed_template['Category']
    )
    
    # Step 7: Sort the final template
    processed_template = processed_template.sort_values(
        by=['Category', 'Priority_Score', 'Team_Name'],
        ascending=[True, False, True]
    )
    
    # Step 8: Add reference IDs for system integration
    processed_template['System_Reference_ID'] = processed_template['Request_ID'].apply(
        lambda x: generate_reference_id(x)  # Placeholder for actual reference ID generation
    )
    
    return processed_template

def generate_reference_id(request_id):
    """
    Placeholder function for generating system reference IDs
    Replace with actual implementation based on your system
    """
    return f"REF_{request_id}"

requests_ready = create_assignment_template(requests_sel)

# Capacity Prep

After understanding the needs of the business for the day, I needed to see what the whether any of our current resources had any availability. I used the following code to see what resources were available and had capacity to be placed on the needs above.

In [None]:
# create the capbase file that determines the capacity of each abstractor in each category and saves it to a `capbase.xlsx` file
capbase = create_capbase.create_capbase_file(queries)
# for all the categories this creates a dictionary that holds all of the categories that are needed in order to be iterated over later
# the writer=True parameter is used to determine whether it will be written to an excel file or not. The function is reused 
# in other notebooks to explore capacity in certain categories without running all categories, and returning it directly to the 
# notebook without writing it to an excel file
result = capacity_portion.category_capacity(categories, capbase, writer=True)

# Excel File Creation

Next I created an excel file that had the template that I made above for the Requests, and then had different tabs for all of the resources in the categories that had skills that were needed for the day's requests.

In [None]:
writer = pd.ExcelWriter(f"Daily Tools/Assignment_Planning_{today}.xlsx", engine='xlsxwriter')
requests_ready.to_excel(writer, sheet_name = 'Summary', index=False)
for cat in sorted(categories):
    result[cat].to_excel(writer, sheet_name = cat, index = False)
writer.close()
# this function adjusts the column widths to fit the data so that it shows up nicely in excel
# rather than the headers being all squashed.
create_capbase.adjust_workbook_column_widths(f"Daily Tools/Assignment_Planning_{today}.xlsx")

# Diff Check

An easy way to check what the changes were between the previous day and today in terms of needs and capacity.

In [None]:
# Find the most recent file even if it wasn't yesterday. 
# this buffers in case I ran it the previous Friday, or 
# if I was out of the office for a few days.
yesterday_found = False
day = 1
while not yesterday_found:
    try:
        yesterday = (datetime.today() - timedelta(days=day)).strftime('%B %d %Y').replace(" 0", " ")
        yesterday_file = f"Daily Tools/Assignment_Planning_{yesterday}.xlsx"
        yesterday_xlsx = pd.ExcelFile(yesterday_file)
        yesterday_found = True
        yesterday_xlsx.close()
    except:
        day += 1
        if day > 14:
            print("no file from the last 15 days")
            break


# Define file paths
today_file = f"Daily Tools/Assignment_Planning_{today}.xlsx"


def compare_summary_sheets(today_file, yesterday_file):
    # Read the Summary sheets
    today_summary = pd.read_excel(today_file, sheet_name='Summary')
    yesterday_summary = pd.read_excel(yesterday_file, sheet_name='Summary')
    yesterday_summary_fil = yesterday_summary[yesterday_summary['Request Status Update'].isnull()].copy()
    # select just the 'Q-Centrix Team Name' and the 'Requested Hours'
    yesterday_summary_sel = yesterday_summary_fil[['Request Name', 'Requested Hours']].rename(
        columns={'Requested Hours': 'Requested Hours Yesterday'}).copy()
    # add a column for yesterday

    today_summary_sel = today_summary[['SPECIFIC COLUMNS']].copy()
    summaries_joined = today_summary_sel.merge(yesterday_summary_sel, on='Request Name', how='left')
    summaries_joined.loc[:, 'Requested Hours Yesterday'] = summaries_joined['Requested Hours Yesterday'].fillna(0)
    summaries_joined.loc[:, 'Requested Hours Difference'] = summaries_joined['Requested Hours'] - summaries_joined['Requested Hours Yesterday']
    differences = summaries_joined[summaries_joined['Requested Hours Difference'] > 0]
    return differences

def compare_other_sheets(today_file, yesterday_file):
    # Get all sheet names from today's file
    xlsx = pd.ExcelFile(today_file)
    yesterday_xlsx = pd.ExcelFile(yesterday_file)
    sheet_names = [sheet for sheet in xlsx.sheet_names if sheet != 'Summary']
    
    all_differences = {}

    
    for sheet in sheet_names:
        # Read sheets from both files
        dtypes = {
            'Several_column_names': their_corresponding_dtypes,
        }
        today_sheet = pd.read_excel(today_file, sheet_name=f'{sheet}', dtype=dtypes)
        if sheet not in yesterday_xlsx.sheet_names:
            print(f"{sheet} not in yesterday's file")
            today_sheet = pd.read_excel(today_file, sheet_name=f'{sheet}', dtype=dtypes)
            today_filtered = today_sheet[today_sheet['Capticket Hours'].notnull()]
            today_filtered_sel = today_filtered[['CBIZ_Name', 'Capticket Hours']].rename(columns={'Capticket Hours': 'Hours'}).copy()
            if not today_filtered_sel.empty:
                all_differences[sheet] = today_filtered_sel
        else:
            yesterday_sheet = pd.read_excel(yesterday_file, sheet_name=f'{sheet}', dtype=dtypes)
            
            # Filter rows with non-null 'Capticket Hours'
            today_filtered = today_sheet[today_sheet['Capticket Hours'].notnull()]
            yesterday_filtered = yesterday_sheet[yesterday_sheet['Capticket Hours'].notnull()]
            # select just the 'CBIZ_Name' and 'Capticket Hours' columns
            today_filtered_sel = today_filtered[['CBIZ_Name', 'Capticket Hours']].rename(columns={'Capticket Hours': 'Hours'}).copy()
            yesterday_filtered_sel = yesterday_filtered[['CBIZ_Name', 'Capticket Hours']].rename(columns={'Capticket Hours': 'Hours'}).copy()
            
            # Outer join on 'CBIZ_Name'
            merged = today_filtered_sel.merge(yesterday_filtered_sel, on='CBIZ_Name', how='left', suffixes=('_today', '_yesterday'))

            # fill na with 0 in '_yesterday'
            merged.loc[:, 'Hours_yesterday'] = merged['Hours_yesterday'].fillna(0)
            
            # Compare 'Capticket Hours'
            differences = merged[merged['Hours_today'] >= merged['Hours_yesterday']]
            
            if not differences.empty:
                all_differences[sheet] = differences
    xlsx.close()
    yesterday_xlsx.close()
    return all_differences
summary_differences = compare_summary_sheets(today_file, yesterday_file)
other_sheet_differences = compare_other_sheets(today_file, yesterday_file)

print("Differences in Summary sheet:")
summary_differences

In [None]:
print("\nDifferences in other sheets:")
for sheet, diff in other_sheet_differences.items():
    print(f"\nDifferences in {sheet}:")
    try:
        print(diff[['CBIZ_Name', 'Hours_today', 'Hours_yesterday']])
    except:
        print(diff[['CBIZ_Name', 'Hours']])

# Assignment Creation table for push to SF from XL-Connector

In [None]:
assignment_decision_summary = pd.read_excel(f"Daily Tools/Assignment_Planning_{today}.xlsx", sheet_name = 'Summary')
# if the entire column of 'Suggested Resource' is null this means that no decisions have been made yet 
# so throw an error message saying to input the Suggested resources into the 'Suggested Resource' column
decisions = assignment_decision_summary[~assignment_decision_summary['Suggested Resource'].isnull()].copy()
if len(decisions) == 0:
    raise ValueError('Please input the Suggested resources into the "Suggested Resource" column')
# check that the number of decisions is what I expected based on the plan I made today
len(decisions)

In [None]:
def validate_existing_assignments(proposed_assignments: pd.DataFrame, 
                                data_interface = queries) -> bool:
    """
    Validates proposed assignments against existing assignments to identify conflicts
    and necessary updates.
    
    Args:
        proposed_assignments (pd.DataFrame): DataFrame containing proposed assignment data
            Required columns:
            - 'Resource_Name': Name of resource to be assigned
            - 'Request_ID': Unique identifier for the request
            - 'Team_Name': Name of team for assignment
        data_interface: Interface for querying assignment data
    
    Returns:
        bool: True if updates to existing assignments are needed, False otherwise
    """
    # Counter for assignments that need updating
    assignments_requiring_updates = 0
    
    # Iterate through each proposed assignment
    for _, assignment in proposed_assignments.iterrows():
        # Step 1: Get resource identifier
        try:
            resource_id = data_interface.get_resource_id(assignment['Resource_Name'])
        except Exception as e:
            print(f"Error fetching resource ID: {e}")
            continue
            
        # Step 2: Determine team identifier
        team_id = None
        request_id = assignment['Request_ID']
        
        # First try to get team ID from request if it's a valid request ID
        if isinstance(request_id, str) and request_id.startswith(('PREFIX1', 'PREFIX2')):
            try:
                team_info = data_interface.get_team_info(request_id)
                team_id = team_info.get('team_id')
            except Exception as e:
                print(f"Could not retrieve team info for request: {request_id}")
        
        # Fallback: Get team ID directly from team name
        if team_id is None:
            try:
                team_id = data_interface.get_team_id(assignment['Team_Name'])
            except Exception as e:
                print(f"Could not retrieve team ID for: {assignment['Team_Name']}")
                continue
                
        # Step 3: Check existing assignments
        if team_id:
            try:
                # Get current assignments for the resource
                current_assignments = data_interface.get_resource_assignments(
                    resource_id=resource_id
                )
                
                # Check if resource is already assigned to this team
                if assignment['Team_Name'] in current_assignments['Team_Name'].values:
                    assignments_requiring_updates += 1
                    print(f"Existing assignment found for {assignment['Resource_Name']} "
                          f"on {assignment['Team_Name']}")
                    
                    # Get and display other team members for reference
                    team_assignments = data_interface.get_resource_assignments(team_id=team_id)
                    active_team_members = team_assignments[
                        team_assignments['Position'] == 'STANDARD_POSITION'
                    ]['Resource_Name']
                    print("Current team assignments:")
                    print(active_team_members)
                    
            except Exception as e:
                print(f"Error checking assignments: {e}")
                continue
    
    # Return whether any updates are needed
    return assignments_requiring_updates > 0

def check_assignment_updates(decisions_df: pd.DataFrame) -> bool:
    """
    Wrapper function to check if assignments need updating
    
    Args:
        decisions_df: DataFrame containing assignment decisions
        
    Returns:
        bool: True if updates are needed, False otherwise
    """
    return validate_existing_assignments(decisions_df)

assignment_update_needed = check_assignment_updates(decisions)

# Excel Creation

After checking if an assignment exists I would reformat the dataframe to fit the format that I needed to push to Salesforce. Then I would get this data and put it along with another dataframe that contained the capacity ticket information and save them both as separate sheets in a new Excel Workbook.

In [None]:
# code removed for privacy

I then updated the requests from the decisions dataframe 

In [None]:
decisions_no_na_requests = assignment_decision_summary.loc[assignment_decision_summary['Request Status Update'].notna()]
decisions_requests = decisions_no_na_requests.loc[decisions_no_na_requests['Request Status Update'].str.contains('^(PREFIX)*'), 
                                                    ['SPECIFIC COLUMS']]
# if the Request name value contains 'see' then remove the row
decisions_requests = decisions_requests.drop_duplicates()
decisions_requests

In [None]:
for i in range(len(decisions_requests)):
    request_info = queries.get_sr_from_request_name(decisions_requests['Request Name'].iloc[i])
    if decisions_requests['Request Hours Update?'].iloc[i] == 'TRUE':
        request_info['SOW_Hours__c'].values[0] = decisions_requests['Update Request Hours To'].iloc[i]
        if pd.isna(decisions_requests['Notes'].iloc[i]):
            print(f"Skipping Notes Update for {decisions_requests['Request Name'].iloc[i]} on {decisions_requests['Team Name'].iloc[i]}")
        else:
            request_info.loc[:, 'Notes__c'] = np.where((request_info['Notes__c'].values[0] != decisions_requests['Notes'].iloc[i]), 
                                                    (request_info['Notes__c'].values[0] + decisions_requests['Notes'].iloc[i]),
                                                    request_info['Notes__c'])
    queries.sf.Staffing_Request__c.update(request_info.Id.values[0], {'SOW_Hours__c': request_info.SOW_Hours__c.values[0], 
                                                    'Status__c': decisions_requests['Request Status Update'].iloc[i],
                                                    "Notes__c": f"{request_info.Notes__c.values[0]}"})
    print(f"Updated {decisions_requests['Request Name'].iloc[i]} on {decisions_requests['Team Name'].iloc[i]}")

# History of Assignments

In [None]:
assignment_history_df = pd.read_excel(f"assignment_creation_history.xlsx", sheet_name='Automated_Assignment_Creation')
old_assignment_history = pd.read_excel(f"assignment_creation_history.xlsx", sheet_name='Assignment Creation (3)')
decisions.loc[:, 'Date Created'] = datetime.now().date()
assignment_history_df = pd.concat([assignment_history_df, decisions]).drop_duplicates()
writer = pd.ExcelWriter("assignment_creation_history.xlsx", engine='xlsxwriter')
assignment_history_df.to_excel(writer, sheet_name = 'Automated_Assignment_Creation', index=False)
old_assignment_history.to_excel(writer, sheet_name = 'Assignment Creation (3)', index=False)
writer.close()
create_capbase.adjust_workbook_column_widths(f"assignment_creation_history.xlsx")