# Job Search Sankey Diagrams
## Background & Purpose

In October 2024, I began applying to jobs intensely after learning about an impending layoff. Like many job seekers, I found myself sending out dozens of applications, often with little visibility into how far I was getting in the process. By early 2025, I had resumed applying again with a renewed focus and a more structured approach to tracking progress.

To bring clarity to this overwhelming experience, I began logging each application I submitted, along with the specific stages it went through — from initial submission to final outcome. I wanted to know:
- How many applications progressed beyond the first step?
- At what stage were most rejections happening?
- How many companies made it to final interviews or offers?

This notebook uses Sankey diagrams to visualize the flow of applications across various stages in the interview process. Each node represents a distinct stage (e.g., "Applied", "HR Call", "Technical Interview", "Offer"), and the links represent transitions between them. The thickness of each link corresponds to the number of applications that followed that particular path.

By visualizing this data:
- I can see common patterns and dead ends.
- I get insight into which stages were the most common points of rejection.
- I gain a clear picture of where effort translated into progress — and where it didn’t.

This project is both personal and analytical: a way to make sense of a challenging time and derive insights that could help improve future job searches.



## Setup
### Import packages

In [1]:
import pandas as pd
import plotly.graph_objects as go
from dotenv import load_dotenv
import os

load_dotenv()

True

## Data Preparation
### User inputs
Reads in the user inputs from the `.env` file for the input file name, source column, target column, and visualization title.

In [2]:
# load in env vars
input_file_name = os.getenv("INPUT_FILE")
source_column = os.getenv("SOURCE_COLUMN")
target_column = os.getenv("TARGET_COLUMN")
visualization_title = os.getenv("VIS_TITLE")

print(f"Loaded config for: {input_file_name}")
print(f"Loaded config for: {source_column}")
print(f"Loaded config for: {target_column}")
print(f"Loaded config for: {visualization_title}")

Loaded config for: 2025-Sankey-Table.csv
Loaded config for: Source
Loaded config for: Target
Loaded config for: Job Search Sankey Diagram 2025


### Load source data
This step loads the data tracking my progress through each interview stage for the jobs I applied to, as well as those that recruiters reached out to me about.

In [3]:
# load the CSV and preview 
df = pd.read_csv(f"input_data/{input_file_name}")
df.head()

Unnamed: 0,Company,Source,Target
0,Apptegy,Recruiter Inquiry,HR/ Hiring Manager Interview
1,Apptegy,HR/ Hiring Manager Interview,Technical Interview
2,Apptegy,Technical Interview,Panel Interview
3,Apptegy,Panel Interview,Offer
4,Apptegy,Offer,Accepted Offer


### Groups the data by "Source" and "Target" columns

In [4]:
# group by transition and count how many times it occurred across all companies
flow_counts = df.groupby([source_column, target_column]).size().reset_index()
flow_counts.rename(columns={0: "Count"}, inplace=True)
flow_counts

Unnamed: 0,Source,Target,Count
0,Applied,Auto-Rejected,25
1,Applied,HR/ Hiring Manager Interview,3
2,Applied,No Response,28
3,HR/ Hiring Manager Interview,Rejected,3
4,HR/ Hiring Manager Interview,Technical Interview,4
5,HR/ Hiring Manager Interview,Withdrew,1
6,Hiring Manager Inquiry,HR/ Hiring Manager Interview,1
7,Offer,Accepted Offer,1
8,Offer,Declined Offer,1
9,Panel Interview,Offer,2


### Extracts the unique labels

In [5]:
# gets unique stages as nodes

# ravel() puts the 2D array into 1D
# Output: array([["Panel Interview", "Offer"],
#                ["Panel Interview", "Rejected"],
#                ["Recruiter Inquiry", "Withdrew"]])

labels = pd.unique(flow_counts[[source_column, target_column]].values.ravel())
labels

array(['Applied', 'Auto-Rejected', 'HR/ Hiring Manager Interview',
       'No Response', 'Rejected', 'Technical Interview', 'Withdrew',
       'Hiring Manager Inquiry', 'Offer', 'Accepted Offer',
       'Declined Offer', 'Panel Interview', 'Recruiter Inquiry'],
      dtype=object)

### Mapping Labels to Indices
- gives each label a unique index
- creates a `dict` where each label maps to a unique index
- maps the "Source" and "Target" stages to their respective indices using the `label_to_index` dictionary

In [6]:
# map each target state to an index
label_to_index = {label: idx for idx, label in enumerate(labels)}
label_to_index


{'Applied': 0,
 'Auto-Rejected': 1,
 'HR/ Hiring Manager Interview': 2,
 'No Response': 3,
 'Rejected': 4,
 'Technical Interview': 5,
 'Withdrew': 6,
 'Hiring Manager Inquiry': 7,
 'Offer': 8,
 'Accepted Offer': 9,
 'Declined Offer': 10,
 'Panel Interview': 11,
 'Recruiter Inquiry': 12}

In [7]:
# maps the integer value from 'Source' to an index
source_id_column = f"{source_column}ID"
flow_counts[source_id_column] = flow_counts[source_column].map(label_to_index)
flow_counts[source_id_column]

0      0
1      0
2      0
3      2
4      2
5      2
6      7
7      8
8      8
9     11
10    11
11    11
12    12
13     5
Name: SourceID, dtype: int64

In [8]:
# maps the integer value from 'Target' to an index
target_id_column = f"{target_column}ID"
flow_counts[target_id_column] = flow_counts[target_column].map(label_to_index)
flow_counts[target_id_column]

0      1
1      2
2      3
3      4
4      5
5      6
6      2
7      9
8     10
9      8
10     4
11     6
12     2
13    11
Name: TargetID, dtype: int64

### Define colors for the nodes

In [9]:
# create a color map for the nodes
# more colors can be added if there are more unqiue stages
node_colors = ['lightblue', 
               'red', 
               'lightcoral', 
               'lightskyblue', 
               'lightpink', 
               'yellow', 
               'green', 
               'lightsteelblue', 
               'blue', 
               'purple', 
               'brown', 
               'orange', 
               'orchid', 
               'mediumseagreen'
]

## Visualize the data
### Build the Sankey diagram figure

In [10]:
# build Sankey diagram with colors
fig = go.Figure(data=[go.Sankey(
    node=dict(
        pad=20,
        thickness=20,
        line=dict(color="black", width=0.5),
        label=labels,
        color=node_colors  # this applies color to each node
    ),
    link=dict(
        source=flow_counts[source_id_column],
        target=flow_counts[target_id_column],
        value=flow_counts["Count"],
    )
)])

### Show the figure

In [11]:
# set the layout configurations
fig.update_layout(
    title_text=visualization_title,
    font_size=12,
    autosize=True,
    width=1000,    # width of the graph (increase this to make it longer horizontally)
    height=800     # height of the graph (increase this to make it longer vertically)
)
fig.show()

# take the vis title and format it to use it for the output file name
output_file = visualization_title.strip().lower().replace(" ", "_")

# save the figure to a file in the output_data directory
fig.write_image(f"output_data/{output_file}.png", width=1000, height=800)

## Conclusion & Reflections

This Sankey diagram provides a clear, visual overview of how job applications progress through different interview stages. By mapping transitions between stages such as “Applied,” “HR Interview,” and “Technical Interview,” users can better understand where applications tend to stall, advance, or succeed.

Key takeaways from this analysis might include:
- Identifying common drop-off points in the interview process.
- Highlighting stages where most rejections occur.
- Recognizing which types of roles or companies tend to move candidates further through the pipeline.

Visualizing the job search in this way can help make sense of a complex and often stressful process, and may inform future application strategies or areas for preparation and focus.