# Job Search Sankey Diagrams
## Background & Purpose

In October 2024, I began applying to jobs intensely after learning about an impending layoff. Like many job seekers, I found myself sending out dozens of applications, often with little visibility into how far I was getting in the process. By early 2025, I had resumed applying again with a renewed focus and a more structured approach to tracking progress.

To bring clarity to this overwhelming experience, I began logging each application I submitted, along with the specific stages it went through — from initial submission to final outcome. I wanted to know:
- How many applications progressed beyond the first step?
- At what stage were most rejections happening?
- How many companies made it to final interviews or offers?

This notebook uses Sankey diagrams to visualize the flow of applications across various stages in the interview process. Each node represents a distinct stage (e.g., "Applied", "HR Call", "Technical Interview", "Offer"), and the links represent transitions between them. The thickness of each link corresponds to the number of applications that followed that particular path.

By visualizing this data:
- I can see common patterns and dead ends.
- I get insight into which stages were the most common points of rejection.
- I gain a clear picture of where effort translated into progress — and where it didn’t.

This project is both personal and analytical: a way to make sense of a challenging time and derive insights that could help improve future job searches.



## Setup
### Import packages

In [30]:
import pandas as pd
import plotly.graph_objects as go

## Data Preparation
### User inputs
**This is the section to override defaults** for the input file name, source column, target column, and visualization title.

In [59]:
input_file_name = '2024-Sankey-Table.csv'
source_column = 'Source'
target_column = 'Target'
visualization_title = 'Job Search Sankey 2024'

### Load source data
This step loads the data tracking my progress through each interview stage for the jobs I applied to, as well as those that recruiters reached out to me about.

In [49]:
# load the CSV with Company, Source, Target
df = pd.read_csv(f"input_data/{input_file_name}")
df.head()

Unnamed: 0,Company,Source,Target
0,Accompany Health,Applied,HR/ Hiring Manager Interview
1,,HR/ Hiring Manager Interview,Technical Interview
2,,Technical Interview,Rejected
3,Age of Learning,Applied,HR/ Hiring Manager Interview
4,,HR/ Hiring Manager Interview,Technical Interview


### Groups the data by "Source" and "Target" columns

In [50]:
# group by transition and count how many times it occurred across all companies
flow_counts = df.groupby([source_column, target_column]).size().reset_index()
flow_counts.rename(columns={0: "Count"}, inplace=True)
flow_counts

Unnamed: 0,Source,Target,Count
0,Applied,Auto-Rejected,71
1,Applied,HR/ Hiring Manager Interview,22
2,Applied,No Response,100
3,Applied,Withdrew,5
4,HR/ Hiring Manager Interview,No Response,1
5,HR/ Hiring Manager Interview,Offer,1
6,HR/ Hiring Manager Interview,Rejected,6
7,HR/ Hiring Manager Interview,Technical Interview,10
8,HR/ Hiring Manager Interview,Withdrew,5
9,Offer,Declined Offer,1


### Extracts the unique labels

In [None]:
# gets unique stages as nodes
# ravel() puts the 2D array into 1D
# Output: array([["Panel Interview", "Offer"],
#                ["Panel Interview", "Rejected"],
#                ["Recruiter Inquiry", "Withdrew"]])

labels = pd.unique(flow_counts[[source_column, target_column]].values.ravel())
labels

array(['Applied', 'Auto-Rejected', 'HR/ Hiring Manager Interview',
       'No Response', 'Withdrew', 'Offer', 'Rejected',
       'Technical Interview', 'Declined Offer', 'Offer Accepted',
       'Panel Interview', 'Recruiter Inquiry'], dtype=object)

### Mapping Labels to Indices
- gives each label a unique index
- creates a `dict` where each label maps to a unique index
- maps the "Source" and "Target" stages to their respective indices using the `label_to_index` dictionary

In [34]:
# map each target state to an index
label_to_index = {label: idx for idx, label in enumerate(labels)}
label_to_index


{'Applied': 0,
 'Auto-Rejected': 1,
 'HR/ Hiring Manager Interview': 2,
 'No Response': 3,
 'Withdrew': 4,
 'Offer': 5,
 'Rejected': 6,
 'Technical Interview': 7,
 'Declined Offer': 8,
 'Offer Accepted': 9,
 'Panel Interview': 10,
 'Recruiter Inquiry': 11}

In [51]:
# maps the integer value from 'Source' to an index
source_id_column = f"{source_column}ID"
flow_counts[source_id_column] = flow_counts[source_column].map(label_to_index)
flow_counts[source_id_column]

0      0
1      0
2      0
3      0
4      2
5      2
6      2
7      2
8      2
9      5
10     5
11    10
12    10
13    10
14    11
15     7
16     7
17     7
Name: SourceID, dtype: int64

In [54]:
# maps the integer value from 'Target' to an index
target_id_column = f"{target_column}ID"
flow_counts[target_id_column] = flow_counts[target_column].map(label_to_index)
flow_counts[target_id_column]

0      1
1      2
2      3
3      4
4      3
5      5
6      6
7      7
8      4
9      8
10     9
11     5
12     6
13     4
14     2
15    10
16     6
17     4
Name: TargetID, dtype: int64

### Define colors for the nodes

In [55]:
# create a color map for the nodes
node_colors = ['lightblue', 
               'red', 
               'lightcoral', 
               'lightskyblue', 
               'lightpink', 
               'yellow', 
               'green', 
               'lightsteelblue', 
               'blue', 
               'purple', 
               'brown', 
               'orange', 
               'orchid', 
               'mediumseagreen'
]

## Visualize the data
### Build the Sankey diagram figure

In [56]:
# build Sankey diagram with colors
fig = go.Figure(data=[go.Sankey(
    node=dict(
        pad=20,
        thickness=20,
        line=dict(color="black", width=0.5),
        label=labels,
        color=node_colors  # this applies color to each node
    ),
    link=dict(
        source=flow_counts[source_id_column],
        target=flow_counts[target_id_column],
        value=flow_counts["Count"],
    )
)])

### Show the figure

In [60]:
fig.update_layout(
    title_text=visualization_title,
    font_size=12,
    autosize=True,
    width=1000,    # width of the graph (increase this to make it longer horizontally)
    height=800     # height of the graph (increase this to make it longer vertically)
)

fig.show()

output_file = visualization_title.strip().lower().replace(" ", "_")

fig.write_image(f"output_data/{output_file}.png")

## Conclusion & Reflections

This Sankey diagram provides a clear, visual overview of how job applications progress through different interview stages. By mapping transitions between stages such as “Applied,” “HR Interview,” and “Technical Interview,” users can better understand where applications tend to stall, advance, or succeed.

Key takeaways from this analysis might include:
- Identifying common drop-off points in the interview process.
- Highlighting stages where most rejections occur.
- Recognizing which types of roles or companies tend to move candidates further through the pipeline.

Visualizing the job search in this way can help make sense of a complex and often stressful process, and may inform future application strategies or areas for preparation and focus.