<center><img src="../logo/hr-pdgraph-logo.png" alt="HR-PDGraph Logo" width="200"/></center>

## Background

**FDIC HRO** seeks to automate the analysis of existing Position Descriptions (PDs) within the organization. Such analysis involves identifying overlapping PDs through shared Knowledge, Skills, and Abilities (KSA) elements. Insights derived from this analysis can support strategic initiatives like organizational restructuring, role consolidation, or gap identification.

## Objective

Develop an AI/ML-driven system that produces **verifiable, testable, and measurable models** to efficiently cluster and analyze PDs, enabling actionable insights for HR and organizational decision-makers.

Key objectives include:

- **Preprocessing** PDs and resumes to generate a graph of entities and relationships.
- **Applying graph analytics** (e.g., Louvain, PageRank) to detect redundancy and support workforce optimization.
- **Training Graph Neural Networks (GNNs)** for advanced PD clustering and predictive job-role matching.
- **Integrating Large Language Models (LLMs)** to generate role-aligned PD templates that conform to patterns learned from the organizational graph structure.

## Data Sources

### O\*NET
O\*NET (Occupational Information Network) provides a structured taxonomy of occupations, skills, knowledge, abilities, and work activities. Each occupation is associated with detailed ratings on the importance and level of various KSAs, which are critical for workforce planning and role design.

For the purpose of this POC, we limit the O*Net dataset to the following job title | job code:
```python
"Data Science": ["Data Scientists", "15-2051.00"], 
"Human Resources": ["Human Resources Specialists", "13-1071.00"], 
"Advocate": ["Advocate", "21-1093.00"], 
"Web Designer": ["Web Designer", "15-1254.00"], 
"Mechanical Engineer": ["Mechanical Engineer", "17-2141.00"], 
"Sales": ["Sales Managers", "11-2022.00"], 
"Health and fitness": ["Wellness Coach", "11-9179.01"], 
"Civil Engineer": ["Civil Engineer", "17-2051.00"], 
"Java Developer": ["Java Developer", "15-1251.00"], 
"Business Analyst": ["Business Analyst", "13-1111.00"], 
"SAP Developer": ["Software Analyst", "15-1211.00"], 
"Automation Testing": ["Automation Tester", "15-1253.00"], 
"Electrical Engineering": ["Electrical Engineers", "17-2071.00"], 
"Operations Manager": ["Operations Manager", "11-1021.00"], 
"Python Developer": ["Programmer", "15-1251.00"], 
"DevOps Engineer": ["DevOps Engineer", "15-1252.00"], 
"Network Security Engineer": ["Network Security Engineer", "15-1299.04"], 
"PMO": ["Personnel Officer", "13-1071.00"], 
"Database": ["Database Manager", "15-1242.00"], 
"Hadoop": ["Data Storage Specialist", "15-1242.00"], 
"ETL Developer": ["Electronic Data Interchange System Developer (EDI System Developer)", "15-1299.08"], 
"DotNet Developer": [".NET Developer", "15-1252.00"], 
"Blockchain": ["Blockchain Developer", "15-1299.07"], 
"Testing": ["Tester", "15-1299.04"]

```

### VOLCANO
VOLCANO provides PCA-based trait scores for occupations and individual KSAs. These include:
- **Comp.1 (Preparation):** Level of preparation and complexity
- **Comp.2 (STEM vs. Humanities):** The technical vs. human-oriented balance
- **Comp.3 (Math vs. Health):** Emphasis on quantitative vs. interpersonal/health-centered work

These components allow for **quantitative, interpretable role profiling**.

### Kaggle Resume Dataset
In order to model a POC we are using 105 resumes from the kaggle resume dataset in palce of PDs. Our Assumption is that Resumes have similar KSA information as PDs and are likely more complex than PDs.

### FDIC PDs
Once the POCS is moved to an approved environment (FHPCC), we will replace the resume daataset with PDs.

## Project Proposal

**HR-PDGraph** is a data-driven, explainable AI platform designed to assist HR leaders and managers in analyzing, clustering, and generating position descriptions (PDs) using graph-based modeling and machine learning.

By connecting PDs to standardized skills (KSAs), occupations, and validated PCA-based role traits from frameworks such as **O\*NET** and **VOLCANO**, HR-PDGraph enables:

- Traceable scoring  
- Intelligent role alignment  
- Transparent evaluation of roles  
- AI-assisted generation of PDs aligned with strategic workforce clusters  

This proposal builds upon the methodology of the following published research:

## Research Foundations

### Research 1  
**Title:** [*A Novel Approach for Job Matching and Skill Recommendation using Transformers and the O\*NET Database*](https://www.sciencedirect.com/science/article/pii/S2214579625000048)  
**Summary:** This research presents a transformer-based approach for matching resumes to job roles by extracting entities (e.g., skills and experience) from unstructured text and computing semantic similarity scores with KSAs linked to job titles in O\*NET. The approach introduces a normalized job scoring method to rank occupations based on entity overlap and cosine similarity.

### Research 2  
**Title:** [*Visualization of Latent Components Assessed in O\*NET Occupations (VOLCANO)*](https://link.springer.com/article/10.3758/s13428-022-02044-7)  
**Summary:** This paper introduces a principal component-based approach (PCA) to analyze occupational traits. By applying dimensionality reduction to O\*NET’s rich KSA space, the authors identify interpretable latent axes such as Preparation (Comp.1), STEM vs. Humanities (Comp.2), and Math vs. Health (Comp.3). These dimensions allow for more meaningful clustering and role comparisons across the labor market.

## Leveraging the Research in Practice

We modeled the first research paper’s architecture in **Neo4j**, representing resumes, extracted entities (noun phrases), KSAs, tech & tools, work activities and their associated job titles. Similarity scores between phrases and KSAs, along with job KSA importance levels, were used to dynamically compute a resume-to-job **fit score**.

We then **extended the graph** by linking each job and KSA node to the **latent PCA trait scores** from the VOLCANO study. This enriched model enables us to go beyond matching and into interpretation: we can now **cluster overlapping resumes and PDs**, and label those clusters with meaningful dimensions such as **STEM-ness, preparedness, and quantitative orientation**.

This combined approach supports **data-driven, explainable clustering** of PDs and **generative capabilities** that can help HR design roles aligned to organizational strategy.

## Environment

Intitial POC development in the Azure Sandbox (DIT). Upon approval by stakeholders, we will move our project to FHPCC.

## Conclusion

By combining methodologies from both research papers, **HR-PDGraph** delivers a scalable and intelligent framework for organizational role analysis, grounded in graph AI and explainable machine learning.


# 🔍 HR-PD Graph: Graph-Based Resume Community Detection (Neo4j + GDS)

Welcome to Graph Data Science. This notebook documents the entire process of building a semantic graph from resumes and traits, projecting it into GDS, and running community detection + profiling.
```pyhton
CALL db.schema.visualization()

```

<p>
<center>
  <img src="../images/graph.png" width="300"/>
  <br>
  <b>Figure 1:</b> PD Graph Schema (baseline)
</center>
</p>


***

## 1️⃣ Graph Construction Overview and Examples

We construct the graph with the following entities and relationships (Figure 1:):

- `Resume` → [:CONTAINS] → `NounPhrase`
- `NounPhrase` → [:SIMILAR_TO] → `Skill` | `Ability` | `Knowledge` | `Work_activities` | `Tech` | `Tools` | `Task_Ratings`
- Each KSA node ← [:ALIGNS_WITH] ←`Trait`
- Each KSA node → [:REQUIRED_FOR] → `JobTitle`
- `Occupation` → [:ALIGNED_WITH] → `JobTitle` (from VOLCANO)

This enables multi-hop reasoning from resume → cognitive traits, allowing semantic community detection. Figure 2 shows the results of a simple query on all resumes for "Java Developers". A graphical view, such as this, shows clustering while providing valuable insights. The four resumes in Red share common and distinct noun phrases, ultimately linking to the Trait (Businees, Technical, Administrative, etc.). Also, a highly connected resume (degree) is likely to have more general knowledge versus a one that is more sepcialized. But are there ways to measure such traits?



<p>
<center>
  <img src="../images/example_resume_graph.png" width="400"/>
  <br>
  <b>Figure 2:</b> A snapshot of the connected HR-Graph for multi-hopping reasoning 
</center>
</p>

***

## 🔍 Resume Connectivity Analysis: Degree Centrality
```cypher
MATCH (r:Resume)-[:CONTAINS]->(:NounPhrase)-[:SIMILAR_TO]->(:Entity)
RETURN r.id as resume_id, r.name AS original_job, COUNT(*) AS degree
ORDER BY degree DESC
LIMIT 5

```

We analyzed the graph to find which resumes have the **highest degree**, defined as the number of unique `NounPhrases → Entities` matched from that resume. Higher degree implies broader conceptual coverage across the O*NET knowledge base — often associated with more **generalist** or **multi-disciplinary** roles.

### 📈 Top 5 Most Connected Resumes

| Resume ID                               | Original Job         | Degree |
|-----------------------------------------|-----------------------|--------|
| c15b4a67-bdaf-4b3a-bbe3-64c2915beaab     | DevOps Engineer       | 77     |
| adb17a98-0d61-42b7-864c-31d08b85447a     | Mechanical Engineer   | 46     |
| 063c0d0b-11af-4895-b2e0-eb3b5c84b9ce     | Data Science          | 43     |
| 71dac60a-bb9c-4503-b467-5fbe5d79a496     | Automation Testing    | 41     |
| 8ad00a89-2d8c-48b8-9089-783bb742ae20     | Business Analyst      | 41     |

### 🧠 Interpretation

- **DevOps Engineers** rank highest, with 77 connections. This supports the understanding that DevOps roles require knowledge across multiple domains — development, operations, security, tooling, and cloud infrastructure.
- **Mechanical Engineers** and **Data Science** also show high conceptual overlap, likely due to the variety of skills and tools covered in their resumes.
- This metric can help distinguish between **broad-spectrum candidates** (generalists) and **specialists** (fewer, but deeper connections).

***


## 📊 Calculating Job Score from Resume Using Research Paper Formula

We calculate the job matching score for a resume based on how many O*NET entities it covers for each job. This approach strictly follows the scoring methodology from the research paper.

---

### 📘 Scoring Formula (from the Research Paper)

For each job $j$, the score is calculated as:

$$
\text{score}(j) = \sum_{i=1}^{7} \left( \frac{\sum_{k \in i} \text{score}_k(j)}{\text{score\_max}_i(j)} + 0.5 \times \sum_{k \in i} \text{score}_k(j) \right)
$$

Where:

- $i$ refers to each O\*NET entity type (Abilities, Knowledge, Skills, Work Activities, Tasks, Tools, Tech)
- $\text{score}_k(j)$ is the importance value (data\_value) of a matched entity $k$ for job $j$
- $\text{score\_max}_i(j)$ is the sum of all possible importance values for entity type $i$ for job $j$


***

### 🔍 Cypher Query to Compute Job Score for a Data Scientist Resume

```cypher
// Step 1: Get matched entities and jobs for a specific resume
MATCH (r:Resume {id: "063c0d0b-11af-4895-b2e0-eb3b5c84b9ce"})-[:CONTAINS]->(:NounPhrase)-[:SIMILAR_TO]->(e:Entity)-[rel:REQUIRED_FOR]->(j:JobTitle)
WITH j, labels(e)[1] AS entity_type, sum(rel.importance) AS score_i

// Step 2: For each job & entity_type, get score_max_i
MATCH (e2:Entity)-[rel2:REQUIRED_FOR]->(j)
WITH j, entity_type,
     score_i,
     sum(rel2.importance) AS score_max_i

// Step 3: Apply the formula for each (job, entity_type)
WITH j.title AS job,
     (score_i / score_max_i) + (0.5 * score_i) AS partial_score

// Step 4: Sum all entity-type scores for each job
WITH job, sum(partial_score) AS total_score
ORDER BY total_score DESC
RETURN job, round(total_score, 2) AS job_score
LIMIT 10

```
### 🏆 Top 5 Predicted Jobs for the Data Science Resume

| Rank | Job Title                                     | Matched Entities |
|------|-----------------------------------------------|------------------|
| 1    | Sales Manager                                 | 63.62            |
| 3    | Electrical Engineers                          | 50.98            |
| 2    | Human Resources Specialists                   | 54.12            |
| 4    | Data Scientists                               | 20.58            |

<br/>
Yikes, it is predicting the Sales Manager as the top choice and Data Science as number 4. 

***


### 🔍 Solving with graph: Matching by Entity Overlap

This method counts how many entities (skills, tools, knowledge) are shared between the resume and each job. The results, although quite close, show slight improvement in the ranking for the number 2. 

```cypher
MATCH (r:Resume {id: "063c0d0b-11af-4895-b2e0-eb3b5c84b9ce"})-[:CONTAINS]->(:NounPhrase)-[:SIMILAR_TO]->(e:Entity)
MATCH (e)-[:REQUIRED_FOR]->(j:JobTitle)
RETURN j.title AS job, count(DISTINCT e) AS matched_entities
ORDER BY matched_entities DESC
LIMIT 5
```

### 🏆 Top Jobs by Shared Entity Overlap

| Rank | Job Title                                     | Matched Entities |
|------|-----------------------------------------------|------------------|
| 1    | Sales Manager                                 | 23            |
| 2    | Human Resources Specialists                   | 20            |
| 3    | Electrical Engineers                          | 18            |
| 4    | Data Scientists                               | 6            |

<br/>
The graph allows us to quickly see that the Sales Manager has more matched entities with this resume. Lets devle more into it:

### 📊 Matched Entities by Resume vs. Job Roles

This table shows which `noun_phrase` from the resume matched which `entity_text`, along with whether that entity is required for **Sales Managers** or **Data Scientists**.

### 🧩 Matched Entities with Resume Noun Phrases — Sales Manager vs. Data Scientist

This table displays the entity types, matched entity texts, the resume's noun phrases that triggered the match, and whether those entities are required for "Sales Managers" or "Data Scientists".

| Entity Type       | Entity Text                                           | Noun Phrase                  | Required for Sales Manager | Required for Data Scientist |
|-------------------|-------------------------------------------------------|------------------------------|-----------------------------|------------------------------|
| Knowledge         | Administration and Management                         | management                   | true                        | false                        |
| Knowledge         | Computers and Electronics                             | engineering electronics      | true                        | false                        |
| Knowledge         | Customer and Personal Service                         | customer                     | true                        | false                        |
| Knowledge         | Customer and Personal Service                         | new customer                 | true                        | false                        |
| Knowledge         | Design                                                | designing                    | true                        | false                        |
| Knowledge         | Design                                                | designed                     | true                        | false                        |
| Knowledge         | Engineering and Technology                            | engineering                  | true                        | false                        |
| Knowledge         | English Language                                      | english                      | true                        | false                        |
| Knowledge         | Food Production                                       | production                   | true                        | false                        |
| Knowledge         | Foreign Language                                      | english                      | true                        | false                        |
| Knowledge         | Production and Processing                             | production                   | true                        | false                        |
| Knowledge         | Sales and Marketing                                   | retail marketing- exprience | true                        | false                        |
| Skills            | Systems Analysis                                      | analyzing                    | true                        | false                        |
| Skills            | Technology Design                                     | designing                    | true                        | false                        |
| Skills            | Writing                                               | handwriting                  | true                        | false                        |
| Tech              | Business intelligence and data analysis software      | tableau                      | true                        | true                         |
| Tech              | Customer relationship management CRM software         | crm                          | true                        | false                        |
| Tech              | Data base reporting software                          | reporting                    | true                        | true                         |
| Tech              | Data base user interface and query software           | database objects             | true                        | true                         |
| Tech              | Data base user interface and query software           | sql                          | true                        | true                         |
| Tech              | Data base user interface and query software           | sql server                   | true                        | true                         |
| Tech              | Data base user interface and query software           | database tables              | true                        | true                         |
| Tech              | Data mining software                                  | analytics tool               | true                        | false                        |
| Tech              | Data mining software                                  | analytics infrastructure     | true                        | false                        |
| Tech              | Data mining software                                  | analytics capabilities       | true                        | false                        |
| Tech              | Data mining software                                  | analytics                    | true                        | false                        |
| Tech              | Enterprise resource planning ERP software             | introduced sap predictive    | true                        | true                         |
| Tech              | Enterprise resource planning ERP software             | sap senior developer         | true                        | true                         |
| Tech              | Enterprise resource planning ERP software             | sap ao                       | true                        | true                         |
| Tech              | Enterprise resource planning ERP software             | sap hana                     | true                        | true                         |
| Tech              | Enterprise resource planning ERP software             | sap pal                      | true                        | true                         |
| Tech              | Enterprise resource planning ERP software             | sap                          | true                        | true                         |
| Tech              | Enterprise resource planning ERP software             | sap database developer       | true                        | true                         |
| Tech              | Object or component oriented development software     | c #                          | true                        | true                         |
| Tech              | Object or component oriented development software     | python- exprience            | true                        | true                         |
| Tech              | Object or component oriented development software     | python3                      | true                        | true                         |
| Tech              | Object or component oriented development software     | python                       | true                        | true                         |
| Tech              | Operating system software                             | windows server               | true                        | true                         |
| Tools             | Computer servers                                      | database servers             | false                       | false                        |
| Tools             | Computer servers                                      | windows server               | false                       | false                        |
| Work_Activities   | Analyzing Data or Information                         | data                         | true                        | false                        |
| Work_Activities   | Analyzing Data or Information                         | analyzing                    | true                        | false                        |
| Work_Activities   | Processing Information                                 | data processing algorithms   | true                        | false                        |


<br/>

We can see that KSA's (showing only top 9) are matching 

***

## 2️⃣ Project GDS Graph: Detect Resume Communities Based on Traits

To uncover hidden structure in our resume and trait data, we project a **heterogeneous graph** into the Neo4j Graph Data Science (GDS) catalog. This graph includes nodes representing:

- **Resumes**
- **Noun Phrases** extracted from resume text
- **O*NET-style traits** such as Skills, Knowledge, Abilities, and Work Activities
- **Custom job traits** from the VOLCANO dataset

#### 🔗 Node Types

- `Resume`: Parsed resume documents
- `NounPhrase`: Extracted phrases representing capabilities
- `KSA`: O*NET-style Skills, Knowledge, and Abilities
- `Trait`: Supplementary traits (e.g., from VOLCANO or behavioral models)
- `JobTitle`: Standardized job roles
- `Occupation`: Broader occupational categories (SOC codes)


#### 📦 Graph Projection Code

We define the graph projection by specifying relevant node labels and relationship types, all as **undirected** to allow GDS algorithms to walk in both directions. The graph projection makes these relationships available in the GDS engine for advanced analytics.

```cypher
CALL gds.graph.drop('resume_trait_graph', false);  // optional: clean slate

CALL gds.graph.project(
  'resume_trait_graph',
  ['Resume', 'NounPhrase', 'Knowledge', 'Skill', 'Ability', 'Trait', 'JobTitle', 'Occupation'],
  {
    CONTAINS: {orientation: 'UNDIRECTED'},
    SIMILAR_TO: {orientation: 'UNDIRECTED'},
    ALIGNS_WITH: {orientation: 'UNDIRECTED'},
    REQUIRED_FOR: {orientation: 'UNDIRECTED'}
  }
);



```
***

## 3️⃣ Run Louvain Community Detection

With the graph projected, we apply Louvain community detection, a powerful unsupervised algorithm that finds clusters (or “communities”) of nodes that are densely interconnected. This algorithm identifies groups of resumes and traits that share common semantic features, enabling deeper understanding of:

- Workforce patterns

- Emerging talent clusters

- Trait-driven job readiness

```cypher
CALL gds.louvain.write('resume_trait_graph', {
  writeProperty: 'community_id'
})
YIELD communityCount, modularity;
```

**Expected Output:**
- `communityCount`: e.g. `433`
- `modularity`: e.g. `0.05` (very good separation)

***

## 4️⃣ List Detected Communities


```cypher
MATCH (r:Resume)
RETURN DISTINCT r.community_id AS cluster_id
ORDER BY cluster_id limit 5;
```
### 🧭 Detected Resume-Trait Communities

A total of 130 communities were identified using the Louvain algorithm on the projected resume-trait graph. Each represents a cluster of resumes, traits, and occupations that are semantically aligned based on shared KSAs and relationships. List shows the first 5 in Table 1.

| Community ID |
|--------------|
| 770          |
| 771          |
| 776          |
| 782          |
| 785          |


<center><b>Table 1:</b> An example of 5 (out of of 433) communities listed </center>



## 5️⃣ Profile Trait Dimensions by Cluster

To understand the psychological and functional traits that define each cluster of resumes, we aggregate **PCA-based trait dimensions** from the VOLCANO model across resumes in each community. The dimensions—**Cognitive**, **Operational**, and **Physical**—are averaged to generate a cluster-level profile.

The Cypher query below performs this aggregation:

```cypher
MATCH (r:Resume)-[:CONTAINS]->(:NounPhrase)-[:SIMILAR_TO]->(e)<-[:ALIGNS_WITH]-(t:Trait)
WHERE r.community_id IS NOT NULL
WITH r.community_id AS cluster,
     avg(t.score1) AS cognitive,
     avg(t.score2) AS operational,
     avg(t.score3) AS physical
RETURN cluster, cognitive, operational, physical
ORDER BY cluster;
```

This gives PCA-based cluster characteristics from the VOLCANO trait model.

***

## 6️⃣ Cluster Labeling Strategy (Heuristics)

The table below shows the resulting **trait profile centroids** for each cluster. These values represent the average PCA-based trait scores derived from the VOLCANO model across resumes in each cluster.


| Cluster | Cognitive         | Operational        | Physical           |
|---------|-------------------|--------------------|--------------------|
| 131     | 7.505101662566304 | -0.014893151421773457 | -0.5733118464005534 |
| 132     | 13.060301038105644 | -0.9730870853812928  | 1.2821258634535482  |
| 155     | 0.38670789535096  | 29.3247804184627   | 7.41428300231937   |
| 313     | -2.01285003691308 | -0.0440611642471945 | 0.598676677611171  |
| 370     | 7.16137488112576  | -0.24948529643466522 | 1.5247971610356683  |
| 383     | 6.643736332365739 | -0.6453283224454783 | 0.20573718932540833 |



<center><b>Table 2: </B>Resume Communities with their respective Average Traits scores</b></center>


### 💡 Interpretation Heuristics

Based on the dominant trait dimension(s) in Table 2, we can label clusters heuristically:

- **Cognitive-Dominant**: High cognitive score, low operational and physical (e.g., Cluster 433)
- **Operational-Heavy**: Very high operational score (e.g., Cluster 209)
- **Balanced**: Moderate scores across dimensions (e.g., Cluster 339)
- **Low Activity Profiles**: Negative or near-zero values (e.g., Cluster 315)

These labels help summarize the psychological and functional makeup of groups of resumes, which can then inform job fit, training priorities, or hiring strategies.




We use simple rules to label communities based on trait averages:

| Label                 | Criteria                                 |
|----------------------|-------------------------------------------|
| STEM-heavy           | `cognitive ≥ 10`, `operational ≤ 0`       |
| Operational/Admin    | `operational ≥ 4`                         |
| Physical/Trade       | `physical ≥ 2.5`                          |
| Generalist           | All values between 2–8                   |
| Creative/Outlier     | `cognitive > 12` or unusual combinations  |

<center><b>Table 3: </B>A Labeling rules for this project</b></center>

Each resume is assigned a `community_label`.

---

## 7️⃣ Apply Labels in Neo4j



```cypher
MATCH (r:Resume)
WHERE r.community_id = 132
SET r.community_label = "STEM-heavy";
```

Repeat for each cluster.

***



## 8️⃣ Visualize in Neo4j Bloom

- Color `Resume` nodes by `community_label`
- Query:
  ```cypher
  MATCH (r:Resume)-[:CONTAINS]->(:NounPhrase)-[:SIMILAR_TO]->()<-[:ALIGNS_WITH]-(t:Trait) WHERE r.community_id=132
  RETURN r, t
  ```
- Export PNGs for clusters of interest
<p>
<center>
  <img src="../images/stem-heavy.png" width="300"/>
  <br>
  <b>Figure 3:</b> A Stem Heavy Cluster
</center>
</p>


---



In [1]:
from IPython.display import display, HTML

from IPython.display import display, HTML

def show_two_samples(title1, text1, title2, text2):
    display(HTML(f"""
    <div style="display: flex; gap: 20px; font-family: 'Segoe UI', sans-serif;">
        <div style="
            flex: 1;
            border: 1px solid #ccc;
            background-color: #f9f9f9;
            padding: 12px;
            border-left: 5px solid #2b6cb0;
        ">
            <strong style="color: #2b6cb0;">{title1}</strong>
            <pre style="white-space: pre-wrap; margin: 10px 0 0 0;">{text1}</pre>
        </div>
        <div style="
            flex: 1;
            border: 1px solid #ccc;
            background-color: #f9f9f9;
            padding: 12px;
            border-left: 5px solid #38a169;
        ">
            <strong style="color: #38a169;">{title2}</strong>
            <pre style="white-space: pre-wrap; margin: 10px 0 0 0;">{text2}</pre>
        </div>
    </div>
    """))
show_two_samples(
    "Resume Snippet 1 (Operational)",
    "Education Details January 2018 M. S. Nutrition and Exercise Physiology New York, NY Teachers College, Columbia University January 2016 B. S. Nutrition and Dietetics Miami, FL Florida International University January 2011 B. Sc. General Microbiology Pune, Maharashtra Abasaheb Garware College Group Fitness Instructor, India Group Fitness Instructor, India - Columbia University Skill Details Company Details company - Columbia University description - Present Organized high energy weight training, cardiovascular and indoor cycling classes accommodating participants of varying age-groups, cultural backgrounds and fitness levels to help achieve their fitness goals. company - Columbia Dental School description - Provided detailed nutrition counselling and telephonic follow up to dental patients with accompanying metabolic conditions like diabetes, hypertension and obesity. ",    
    "Resume Snippet 2 (STEM-heavy)",
    "echnical Skills Application Servers: IIS 6. 0, Jboss 7. 1. Database: SQL, Oracle and DB2. Report Tool: iReport, Crystal report. Career GraphEducation Details Business Analyst Business Analyst - Zensar Technologies Ltd Skill Details CRYSTAL REPORT- Exprience - 15 months DATABASE- Exprience - 6 months DB2- Exprience - 6 months IIS- Exprience - 6 months IIS 6- Exprience - 6 monthsCompany Details company - Zensar Technologies Ltd description - Location: Goregoan, Mumbai ( Client -SUN Pharmaceutical ) Designation: Business Analyst. Role: Requirement gathering, gap analysis, support, end user training, documentation. company - Proteus Technologies Pvt Ltd description - Base Information Management Pvt. Ltd. Is a Mumbai base software service provider with core competency and proven track record of installations of Enterprise Wide Solutions."
)


## 🔄 Analyze Centrality or Similarity Using PageRank

After projecting the `resume_trait_graph`, we can apply **PageRank** to identify the most central and influential resumes in the graph.

### 📌 What Is PageRank?

**PageRank** is a centrality algorithm originally developed by Google to rank web pages. It measures the importance of a node based on the quantity and quality of its connections.

In the context of our graph:

- A **Resume** node with a high PageRank score is **highly connected** to influential traits or shares traits with many other resumes.
- A **Trait** or **KSA** node with high PageRank is connected to multiple resumes, job titles, or occupations — making it a **common or valuable capability**.

### 🧠 Why This Matters

PageRank helps us:

- **Identify generalist resumes** with broadly relevant or in-demand capabilities.
- **Rank traits** by their influence across occupations and resumes.
- Spot **hidden influencers**: resumes or traits that aren’t obviously connected but are structurally important in the network.
- Generate **better recommendations** (e.g., resumes to job matches, traits to prioritize).

This adds a layer of **graph-based insight** that complements community detection and trait profiling.

***

### 🛠️ Run PageRank in Neo4j GDS

We can apply the PageRank algorithm to our projected graph using the following Cypher command:

```cypher
CALL gds.pageRank.write('resume_trait_graph', {
  maxIterations: 20,
  dampingFactor: 0.85,
  writeProperty: 'pagerank'
});

MATCH (r:Resume)-[:CONTAINS]->(:NounPhrase)-[:SIMILAR_TO]->(ksa)<-[:ALIGNS_WITH]-(t:Trait)
WHERE r.pagerank > 0.01
RETURN r.name AS Resume, collect(DISTINCT t.trait_cluster) AS Traits, r.pagerank AS PageRank
ORDER BY r.pagerank DESC
LIMIT 20;
```
DevOps still showing improtant as our first basic graph analysis. But the others are different. Further 


| Resume              | Traits                                                                                           | PageRank           |
|---------------------|--------------------------------------------------------------------------------------------------|--------------------|
| DevOps Engineer     | ["Math", "Business", "Communication", "Cognitive", "Sensory", "Physical", "Oper&Cntrl"]          | 21.04113876313279  |
| Automation Testing  | ["Communication", "Math", "SciEng", "Sensory", "Physical", "Oper&Cntrl", "Cognitive"]            | 13.633899485658137 |
| Automation Testing  | ["Communication", "Cognitive", "Math", "Oper&Cntrl"]                                             | 12.767209075750184 |
| Database            | ["Physical", "Business", "Communication", "Cognitive", "Oper&Cntrl", "Sensory"]                  | 12.34166692290854  |
| DevOps Engineer     | ["Business", "Sensory", "Math", "Cognitive", "Oper&Cntrl", "Physical"]                           | 12.251018046734663 |
| Business Analyst    | ["Physical", "Cognitive", "Communication", "SciEng", "Business", "Sensory", "Math", "Oper&Cntrl"]| 12.033167351759545 |

<center><b>Table:</b> Top ranked graph nodes in the graph</center>
### Interpretation of Results

- **PageRank Scores:**  
  Indicate the relative importance of each resume node based on their connections to influential traits. The "DevOps Engineer" resume (21.04) demonstrates the strongest overall influence within the graph, highlighting it as a central and highly relevant profile.

- **Traits:**  
  Show job-relevant characteristics indirectly linked to resumes via NounPhrase and KSA nodes. Traits like **Communication**, **Math**, and **Cognitive** are common across resumes, suggesting universal skill requirements. Specialized traits (e.g., **SciEng** in Automation Testing and Business Analyst resumes) highlight role-specific competencies.

***

## ✅ Outcome

You now have:

- Trait-aware semantic clusters of resumes
- Trait PCA signatures per cluster
- Community labels like "STEM", "Ops", etc.
- Resume-trait-occupation alignment
- Graph structure suitable for further querying, filtering, and recommendation

***

### Graph Schema:

```cypher
CALL apoc.meta.schema() YIELD value RETURN apoc.convert.toJson(value) AS schema_json
```