(Resume)
  └── CONTAINS → (NounPhrase)
        └── SIMILAR_TO → (KSA)
              ◀── ALIGNS_WITH ─ (Trait)



# 🔍 Graph-Based Resume Community Detection (Neo4j + GDS)

This notebook documents the entire process of building a semantic graph from resumes and traits, projecting it into GDS, and running community detection + profiling.

---

## 1️⃣ Graph Construction Overview

We construct the graph with the following entities and relationships:

- `Resume` → [:CONTAINS] → `NounPhrase`
- `NounPhrase` → [:SIMILAR_TO] → `Skill` | `Ability` | `Knowledge` | `Work_activities`
- Each KSA node ← [:ALIGNS_WITH] ←`Trait`
- Each KSA node → [:REQUIRED_FOR] → `JobTitle`
- `Occupation` → [:ALIGNED_WITH] → `JobTitle` (from VOLCANO)

This enables multi-hop reasoning from resume → cognitive traits, allowing semantic community detection.

---

## 2️⃣ Project GDS Graph

We project the graph into the GDS catalog with all relevant relationships:

```cypher
CALL gds.graph.drop('resume_trait_graph', false);  // optional

CALL gds.graph.project(
  'resume_trait_graph',
  ['Resume', 'NounPhrase', 'Skill', 'Knowledge', 'Ability', 'Work_activities', 'Trait'],
  {
    CONTAINS: {orientation: "UNDIRECTED"},
    SIMILAR_TO: {orientation: "UNDIRECTED"},
    ALIGNS_WITH: {orientation: "UNDIRECTED"},
    REQUIRED_FOR: {orientation: "UNDIRECTED"}
  }
);
```

---

## 3️⃣ Run Louvain Community Detection

```cypher
CALL gds.louvain.write('resume_trait_graph', {
  writeProperty: 'community_id'
})
YIELD communityCount, modularity;
```

**Expected Output:**
- `communityCount`: e.g. `123`
- `modularity`: e.g. `0.64` (very good separation)

---

## 4️⃣ List Detected Communities

```cypher
MATCH (r:Resume)
RETURN DISTINCT r.community_id AS cluster_id
ORDER BY cluster_id;
```

---

## 5️⃣ Profile Trait Dimensions by Cluster

```cypher
MATCH (r:Resume)-[:CONTAINS]->(:NounPhrase)-[:SIMILAR_TO]->(e)<-[:ALIGNS_WITH]-(t:Trait)
WHERE r.community_id IS NOT NULL
WITH r.community_id AS cluster,
     avg(t.score1) AS cognitive,
     avg(t.score2) AS operational,
     avg(t.score3) AS physical
RETURN cluster, cognitive, operational, physical
ORDER BY cluster;
```

This gives PCA-based cluster characteristics from the VOLCANO trait model.

---

## 6️⃣ Cluster Labeling Strategy (Heuristics)

We use simple rules to label communities based on trait averages:

| Label                 | Criteria                                 |
|----------------------|-------------------------------------------|
| STEM-heavy           | `cognitive ≥ 10`, `operational ≤ 0`       |
| Operational/Admin    | `operational ≥ 4`                         |
| Physical/Trade       | `physical ≥ 3.5`                          |
| Generalist           | All values between 2–8                   |
| Creative/Outlier     | `cognitive > 12` or unusual combinations  |

Each resume is assigned a `community_label`.

---

## 7️⃣ (Optional) Apply Labels in Neo4j

```cypher
MATCH (r:Resume)
WHERE r.community_id = 332
SET r.community_label = "STEM-heavy";
```

Repeat for each cluster.

---

## 8️⃣ Visualize in Neo4j Bloom

- Color `Resume` nodes by `community_label`
- Query:
  ```cypher
  MATCH (r:Resume)-[:CONTAINS]->(:NounPhrase)-[:SIMILAR_TO]->()-[:ALIGNS_WITH]->(t:Trait)
  RETURN r, t
  ```
- Export PNGs for clusters of interest

---

## 9️⃣ (Optional): Analyze Centrality or Similarity

Find resumes most connected via shared traits:

```cypher
CALL gds.pageRank.write('resume_trait_graph', {
  maxIterations: 20,
  dampingFactor: 0.85,
  writeProperty: 'pagerank'
});
```

---

## ✅ Outcome

You now have:

- Trait-aware semantic clusters of resumes
- Trait PCA signatures per cluster
- Community labels like "STEM", "Ops", etc.
- Resume-trait-occupation alignment
- Graph structure suitable for further querying, filtering, and recommendation

---
