# 🔍 Graph-Based Resume Community Detection (Neo4j + GDS)

This notebook documents the entire process of building a semantic graph from resumes and traits, projecting it into GDS, and running community detection + profiling.

---

## 1️⃣ Graph Construction Overview

We construct the graph with the following entities and relationships:

- `Resume` → [:CONTAINS] → `NounPhrase`
- `NounPhrase` → [:SIMILAR_TO] → `Skill` | `Ability` | `Knowledge` | `Work_activities`
- Each KSA node ← [:ALIGNS_WITH] ←`Trait`
- Each KSA node → [:REQUIRED_FOR] → `JobTitle`
- `Occupation` → [:ALIGNED_WITH] → `JobTitle` (from VOLCANO)

This enables multi-hop reasoning from resume → cognitive traits, allowing semantic community detection.

---

## 2️⃣ Project GDS Graph

We project the graph into the GDS catalog with all relevant relationships:

```cypher
CALL gds.graph.drop('resume_trait_graph', false);  // optional

CALL gds.graph.project(
  'resume_trait_graph',
  ['Resume', 'NounPhrase', 'Skill', 'Knowledge', 'Ability', 'Work_activities', 'Trait'],
  {
    CONTAINS: {orientation: "UNDIRECTED"},
    SIMILAR_TO: {orientation: "UNDIRECTED"},
    ALIGNS_WITH: {orientation: "UNDIRECTED"},
    REQUIRED_FOR: {orientation: "UNDIRECTED"}
  }
);
```

---

## 3️⃣ Run Louvain Community Detection

```cypher
CALL gds.louvain.write('resume_trait_graph', {
  writeProperty: 'community_id'
})
YIELD communityCount, modularity;
```

**Expected Output:**
- `communityCount`: e.g. `123`
- `modularity`: e.g. `0.64` (very good separation)

---

## 4️⃣ List Detected Communities

```cypher
MATCH (r:Resume)
RETURN DISTINCT r.community_id AS cluster_id
ORDER BY cluster_id;
```

---

## 5️⃣ Profile Trait Dimensions by Cluster

```cypher
MATCH (r:Resume)-[:CONTAINS]->(:NounPhrase)-[:SIMILAR_TO]->(e)<-[:ALIGNS_WITH]-(t:Trait)
WHERE r.community_id IS NOT NULL
WITH r.community_id AS cluster,
     avg(t.score1) AS cognitive,
     avg(t.score2) AS operational,
     avg(t.score3) AS physical
RETURN cluster, cognitive, operational, physical
ORDER BY cluster;
```

This gives PCA-based cluster characteristics from the VOLCANO trait model.

---

## 6️⃣ Cluster Labeling Strategy (Heuristics)

Clustering Result:
╒═══════╤══════════════════╤════════════════════╤════════════════════╕
│cluster│cognitive         │operational         │physical            │
╞═══════╪══════════════════╪════════════════════╪════════════════════╡
│207    │0.38670789535096  │29.3247804184627    │7.41428300231937    │
├───────┼──────────────────┼────────────────────┼────────────────────┤
│332    │14.122563498083675│-0.2672349079859577 │4.639234039857368   │
├───────┼──────────────────┼────────────────────┼────────────────────┤
│484    │11.434283129537272│-3.5229864021632813 │0.674122818837789   │
├───────┼──────────────────┼────────────────────┼────────────────────┤
│490    │12.25644218406835 │-0.864641028306843  │1.1053546768676394  │
├───────┼──────────────────┼────────────────────┼────────────────────┤
│491    │-2.01285003691308 │-0.0440611642471945 │0.598676677611171   │
├───────┼──────────────────┼────────────────────┼────────────────────┤
│495    │11.65572099455933 │-0.7626344927208326 │0.8279293242866523  │
├───────┼──────────────────┼────────────────────┼────────────────────┤
│496    │1.88686011992082  │4.348865081724812   │3.111179334352151   │
├───────┼──────────────────┼────────────────────┼────────────────────┤
│501    │2.9345980626130816│-1.617953260069513  │-0.13767113680853268│
├───────┼──────────────────┼────────────────────┼────────────────────┤
│503    │4.961068695512465 │0.12322606648393532 │2.6374791793107333  │
├───────┼──────────────────┼────────────────────┼────────────────────┤
│515    │8.99305534299731  │-8.031407122097     │2.35613584129461    │
├───────┼──────────────────┼────────────────────┼────────────────────┤
│524    │13.124399403645608│-3.873987375510551  │3.5022661515900677  │
├───────┼──────────────────┼────────────────────┼────────────────────┤
│527    │5.988954750525826 │-0.42209227337705973│0.6246666958857904  │
├───────┼──────────────────┼────────────────────┼────────────────────┤
│529    │7.272591080144173 │-1.6467373033892077 │1.102038315172581   │
├───────┼──────────────────┼────────────────────┼────────────────────┤
│540    │5.714125760478576 │-0.6339339950412302 │0.6719899158314191  │
├───────┼──────────────────┼────────────────────┼────────────────────┤
│544    │4.986810798423939 │-0.43540814436369746│2.181013667656739   │
├───────┼──────────────────┼────────────────────┼────────────────────┤
│555    │6.261663329843539 │-1.5964533906676188 │0.8762355399512414  │
├───────┼──────────────────┼────────────────────┼────────────────────┤
│556    │10.090082669972437│-3.632679636196256  │1.9327821612458462  │
├───────┼──────────────────┼────────────────────┼────────────────────┤
│570    │3.4837531319298245│-0.1995762246119926 │2.0553233667327455  │
├───────┼──────────────────┼────────────────────┼────────────────────┤
│578    │9.958298185588319 │-1.0691551550813119 │0.4170179148150373  │
├───────┼──────────────────┼────────────────────┼────────────────────┤
│686    │8.681365660258507 │-3.02720453185064   │2.5475420076381114  │
└───────┴──────────────────┴────────────────────┴────────────────────┘

We use simple rules to label communities based on trait averages:

| Label                 | Criteria                                 |
|----------------------|-------------------------------------------|
| STEM-heavy           | `cognitive ≥ 10`, `operational ≤ 0`       |
| Operational/Admin    | `operational ≥ 4`                         |
| Physical/Trade       | `physical ≥ 3.5`                          |
| Generalist           | All values between 2–8                   |
| Creative/Outlier     | `cognitive > 12` or unusual combinations  |

Each resume is assigned a `community_label`.

---

## 7️⃣ (Optional) Apply Labels in Neo4j



```cypher
MATCH (r:Resume)
WHERE r.community_id = 332
SET r.community_label = "STEM-heavy";
```

Repeat for each cluster.

---

## 8️⃣ Visualize in Neo4j Bloom

- Color `Resume` nodes by `community_label`
- Query:
  ```cypher
  MATCH (r:Resume)-[:CONTAINS]->(:NounPhrase)-[:SIMILAR_TO]->()-[:ALIGNS_WITH]->(t:Trait)
  RETURN r, t
  ```
- Export PNGs for clusters of interest

---

## 9️⃣ (Optional): Analyze Centrality or Similarity

Find resumes most connected via shared traits:

```cypher
CALL gds.pageRank.write('resume_trait_graph', {
  maxIterations: 20,
  dampingFactor: 0.85,
  writeProperty: 'pagerank'
});
```

---

## ✅ Outcome

You now have:

- Trait-aware semantic clusters of resumes
- Trait PCA signatures per cluster
- Community labels like "STEM", "Ops", etc.
- Resume-trait-occupation alignment
- Graph structure suitable for further querying, filtering, and recommendation

---
