## CV ‚Üí TruCV AI Parsing Prototype (Laboratory)

### Input: Resume (PDF / DOCX)

- Output: TruCV Draft JSON + confidence + warnings

- Uses: OpenAI API

### Does NOT handle:

- Verification

- Blockchain

- Database writes

- Auth

### üü¶ CELL 1 ‚Äî Imports & Global Config

In [36]:


#Core Python
import os
import re
import json
import uuid
from typing import TypedDict, Literal, Dict, Any, List, Optional, Annotated
import operator

# ---- Environment ----
from dotenv import load_dotenv

# ---- File Parsing ----
import pdfplumber
from docx import Document

# ---- LangGraph / LangChain ----
from langgraph.graph import StateGraph, START, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage

# ---- Schema / Validation ----
from pydantic import BaseModel, Field, ValidationError

#load env 
load_dotenv()

#selecting models
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    max_tokens=4000
)

#global veriable 
CONFIDENCE_THRESHOLD = 0.65
MAX_RETRIES_PER_SECTION = 1

SUPPORTED_SECTIONS = [
    "personal",
    "education",
    "experience",
    "skills",
    "projects",
    "awards"
]



## üü¶ CELL 2 ‚Äî TruCV Target Schema (CRITICAL)

In [37]:
#0 . Common / Reusable Models

class TruCVBaseModel(BaseModel):
    class Config:
        populate_by_name = True
        extra = "ignore"

# from is a Python keyword ‚Üí use from_ + alias="from"
class Duration(TruCVBaseModel):
    from_: Optional[str] = Field(None, alias="from")
    to: Optional[str] = None

#1 . Personal Section
class PersonalInfo(TruCVBaseModel):
    fullName: Optional[str] = None
    email: Optional[str] = None
    phone: Optional[str] = None
    city: Optional[str] = None
    linkedin: Optional[str] = None
    github: Optional[str] = None
    summary: Optional[str] = None
    imgUrl: Optional[str] = None

#2. Education Model
class Education(TruCVBaseModel):
    id: Optional[str] = None
    # üõ†Ô∏è ADDED: Missing from Python but present in TS Interface
    eduDocId: Optional[str] = None 
    level: Optional[str] = None
    boardNameOrDegree: Optional[str] = None
    institutionName: Optional[str] = None
    gpa: Optional[str] = None
    duration: Optional[Duration] = None

    selfAttested: bool = True
    docUri: Optional[str] = None
    issuerEmailId: Optional[str] = ""
    isEmailSend: bool = False

    verified: bool = False
    # üõ†Ô∏è FIXED: Removed "rejected" to match Mongoose enum ["pending", "verified"]
    status: Literal["pending", "verified"] = "pending"


#3. Experience Model
class Experience(TruCVBaseModel):
    id: Optional[str] = None
    companyName: Optional[str] = None
    jobRole: Optional[str] = None
    duration: Optional[Duration] = None
    skills: Optional[str] = None # Note: AI must format this as comma-separated string
    description: Optional[str] = None

    selfAttested: bool = True
    isEmailSend: bool = False
    docUri: Optional[str] = None
    issuerEmailId: Optional[str] = ""

    verified: bool = False
    # üõ†Ô∏è FIXED: Removed "rejected"
    status: Literal["pending", "verified"] = "pending"

#4. Skills Model
class Skill(TruCVBaseModel):
    id: Optional[str] = None
    skillName: Optional[str] = None
    level: Optional[str] = None

    selfAttested: bool = True
    # ‚ö†Ô∏è NOTE: Typo 'endoresBy' matches DB schema. DO NOT CORRECT.
    endoresBy: Optional[str] = ""
    endoresThrough: Optional[str] = ""

#5. Project Model
class Project(TruCVBaseModel):
    id: Optional[str] = None
    projectName: Optional[str] = None
    projectUrl: Optional[str] = None
    duration: Optional[Duration] = None
    skills: Optional[str] = None
    description: Optional[str] = None

    selfAttested: bool = True

#6. Award / Certificate Model
class Award(TruCVBaseModel):
    id: Optional[str] = None
    level: Optional[str] = None
    name: Optional[str] = None
    organisation: Optional[str] = None
    duration: Optional[Duration] = None
    description: Optional[str] = None

    selfAttested: bool = True
    issuerEmailId: Optional[str] = ""
    docUri: Optional[str] = ""
    isEmailSend: bool = False

    verified: bool = False
    # üõ†Ô∏è FIXED: Removed "rejected"
    status: Literal["pending", "verified"] = "pending"

# Final TruCV Root Model
class TruCVDraft(TruCVBaseModel):
    userId: Optional[str] = None
    title: Optional[str] = None

    personal: PersonalInfo

    educations: List[Education] = Field(default_factory=list)
    experiences: List[Experience] = Field(default_factory=list)
    skills: List[Skill] = Field(default_factory=list)
    projects: List[Project] = Field(default_factory=list)
    awards: List[Award] = Field(default_factory=list)

In [38]:
test_data = {
    "userId": "test_user",
    "title": "Auto Generated CV",
    "personal": {
        "fullName": "Test User",
        "email": "test@example.com"
    },
    "educations": [
        {
            "level": "Graduation",
            "institutionName": "Test University",
            "duration": {
                "from": "2020",
                "to": "2024"
            }
        }
    ]
}

cv = TruCVDraft(**test_data)
print(cv.model_dump(by_alias=True))


{'userId': 'test_user', 'title': 'Auto Generated CV', 'personal': {'fullName': 'Test User', 'email': 'test@example.com', 'phone': None, 'city': None, 'linkedin': None, 'github': None, 'summary': None, 'imgUrl': None}, 'educations': [{'id': None, 'eduDocId': None, 'level': 'Graduation', 'boardNameOrDegree': None, 'institutionName': 'Test University', 'gpa': None, 'duration': {'from': '2020', 'to': '2024'}, 'selfAttested': True, 'docUri': None, 'issuerEmailId': '', 'isEmailSend': False, 'verified': False, 'status': 'pending'}], 'experiences': [], 'skills': [], 'projects': [], 'awards': []}


## üü¶ CELL 3 ‚Äî Internal AI Parsing Schema (Intermediate)

In [39]:
class ParsedBaseModel(BaseModel):
    class Config:
        extra = "ignore"
 
# 1Ô∏è. Atomic Parsed Field
class ParsedField(ParsedBaseModel):
    value: Optional[str] = None
    confidence: float = Field(
        ge=0.0,
        le=1.0,
        description="Confidence score between 0 and 1"
    )
#2. Parsed Personal Section
class ParsedPersonal(ParsedBaseModel):
    fullName: Optional[ParsedField] = None
    email: Optional[ParsedField] = None
    phone: Optional[ParsedField] = None
    city: Optional[ParsedField] = None
    linkedin: Optional[ParsedField] = None
    github: Optional[ParsedField] = None
    summary: Optional[ParsedField] = None


#3.Parsed Education
class ParsedEducationEntry(ParsedBaseModel):
    level: Optional[ParsedField] = None
    boardNameOrDegree: Optional[ParsedField] = None
    institutionName: Optional[ParsedField] = None
    gpa: Optional[ParsedField] = None
    duration_from: Optional[ParsedField] = None
    duration_to: Optional[ParsedField] = None

class ParsedEducation(ParsedBaseModel):
    items: List[ParsedEducationEntry] = Field(default_factory=list)
    section_confidence: float = Field(ge=0.0, le=1.0)


#4. Parsed Experience
class ParsedExperienceEntry(ParsedBaseModel):
    companyName: Optional[ParsedField] = None
    jobRole: Optional[ParsedField] = None
    skills: Optional[ParsedField] = None
    description: Optional[ParsedField] = None
    duration_from: Optional[ParsedField] = None
    duration_to: Optional[ParsedField] = None

class ParsedExperience(ParsedBaseModel):
    items: List[ParsedExperienceEntry] = Field(default_factory=list)
    section_confidence: float = Field(ge=0.0, le=1.0)


#5.Parsed Skills 
class ParsedSkillEntry(ParsedBaseModel):
    skillName: Optional[ParsedField] = None
    level: Optional[ParsedField] = None

class ParsedSkills(ParsedBaseModel):
    items: List[ParsedSkillEntry] = Field(default_factory=list)
    section_confidence: float = Field(ge=0.0, le=1.0)


#6. Parsed Projects
class ParsedProjectEntry(ParsedBaseModel):
    projectName: Optional[ParsedField] = None
    projectUrl: Optional[ParsedField] = None
    skills: Optional[ParsedField] = None
    description: Optional[ParsedField] = None
    duration_from: Optional[ParsedField] = None
    duration_to: Optional[ParsedField] = None

class ParsedProjects(ParsedBaseModel):
    items: List[ParsedProjectEntry] = Field(default_factory=list)
    section_confidence: float = Field(ge=0.0, le=1.0)


#7. Parsed Awards
class ParsedAwardEntry(ParsedBaseModel):
    level: Optional[ParsedField] = None
    name: Optional[ParsedField] = None
    organisation: Optional[ParsedField] = None
    description: Optional[ParsedField] = None
    duration_from: Optional[ParsedField] = None
    duration_to: Optional[ParsedField] = None

class ParsedAwards(ParsedBaseModel):
    items: List[ParsedAwardEntry] = Field(default_factory=list)
    section_confidence: float = Field(ge=0.0, le=1.0)


#8. Parsed Resume
class ParsedResume(ParsedBaseModel):
    personal: Optional[ParsedPersonal] = None
    education: Optional[ParsedEducation] = None
    experience: Optional[ParsedExperience] = None
    skills: Optional[ParsedSkills] = None
    projects: Optional[ParsedProjects] = None
    awards: Optional[ParsedAwards] = None



## üü¶ CELL 4 ‚Äî Resume Upload (PDF / DOCX)

In [40]:
uploaded_file_path = "/home/ganesh/Desktop/LanChain-Framework/Ganesh-Agrahari-Resume.pdf" 

if not os.path.exists(uploaded_file_path):
    raise FileNotFoundError(f"File not found: {uploaded_file_path}")

file_ext = os.path.splitext(uploaded_file_path)[1].lower()

if file_ext not in [".pdf", ".docx"]:
    raise ValueError("Unsupported file type. Only PDF and DOCX are allowed.")

print(f"Resume file loaded successfully: {uploaded_file_path}")


Resume file loaded successfully: /home/ganesh/Desktop/LanChain-Framework/Ganesh-Agrahari-Resume.pdf


## üü¶ CELL 5 ‚Äî Text Extraction Layer (PDF / DOCX)

- Inputs
 uploaded_file_path

- Outputs
raw_text
normalized_text 

In [41]:
# üü¶ CELL 5 ‚Äî Visual Text Extraction (The "Human Eye" Approach)

import base64
from pdf2image import convert_from_path

def encode_image_base64(image):
    """Convert PIL Image to base64 string for OpenAI"""
    import io
    buffered = io.BytesIO()
    image.save(buffered, format="JPEG")
    return base64.b64encode(buffered.getvalue()).decode('utf-8')

def extract_text_via_vision(file_path: str, file_ext: str) -> str:
    """
    Uses GPT-4o Vision to read the resume layout visually.
    This solves multi-column overlap issues perfectly.
    """
    full_text = ""
    
    print("üëÄ Starting Vision Extraction...")

    if file_ext == ".pdf":
        # 1. Convert PDF pages to Images
        images = convert_from_path(file_path)
        
        for i, img in enumerate(images):
            print(f"   Processing Page {i+1}/{len(images)}...")
            base64_img = encode_image_base64(img)
            
            # 2. Ask GPT-4o to transcribe the image
            message = HumanMessage(
                content=[
                    {
                        "type": "text", 
                        "text": "You are a professional resume parser. Transcribe the text from this resume page exactly as it appears, but strictly preserving the logical reading order. If there are multiple columns, read the left column completely, then the right column. Do not mix text from different columns. Output only the raw text."
                    },
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/jpeg;base64,{base64_img}"}
                    }
                ]
            )
            
            response = llm.invoke([message])
            full_text += response.content + "\n\n"
            
    elif file_ext == ".docx":
        # DOCX is already linear, no need for vision
        doc = Document(file_path)
        full_text = "\n".join([para.text for para in doc.paragraphs])

    return full_text

# ---- EXECUTION ----
try:
    # This replaces the old pdfplumber logic entirely
    raw_text = extract_text_via_vision(uploaded_file_path, file_ext)
    
    if not raw_text.strip():
        raise ValueError("Vision extraction returned empty text.")

    # We still normalize to clean up any AI artifacts
    def normalize_text(text: str) -> str:
        text = re.sub(r"[‚Ä¢‚ó¶‚ñ™‚óè‚òÖÔÇß]", "-", text) # Bullets
        text = re.sub(r" +", " ", text)       # Extra spaces
        text = re.sub(r"\n\s*\n+", "\n\n", text) # Extra newlines
        return text.strip()

    normalized_text = normalize_text(raw_text)

    print(f"‚úÖ Extraction Complete! Length: {len(normalized_text)} chars")
    print("--- PREVIEW (First 500 chars) ---")
    print(normalized_text[:500])

except Exception as e:
    raise RuntimeError(f"Vision Extraction Failed: {str(e)}")

üëÄ Starting Vision Extraction...
   Processing Page 1/2...
   Processing Page 2/2...
‚úÖ Extraction Complete! Length: 5789 chars
--- PREVIEW (First 500 chars) ---
```
Ganesh Agrahari

ganeshagrahari08@gmail.com
+91 9044232872
Lucknow, India
My Portfolio
LinkedIn
GitHub

PROFILE
AI Engineer with hands-on experience building real-world, production-grade systems. Developed serverless AI microservices, Retrieval-Augmented Generation (RAG) systems using embedding models, and scalable vector search pipelines. Experienced with Azure Functions, GPT-4, Elasticsearch, LangGraph, and N8n-based automation workflows. Strong foundation in Python, Machine Learning, NLP,


In [42]:

print(normalized_text)


```
Ganesh Agrahari

ganeshagrahari08@gmail.com
+91 9044232872
Lucknow, India
My Portfolio
LinkedIn
GitHub

PROFILE
AI Engineer with hands-on experience building real-world, production-grade systems. Developed serverless AI microservices, Retrieval-Augmented Generation (RAG) systems using embedding models, and scalable vector search pipelines. Experienced with Azure Functions, GPT-4, Elasticsearch, LangGraph, and N8n-based automation workflows. Strong foundation in Python, Machine Learning, NLP, and cloud architecture, with proven experience self-hosting Elasticsearch and N8n on Azure Virtual Machines to deliver fast, reliable, and cost-efficient AI solutions at scale.

EDUCATION
BCA Data Science & Artificial Intelligence(in collaboration with IBM)
BBD University
08/2023 ‚Äì 09/2026 | Lucknow, India
Last year SGPA: 8

Intermediate(PCM)
SVM Inter College Ntpc
2022 | Raebareli, India
Percentage: 82%

ACTIVITY
HackerRun Problem Solving
03/2025 ‚Äì present

GitHub Streak Maintenance
10/202

## üü¶ CELL 6 ‚Äî Section Detection (Correct Way)
### Inputs
- normalized_text
### Outputs
- sections = {
    "personal": str,
    "education": str,
    "experience": str,
    "skills": str,
    "projects": str,
    "awards": str
}



In [43]:
# üü¶ CELL 6 ‚Äî Semantic Section Detection (The "Router")

# 1. Define the Router Schema (Strict Contract)
class SectionSplit(BaseModel):
    personal: str = Field(description="Raw text containing name, contact, links, summary, and profile image url")
    education: str = Field(description="Raw text containing degrees, universities, dates, and grades")
    experience: str = Field(description="Raw text containing job roles, companies, dates, and responsibilities")
    skills: str = Field(description="Raw text containing technical skills, soft skills, tools, and languages")
    projects: str = Field(description="Raw text containing project names, descriptions, and links")
    awards: str = Field(description="Raw text containing certifications, honors, and achievements")

def detect_sections_semantically(text: str) -> Dict[str, str]:
    print("üß† Detecting sections semantically (No Regex)...")
    
    # 2. System Prompt: The "Router" instructions
    system_prompt = """
    You are an expert Resume Segmenter. 
    Your task is to classify every single line of the provided resume text into one of these 6 buckets:
    - personal
    - education
    - experience
    - skills
    - projects
    - awards

    CRITICAL RULES:
    1. Do not summarize. Copy the EXACT text from the resume into the correct bucket.
    2. If a section is missing (e.g. no Awards), return an empty string for that key.
    3. If text is ambiguous (e.g. "React" inside a project description), keep it with the project.
    4. Personal section MUST include the Name, Phone, Email, and Links found at the top.
    """
    
    # 3. Use 'with_structured_output' to prevent JSON crashes
    # This forces the LLM to call a function, returning a valid Pydantic object
    router_llm = llm.with_structured_output(SectionSplit)
    
    try:
        response = router_llm.invoke([
            SystemMessage(content=system_prompt),
            HumanMessage(content=f"RESUME TEXT:\n{text}")
        ])
        
        # Convert Pydantic model to Python Dict
        sections = response.model_dump()
        return sections

    except Exception as e:
        raise RuntimeError(f"Semantic Section Detection Failed: {str(e)}")

# ---- EXECUTION ----
try:
    # No more 'USE_LLM_FALLBACK' logic. We trust the Intelligence Layer.
    resume_sections = detect_sections_semantically(normalized_text)

    # ---- VISUAL SANITY CHECK ----
    print(f"\n‚úÖ Sections Detected Successfully!")
    for k, v in resume_sections.items():
        # Print first 100 chars of each section to verify content
        preview = v[:100].replace('\n', ' ') + "..." if len(v) > 0 else "EMPTY"
        print(f"   üìÇ {k.upper():<12} : {len(v):<4} chars | {preview}")

    # Safety Check for key sections
    if len(resume_sections['experience']) < 20 and len(resume_sections['education']) < 20:
        print("\n‚ö†Ô∏è  WARNING: Critical sections (Experience/Education) seem empty.")
        
except Exception as e:
    print(f"‚ùå FATAL ERROR in Cell 6: {e}")

üß† Detecting sections semantically (No Regex)...

‚úÖ Sections Detected Successfully!
   üìÇ PERSONAL     : 102  chars | Ganesh Agrahari  ganeshagrahari08@gmail.com +91 9044232872 Lucknow, India My Portfolio LinkedIn GitH...
   üìÇ EDUCATION    : 219  chars | BCA Data Science & Artificial Intelligence(in collaboration with IBM) BBD University 08/2023 ‚Äì 09/20...
   üìÇ EXPERIENCE   : 1855 chars | AI Engineer Intern Edubuk 08/2025 ‚Äì Present - Built an AI-powered JD‚ÄìCV matching system combining ve...
   üìÇ SKILLS       : 980  chars | **Programming & Scripting:** - Python, JavaScript - Async Programming, REST API Development  **AI/ML...
   üìÇ PROJECTS     : 1427 chars | TruJobs ‚Äì AI Recruitment System (Edubuk) - Designed an AI-driven JD‚ÄìCV matching workflow that evalua...
   üìÇ AWARDS       : 265  chars | - Data Science Level 1 - IBM - Analytics in IBM Cognos - Machine Learning - Udemy - Cyber Security -...


## üü¶ CELL 7 ‚Äî Section Parsing (AI-Powered)

### Inputs
- sections
### Outputs
- parsed_sections = {
    "personal": {...},
    "education": [...],
    "experience": [...],
    "skills": [...],
    "projects": [...],
    "awards": [...]
}

In [44]:
# üü¶ CELL 7 ‚Äî Prompt Templates (Read-Only Configuration)

# ------------------------------------------------------------------
# GLOBAL INSTRUCTION (Appended to all prompts)
# ------------------------------------------------------------------
BASE_INSTRUCTION = """
You are a strict data extraction AI.
INPUT: Raw text from a specific section of a resume.
OUTPUT: Valid JSON matching the exact schema requested.

RULES:
1. DATA TYPES: Return every field as an object with two keys:
   - "value": The extracted string (or null if not found).
   - "confidence": A float (0.0 to 1.0) indicating certainty.
   
2. DATES: Format all dates as "YYYY-MM" (e.g., "2023-08"). 
   - If currently active, use "Present".
   - If only year is available, use "YYYY".
   
3. MISSING DATA: 
   - If a field is not found, set "value": null and "confidence": 0.0.
   - DO NOT hallucinate or invent data.

4. NO MARKDOWN: Return ONLY the raw JSON string. No ```json blocks.
"""

# ------------------------------------------------------------------
# 1. PERSONAL SECTION
# ------------------------------------------------------------------
PROMPT_PERSONAL = f"""
{BASE_INSTRUCTION}

EXTRACT THESE FIELDS:
- fullName: The candidate's full name.
- email: Valid email address.
- phone: Phone number (normalize to standard format).
- city: City/Location (e.g. "Lucknow, India").
- linkedin: Full LinkedIn URL.
- github: Full GitHub URL.
- summary: A brief professional summary (max 3-4 lines).

SCHEMA TARGET:
{{
  "fullName": {{ "value": "...", "confidence": 1.0 }},
  "email": {{ "value": "...", "confidence": 1.0 }},
  "phone": {{ "value": "...", "confidence": 0.8 }},
  "city": {{ "value": "...", "confidence": 0.9 }},
  "linkedin": {{ "value": "...", "confidence": 0.95 }},
  "github": {{ "value": "...", "confidence": 0.95 }},
  "summary": {{ "value": "...", "confidence": 0.8 }}
}}
"""

# ------------------------------------------------------------------
# 2. EDUCATION SECTION
# ------------------------------------------------------------------
PROMPT_EDUCATION = f"""
{BASE_INSTRUCTION}

EXTRACT A LIST OF EDUCATION ENTRIES.
For each entry, extract:
- level: "Grade 10", "Grade 12", "Undergraduate", "Postgraduate", or "PhD".
- boardNameOrDegree: Degree name (e.g. "B.Tech Computer Science", "CBSE").
- institutionName: University or School name.
- gpa: CGPA or Percentage (as a string, e.g. "8.5/10" or "82%").
- duration_from: Start date.
- duration_to: End date (or "Present").

SCHEMA TARGET:
{{
  "items": [
    {{
      "level": {{ "value": "...", "confidence": 0.9 }},
      "boardNameOrDegree": {{ "value": "...", "confidence": 0.9 }},
      "institutionName": {{ "value": "...", "confidence": 0.95 }},
      "gpa": {{ "value": "...", "confidence": 0.8 }},
      "duration_from": {{ "value": "2020-08", "confidence": 0.9 }},
      "duration_to": {{ "value": "2024-06", "confidence": 0.9 }}
    }}
  ],
  "section_confidence": 0.95
}}
"""

# ------------------------------------------------------------------
# 3. EXPERIENCE SECTION
# ------------------------------------------------------------------
PROMPT_EXPERIENCE = f"""
{BASE_INSTRUCTION}

EXTRACT A LIST OF PROFESSIONAL EXPERIENCES.
For each entry, extract:
- companyName: Name of the company.
- jobRole: Title (e.g. "Senior Software Engineer").
- description: Bullet points describing responsibilities (preserve newlines).
- skills: A comma-separated string of tools/skills mentioned IN THIS ROLE (e.g. "Python, Azure, Docker").
- duration_from: Start date.
- duration_to: End date (or "Present").

SCHEMA TARGET:
{{
  "items": [
    {{
      "companyName": {{ "value": "...", "confidence": 0.95 }},
      "jobRole": {{ "value": "...", "confidence": 0.95 }},
      "description": {{ "value": "...", "confidence": 0.85 }},
      "skills": {{ "value": "Java, Spring Boot", "confidence": 0.8 }},
      "duration_from": {{ "value": "...", "confidence": 0.9 }},
      "duration_to": {{ "value": "...", "confidence": 0.9 }}
    }}
  ],
  "section_confidence": 0.9
}}
"""

# ------------------------------------------------------------------
# 4. SKILLS SECTION
# ------------------------------------------------------------------
PROMPT_SKILLS = f"""
{BASE_INSTRUCTION}

EXTRACT A LIST OF SKILLS.
For each skill:
- skillName: The specific skill (e.g. "React.js", "Python").
- level: infer "Beginner", "Intermediate", or "Expert" based on context. Default to "Intermediate" if unsure.

SCHEMA TARGET:
{{
  "items": [
    {{
      "skillName": {{ "value": "Python", "confidence": 1.0 }},
      "level": {{ "value": "Expert", "confidence": 0.8 }}
    }}
  ],
  "section_confidence": 0.9
}}
"""

# ------------------------------------------------------------------
# 5. PROJECTS SECTION
# ------------------------------------------------------------------
PROMPT_PROJECTS = f"""
{BASE_INSTRUCTION}

EXTRACT A LIST OF PROJECTS.
For each entry:
- projectName: Name of the project.
- projectUrl: Link to code or demo (if available).
- description: Brief description of what was built.
- skills: Comma-separated list of tech stack used.
- duration_from: Start date (optional).
- duration_to: End date (optional).

SCHEMA TARGET:
{{
  "items": [
    {{
      "projectName": {{ "value": "...", "confidence": 0.9 }},
      "projectUrl": {{ "value": "...", "confidence": 0.95 }},
      "description": {{ "value": "...", "confidence": 0.8 }},
      "skills": {{ "value": "...", "confidence": 0.8 }},
      "duration_from": {{ "value": null, "confidence": 0.0 }},
      "duration_to": {{ "value": null, "confidence": 0.0 }}
    }}
  ],
  "section_confidence": 0.9
}}
"""

# ------------------------------------------------------------------
# 6. AWARDS SECTION
# ------------------------------------------------------------------
PROMPT_AWARDS = f"""
{BASE_INSTRUCTION}

EXTRACT A LIST OF AWARDS/CERTIFICATIONS.
For each entry:
- name: Name of the award or certificate.
- organisation: Issuing body (e.g. "AWS", "Google", "University").
- level: "National", "International", "University", or "Other".
- description: Any details.
- duration_from: Date received.

SCHEMA TARGET:
{{
  "items": [
    {{
      "name": {{ "value": "AWS Certified Solutions Architect", "confidence": 0.95 }},
      "organisation": {{ "value": "Amazon Web Services", "confidence": 0.9 }},
      "level": {{ "value": "International", "confidence": 0.7 }},
      "description": {{ "value": "...", "confidence": 0.6 }},
      "duration_from": {{ "value": "2023-01", "confidence": 0.9 }},
      "duration_to": {{ "value": null, "confidence": 0.0 }}
    }}
  ],
  "section_confidence": 0.9
}}
"""

## Cell 8 ‚Äî Section Parsing Functions (Pure & Stateless)

In [45]:
# üü¶ CELL 8 ‚Äî Section Parsing Functions (Pure & Stateless)

def _fetch_parsed_data(section_name: str, text: str, model: Any, prompt: str) -> Any:
    """
    Generic helper to call LLM with structured output.
    Returns the Pydantic model instance (e.g., ParsedEducation).
    """
    # 1. Fail fast if text is empty (save API cost)
    if not text or len(text.strip()) < 10:
        print(f"   ‚è© Skipping {section_name}: Text empty or too short.")
        # Return an empty instance with 0 confidence
        return model(section_confidence=0.0)

    print(f"   ‚ö° Parsing {section_name.upper()} ({len(text)} chars)...")
    
    # 2. Bind the specific Pydantic model to the LLM
    structured_llm = llm.with_structured_output(model)
    
    try:
        # 3. Invoke with the specific Prompt from Cell 7
        response = structured_llm.invoke([
            SystemMessage(content=prompt),
            HumanMessage(content=f"SECTION TEXT:\n{text}")
        ])
        return response
        
    except Exception as e:
        print(f"   ‚ùå Error parsing {section_name}: {e}")
        # Return safe fallback on crash
        return model(section_confidence=0.0)

# ------------------------------------------------------------------
# 1. Personal
# ------------------------------------------------------------------
def parse_personal(text: str) -> ParsedPersonal:
    return _fetch_parsed_data("personal", text, ParsedPersonal, PROMPT_PERSONAL)

# ------------------------------------------------------------------
# 2. Education
# ------------------------------------------------------------------
def parse_education(text: str) -> ParsedEducation:
    return _fetch_parsed_data("education", text, ParsedEducation, PROMPT_EDUCATION)

# ------------------------------------------------------------------
# 3. Experience
# ------------------------------------------------------------------
def parse_experience(text: str) -> ParsedExperience:
    return _fetch_parsed_data("experience", text, ParsedExperience, PROMPT_EXPERIENCE)

# ------------------------------------------------------------------
# 4. Skills
# ------------------------------------------------------------------
def parse_skills(text: str) -> ParsedSkills:
    return _fetch_parsed_data("skills", text, ParsedSkills, PROMPT_SKILLS)

# ------------------------------------------------------------------
# 5. Projects
# ------------------------------------------------------------------
def parse_projects(text: str) -> ParsedProjects:
    return _fetch_parsed_data("projects", text, ParsedProjects, PROMPT_PROJECTS)

# ------------------------------------------------------------------
# 6. Awards
# ------------------------------------------------------------------
def parse_awards(text: str) -> ParsedAwards:
    return _fetch_parsed_data("awards", text, ParsedAwards, PROMPT_AWARDS)

# ---- QUICK TEST (Sanity Check) ----
# We test with the 'experience' text we extracted in Cell 6
if "resume_sections" in globals() and resume_sections["experience"]:
    test_exp = parse_experience(resume_sections["experience"])
    print("\n‚úÖ Test Parse Experience Result:")
    print(json.dumps(test_exp.model_dump(), indent=2))
else:
    print("‚ö†Ô∏è No experience text found in global state to test.")

   ‚ö° Parsing EXPERIENCE (1855 chars)...

‚úÖ Test Parse Experience Result:
{
  "items": [
    {
      "companyName": {
        "value": "Edubuk",
        "confidence": 0.95
      },
      "jobRole": {
        "value": "AI Engineer Intern",
        "confidence": 0.95
      },
      "skills": {
        "value": "Azure, OpenAI, Elasticsearch",
        "confidence": 0.8
      },
      "description": {
        "value": "- Built an AI-powered JD\u2013CV matching system combining vector similarity for accuracy and LLM-based analysis for contextual understanding, enabling high-precision candidate\u2013job matching at scale.\n- Designed and deployed a serverless architecture on Azure using Azure Functions, including migration from an earlier AWS Lambda\u2013based serverless setup, ensuring improved scalability and operational consistency.\n- Implemented a Retrieval-Augmented pipeline using OpenAI text-embedding-3-large for semantic embeddings and GPT-4 for deep context evaluation between job 

## CELL 9 ‚Äî LangGraph State Definition

In [46]:
# üü¶ CELL 9 ‚Äî LangGraph State Definition

# This class defines the schema of the shared memory used by the graph.
# Every node receives this state, modifies it, and passes it on.

class GraphState(TypedDict):
    # 1. RAW INPUTS
    raw_text: str                   # The clean text from Cell 5
    
    # 2. INTERMEDIATE (The "Router" Output)
    sections: Dict[str, str]        # The 6 semantic buckets from Cell 6
    
    # 3. AI PARSING RESULTS (The Pydantic Models from Cell 3)
    # These start as None and get filled by the parallel workers
    personal: Optional[ParsedPersonal]
    education: Optional[ParsedEducation]
    experience: Optional[ParsedExperience]
    skills: Optional[ParsedSkills]
    projects: Optional[ParsedProjects]
    awards: Optional[ParsedAwards]
    
    # 4. ORCHESTRATION & CONTROL
    # Tracks how confident we are about each section
    confidence_map: Dict[str, float] 
    
    # Tracks how many times we have retried a specific section
    retry_counts: Dict[str, int]
    
    # 5. FINAL OUTPUT
    # The "Draft JSON" that matches newCv.model.ts (Cell 2)
    trucv_draft: Optional[Dict[str, Any]] 
    
    # 6. LOGS
    errors: List[str]               # Accumulates error messages

## üü¶ CELL 10 ‚Äî Confidence Scoring Engine

In [None]:
# üü¶ CELL 10 ‚Äî Confidence Scoring Engine (Business Logic Layer)

def calculate_personal_score(personal: ParsedPersonal) -> (float, List[str]):
    """Calculates weighted confidence for Personal section."""
    score = 1.0
    warnings = []
    
    # CRITICAL FIELDS (Heavy Penalty)
    if not personal.fullName.value:
        score -= 0.4
        warnings.append("CRITICAL: Name is missing.")
    if not personal.email.value:
        score -= 0.3
        warnings.append("CRITICAL: Email is missing.")
    if not personal.phone.value:
        score -= 0.1
        warnings.append("WARNING: Phone number is missing.")
        
    # ENRICHMENT FIELDS (Light Penalty)
    if not personal.linkedin.value:
        score -= 0.05
        warnings.append("INFO: LinkedIn profile not found.")
    if not personal.city.value:
        score -= 0.05
    
    # Average in the AI's own confidence for the found fields
    ai_conf = (personal.fullName.confidence + personal.email.confidence) / 2
    final_score = (score * 0.7) + (ai_conf * 0.3)
    
    return max(0.0, final_score), warnings

def calculate_education_score(education: ParsedEducation) -> (float, List[str]):
    """Checks for missing dates, institution names, and degrees."""
    if not education.items:
        return 0.0, ["WARNING: No education entries found."]
    
    total_score = 0.0
    warnings = []
    
    for i, item in enumerate(education.items):
        item_score = 1.0
        
        # Check Criticals
        if not item.institutionName.value:
            item_score -= 0.3
            warnings.append(f"Edu #{i+1}: Institution name missing.")
        if not item.boardNameOrDegree.value:
            item_score -= 0.2
            warnings.append(f"Edu #{i+1}: Degree/Board missing.")
            
        # Check Dates (Critical for Verification)
        if not item.duration_from.value:
            item_score -= 0.15
            warnings.append(f"Edu #{i+1}: Start date missing.")
            
        # Check GPA (Useful but not critical)
        if not item.gpa.value:
            warnings.append(f"Edu #{i+1}: GPA/Percentage missing.")
            
        total_score += item_score

    avg_score = total_score / len(education.items)
    return max(0.0, avg_score), warnings

def calculate_experience_score(experience: ParsedExperience) -> (float, List[str]):
    if not experience.items:
        # It's okay for freshers to have no experience, but we flag it
        return 1.0, ["INFO: No experience detected (Fresher?)."]
    
    total_score = 0.0
    warnings = []
    
    for i, item in enumerate(experience.items):
        item_score = 1.0
        
        if not item.companyName.value:
            item_score -= 0.3
            warnings.append(f"Exp #{i+1}: Company name missing.")
        if not item.jobRole.value:
            item_score -= 0.2
            warnings.append(f"Exp #{i+1}: Job role missing.")
            
        # Description check
        desc = item.description.value or ""
        if len(desc) < 20:
            item_score -= 0.1
            warnings.append(f"Exp #{i+1}: Description is too short or empty.")
            
        total_score += item_score

    avg_score = total_score / len(experience.items)
    return max(0.0, avg_score), warnings

# ---- MAIN NODE FUNCTION ----

def node_confidence_scoring(state: GraphState):
    print("üìä [Scoring] Analyzing extracted data quality...")
    
    confidence_map = {}
    all_warnings = []
    
    # 1. Personal
    if state["personal"]:
        p_score, p_warn = calculate_personal_score(state["personal"])
        confidence_map["personal"] = round(p_score, 2)
        all_warnings.extend(p_warn)
    else:
        confidence_map["personal"] = 0.0
        all_warnings.append("CRITICAL: Personal section failed to parse.")

    # 2. Education
    if state["education"]:
        e_score, e_warn = calculate_education_score(state["education"])
        confidence_map["education"] = round(e_score, 2)
        all_warnings.extend(e_warn)
        
    # 3. Experience
    if state["experience"]:
        exp_score, exp_warn = calculate_experience_score(state["experience"])
        confidence_map["experience"] = round(exp_score, 2)
        all_warnings.extend(exp_warn)
        
    # (We can add Skills/Projects scoring similarly, but these 3 are core)
    
    # Update state
    # We append new warnings to any existing errors
    current_errors = state.get("errors", [])
    current_errors.extend(all_warnings)
    
    print(f"   Scores: {confidence_map}")
    if all_warnings:
        print(f"   Warnings: {len(all_warnings)} detected.")

    return {
        "confidence_map": confidence_map,
        "errors": current_errors
    }

## CELL 11 ‚Äî Retry Low-Confidence Sections

In [None]:
# üü¶ CELL 11 ‚Äî Retry Logic (Self-Correction Layer)

def node_retry_logic(state: GraphState):
    """
    Checks confidence scores. If any section is below threshold,
    it triggers an immediate re-parse with a stricter prompt.
    """
    print("üîÑ [Retry] Evaluating need for self-correction...")
    
    # Get current state data
    conf_map = state.get("confidence_map", {})
    sections = state["sections"]
    
    # We will build a dictionary of updates
    updates = {}
    new_logs = []
    
    # List of sections to check
    # (Matches keys in state and confidence_map)
    check_list = ["personal", "education", "experience", "skills", "projects", "awards"]
    
    for section in check_list:
        score = conf_map.get(section, 0.0)
        
        # 1. Check if score is unacceptable
        if score < CONFIDENCE_THRESHOLD and score > 0.0:
            print(f"   ‚ö†Ô∏è Low confidence in '{section}' ({score}). Attempting Retry...")
            
            # 2. Get the specific text again
            text = sections.get(section, "")
            
            # 3. Construct a "Repair Prompt"
            # We add a specific instruction to be more careful
            repair_instruction = f"""
            PREVIOUS ATTEMPT FAILED. 
            The confidence score was low ({score}).
            
            CRITICAL INSTRUCTION:
            - Look closer for missing dates or names.
            - If the text is truly empty/missing, return null values with 0 confidence.
            - Do not guess.
            """
            
            # 4. Select the correct parsing function & model dynamically
            if section == "personal":
                # We append the repair instruction to the original prompt
                new_result = _fetch_parsed_data("personal", text, ParsedPersonal, PROMPT_PERSONAL + repair_instruction)
                updates["personal"] = new_result
                
            elif section == "education":
                new_result = _fetch_parsed_data("education", text, ParsedEducation, PROMPT_EDUCATION + repair_instruction)
                updates["education"] = new_result

            elif section == "experience":
                new_result = _fetch_parsed_data("experience", text, ParsedExperience, PROMPT_EXPERIENCE + repair_instruction)
                updates["experience"] = new_result
                
            elif section == "skills":
                new_result = _fetch_parsed_data("skills", text, ParsedSkills, PROMPT_SKILLS + repair_instruction)
                updates["skills"] = new_result
                
            elif section == "projects":
                new_result = _fetch_parsed_data("projects", text, ParsedProjects, PROMPT_PROJECTS + repair_instruction)
                updates["projects"] = new_result
            
            elif section == "awards":
                new_result = _fetch_parsed_data("awards", text, ParsedAwards, PROMPT_AWARDS + repair_instruction)
                updates["awards"] = new_result
            
            new_logs.append(f"RETRY: Re-parsed {section} (Prev Score: {score})")
            
        else:
            # Score is good, or section is empty (0.0). No action.
            pass

    if not updates:
        print("   ‚úÖ No retries needed. Quality is sufficient.")
    else:
        print(f"   ‚úÖ Retried {len(updates)} sections.")
        # We append the log of what we did
        current_errors = state.get("errors", [])
        current_errors.extend(new_logs)
        updates["errors"] = current_errors

    return updates

## üü¶ CELL 12 ‚Äî Assemble TruCV Draft (Deterministic Mapping Layer)

In [None]:
# üü¶ CELL 12 ‚Äî Assemble TruCV Draft (Deterministic Mapping Layer)

def safe_val(field: Optional[ParsedField]) -> str:
    """Helper: Extracts string value from ParsedField, defaulting to empty string."""
    if field and field.value:
        return field.value.strip()
    return ""

def create_duration(start: Optional[ParsedField], end: Optional[ParsedField]) -> Duration:
    """Helper: Creates a TruCV Duration object."""
    return Duration(
        from_=safe_val(start),
        to=safe_val(end)
    )

def node_assemble_draft(state: GraphState):
    print("üèóÔ∏è [Assembly] Mapping AI data to TruCV Schema...")
    
    # 1. SETUP DEFAULTS
    # In a real app, userId comes from the request context
    user_id_mock = "user_12345_mock_id" 
    cv_title = "Uploaded Resume Draft"

    # 2. MAP PERSONAL SECTION
    p_data = state.get("personal")
    personal_obj = PersonalInfo(
        fullName=safe_val(p_data.fullName) if p_data else "",
        email=safe_val(p_data.email) if p_data else "",
        phone=safe_val(p_data.phone) if p_data else "",
        city=safe_val(p_data.city) if p_data else "",
        linkedin=safe_val(p_data.linkedin) if p_data else "",
        github=safe_val(p_data.github) if p_data else "",
        summary=safe_val(p_data.summary) if p_data else "",
        imgUrl="" # AI doesn't extract images yet
    )

    # 3. MAP EDUCATION
    educations_list = []
    if state.get("education") and state["education"].items:
        for item in state["education"].items:
            educations_list.append(Education(
                id=str(uuid.uuid4()), # Generate distinct ID for frontend keys
                eduDocId=str(uuid.uuid4()), # Placeholder for DB requirement
                level=safe_val(item.level),
                boardNameOrDegree=safe_val(item.boardNameOrDegree),
                institutionName=safe_val(item.institutionName),
                gpa=safe_val(item.gpa),
                duration=create_duration(item.duration_from, item.duration_to),
                # Defaults enforced by Schema in Cell 2
                selfAttested=True,
                verified=False,
                status="pending"
            ))

    # 4. MAP EXPERIENCE
    experiences_list = []
    if state.get("experience") and state["experience"].items:
        for item in state["experience"].items:
            experiences_list.append(Experience(
                id=str(uuid.uuid4()),
                companyName=safe_val(item.companyName),
                jobRole=safe_val(item.jobRole),
                description=safe_val(item.description),
                skills=safe_val(item.skills), # AI returns comma-separated string, matching DB
                duration=create_duration(item.duration_from, item.duration_to),
                selfAttested=True,
                verified=False,
                status="pending"
            ))

    # 5. MAP SKILLS
    skills_list = []
    if state.get("skills") and state["skills"].items:
        for item in state["skills"].items:
            skills_list.append(Skill(
                id=str(uuid.uuid4()),
                skillName=safe_val(item.skillName),
                level=safe_val(item.level) or "Intermediate",
                selfAttested=True
            ))

    # 6. MAP PROJECTS
    projects_list = []
    if state.get("projects") and state["projects"].items:
        for item in state["projects"].items:
            projects_list.append(Project(
                id=str(uuid.uuid4()),
                projectName=safe_val(item.projectName),
                projectUrl=safe_val(item.projectUrl),
                description=safe_val(item.description),
                skills=safe_val(item.skills),
                duration=create_duration(item.duration_from, item.duration_to),
                selfAttested=True
            ))

    # 7. MAP AWARDS
    awards_list = []
    if state.get("awards") and state["awards"].items:
        for item in state["awards"].items:
            awards_list.append(Award(
                id=str(uuid.uuid4()),
                name=safe_val(item.name),
                organisation=safe_val(item.organisation),
                level=safe_val(item.level),
                description=safe_val(item.description),
                duration=create_duration(item.duration_from, item.duration_to),
                selfAttested=True,
                verified=False,
                status="pending"
            ))

    # 8. CONSTRUCT ROOT OBJECT
    final_draft = TruCVDraft(
        userId=user_id_mock,
        title=cv_title,
        personal=personal_obj,
        educations=educations_list,
        experiences=experiences_list,
        skills=skills_list,
        projects=projects_list,
        awards=awards_list
    )

    # 9. VALIDATION & RETURN
    # This automatically validates against Pydantic rules from Cell 2
    try:
        draft_dict = final_draft.model_dump(by_alias=True)
        print(f"‚úÖ Assembly Successful! Generated {len(educations_list)} Edu, {len(experiences_list)} Exp entries.")
        return {"trucv_draft": draft_dict}
    except ValidationError as e:
        print(f"‚ùå Validation Error during assembly: {e}")
        return {"errors": [str(e)]}

## üü¶ CELL 13 ‚Äî Final Graph Compilation & Execution

In [54]:
# üü¶ CELL 13 ‚Äî Final Graph Compilation & Execution
# This runs AFTER all logic cells (8, 11, 12, 13) are defined.

# ==========================================
# 1. DEFINE PARSER NODES (Wrappers)
# ==========================================
# These wrapper functions bridge the gap between "GraphState" and your "Pure Functions"

def node_detect_sections(state: GraphState):
    print("üö¶ [Router] Splitting resume into sections...")
    return {"sections": detect_sections_semantically(state["raw_text"])}

def node_parse_personal(state: GraphState):
    print("   üë§ [Personal] Parsing...")
    return {"personal": parse_personal(state["sections"].get("personal", ""))}

def node_parse_education(state: GraphState):
    print("   üéì [Education] Parsing...")
    return {"education": parse_education(state["sections"].get("education", ""))}

def node_parse_experience(state: GraphState):
    print("   üíº [Experience] Parsing...")
    return {"experience": parse_experience(state["sections"].get("experience", ""))}

def node_parse_skills(state: GraphState):
    print("   üõ†Ô∏è [Skills] Parsing...")
    return {"skills": parse_skills(state["sections"].get("skills", ""))}

def node_parse_projects(state: GraphState):
    print("   üöÄ [Projects] Parsing...")
    return {"projects": parse_projects(state["sections"].get("projects", ""))}

def node_parse_awards(state: GraphState):
    print("   üèÜ [Awards] Parsing...")
    return {"awards": parse_awards(state["sections"].get("awards", ""))}

# NOTE: node_confidence_scoring, node_retry_logic, and node_assemble_draft 
# are ALREADY defined in Cells 11, 12, and 13. We use them directly below.

# ==========================================
# 2. BUILD THE GRAPH
# ==========================================

workflow = StateGraph(GraphState)

# A. Add All Nodes
workflow.add_node("detect_sections", node_detect_sections)
workflow.add_node("parse_personal", node_parse_personal)
workflow.add_node("parse_education", node_parse_education)
workflow.add_node("parse_experience", node_parse_experience)
workflow.add_node("parse_skills", node_parse_skills)
workflow.add_node("parse_projects", node_parse_projects)
workflow.add_node("parse_awards", node_parse_awards)

# The Logic Nodes from Cells 11, 12, 13
workflow.add_node("confidence_scoring", node_confidence_scoring)
workflow.add_node("retry_logic", node_retry_logic)
workflow.add_node("assemble_draft", node_assemble_draft)

# B. Wire the Edges
workflow.add_edge(START, "detect_sections")

# Fan-Out
workflow.add_edge("detect_sections", "parse_personal")
workflow.add_edge("detect_sections", "parse_education")
workflow.add_edge("detect_sections", "parse_experience")
workflow.add_edge("detect_sections", "parse_skills")
workflow.add_edge("detect_sections", "parse_projects")
workflow.add_edge("detect_sections", "parse_awards")

# Fan-In
workflow.add_edge("parse_personal", "confidence_scoring")
workflow.add_edge("parse_education", "confidence_scoring")
workflow.add_edge("parse_experience", "confidence_scoring")
workflow.add_edge("parse_skills", "confidence_scoring")
workflow.add_edge("parse_projects", "confidence_scoring")
workflow.add_edge("parse_awards", "confidence_scoring")

# Linear Finish
workflow.add_edge("confidence_scoring", "retry_logic")
workflow.add_edge("retry_logic", "assemble_draft")
workflow.add_edge("assemble_draft", END)

# C. Compile
app = workflow.compile()

# ==========================================
# 3. FINAL EXECUTION
# ==========================================
print("\nüöÄ STARTING FINAL TRUCV PIPELINE...")

final_state = app.invoke({
    "raw_text": normalized_text,
    "retry_counts": {}, 
    "errors": []
})

print("\nüèÅ PIPELINE FINISHED.")

# ==========================================
# 4. DISPLAY RESULT
# ==========================================
import json

if final_state.get("trucv_draft"):
    print("\n‚úÖ FINAL JSON OUTPUT (Ready for MongoDB):")
    print(json.dumps(final_state["trucv_draft"], indent=2))
    
    # Save to file for inspection
    with open("trucv_draft.json", "w") as f:
        json.dump(final_state["trucv_draft"], f, indent=2)
    print("\nüíæ Saved to 'trucv_draft.json'")
else:
    print("\n‚ùå Errors prevented draft creation:")
    print(final_state["errors"])


üöÄ STARTING FINAL TRUCV PIPELINE...
üö¶ [Router] Splitting resume into sections...
üß† Detecting sections semantically (No Regex)...
   üèÜ [Awards] Parsing...
   ‚ö° Parsing AWARDS (265 chars)...
   üéì [Education] Parsing...
   ‚ö° Parsing EDUCATION (219 chars)...
   üíº [Experience] Parsing...
   ‚ö° Parsing EXPERIENCE (1855 chars)...
   üë§ [Personal] Parsing...
   ‚ö° Parsing PERSONAL (102 chars)...
   üöÄ [Projects] Parsing...
   ‚ö° Parsing PROJECTS (1427 chars)...
   üõ†Ô∏è [Skills] Parsing...
   ‚ö° Parsing SKILLS (980 chars)...
üìä [Scoring] Analyzing extracted data quality...
   Scores: {'personal': 0.96, 'education': 1.0, 'experience': 1.0}
üîÑ [Retry] Evaluating need for self-correction...
   ‚úÖ No retries needed. Quality is sufficient.
üèóÔ∏è [Assembly] Mapping AI data to TruCV Schema...
‚úÖ Assembly Successful! Generated 2 Edu, 3 Exp entries.

üèÅ PIPELINE FINISHED.

‚úÖ FINAL JSON OUTPUT (Ready for MongoDB):
{
  "userId": "user_12345_mock_id",
  "title":

## CELL 14 ‚Äî Final Response Object & Schema Audit

In [None]:
# üü¶ CELL 14 ‚Äî Final Response Object & Schema Audit

import json

# 1. Construct the Final API Response
# This mimics the JSON payload your Azure Function will return to the UI
api_response = {
    "status": "draft_created" if final_state.get("trucv_draft") else "failed",
    "trucvDraft": final_state.get("trucv_draft", {}),
    "confidenceMeta": final_state.get("confidence_map", {}),
    "warnings": final_state.get("errors", [])
}

# 2. Display the Payload
print("\nüì¶ FINAL API RESPONSE PAYLOAD:")
print(json.dumps(api_response, indent=2))

# ====================================================
# 3. AUTOMATED COMPATIBILITY CHECK (The "Sanity Check")
# ====================================================
print("\nüîç RUNNING SCHEMA COMPATIBILITY AUDIT...")

draft = api_response["trucvDraft"]
issues = []

# CHECK 1: Root Keys
required_root_keys = ["personal", "educations", "experiences", "skills", "projects", "awards"]
for key in required_root_keys:
    if key not in draft:
        issues.append(f"‚ùå Root key missing: '{key}'")

# CHECK 2: Personal Section Details
if "personal" in draft:
    p = draft["personal"]
    # Check a few critical fields required by Mongoose
    if "fullName" not in p: issues.append("‚ùå Personal: 'fullName' is missing")
    if "email" not in p: issues.append("‚ùå Personal: 'email' is missing")

# CHECK 3: Education Structure (Array + IDs)
if draft.get("educations"):
    edu = draft["educations"][0]
    if "id" not in edu: issues.append("‚ùå Education: 'id' (UUID) is missing")
    if "eduDocId" not in edu: issues.append("‚ùå Education: 'eduDocId' is missing (Backend requires this)")
    if "status" not in edu: issues.append("‚ùå Education: 'status' is missing")
    if "duration" not in edu or "from" not in edu["duration"]: 
        issues.append("‚ùå Education: 'duration' structure is incorrect")

# CHECK 4: Experience Structure
if draft.get("experiences"):
    exp = draft["experiences"][0]
    if "companyName" not in exp: issues.append("‚ùå Experience: 'companyName' missing")
    if "skills" not in exp: issues.append("‚ùå Experience: 'skills' field missing")
    # Verify strict status enum
    if exp.get("status") not in ["pending", "verified"]:
         issues.append(f"‚ùå Experience: Invalid status '{exp.get('status')}' (Must be 'pending')")

# VERDICT
if not issues:
    print("‚úÖ SUCCESS: Generated JSON is 100% compatible with newCv.model.ts")
    print("   Ready for Database Insertion.")
else:
    print("‚ö†Ô∏è COMPATIBILITY ISSUES DETECTED:")
    for issue in issues:
        print(issue)


üì¶ FINAL API RESPONSE PAYLOAD:
{
  "status": "draft_created",
  "trucvDraft": {
    "userId": "user_12345_mock_id",
    "title": "Uploaded Resume Draft",
    "personal": {
      "fullName": "Ganesh Agrahari",
      "email": "ganeshagrahari08@gmail.com",
      "phone": "+91 9044232872",
      "city": "Lucknow, India",
      "linkedin": "",
      "github": "",
      "summary": "",
      "imgUrl": ""
    },
    "educations": [
      {
        "id": "e6f86a6d-8b55-4a2f-ad5a-43e0be5e0e29",
        "eduDocId": "f9af1c3c-dced-4d26-ac3f-5cf73b32a945",
        "level": "Undergraduate",
        "boardNameOrDegree": "BCA Data Science & Artificial Intelligence",
        "institutionName": "BBD University",
        "gpa": "8",
        "duration": {
          "from": "2023-08",
          "to": "2026-09"
        },
        "selfAttested": true,
        "docUri": null,
        "issuerEmailId": "",
        "isEmailSend": false,
        "verified": false,
        "status": "pending"
      },
      {


In [57]:
# üü¶ CELL 15 ‚Äî Manual Sanity Check & Final Verdict
# This cell proves that the output isn't just "valid JSON", but "TruCV Compliant".

print("üõ°Ô∏è RUNNING TRUCV COMPLIANCE CHECKS...\n")

draft = api_response["trucvDraft"]
meta = api_response["confidenceMeta"]
warnings = api_response["warnings"]

# 1. VERIFY "TRUST NO ONE" POLICY
# Rule: AI must never mark data as 'verified'.
verified_flags = []
if "educations" in draft:
    verified_flags.extend([e["verified"] for e in draft["educations"]])
if "experiences" in draft:
    verified_flags.extend([e["verified"] for e in draft["experiences"]])

if any(verified_flags):
    print("‚ùå SECURITY FAILURE: AI marked some data as 'verified'. This is dangerous.")
else:
    print("‚úÖ TRUST CHECK PASSED: All data is correctly marked 'unverified'.")

# 2. VERIFY DATA INTEGRITY
# Rule: We need at least one contact method and one content section.
has_contact = bool(draft["personal"].get("email") or draft["personal"].get("phone"))
has_content = bool(draft["educations"] or draft["experiences"] or draft["projects"])

if has_contact and has_content:
    print("‚úÖ DATA CHECK PASSED: Draft contains contact info + content.")
else:
    print("‚ö†Ô∏è DATA WARNING: Result looks empty. Check text extraction.")

# 3. VERIFY CONFIDENCE TRANSPARENCY
# Rule: We must expose how confident we are.
if meta and len(meta) > 0:
    print(f"‚úÖ TRANSPARENCY PASSED: Confidence scores generated for {list(meta.keys())}.")
else:
    print("‚ùå FAILURE: No confidence metadata found.")

# 4. FINAL HUMAN REPORT
print(f"\nüìä FINAL SUMMARY REPORT")
print(f"   - Status:      {api_response['status']}")
print(f"   - ID:          {draft.get('userId', 'N/A')}")
print(f"   - Warnings:    {len(warnings)} detected")
if len(warnings) > 0:
    print(f"     First Warning: '{warnings[0]}'")

print("\nüéØ FINAL VERDICT:")
if not any(verified_flags) and has_contact and has_content:
    print("   üöÄ PROTOTYPE SUCCESSFUL. READY FOR AZURE MIGRATION.")
else:
    print("   üõë PROTOTYPE NEEDS REFINEMENT.")

üõ°Ô∏è RUNNING TRUCV COMPLIANCE CHECKS...

‚úÖ TRUST CHECK PASSED: All data is correctly marked 'unverified'.
‚úÖ DATA CHECK PASSED: Draft contains contact info + content.
‚úÖ TRANSPARENCY PASSED: Confidence scores generated for ['personal', 'education', 'experience'].

üìä FINAL SUMMARY REPORT
   - Status:      draft_created
   - ID:          user_12345_mock_id

üéØ FINAL VERDICT:
   üöÄ PROTOTYPE SUCCESSFUL. READY FOR AZURE MIGRATION.
