<a href="https://colab.research.google.com/github/framunoz/cv-analyser-with-rag/blob/feature%2Fkaggle-colab-notebooks/notebooks/colab-ai-driven-cv-optimisation-with-rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📝 Optimize Your CV with AI to Beat the ATS Filters

## 1. Welcome! What Does This Notebook Do?

**Applying for jobs often feels like sending your CV into a black hole, right?** Many companies use Applicant Tracking Systems (ATS) – automated software that scans your CV for keywords and relevant experience *before* a human ever sees it. If your CV doesn't match what the ATS is looking for, it might get filtered out, even if you're a great fit for the role!

**This notebook is your personal AI assistant to help you get noticed.** It guides you through tailoring parts of your CV to better match a specific job description, increasing your chances of passing the ATS scan and impressing the hiring manager.

**Here's how it works in simple terms:**

1.  **You Provide:** You'll give the notebook your CV (as a PDF or JSON file in Google Drive) and the text of the job description you're interested in.
2.  **AI Reads & Understands:** The AI reads your CV to understand your skills and experience. It creates a special "smart index" that captures the *meaning* of your experience, not just the words.
3.  **AI Finds Matches:** Using the job description, the AI searches its "smart index" to find the parts of your CV (like specific jobs or projects) that are the **most relevant** matches for *that specific role*.
4.  **AI Helps Rewrite:** You'll chat with the AI assistant. It will suggest ways to rewrite the descriptions of your most relevant experiences, helping you incorporate important keywords from the job description naturally. This makes your CV more ATS-friendly and impactful.
5.  **You Review & Use:** You review the AI's suggestions and copy the improved text snippets into your actual CV document.

**Who is this for?**

Anyone applying for jobs who wants a smarter, faster way to customize their CV for each application. **No programming or AI knowledge is needed!** Just follow the steps.

**Ready to get started?** We'll guide you through each step below.

## 2. Setup: Getting Ready

This section gets the notebook ready to work. We'll install some necessary software tools and then guide you on how to securely connect the notebook to Google's AI services using your own unique key (called an API key).

### 2.1. Install Necessary Tools

This next code cell installs some extra software packages that this notebook needs to do its job. These tools help it read PDF files, connect to Google's AI services, manage the "smart index" of your CV, and handle files.

**➡️ Just click the Play button (►) on the cell below to run it.** It might take a minute or two to finish installing everything. You'll see some messages scroll by, and it's done when you see a "✅ Libraries installed successfully!" message.

In [1]:
# @title Install Required Libraries (Click the ► button to run)
# This command installs the necessary Python packages for the notebook.
# It might take a minute or two to complete.
# The '-q' flag makes the output less noisy (quieter).
# ruff: noqa: T201, F811

# Uninstall conflicting library (optional step from original notebook, may help in Colab)
!pip uninstall -qqy jupyterlab

# Install Google AI, PDF reader, Vector DB, GDrive downloader (if used), YAML handler
# We install needed versions of google-genai and typing_extensions for compatibility.
!pip install -q google-genai pdfplumber chromadb gdown PyYAML typing-extensions==4.12.0

print("✅ Libraries installed successfully!")

[0m✅ Libraries installed successfully!


### 2.2. Add Your Secure API Key

To use Google's powerful AI (like the Gemini model that powers this notebook), you need a personal "API Key". Think of it like a secure password that gives this notebook permission to use the AI service on your behalf. It's important to keep this key private.

Google Colab provides a secure way to store this key using the **Secrets Manager**.

**➡️ How to Add Your Key (Choose ONE method):**

Look for the **🔑 (key) icon** in the left sidebar of Colab. Click it to open the Secrets panel.

* **Method 1: Import from Google AI Studio (Recommended if you already have a key there)**
    1.  In the Secrets panel (🔑), find the dropdown menu labeled **"Gemini API keys"**.
    2.  Select **"Import key from Google AI Studio"**.
    3.  Follow the on-screen prompts to sign in and import your key. Colab should automatically name it `GOOGLE_API_KEY`. Make sure the **"Notebook access" toggle is turned ON** (☑️) for this secret.

* **Method 2: Add Manually**
    1.  If you don't have a key in AI Studio, you can get one from [Google AI Studio](https://aistudio.google.com/app/apikey) (or Google Cloud Console if you use that). Click "Create API key".
    2.  In Colab's Secrets panel (🔑), click **"+ Add new secret"**.
    3.  For the **Name**, enter **exactly** `GOOGLE_API_KEY`.
    4.  Paste your generated API key into the **Value** field.
    5.  **IMPORTANT:** Make sure the **"Notebook access" toggle is turned ON** (☑️) for this secret.
    6.  Close the Secrets panel.

**After adding the secret using one of these methods, run the next code cell (click ►) to load it.**

In [2]:
# @title Load Google AI API Key from Colab Secrets (Click ► to run)
from google.colab import userdata
import sys  # Needed to potentially stop execution

# Simplified error messages for easier understanding
_API_KEY_ERROR_MESSAGES = {
    "NOT_FOUND": """
❌ **Secret 'GOOGLE_API_KEY' not found.**
   Did you add the secret using the 🔑 icon in the left sidebar?
   Make sure the name is exactly 'GOOGLE_API_KEY' and that 'Notebook access' is turned ON (☑️).
    """,
    "EMPTY": """
❌ **Secret 'GOOGLE_API_KEY' was found but is empty.**
   Please check the value you pasted into the secret using the 🔑 icon. It should not be blank.
    """,
    "IMPORT_ERROR": """
❌ **Could not access Colab Secrets.**
   This notebook is designed for Google Colab. If you are in Colab, try restarting the runtime (Runtime -> Restart runtime) and run the cells again.
    """,
    "UNKNOWN": """
❌ **An unexpected error occurred while loading the API key.**
   Details: {error}
   You might need to check your Colab Secrets setup again.
    """,
}

try:
    # Attempt to retrieve the secret named 'GOOGLE_API_KEY'
    GOOGLE_API_KEY = userdata.get("GOOGLE_API_KEY")

    if GOOGLE_API_KEY is None:
        # Secret not found or access not enabled
        print(_API_KEY_ERROR_MESSAGES["NOT_FOUND"])
        # Stop execution if key not found
        sys.exit("Stopping: API Key is required.")
    elif not GOOGLE_API_KEY:  # Checks for empty string
        # Secret found but has no value
        print(_API_KEY_ERROR_MESSAGES["EMPTY"])
        # Stop execution if key is empty
        sys.exit("Stopping: API Key cannot be empty.")
    else:
        # Key loaded successfully
        print("✅ GOOGLE_API_KEY loaded successfully from Colab Secrets.")
        # Optional: Uncomment below to verify the first few chars (for debugging)
        # print(f"(Key starts with: {GOOGLE_API_KEY[:4]}...)")

except ImportError:
    # Handle case where userdata is not available (running outside Colab)
    print(_API_KEY_ERROR_MESSAGES["IMPORT_ERROR"])
    GOOGLE_API_KEY = None
    sys.exit("Stopping: Notebook must run in Colab to use Secrets.")
except Exception as e:
    # Handle any other unexpected errors during secret retrieval
    print(_API_KEY_ERROR_MESSAGES["UNKNOWN"].format(error=e))
    GOOGLE_API_KEY = None
    sys.exit(f"Stopping due to error loading API Key: {e}")

# If the code reaches here without exiting, the key is loaded and valid (not None or empty).

✅ GOOGLE_API_KEY loaded successfully from Colab Secrets.


### 2.3. Helper Functions (Retry Logic)

Sometimes, when contacting the Google AI service, it might be temporarily busy or unavailable (like getting a busy signal on a phone line). To handle this smoothly, the next code cell defines a small helper function.

This helper allows the notebook to automatically try contacting the service again a few times if it encounters these specific temporary issues. This makes the whole process more reliable without you having to manually re-run things for minor glitches. You don't need to worry about the details of the code itself.

In [3]:
# @title Define Helper Function for API Retries (click to expand)
from google import genai
from google.api_core import retry


def is_retriable(exception: Exception) -> bool:
    """Checks if an exception is a known temporary/retriable Google API Error."""
    return isinstance(exception, genai.errors.APIError) and exception.code in {429, 503}

## 3. Your Information & Job Details

In [4]:
# @title Step 3.1: Connect Google Drive & Specify Folder (Click ►)
from google.colab import drive
from pathlib import Path
import sys  # To stop execution on error

# @markdown ---
# @markdown ### 1. Connect to Google Drive
# @markdown Run this cell. It will ask for permission to connect to your Google Drive. Click "Connect to Google Drive", choose your account, and click "Allow".
try:
    drive.mount(
        "/content/drive", force_remount=True
    )  # force_remount helps if connection is stale
    print("\n✅ Google Drive connected successfully!")
except Exception as e:
    print(f"\n❌ Error mounting Google Drive: {e}")
    print("   Please ensure you allow access when prompted.")
    sys.exit("Stopping: Google Drive connection failed.")  # Stop execution

# @markdown ---
# @markdown ### 2. Enter Folder Path & Options
# @markdown Now, enter the full path to the folder **inside your Google Drive** where your CV file (`resume.pdf` or `resume.json`) is located.
# @markdown **Example:** `/content/drive/MyDrive/ia-driven-cv-opt`
DRIVE_FOLDER_PATH = "/content/drive/MyDrive/ia-driven-cv-opt"  # @param {type:"string"}

# @markdown Check the box below ONLY if you want the notebook to re-process your `resume.pdf`, even if a `resume.json` file already exists in the folder (useful if you updated the PDF recently).
FORCE_PDF_REPARSE = False  # @param {type:"boolean"}
# @markdown ---

# --- Path Validation ---
if not DRIVE_FOLDER_PATH:
    print(
        "\n❌ Error: Please enter the path to your Google Drive folder in the form"
        " above and run this cell again."
    )
    sys.exit("Stopping: Drive folder path is required.")

# Use pathlib for easier path handling
_base_path = Path(DRIVE_FOLDER_PATH)

if not _base_path.exists():
    print(
        "\n❌ Error: The specified folder path does not seem to exist:"
        f" '{DRIVE_FOLDER_PATH}'"
    )
    print(
        "   Please double-check the path you entered. Make sure it starts with"
        " '/content/drive/MyDrive' and that the folder exists in your Google Drive."
    )
    sys.exit("Stopping: Invalid Drive folder path.")
elif not _base_path.is_dir():
    print(f"\n❌ Error: The specified path is not a folder: '{DRIVE_FOLDER_PATH}'")
    print("   Please ensure you provide the path to a folder, not directly to a file.")
    sys.exit("Stopping: Path is not a folder.")
else:
    print(f"\n✅ Folder path confirmed: '{_base_path}'")
    # Define derived paths for later use
    CV_JSON_PATH = _base_path / "resume.json"
    CV_PDF_PATH = _base_path / "resume.pdf"
    # Define ChromaDB path within the user's specified folder
    CHROMA_DB_PATH = _base_path / "cv_chroma_db"
    print(f"   Will look for CV JSON at: '{CV_JSON_PATH}'")
    print(f"   Will look for CV PDF at: '{CV_PDF_PATH}'")
    print(f"   Vector database will be stored in: '{CHROMA_DB_PATH}'")

print("\nProcessing options:")
if FORCE_PDF_REPARSE:
    print(
        "   - Will attempt to process 'resume.pdf' (ignoring any existing"
        " 'resume.json')."
    )
else:
    print(
        "   - Will first look for 'resume.json'. If not found, will look for"
        " 'resume.pdf'."
    )

# Ensure the directory for ChromaDB exists
try:
    CHROMA_DB_PATH.mkdir(parents=True, exist_ok=True)
    print(f"\n✅ Directory for vector database ensured at: '{CHROMA_DB_PATH}'")
except Exception as e:
    print(f"\n❌ Error creating directory for vector database: {e}")
    sys.exit("Stopping: Could not create database directory.")

Mounted at /content/drive

✅ Google Drive connected successfully!

✅ Folder path confirmed: '/content/drive/MyDrive/ia-driven-cv-opt'
   Will look for CV JSON at: '/content/drive/MyDrive/ia-driven-cv-opt/resume.json'
   Will look for CV PDF at: '/content/drive/MyDrive/ia-driven-cv-opt/resume.pdf'
   Vector database will be stored in: '/content/drive/MyDrive/ia-driven-cv-opt/cv_chroma_db'

Processing options:
   - Will first look for 'resume.json'. If not found, will look for 'resume.pdf'.

✅ Directory for vector database ensured at: '/content/drive/MyDrive/ia-driven-cv-opt/cv_chroma_db'


### Step 3.2: Provide the Job Description

In the text box in the **next cell**, please **delete the example text** and **paste the full job description** you are applying for.

**➡️ IMPORTANT:** After you paste or edit the job description in the box below, **you MUST run that cell (click its ► button)**. This saves the description so the notebook can use it for the analysis.

In [5]:
# @title Enter Job Description Here (Then Click ►)
import ipywidgets as widgets
from IPython.display import display
import sys  # To potentially stop execution

# Default Job Description Text (from original notebook [Source: 76-87])
# This helps the user see the expected format and provides a starting point.
default_job_desc = """Job Title: AI/Machine Learning Engineer

Company: Innovate Solutions Inc.

Location: Remote (US Based)

About Us:
Innovate Solutions Inc. is at the forefront of applying artificial intelligence to solve real-world business challenges. We foster a collaborative environment where creative thinking and technical excellence drive our success. We are passionate about building intelligent systems that deliver significant value to our clients across various industries. Join our growing team and help shape the future of applied AI.

About the Role:
We are seeking a talented and motivated AI/Machine Learning Engineer to join our core development team. You will play a key role in the end-to-end lifecycle of machine learning projects, from conceptualization and data exploration to model deployment and monitoring. You'll work closely with data scientists, software engineers, and product managers to build innovative AI-powered features and products.

Responsibilities:
- Design, develop, train, and deploy machine learning models (including deep learning models) for tasks such as NLP, predictive analytics, anomaly detection, and personalization.
- Process, cleanse, and verify the integrity of large datasets used for analysis and model training.
- Collaborate with data engineering teams to build and maintain robust data pipelines for ML workflows.
- Implement and maintain MLOps best practices for model versioning, testing, deployment, and monitoring.
- Stay current with the latest advancements in AI/ML techniques, tools, and platforms.
- Analyze experimental results, iterate on models, and communicate findings to technical and non-technical stakeholders.
- Contribute to the development of internal AI platforms and tooling.

Required Qualifications:
- Bachelor's or Master's degree in Computer Science, Data Science, Statistics, or a related quantitative field.
- 2+ years of hands-on experience building and deploying machine learning models in a production environment.
- Strong programming skills in Python and proficiency with relevant ML libraries (e.g., Scikit-learn, TensorFlow, PyTorch, Keras).
- Solid understanding of core machine learning algorithms, statistical modeling, and evaluation metrics.
- Experience working with SQL and/or NoSQL databases.
- Familiarity with data processing and analysis libraries (e.g., Pandas, NumPy).
- Excellent problem-solving skills and attention to detail.
- Strong communication and teamwork abilities.

Desired Qualifications (Bonus Points):
- PhD in a related field.
- Experience with cloud platforms (AWS, GCP, or Azure) and their AI/ML services (e.g., SageMaker, Vertex AI, Azure ML).
- Experience with MLOps tools and practices (e.g., Docker, Kubernetes, MLflow, Kubeflow).
- Experience with Natural Language Processing (NLP) or Computer Vision (CV).
- Experience with big data technologies (e.g., Spark, Hadoop).
- Publications in relevant AI/ML conferences or journals.

What We Offer:
- Competitive salary and benefits package.
- Opportunity to work on challenging and impactful AI projects.
- A dynamic, collaborative, and supportive work environment.
- Flexible remote work policy.
- Professional development opportunities.
"""

# --- Create and display the Textarea widget ---
print("⬇️ Edit the text area below with your target job description.")
print("   IMPORTANT: After editing, run this cell again (click ►) to save the text.")
job_desc_widget = widgets.Textarea(
    value=default_job_desc,
    placeholder="Paste the full job description here...",
    description="",
    disabled=False,
    layout=widgets.Layout(height="400px", width="95%"),  # Adjust size as needed
)
display(job_desc_widget)

# --- Capture value after execution ---
# This line captures the *current* value from the widget when the cell runs.
# The user MUST run the cell *after* they finish editing the text area.
JOB_DESCRIPTION = job_desc_widget.value

# --- Validation after capture ---
# Check if the captured text is still the default or empty.
if not JOB_DESCRIPTION:
    print(
        "\n❌ Error: Job description is empty. Please paste the job description and run"
        " this cell again."
    )
    # Optionally stop execution if required: sys.exit("Stopping: Job description required.")
elif JOB_DESCRIPTION == default_job_desc:
    print("\n⚠️ Warning: The text box still contains the example job description.")
    print(
        "   Please replace it with the actual job description you are targeting and run"
        " this cell again."
    )
    # Optionally stop execution: sys.exit("Stopping: Please provide the actual job description.")
elif len(JOB_DESCRIPTION) < 100:  # noqa: PLR2004
    print(
        "\n⚠️ Warning: The job description seems very short. Did you paste the full"
        " text?"
    )
    print("   Continuing, but please double-check.")
    print("✅ Job description captured (but please review if it's complete).")
else:
    # If it's not empty, not the default, and has reasonable length
    print("\n✅ Job description captured successfully!")

⬇️ Edit the text area below with your target job description.
   IMPORTANT: After editing, run this cell again (click ►) to save the text.


Textarea(value="Job Title: AI/Machine Learning Engineer\n\nCompany: Innovate Solutions Inc.\n\nLocation: Remot…


   Please replace it with the actual job description you are targeting and run this cell again.


In [6]:
# @title Step 3.3: Configure Analysis Settings (Click ►)
import sys  # To potentially stop execution

# @markdown ---
# @markdown ### Analysis Settings

# @markdown Select the language the AI should use when chatting with you and rewriting CV sections.
LANGUAGE = "en"  # @param ["en", "es"]

# @markdown How many of the most relevant CV items (e.g., past jobs, projects) should the AI focus on refining during the chat? (Usually 2-4 is a good number).
MAX_RELEVANT_ITEMS = 3  # @param {type:"integer"}

# @markdown Which sections of your CV should be analyzed and indexed? Enter the section names exactly as they appear in your structured CV (JSON), separated by commas.
# @markdown **Common examples:** `work`, `projects`, `certificates`, `skills`, `education`, `publications`, `volunteer`
# @markdown (Check section 4.4 later if you're unsure about your section names).
CV_SECTIONS_TO_FOCUS_STR = "work, projects, certificates"  # @param {type:"string"}

# @markdown (Optional) Check the box below if you want to see the structured CV data (JSON format) after it's loaded or processed. This can be useful for verifying the section names.
SHOW_STRUCTURED_CV = False  # @param {type:"boolean"}
# @markdown ---

# --- Process and Validate Settings ---

# Process the sections string into a clean list
CV_SECTIONS_TO_FOCUS = [
    section.strip().lower()
    for section in CV_SECTIONS_TO_FOCUS_STR.split(",")
    if section.strip()
]

# Validation
if not CV_SECTIONS_TO_FOCUS:
    print(
        "❌ Error: Please specify at least one CV section to focus on (e.g., 'work,"
        " projects')."
    )
    sys.exit("Stopping: CV Sections to focus cannot be empty")  # Stop execution

if MAX_RELEVANT_ITEMS < 1:
    print("⚠️ Warning: Number of CV items to refine must be at least 1. Setting to 1.")
    MAX_RELEVANT_ITEMS = 1  # Reset to minimum valid

print("✅ Analysis settings confirmed:")
print(f"   - Language for AI interaction: {LANGUAGE}")
print(f"   - Max CV items to refine: {MAX_RELEVANT_ITEMS}")
print(f"   - CV sections to analyze: {CV_SECTIONS_TO_FOCUS}")
print(f"   - Show structured CV data: {SHOW_STRUCTURED_CV}")

✅ Analysis settings confirmed:
   - Language for AI interaction: en
   - Max CV items to refine: 3
   - CV sections to analyze: ['work', 'projects', 'certificates']
   - Show structured CV data: False


In [7]:
# @title Internal Configuration (Defaults - No changes needed)

# --- Fixed Configuration - Do Not Modify Unless You Know What You're Doing ---
# This cell sets up internal parameters based on the recommended settings
# or values from the original notebook configuration.

# Embedding Model Name (Responsible for understanding text meaning)
# Using the model specified in the original notebook [Source: 89]
EMBEDDING_MODEL_NAME = "models/text-embedding-004"

# Generative Model Name (Used for structuring CV and the rewrite chat)
# Using the model specified in the original notebook [Source: 90]
GENERATIVE_MODEL_NAME = "gemini-2.0-flash"

# RAG Retrieval Window (How much extra context to fetch during search)
# Using the value from the original notebook [Source: 89]
RETRIEVAL_WINDOW = 2

# Vector DB Collection Name (Internal name for the 'smart index')
# Using the value from the original notebook [Source: 90]
COLLECTION_NAME = "cv_embeddings_v1"

# LLM Parameters (Controls AI creativity/consistency - fixed for simplicity)
# Using values from the original notebook [Source: 91]
STRUCTURING_LLM_TEMPERATURE = 0.1  # Low temp for consistent JSON structuring
REWRITING_LLM_TEMPERATURE = 0.8  # Higher temp for creative rewriting
REWRITING_LLM_TOP_P = 0.95
REWRITING_LLM_TOP_K = 30

# CV Text Limit for Structuring (Safety cutoff for very long PDFs)
# Using the value from the original notebook [Source: 94]
MAX_CV_TEXT_LENGTH_FOR_STRUCTURING = 12000

print("✅ Internal fixed parameters loaded.")
# --- End of Fixed Configuration ---

✅ Internal fixed parameters loaded.


## 4. Load and Process Your CV

Now that the setup and configuration are done, the notebook will find your CV file (either `resume.json` or `resume.pdf`) in the Google Drive folder you specified. It will then load or process it into a structured format that the AI can understand.

### 4.1. Load or Generate Structured CV Data

This cell checks the Google Drive folder you specified:

1.  It first looks for a file named `resume.json`. If it finds one (and you **didn't** check the "Re-process PDF" box earlier), it will load your CV directly from this file. This is faster and saves AI processing time if you've run the notebook before.
2.  If it can't find `resume.json`, or if you checked the box to force reprocessing, it will then look for `resume.pdf`. If found, the notebook will proceed to extract the text and use AI to structure it in the following steps.
3.  If neither file is found in the specified folder, it will stop with an error.

The code cell below performs these checks and either loads the JSON or prepares to process the PDF.

In [8]:
# @title Load CV Data (Checks for JSON first, then PDF)
import json
from IPython.display import IFrame, display, Markdown  # For displaying PDF/messages
import sys

# Initialize variables
structured_cv_data = None
cv_loaded_from_json = False
pdf_path_to_process = None

print(f"Checking folder: '{_base_path}'")  # _base_path defined in cell 3.1

# --- Try loading JSON first (if not forced to reparse PDF) ---
if not FORCE_PDF_REPARSE:
    print(f"Attempting to load existing structured CV from: '{CV_JSON_PATH}'...")
    if CV_JSON_PATH.is_file():
        try:
            with open(CV_JSON_PATH, encoding="utf-8") as f:
                structured_cv_data = json.load(f)
            cv_loaded_from_json = True
            print(f"✅ Successfully loaded structured CV data from '{CV_JSON_PATH}'.")
            # Display a confirmation message in Markdown for better visibility
            display(
                Markdown(
                    "👍 **Success!** Found and loaded `resume.json`. We can skip the"
                    " PDF processing and AI structuring steps."
                )
            )
        except json.JSONDecodeError as e:
            print(f"⚠️ Found '{CV_JSON_PATH}', but failed to load it as JSON: {e}")
            print("   Will proceed to look for 'resume.pdf' instead.")
        except Exception as e:
            print(
                f"⚠️ Found '{CV_JSON_PATH}', but an unexpected error occurred while"
                f" loading: {e}"
            )
            print("   Will proceed to look for 'resume.pdf' instead.")
    else:
        print(f"   '{CV_JSON_PATH}' not found. Will look for 'resume.pdf'.")
else:
    print(
        "ℹ️ 'Force PDF Reparse' is selected. Skipping JSON check, looking for"
        " 'resume.pdf'."
    )

# --- If JSON wasn't loaded, look for PDF ---
if not cv_loaded_from_json:
    print(f"\nLooking for CV PDF file at: '{CV_PDF_PATH}'...")
    if CV_PDF_PATH.is_file():
        pdf_path_to_process = CV_PDF_PATH  # Store path for next steps
        print(f"✅ Found PDF file: '{CV_PDF_PATH}'.")
        print("   Will proceed with PDF text extraction and AI structuring.")
        # Display the PDF inline for user verification (Does not work in Colab)
        # try:
        #     display(Markdown("---"))
        #     print("   Displaying PDF below for verification:")
        #     display(IFrame(src=pdf_path_to_process, width="90%", height="600px"))
        #     display(Markdown("---"))
        # except Exception as e:
        #     print(f"   (Could not display PDF inline due to an error: {e})")
    else:
        # Critical error: No JSON loaded and no PDF found
        print(f"\n❌ Error: PDF file not found at '{CV_PDF_PATH}'.")
        display(
            Markdown(
                "**❌ Critical Error:** Could not find `resume.json` or `resume.pdf`"
                f" in the specified folder (`{_base_path}`)."
            )
        )
        print(
            "Please ensure one of these files exists in that folder and has the correct"
            " name."
        )
        sys.exit("Stopping: No valid CV file found.")  # Stop execution

# Final check
if cv_loaded_from_json and structured_cv_data is None:
    # This case should be rare due to error handling above, but good practice
    print("❌ Error: CV was marked as loaded from JSON, but data is missing.")
    sys.exit("Stopping: Error after attempting to load JSON.")

print("\nLoad/Check step complete.")

Checking folder: '/content/drive/MyDrive/ia-driven-cv-opt'
Attempting to load existing structured CV from: '/content/drive/MyDrive/ia-driven-cv-opt/resume.json'...
   '/content/drive/MyDrive/ia-driven-cv-opt/resume.json' not found. Will look for 'resume.pdf'.

Looking for CV PDF file at: '/content/drive/MyDrive/ia-driven-cv-opt/resume.pdf'...
✅ Found PDF file: '/content/drive/MyDrive/ia-driven-cv-opt/resume.pdf'.
   Will proceed with PDF text extraction and AI structuring.

Load/Check step complete.


### 4.2. Extract Text from PDF (if needed)

*This step only runs if a `resume.pdf` file is being processed (because `resume.json` was not found or you chose to re-process the PDF).*

If a PDF is being used, this step simply reads the document and extracts all the text content from its pages.

In [9]:
# @title Extract Text from PDF (Runs only if PDF is used)
import pdfplumber
import sys

# Initialize raw_cv_text, it will only be populated if PDF processing is needed
raw_cv_text = None

# --- Run extraction only if CV data was NOT loaded from JSON ---
if not cv_loaded_from_json:
    # Check if pdf_path_to_process was set correctly in the previous step
    if pdf_path_to_process and pdf_path_to_process.is_file():
        print(f"Extracting text from PDF: '{pdf_path_to_process}'...")
        try:
            extracted_pages = []
            with pdfplumber.open(pdf_path_to_process) as pdf:
                if not pdf.pages:
                    print("⚠️ Warning: The PDF file seems to have no pages.")
                for i, page in enumerate(pdf.pages):
                    # Extract text page by page, handling potential None values from empty pages
                    page_text = page.extract_text(x_tolerance=1, y_tolerance=3)
                    if page_text:
                        extracted_pages.append(page_text)
                    # else: # Optional: uncomment to debug empty pages
                    #     print(f"   (Page {i+1} had no extractable text)")

            raw_cv_text = "\n".join(extracted_pages)  # Join pages with newline

            if not raw_cv_text:
                print(
                    "⚠️ Warning: Text extraction resulted in empty text. The PDF might"
                    " be image-based or empty."
                )
                # Consider if we should stop here if text is crucial
                # sys.exit("Stopping: Failed to extract any text from PDF.")
            else:
                print(
                    "✅ Text extraction successful. Total characters:"
                    f" {len(raw_cv_text)}"
                )
                # Optional: Uncomment the line below to print the first 500 characters
                # print(f"--- Snippet ---\n{raw_cv_text[:500]}\n---------------\n")

        except Exception as e:
            print(
                "\n❌ ERROR: Failed to open or extract text from PDF"
                f" '{pdf_path_to_process}'."
            )
            print(f"   Error details: {e}")
            print("   The PDF might be corrupted or password-protected.")
            raw_cv_text = None  # Ensure it's None on error
            sys.exit("Stopping: PDF text extraction failed.")
    else:
        # This should ideally not be reached due to checks in cell 4.1, but as a safeguard:
        print(
            "❌ Error: PDF processing was expected, but the PDF path is missing or"
            " invalid."
        )
        sys.exit("Stopping: PDF path error.")
else:
    print("ℹ️ Skipping PDF text extraction because CV data was loaded from JSON.")

# --- Final check for this step ---
# If we expected to process PDF but raw_cv_text is still None or empty, something went wrong.
if not cv_loaded_from_json and not raw_cv_text:
    print("\n❌ Error: PDF processing was required, but no text could be extracted.")
    # Decide if execution should stop
    # sys.exit("Stopping: Failed to get text content from PDF.")



Extracting text from PDF: '/content/drive/MyDrive/ia-driven-cv-opt/resume.pdf'...




✅ Text extraction successful. Total characters: 2545


### 4.3. Structure CV using AI (if needed)

*This step uses AI to understand the extracted PDF text and organize it into a standard structure (JSON format). It only runs if a `resume.pdf` file was processed.*

If the notebook extracted text from a PDF, this step sends that text to the Google AI model (Gemini).

The AI's job here is to act like a very organized assistant: it reads the raw CV text and puts the information into specific categories (like "Work Experience", "Education", "Skills") based on a standard format called "JSON Resume".

This structured format makes it much easier for the later steps to find and analyze the specific parts of your CV that you selected (e.g., just your 'work' history or 'projects').

In [10]:
# @title Define CV Structure Schema (Internal - based on JSON Resume Standard)
# This code defines the expected structure (like a template) for the CV data
# that the AI will generate from the PDF text. It's based on the standard
# JSON Resume schema (https://jsonresume.org/schema/).
# You don't need to expand or modify this.


# Use typing_extensions for TypedDict if needed for compatibility,
# but standard typing might suffice in newer Python/Colab versions.
from typing_extensions import TypedDict


# Define nested structures first (order matters for definition)
class Location(TypedDict, total=False):
    address: str
    postalCode: str
    city: str
    countryCode: str
    region: str


class Profile(TypedDict, total=False):
    network: str
    username: str
    url: str


class Basics(TypedDict, total=False):
    name: str
    label: str  # Job title / headline
    image: str  # URL to profile image
    email: str
    phone: str
    url: str  # Personal website/portfolio URL
    summary: str  # Professional summary/objective
    location: Location
    profiles: list[Profile]  # List of social media/professional profiles


class WorkItem(TypedDict, total=False):
    name: str  # Name of the company/organization
    position: str  # Job title
    url: str  # Company website
    startDate: str  # Format YYYY-MM-DD or YYYY-MM or YYYY
    endDate: str  # Format YYYY-MM-DD or YYYY-MM or YYYY, or 'Present'
    summary: str  # High-level description of role/company achievements
    highlights: list[str]  # Specific achievements or responsibilities (bullet points)


class VolunteerItem(TypedDict, total=False):
    organization: str
    position: str
    url: str
    startDate: str
    endDate: str
    summary: str
    highlights: list[str]


class EducationItem(TypedDict, total=False):
    institution: str
    url: str
    area: str  # e.g., Computer Science
    studyType: str  # e.g., Bachelor's Degree, Master's
    startDate: str
    endDate: str
    score: str  # e.g., GPA
    courses: list[str]  # Relevant coursework


class AwardItem(TypedDict, total=False):
    title: str
    date: str  # Date awarded
    awarder: str  # Organization that gave the award
    summary: str  # Description of the award


class CertificateItem(TypedDict, total=False):
    name: str  # Name of the certificate
    date: str  # Date issued
    issuer: str  # Issuing organization (e.g., Coursera, Google)
    url: str  # Link to certificate if available


class PublicationItem(TypedDict, total=False):
    name: str  # Title of the publication
    publisher: str  # e.g., Journal name, Conference
    releaseDate: str
    url: str  # Link to publication
    summary: str  # Abstract or brief description


class SkillItem(TypedDict, total=False):
    name: str  # Broad skill category (e.g., Web Development, Data Science)
    level: str  # Optional proficiency level (e.g., Intermediate, Advanced)
    keywords: list[str]  # Specific technologies or tools (e.g., Python, PyTorch, AWS)


class LanguageItem(TypedDict, total=False):
    language: str  # e.g., English, Spanish
    fluency: str  # e.g., Native, Fluent, Conversational


class InterestItem(TypedDict, total=False):
    name: str  # Category of interest (e.g., Open Source, AI Ethics)
    keywords: list[str]  # Specific interests


class ReferenceItem(TypedDict, total=False):
    name: str  # Name of reference (ensure consent)
    reference: str  # Testimonial or contact details (handle privacy appropriately)


class ProjectItem(TypedDict, total=False):
    name: str  # Project title
    description: str  # Overall description of the project
    highlights: list[str]  # Key contributions or features
    keywords: list[str]  # Technologies used
    url: str  # Link to project demo or repository
    startDate: str
    endDate: str
    roles: list[str]  # Roles held in the project
    entity: str  # Associated entity (e.g., university, company)
    type: str  # Type of project (e.g., personal, academic, professional)


# --- Top-Level Curriculum Schema ---
class Curriculum(TypedDict, total=False):
    """Represents the complete JSON Resume structure."""

    basics: Basics
    work: list[WorkItem]
    volunteer: list[VolunteerItem]
    education: list[EducationItem]
    awards: list[AwardItem]
    certificates: list[CertificateItem]
    publications: list[PublicationItem]
    skills: list[SkillItem]
    languages: list[LanguageItem]
    interests: list[InterestItem]
    references: list[ReferenceItem]
    projects: list[ProjectItem]


print("✅ CV Structure Schema defined.")

✅ CV Structure Schema defined.


In [11]:
# @title Call AI to Structure CV Text (Runs only if PDF was used)
import json
from google import genai
from google.genai import types
from google.api_core import retry
import sys

# --- Run structuring only if CV data was NOT loaded from JSON ---
if not cv_loaded_from_json:
    # Check if raw_cv_text was successfully extracted
    if raw_cv_text:
        print("Preparing to call the AI model to structure the extracted CV text...")

        # --- 1. Initialize Client & Validate API Key ---
        if "GOOGLE_API_KEY" not in globals() or not GOOGLE_API_KEY:
            print("❌ Error: GOOGLE_API_KEY not found or empty. Check Step 2.2.")
            sys.exit("Stopping: API Key missing for AI call.")

        try:
            client = genai.Client(api_key=GOOGLE_API_KEY)

            # --- 2. Prepare Prompt & Configuration ---
            cv_text_for_prompt = raw_cv_text
            if (
                MAX_CV_TEXT_LENGTH_FOR_STRUCTURING is not None
                and len(raw_cv_text) > MAX_CV_TEXT_LENGTH_FOR_STRUCTURING
            ):
                cv_text_for_prompt = raw_cv_text[:MAX_CV_TEXT_LENGTH_FOR_STRUCTURING]
                print(
                    "⚠️ Warning: CV text truncated to"
                    f" {MAX_CV_TEXT_LENGTH_FOR_STRUCTURING} characters for structuring."
                )

            # System instruction - kept strict for JSON output
            system_instruction = (
                "You are an expert CV parser. Extract information from the provided CV"
                " text and format it strictly according to the JSON Resume Schema"
                " provided using the `Curriculum` type. Return ONLY the valid JSON"
                " object conforming to the schema - no introductory text, no markdown"
                " formatting (like ```json ... ```), no explanations."
            )
            # Prompt
            prompt_message = f"""Given the following CV text, populate the fields of the JSON Resume Schema (represented by the `Curriculum` TypedDict) as accurately as possible.
Use empty strings, arrays, or null values for fields where information is missing in the text. Ensure the output is a single, valid JSON object.

CV Text:
---
{cv_text_for_prompt}
---

Respond with ONLY the JSON object.
"""
            # Combine instructions and prompt
            full_structuring_prompt = system_instruction + "\n\n" + prompt_message

            # Configuration using the structure from original notebook [Source: 158]
            # Requires the Curriculum class defined in the previous cell
            json_generation_config = {
                "temperature": STRUCTURING_LLM_TEMPERATURE,
                "response_mime_type": "application/json",
                "response_schema": Curriculum,  # Pass the TypedDict class
            }

            # --- 3. Define Retry Function (using original client method) ---
            @retry.Retry(predicate=is_retriable)  # Uses 'is_retriable' defined earlier
            def generate_structured_cv_json_with_retry(prompt, config):
                """Calls the Gemini API using client.models.generate_content."""
                print(
                    f"📞 Calling Gemini AI ({GENERATIVE_MODEL_NAME}) via client to"
                    " structure CV text..."
                )
                # Using client.models.generate_content as in original [Source: 160]
                response = client.models.generate_content(
                    model=GENERATIVE_MODEL_NAME,  # Ensure model name is correct
                    contents=prompt,  # Pass the full prompt here
                    config=config,
                )
                print("✅ Gemini AI call finished.")
                return response.parsed

            # --- 4. Execute API Call ---
            print("   (This AI call might take a moment...)")
            structured_cv_data = generate_structured_cv_json_with_retry(
                prompt=full_structuring_prompt,  # Pass the combined prompt
                config=json_generation_config,
            )

            # --- 5. Save Generated JSON (if successful) ---
            if structured_cv_data is not None:
                 print(f"\n💾 Saving the structured CV data generated by the AI to: '{CV_JSON_PATH}'...")
                 try:
                      with open(CV_JSON_PATH, 'w', encoding='utf-8') as f:
                           # Use indent for readability, ensure_ascii=False for unicode chars
                           json.dump(structured_cv_data, f, indent=2, ensure_ascii=False)
                      print(f"✅ Successfully saved structured data to '{CV_JSON_PATH}'.")
                      print("   Future runs can load this file directly (unless 'Force PDF Reparse' is checked).")
                 except Exception as e:
                      print(f"\n❌ ERROR: Failed to save the generated JSON file to '{CV_JSON_PATH}'.")
                      print(f"   Error details: {type(e).__name__} - {e}")
                      print("   The structured data is still available in memory for this session, but wasn't saved.")
                      # Do not stop execution, as the data is still usable in memory

        except Exception as e:
            print(
                "\n❌ ERROR: An unexpected error occurred during the AI structuring"
                " call."
            )
            print(f"   Error details: {type(e).__name__} - {e}")
            structured_cv_data = None
            sys.exit(f"Stopping due to error during AI structuring: {e}")

    else:
        print("ℹ️ Skipping AI structuring because no text was extracted from the PDF.")
        structured_cv_data = None

else:
    print("ℹ️ Skipping AI structuring because CV data was loaded from JSON.")

# --- Final check ---
# If we processed PDF, we MUST have structured_cv_data now
if not cv_loaded_from_json and structured_cv_data is None:
    print("\n❌ Error: Failed to obtain structured CV data after processing the PDF.")
    sys.exit("Stopping: Could not structure CV data from PDF.")
elif not cv_loaded_from_json and structured_cv_data is not None:
     print("\n✅ Successfully obtained structured CV data from AI processing.")
elif cv_loaded_from_json and structured_cv_data is not None:
     # This case means JSON was loaded successfully earlier
     pass

Preparing to call the AI model to structure the extracted CV text...
   (This AI call might take a moment...)
📞 Calling Gemini AI (gemini-2.0-flash) via client to structure CV text...
✅ Gemini AI call finished.

💾 Saving the structured CV data generated by the AI to: '/content/drive/MyDrive/ia-driven-cv-opt/resume.json'...
✅ Successfully saved structured data to '/content/drive/MyDrive/ia-driven-cv-opt/resume.json'.
   Future runs can load this file directly (unless 'Force PDF Reparse' is checked).

✅ Successfully obtained structured CV data from AI processing.


### 4.4. Review Structured CV Data (Optional)

If you checked the "Show structured CV data" box in the settings earlier (Step 3.3), the next cell will display the structured CV information that the notebook has loaded or generated.

This data is shown in YAML format, which is similar to JSON but often easier for humans to read.

You can expand the cell below to:
* Verify that the AI understood your CV content correctly (if it was processed from PDF).
* See the exact names of the sections (like `work`, `projects`, `skills`) that the AI identified. This is useful for confirming the section names you entered in the "CV Sections to Analyze" setting.

If you didn't check the box, the next cell will be skipped.

In [12]:
# @title Display Structured CV Data (YAML Format)
import yaml
import sys

# --- Display only if data exists and the user opted-in ---
# Check if SHOW_STRUCTURED_CV was defined in cell 3.3 and is True
should_display = "SHOW_STRUCTURED_CV" in globals() and SHOW_STRUCTURED_CV

if should_display:
    # Check if structured_cv_data was successfully loaded or created
    if "structured_cv_data" in globals() and structured_cv_data is not None:
        print("--- Structured CV Data (YAML Format) ---")
        try:
            # Use yaml.dump for readable output
            # Use allow_unicode=True for non-ASCII characters
            # Use sort_keys=False to maintain original order where possible
            # Use width=float("inf") to prevent line wrapping within blocks
            print(
                yaml.dump(
                    structured_cv_data,
                    allow_unicode=True,
                    sort_keys=False,
                    width=float("inf"),
                    default_flow_style=None,  # Use block style for readability
                )
            )
        except yaml.YAMLError as e:
            print(f"\n❌ Error formatting data as YAML: {e}")
            print("   Displaying raw data instead:")
            print(structured_cv_data)  # Fallback to raw print
        except Exception as e:
            print(f"\n❌ An unexpected error occurred during display: {e}")
        print("--- End of Structured CV Data ---")
    else:
        print(
            "ℹ️ Structured CV data is not available (likely an issue in previous steps)."
            " Cannot display."
        )
else:  # Optional: uncomment if you want a message when display is skipped due to checkbox
    print("ℹ️ Skipping display of structured CV data as requested in settings.")

ℹ️ Skipping display of structured CV data as requested in settings.


## 5. Create a 'Smart Index' of Your CV

Okay, now we need to make your structured CV searchable in a "smart" way. The AI needs to understand the *meaning* behind the text in your work experience, projects, etc., not just match exact keywords.

To do this, we'll create a **"Smart Index"** (sometimes called a vector database or embeddings) for the CV sections you selected earlier (like `work`, `projects`).

Here's the idea:
1.  **Convert to Meaning:** The AI reads each item (like a specific job description from your CV) and converts its text into a list of numbers (called an "embedding" or "vector"). This list represents the core meaning or concepts in that text.
2.  **Store in Index:** These numerical representations are stored in a special database (the Smart Index) located in the Google Drive folder you specified (`cv_chroma_db`).
3.  **Enable Smart Search:** This index allows the AI to quickly find CV items whose *meaning* is similar to the *meaning* of the job description later on, even if they don't use the exact same words.

The code cells below handle preparing your CV data for this process, setting up the conversion tool, and building the index. You don't need to worry about the details unless you're curious!

In [13]:
# @title Prepare CV Sections for Indexing (Internal Code)
# This code takes the structured CV data and formats the specific sections
# you selected (e.g., work, projects) into individual text pieces (documents)
# ready for the 'meaning conversion' step. It also creates unique IDs
# and extracts some metadata for each piece.

import yaml
import re
import sys

# Check if structured_cv_data exists from previous steps
if "structured_cv_data" not in globals() or structured_cv_data is None:
    print(
        "❌ Error: Structured CV data is not available. Cannot prepare documents for"
        " indexing."
    )
    print("   Please check the output of previous steps (Section 4).")
    sys.exit("Stopping: Missing structured CV data.")

# Check if CV_SECTIONS_TO_FOCUS is defined
if "CV_SECTIONS_TO_FOCUS" not in globals() or not CV_SECTIONS_TO_FOCUS:
    print("❌ Error: The list of CV sections to focus on is missing.")
    print("   Please check the 'Configure Analysis Settings' step (Cell 3.3).")
    sys.exit("Stopping: Missing CV sections to focus on.")

# --- 1. Helper Functions for ID Generation (from original notebook) ---
# Maps section keys to functions creating a base ID string from item content
# Using lowercase keys consistent with CV_SECTIONS_TO_FOCUS processing
BASE_ID_GENERATORS = {
    "work": lambda item: (
        f"{item.get('name', 'NoCompany')}.{item.get('position', 'NoPosition')}"
    ),
    "certificates": lambda item: (
        f"{item.get('issuer', 'NoIssuer')}.{item.get('name', 'NoCert')}"
    ),
    "publications": lambda item: (
        f"{item.get('publisher', 'NoPublisher')}.{item.get('name', 'NoPub')}"
    ),
    "projects": lambda item: item.get("name", "NoProject"),
    "volunteer": lambda item: (
        f"{item.get('organization', 'NoOrg')}.{item.get('position', 'NoVolunteerPos')}"
    ),
    "education": lambda item: (
        f"{item.get('institution', 'NoInstitution')}.{item.get('area', 'NoArea')}.{item.get('studyType', '')}"
    ),
    "basics": lambda item: item.get(
        "name", "NoPerson"
    ),  # Usually only one 'basics' item
    "awards": lambda item: (
        f"{item.get('awarder', 'NoAwarder')}.{item.get('title', 'NoAward')}"
    ),
    "skills": lambda item: item.get(
        "name", "NoSkill"
    ),  # ID based on skill category name
    "languages": lambda item: item.get("language", "NoLang"),
    "interests": lambda item: item.get("name", "NoInterest"),
    "references": lambda item: item.get("name", "NoReference"),
}


def sanitize_id(text_id: str) -> str:
    """Cleans and formats a string into a valid ChromaDB ID (max 63 chars)."""
    if not isinstance(text_id, str):  # Handle potential non-string input
        text_id = str(text_id)
    text_id = text_id.lower()
    # fmt: off
    accent_map = {
        'á':'a', 'ä':'a', 'â':'a', 'à':'a', 'ã':'a', 'å':'a', 'é':'e', 'ë':'e',
        'ê':'e', 'è':'e', 'í':'i', 'ï':'i', 'î':'i', 'ì':'i', 'ó':'o', 'ö':'o',
        'ô':'o', 'ò':'o', 'õ':'o', 'ø':'o', 'ú':'u', 'ü':'u', 'û':'u', 'ù':'u',
        'ñ':'n', 'ç':'c',
    }
    # fmt: on
    # Use regex for replacement
    pattern = re.compile("|".join(accent_map.keys()))
    text_id = pattern.sub(lambda m: accent_map[m.group(0)], text_id)

    text_id = re.sub(r"[\\s_:/]+", ".", text_id)  # Replace common separators with dot
    text_id = re.sub(r"[^a-z0-9.\-]+", "", text_id)  # Keep alphanumeric, dot, hyphen
    text_id = re.sub(
        r"[.\-]+", ".", text_id
    )  # Consolidate consecutive dots/hyphens into dots
    text_id = text_id.strip(".")  # Remove leading/trailing dots

    # Ensure minimum length and apply maximum length constraint
    if len(text_id) < 3:  # noqa: PLR2004
        text_id = f"{text_id}id"  # Append 'id' to short strings
    return text_id[:63]  # Truncate to 63 characters (ChromaDB limit)


def generate_unique_item_id(section_key: str, item: dict, item_index: int) -> str:
    """Generates a unique, sanitized ID for a CV item."""
    # Use lowercase section_key for lookup
    id_generator = BASE_ID_GENERATORS.get(
        section_key, lambda i: f"item.{item_index}"  # Fallback generator
    )
    try:
        base_id = id_generator(item)
    except Exception as e:
        print(
            f"Warning: Error generating base ID for item {item} in {section_key}: {e}."
            " Using fallback."
        )
        base_id = f"item.{item_index}"

    # Include section key and index for uniqueness
    full_id_base = f"{section_key}.{base_id}.{item_index}"
    return sanitize_id(full_id_base)


# --- 2. Function to Prepare Data for Embedding ---
def prepare_embedding_data(  # noqa: PLR0912, PLR0915
    cv_data: dict, sections_to_include: list[str]
) -> tuple[list, list, list]:
    """
    Extracts items, formats documents (YAML strings), generates IDs, creates metadata.
    Returns tuple: (documents, ids, metadatas).
    """
    all_documents = []
    all_ids = []
    all_metadatas = []

    if not isinstance(cv_data, dict):
        print("⚠️ Warning: Input CV data is not in the expected dictionary format.")
        return [], [], []

    print(f"Preparing documents from CV sections: {sections_to_include}...")
    processed_count = 0
    skipped_sections = 0

    for section_key in sections_to_include:  # Already lowercased in cell 3.4
        section_items = cv_data.get(section_key)

        if section_items is None:
            print(f"   - Section '{section_key}' not found in CV data, skipping.")
            skipped_sections += 1
            continue

        # Handle cases where a section might not be a list (e.g., 'basics')
        if not isinstance(section_items, list):
            # If it's a dictionary (like 'basics'), treat it as a single-item list
            if isinstance(section_items, dict):
                section_items = [section_items]
            else:
                print(
                    f"⚠️ Warning: Expected section '{section_key}' to contain a list or"
                    f" dict, but found {type(section_items)}. Skipping this section."
                )
                skipped_sections += 1
                continue

        if not section_items:
            print(f"   - Section '{section_key}' is empty, skipping.")
            skipped_sections += 1
            continue

        print(
            f"   + Processing {len(section_items)} item(s) from section"
            f" '{section_key}'..."
        )
        for index, item in enumerate(section_items):
            if not isinstance(item, dict):
                print(
                    f"⚠️ Warning: Expected a dictionary for item {index} in section"
                    f" '{section_key}', but found {type(item)}. Skipping this item."
                )
                continue

            item_id = generate_unique_item_id(section_key, item, index)
            try:
                # Use YAML dump for a structured, readable text representation
                item_doc = yaml.dump(
                    item,
                    allow_unicode=True,
                    sort_keys=False,
                    width=float("inf"),
                    default_flow_style=None,  # Block style is usually better for LLMs
                )
                # Basic cleaning: remove potential YAML header '...' if present
                item_doc = item_doc.strip().lstrip("...")
                if not item_doc:  # Handle case where item is empty dict -> empty string
                    print(
                        f"⚠️ Warning: Item {index} in {section_key} resulted in empty"
                        " document, skipping."
                    )
                    continue

            except yaml.YAMLError as e:
                print(
                    f"⚠️ Warning: YAML dump failed for item {index} in {section_key}:"
                    f" {e}. Using simple string representation as fallback."
                )
                item_doc = str(item)  # Fallback to basic string conversion

            # Create metadata: always include section and index
            metadata = {"section": section_key, "item_index": index}
            # Add potentially useful fields from the item to metadata if they exist
            # These can sometimes help in understanding retrieved results later
            for key in [
                "name",
                "position",
                "issuer",
                "institution",
                "organization",
                "title",
                "area",
                "studyType",
                "language",
                "network",
            ]:
                if value := item.get(key):
                    # Sanitize metadata keys slightly (lowercase, replace space)
                    meta_key = f"item_{key.lower().replace(' ','_')}"
                    # Store only string values in metadata for simplicity with ChromaDB
                    if isinstance(value, str):
                        metadata[meta_key] = value

            all_documents.append(item_doc)
            all_ids.append(item_id)
            all_metadatas.append(metadata)
            processed_count += 1

    print(f"\n✅ Prepared {processed_count} documents for indexing.")
    if skipped_sections > 0:
        print(
            f"   ({skipped_sections} section(s) were skipped or not found in the data)."
        )
    if processed_count == 0 and len(sections_to_include) > skipped_sections:
        print(
            "❌ Error: No documents could be prepared from the specified sections found"
            " in the CV data."
        )
        print(
            "    Please check your CV data structure and the sections specified in Step"
            " 3.3."
        )
        sys.exit("Stopping: Failed to prepare any documents for indexing.")

    return all_documents, all_ids, all_metadatas


# --- 3. Execute Preparation ---
embedding_documents = []
embedding_ids = []
embedding_metadatas = []

embedding_documents, embedding_ids, embedding_metadatas = prepare_embedding_data(
    cv_data=structured_cv_data, sections_to_include=CV_SECTIONS_TO_FOCUS
)

# --- 4. Display Sample (if documents were prepared) ---
if embedding_documents:
    print("\n--- Sample Prepared Data (First Item) ---")
    print(f"ID        : {embedding_ids[0]}")
    print(f"Metadata  : {embedding_metadatas[0]}")
    # Limit snippet length for display
    doc_snippet = embedding_documents[0]
    max_snippet_len = 300
    if len(doc_snippet) > max_snippet_len:
        doc_snippet = doc_snippet[:max_snippet_len] + "..."
    print(f"Document Snippet:\n---\n{doc_snippet}")
    print("---")
else:
    # This case should be handled by the sys.exit earlier, but added for completeness
    print("\n⚠️ No documents were prepared for embedding.")

Preparing documents from CV sections: ['work', 'projects', 'certificates']...
   + Processing 7 item(s) from section 'work'...
   - Section 'projects' is empty, skipping.
   - Section 'certificates' is empty, skipping.

✅ Prepared 7 documents for indexing.
   (2 section(s) were skipped or not found in the data).

--- Sample Prepared Data (First Item) ---
ID        : work.codeforgeinc.eniorwebdeveloper.0
Metadata  : {'section': 'work', 'item_index': 0, 'item_name': 'CodeForge Inc', 'item_position': 'Senior Web Developer'}
Document Snippet:
---
name: CodeForge Inc
position: Senior Web Developer
url: ''
startDate: Feb 2022
endDate: Present
summary: ''
highlights: [Led frontend/backend development for a React + Node.js e-commerce platform, Implemented CI/CD pipelines using GitHub Actions and Docker, Reduced frontend load time by 40% via perf...
---


In [14]:
# @title Define AI Text-to-Meaning Converter (Embedding Function)
# This code sets up the specific function that converts the text pieces
# (documents) prepared in the previous step into numerical representations
# ('embeddings' or 'vectors') that capture their meaning. It uses the
# Google embedding model specified in the internal configuration.
# You don't need to expand or modify this.

from chromadb import Documents, EmbeddingFunction, Embeddings
from google.api_core import retry
from google import genai  # Using the original import style
from google.genai import types
import sys

# Check if necessary variables/objects exist
client = genai.Client(api_key=GOOGLE_API_KEY)
if "EMBEDDING_MODEL_NAME" not in globals():
    print("❌ Error: Embedding model name not defined. Check internal config.")
    sys.exit("Stopping: Embedding model name missing.")
if "is_retriable" not in globals():
    print("❌ Error: Retry helper function missing. Check cell execution order.")
    sys.exit("Stopping: Retry function missing.")


# Use the custom class structure provided in the original notebook [Source: 214-217]
# Adapted slightly to use the existing 'client' object for consistency
class GeminiEmbeddingFunction(EmbeddingFunction):
    """Custom ChromaDB embedding function using Google AI (client interface)."""

    def __init__(self, task_type: str = "retrieval_document") -> None:
        """Initializes the function with the specified task type."""
        # Ensure task_type is valid for the embedding model
        valid_tasks = [
            "retrieval_document",
            "retrieval_query",
            "semantic_similarity",
            "classification",
            "clustering",
        ]
        if task_type not in valid_tasks:
            raise ValueError(
                f"Invalid task_type '{task_type}'. Must be one of {valid_tasks}"
            )
        self.task_type: str = task_type
        self.model: str = EMBEDDING_MODEL_NAME
        print(
            f"   Embedding function initialized for task: '{self.task_type}' using"
            f" model '{self.model}'"
        )

    # Apply retry logic using the decorator and helper function defined earlier
    @retry.Retry(predicate=is_retriable)
    def __call__(self, input_texts: Documents) -> Embeddings:
        """Generates embeddings using client.models.embed_content."""
        if not isinstance(input_texts, list):
            raise TypeError("Input must be a list of strings.")
        if not all(isinstance(text, str) for text in input_texts):
            raise TypeError("All items in the input list must be strings.")

        print(f"      Generating embeddings for {len(input_texts)} text chunk(s)...")
        try:
            # Use client.models.embed_content as in the original notebook's style preference
            # Note: 'contents' arg expects a list of strings
            response = client.models.embed_content(
                model=self.model,
                contents=input_texts,  # Pass the list of strings directly
                config=types.EmbedContentConfig(task_type=self.task_type),
            )
            # Extract the embeddings list from the response
            # Ensure response structure matches expectations
            if hasattr(response, "embeddings") and isinstance(
                response.embeddings, list
            ):
                # Extract the 'values' from each embedding object
                embeddings_list = [
                    e.values for e in response.embeddings if hasattr(e, "values")
                ]
                if len(embeddings_list) != len(input_texts):
                    print(
                        f"⚠️ Warning: Mismatch between input texts ({len(input_texts)})"
                        f" and generated embeddings ({len(embeddings_list)})."
                    )
                    # Handle potential partial failures if necessary
                print("      ...embeddings generated successfully.")
                return embeddings_list
            else:
                print("❌ Error: Unexpected response structure from embedding API.")
                print(f"   Response received: {response}")
                raise ValueError("Could not extract embeddings from API response.")

        except Exception as e:
            print(f"❌ Error during embedding generation: {type(e).__name__} - {e}")
            # Re-raise the exception to be caught by retry or higher level handler
            raise


# Example instantiation (optional, just to confirm class definition works)
try:
    _test_embedder = GeminiEmbeddingFunction(task_type="retrieval_document")
    print("✅ Embedding function defined successfully.")
except Exception as e:
    print(f"❌ Error defining embedding function: {e}")
    sys.exit("Stopping: Failed to define embedding function.")

   Embedding function initialized for task: 'retrieval_document' using model 'models/text-embedding-004'
✅ Embedding function defined successfully.


In [15]:
# @title Initialize or Load the 'Smart Index' Database (Internal Code)
# This code connects to the 'Smart Index' database (ChromaDB) stored in the
# Google Drive folder you specified earlier ('cv_chroma_db' subfolder).
# If the database or the specific 'collection' for this CV doesn't exist,
# it creates them. It also links the database to the AI embedding function
# defined in the previous step, so it knows how to handle the 'meaning vectors'.

import chromadb
import sys

# Check if necessary variables/classes exist
if "CHROMA_DB_PATH" not in globals() or not CHROMA_DB_PATH:
    print("❌ Error: Path for ChromaDB database is missing. Check Step 3.1.")
    sys.exit("Stopping: ChromaDB path error.")
if "COLLECTION_NAME" not in globals() or not COLLECTION_NAME:
    print("❌ Error: Collection name for ChromaDB is missing. Check Internal Config.")
    sys.exit("Stopping: ChromaDB collection name error.")
if "GeminiEmbeddingFunction" not in globals():
    print("❌ Error: Embedding function class is missing. Check cell execution order.")
    sys.exit("Stopping: Embedding function definition missing.")

# Initialize collection variable
cv_collection = None

try:
    # Initialize the ChromaDB client, pointing to the persistent path in Drive
    print(f"Initializing ChromaDB client at path: '{CHROMA_DB_PATH}'...")
    # Ensure path is a string for the client
    chroma_client = chromadb.PersistentClient(path=str(CHROMA_DB_PATH))
    print("   ChromaDB client initialized.")

    # Instantiate the embedding function for document embedding
    # Use 'retrieval_document' as the task type for storing CV items
    print("   Instantiating embedding function for database...")
    gemini_embedder_for_db = GeminiEmbeddingFunction(task_type="retrieval_document")

    # Get or create the collection within the database
    # This collection will hold all the indexed items (work, projects, etc.) for this CV
    # It uses the embedding function defined above to handle vector creation/search
    print(f"Accessing collection: '{COLLECTION_NAME}'...")
    cv_collection = chroma_client.get_or_create_collection(
        name=COLLECTION_NAME,
        embedding_function=gemini_embedder_for_db,  # Link the embedding function
        # metadata={"hnsw:space": "cosine"} # Optional: Explicitly set distance metric if needed
    )

    print(f"✅ Collection '{cv_collection.name}' ready.")
    # Print current item count, useful for seeing if it's adding/updating later
    initial_count = cv_collection.count()
    print(f"   Current item count: {initial_count}")

except Exception as e:
    print(
        "\n❌ ERROR: Failed to initialize ChromaDB client or collection at"
        f" '{CHROMA_DB_PATH}'."
    )
    print("   Check permissions for the Google Drive folder if issues persist.")
    print(f"   Error details: {type(e).__name__} - {e}")
    cv_collection = None  # Ensure collection is None on error
    sys.exit("Stopping: ChromaDB initialization failed.")

Initializing ChromaDB client at path: '/content/drive/MyDrive/ia-driven-cv-opt/cv_chroma_db'...
   ChromaDB client initialized.
   Instantiating embedding function for database...
   Embedding function initialized for task: 'retrieval_document' using model 'models/text-embedding-004'
Accessing collection: 'cv_embeddings_v1'...
✅ Collection 'cv_embeddings_v1' ready.
   Current item count: 0


In [16]:
# @title Add/Update CV Information in the 'Smart Index' (Internal Code)
# This cell takes the prepared CV documents, their IDs, and metadata
# from Step 5.1 and adds them to the ChromaDB collection initialized
# in the previous step.
# If items with the same IDs already exist, ChromaDB's 'upsert'
# command will update them with the new information.
# The embedding function linked to the collection automatically handles
# converting the documents to vectors ('meaning representations').

import sys

# Check if necessary variables exist from previous steps
if "cv_collection" not in globals() or cv_collection is None:
    print("❌ Error: ChromaDB collection object is missing. Check previous step.")
    sys.exit("Stopping: ChromaDB collection not initialized.")
if (
    "embedding_documents" not in globals()
    or "embedding_ids" not in globals()
    or "embedding_metadatas" not in globals()
):
    print("❌ Error: Prepared document data (docs, ids, metadata) is missing.")
    print("   Check the output of the 'Prepare CV Sections' step.")
    sys.exit("Stopping: Missing data for indexing.")
if not embedding_documents:  # Check if the list is empty
    print(
        "ℹ️ No documents were prepared in the previous step. Nothing to add to the"
        " index."
    )
    # Not necessarily an error if the selected sections were empty/missing, so don't exit
else:
    print(
        f"\nAdding/updating {len(embedding_documents)} document(s) in ChromaDB"
        f" collection '{cv_collection.name}'..."
    )
    print("   (This may take a moment as embeddings are generated...)")
    try:
        # Use upsert: adds new documents and updates existing ones based on ID
        cv_collection.upsert(
            ids=embedding_ids,
            metadatas=embedding_metadatas,
            documents=embedding_documents,
            # Embeddings are generated automatically by the function linked to the collection
        )

        print("\n✅ Documents successfully added/updated in the collection.")
        # Verify final count
        final_count = cv_collection.count()
        print(f"   Collection '{cv_collection.name}' now contains {final_count} items.")
        # Optional check: Compare final count with expected count
        # Note: final_count might be different from initial_count + len(embedding_documents)
        # if some documents were updates rather than new additions.
        # A simple check is if final_count >= len(embedding_ids) if we only expect additions/updates
        # or compare with previous count if available.
        if final_count < len(embedding_ids):
            print(
                f"⚠️ Warning: Final item count ({final_count}) seems lower than the"
                f" number of prepared documents ({len(embedding_documents)})."
            )
            print(
                "    This might happen if there were duplicate IDs or issues during"
                " insertion."
            )

    except Exception as e:
        print(
            "\n❌ ERROR adding/updating documents in ChromaDB collection"
            f" '{cv_collection.name}':"
        )
        print(f"   Error details: {type(e).__name__} - {e}")
        # Depending on the error, you might want to stop execution
        # sys.exit("Stopping: Failed to add documents to ChromaDB.")


Adding/updating 7 document(s) in ChromaDB collection 'cv_embeddings_v1'...
   (This may take a moment as embeddings are generated...)
      Generating embeddings for 7 text chunk(s)...
      ...embeddings generated successfully.

✅ Documents successfully added/updated in the collection.
   Collection 'cv_embeddings_v1' now contains 7 items.


## 6. Find Your Most Relevant Experience for the Job

Alright, the "Smart Index" of your CV is ready!

Now, we'll use the **job description** you provided earlier to search through this index. The goal is to find the experiences from your CV (like specific jobs or projects you listed in the analyzed sections) that are the **closest match in meaning** to the requirements and responsibilities mentioned in the job description.

The AI doesn't just look for exact keyword matches; it looks for semantic similarity – things that *mean* similar things.

The code cell below performs this smart search and retrieves the top `N` most relevant items (where `N` is the number you set in Step 3.3, e.g., 3). These are the items we'll focus on refining in the next section.

In [17]:
# @title Retrieve Most Relevant CV Items (Click ► to run)
import sys
from IPython.display import display, Markdown

# Check if necessary variables/objects exist
if "cv_collection" not in globals() or cv_collection is None:
    print("❌ Error: ChromaDB collection object is missing. Check previous steps.")
    sys.exit("Stopping: ChromaDB collection not initialized.")
if "JOB_DESCRIPTION" not in globals() or not JOB_DESCRIPTION:
    print("❌ Error: Job Description is missing. Check Step 3.2.")
    sys.exit("Stopping: Job description required for retrieval.")
if "GeminiEmbeddingFunction" not in globals():
    print("❌ Error: Embedding function class is missing. Check cell execution order.")
    sys.exit("Stopping: Embedding function definition missing.")
if "MAX_RELEVANT_ITEMS" not in globals() or "RETRIEVAL_WINDOW" not in globals():
    print(
        "❌ Error: Retrieval parameters (MAX_RELEVANT_ITEMS, RETRIEVAL_WINDOW) missing."
    )
    print("   Check Step 3.3 and Internal Config (Cell 3.5).")
    sys.exit("Stopping: Missing retrieval parameters.")

# Initialize result lists
retrieved_ids = []
retrieved_documents = []
retrieved_metadatas = []
retrieved_distances = []

try:
    print(
        f"Performing smart search (RAG query) on collection '{cv_collection.name}'..."
    )

    # 1. Instantiate embedder specifically for the 'retrieval_query' task type
    #    This tells the AI model to generate the best embedding for searching *with* text,
    #    as opposed to indexing *of* text ('retrieval_document').
    print("   Instantiating embedding function for query...")
    query_embedder = GeminiEmbeddingFunction(task_type="retrieval_query")

    # 2. Embed the job description (the query)
    print("   Converting job description to 'meaning vector'...")
    # Ensure job description is passed as a list with one item
    query_embedding = query_embedder([JOB_DESCRIPTION])[0]
    print("   ...query embedding generated.")

    # 3. Determine how many results to fetch from the database
    #    We fetch slightly more than needed (MAX_RELEVANT_ITEMS + RETRIEVAL_WINDOW)
    #    to potentially provide better context or options, though only MAX_RELEVANT_ITEMS
    #    will be highlighted initially in the chat.
    num_results_to_fetch = MAX_RELEVANT_ITEMS + RETRIEVAL_WINDOW
    print(
        f"   Querying the 'Smart Index' for the top {num_results_to_fetch} most similar"
        " items..."
    )

    # 4. Perform the query using the generated embedding
    results = cv_collection.query(
        query_embeddings=[query_embedding],  # Pass the embedding of the job description
        n_results=num_results_to_fetch,  # Number of results to retrieve
        include=[
            "documents",
            "metadatas",
            "distances",
        ],  # Ask for docs, metadata, and similarity scores
    )
    print("   ...query complete.")

    # 5. Safely extract the results from the response dictionary
    #    The results are nested in lists, even for a single query, so we access the first element [0]
    retrieved_ids = results.get("ids", [[]])[0]
    retrieved_documents = results.get("documents", [[]])[0]
    retrieved_metadatas = results.get("metadatas", [[]])[0]
    retrieved_distances = results.get("distances", [[]])[
        0
    ]  # Lower distance means more similar

    print(
        f"\n✅ RAG retrieval complete. Found {len(retrieved_ids)} relevant items from"
        " your CV."
    )

    # --- Display retrieved items (optional but helpful for verification) ---
    if retrieved_ids:
        print("\n--- Top Retrieved Items (Most similar first based on distance) ---")
        for i in range(len(retrieved_ids)):
            item_id = retrieved_ids[i]
            distance = retrieved_distances[i]
            metadata = retrieved_metadatas[i]
            section = metadata.get("section", "N/A")
            doc_preview = retrieved_documents[i][:150] + "..."  # Short preview

            print(f"\n{i+1}. ID: {item_id} (Distance: {distance:.4f})")
            print(f"   Section: {section}")
            # Display identifying metadata if present
            name = metadata.get("item_name", metadata.get("item_title", ""))
            if name:
                print(f"   Name/Title: {name}")
            pos = metadata.get("item_position", "")
            if pos:
                print(f"   Position: {pos}")
            print(f"   Preview: {doc_preview}")
        print("--- End of Retrieved Items ---")
    else:
        display(
            Markdown(
                "⚠️ **Warning:** The search didn't find any relevant items in your"
                " indexed CV sections for this specific job description. The chat"
                " refinement step might not be very effective. You may want to check:"
                " \n   - If the `CV_SECTIONS_TO_FOCUS` in Step 3.3 included the"
                " relevant parts of your CV.\n   - If your CV content significantly"
                " differs from the job description."
            )
        )


except Exception as e:
    print(f"\n❌ ERROR during RAG retrieval: {type(e).__name__} - {e}")
    # Ensure lists are reset on error to prevent using stale data
    retrieved_ids, retrieved_documents, retrieved_metadatas, retrieved_distances = (
        [],
        [],
        [],
        [],
    )
    # Depending on severity, consider stopping:
    # sys.exit("Stopping: Failed during RAG retrieval step.")

# Final check if retrieval was expected but failed silently
if not retrieved_ids:
    print("\n⚠️ Note: No relevant items were retrieved to proceed with refinement.")

Performing smart search (RAG query) on collection 'cv_embeddings_v1'...
   Instantiating embedding function for query...
   Embedding function initialized for task: 'retrieval_query' using model 'models/text-embedding-004'
   Converting job description to 'meaning vector'...
      Generating embeddings for 1 text chunk(s)...
      ...embeddings generated successfully.
   ...query embedding generated.
   Querying the 'Smart Index' for the top 5 most similar items...
   ...query complete.

✅ RAG retrieval complete. Found 5 relevant items from your CV.

--- Top Retrieved Items (Most similar first based on distance) ---

1. ID: work.tanfordgraduate.choolofeducation.re.earcha.i.tanteducation (Distance: 0.9046)
   Section: work
   Name/Title: Stanford Graduate School of Education
   Position: Research Assistant – Educational Technology
   Preview: name: Stanford Graduate School of Education
position: Research Assistant – Educational Technology
url: ''
startDate: Jan 2020
endDate: Dec 2021
su

## 7. Refine Your CV with AI Chat

Now it's time for the interactive part! You'll chat directly with the AI assistant (Gemini).

1.  **Initial Analysis:** First, the AI will analyze the job description and the relevant CV items that were just retrieved. It will show you the main keywords it identified and list the CV items it found most relevant.
2.  **Your Guidance:** Based on the AI's analysis, you'll tell it which specific CV item (from the highlighted list) you want to work on first. You can also give it feedback on the keywords.
3.  **Rewriting Suggestions:** The AI will then suggest a rewritten version of the description for that specific CV item, aiming to incorporate the keywords and make it more impactful and ATS-friendly.
4.  **Iterate:** You can then approve the suggestion, ask for changes, or choose another CV item to refine.

The cells below set up the prompts for this chat and then start the interactive session.

In [18]:
# @title Define AI Chat Prompts & Messages (Internal)
# This cell contains the detailed instructions (prompts) and standard messages
# used by the AI assistant during the interactive chat session in Section 7.
# It includes versions for both English and Spanish, selected based on the
# LANGUAGE setting chosen in Step 3.3.
# You don't need to expand or modify this unless you want to change the AI's behavior.

# --- Initialize Content Dictionary ---
# Structure to hold multi-language prompt components and messages
content = {"en": {}, "es": {}}


# --- 1. Example Output Format ---
# This defines how the AI should format its response when suggesting a rewrite.
content["en"]["example"] = """
## Position Name / Title: [JOB TITLE / PROJECT NAME]
- Company Name / Org / Issuer: [COMPANY/ORG/ISSUER NAME]

### Original Description / Highlights
[ORIGINAL TEXT OF THE CV ITEM BEING DISCUSSED]

### Suggested Refinement (ATS-Friendly)

[REWRITTEN TEXT - SHORT SUMMARY/LEAD-IN FOLLOWED BY BULLET POINTS]
- [ATS-Optimized achievement/responsibility 1 incorporating keywords]
- [ATS-Optimized achievement/responsibility 2 incorporating keywords]
- ...


### Changes Made & Keywords Used
* **Explanation:** [Brief summary of additions/removals/focus shifts, explaining the 'why'].
* **Keywords Incorporated:** [KEYWORD 1], [KEYWORD 2], ...

---
**Shall we refine this suggestion further, proceed to the next highlighted item, or stop?**
(Enter your feedback, 'next', or 'q' to quit)
"""

content["es"]["example"] = """
## Nombre del puesto / Título / Certificado: [TÍTULO PUESTO / PROYECTO / CERTIFICADO]
- Nombre de empresa / Org / Emisor: [NOMBRE EMPRESA/ORG/EMISOR]

### Descripción Original / Destacados

[TEXTO ORIGINAL DEL ÍTEM DEL CV EN DISCUSIÓN]


### Refinamiento Sugerido (Amigable para ATS)

[TEXTO REESCRITO - RESUMEN CORTO/INTRODUCCIÓN SEGUIDO DE PUNTOS CLAVE]
- [Logro/responsabilidad optimizado para ATS 1 incorporando palabras clave]
- [Logro/responsabilidad optimizado para ATS 2 incorporando palabras clave]
- ...


### Cambios Realizados y Palabras Clave Usadas
* **Explicación:** [Resumen breve de adiciones/eliminaciones/reenfoques, explicando el 'por qué'].
* **Palabras Clave Incorporadas:** [PALABRA CLAVE 1], [PALABRA CLAVE 2], ...

---
**¿Refinamos más esta sugerencia, procedemos al siguiente ítem destacado o paramos?**
(Ingresa tu feedback, 'next'/'siguiente', o 'q' para salir)
"""


# --- 2. Main System Prompt / Initial Instruction ---
# Defines the AI's role, task, constraints, and the interactive workflow.
# Placeholders {n_max_exp}, {example}, {job_description}, {cv_experiences} will be formatted later.
content["en"][
    "prompt"
] = r"""You are an expert CV and resume writer, specialized in optimizing content for Applicant Tracking Systems (ATS) by strategically incorporating keywords from job descriptions.

**Your Goal:** Analyze the provided JOB DESCRIPTION and the list of RETRIEVED CV EXPERIENCES. Your primary task is to help the user refine the text of their most relevant CV experiences to strongly align with the keywords and requirements of the job description, making the CV more ATS-compatible and impactful.

**Output Requirements & Constraints:**
- Use ATS-friendly language (action verbs, relevant nouns).
- Be concise and focus on impact and achievements (quantify where possible).
- Use bullet points for lists of responsibilities or achievements in the refined text.
- **Strictly adhere** to the requested output formats below. Do NOT add extra conversational text, greetings, or self-corrections unless specifically part of the requested format.
- **Focus only on the text content** of the CV items provided. Do not invent new experiences or sections.

**--- Workflow ---**

**Phase 1: Initial Analysis (Your First Response)**
1.  Read the JOB DESCRIPTION and identify the **most critical keywords and key skills** (technical skills, soft skills, tools, qualifications). List them clearly.
2.  Review the RETRIEVED CV EXPERIENCES. Select the top **{n_max_exp} most relevant** experiences based on their similarity to the job description. List these under "HIGHLIGHTED EXPERIENCES", including title/position and company/organization.
3.  List any remaining retrieved experiences under "OTHER RETRIEVED EXPERIENCES".
4.  **Crucially**, end this initial analysis by asking the user **exactly** this question:
    "Based on the keywords and highlighted experiences, which **highlighted experience** would you like me to help you refine first? Please provide the title/position."

**Initial Analysis Output Format (Use exactly this structure):**

**ANALYSIS COMPLETE**

**1. Key Skills & Keywords from Job Description:**
* [Keyword/Skill 1]
* [Keyword/Skill 2]
* ...

**2. Highlighted CV Experiences (Most Relevant - Max {n_max_exp}):**
* **[Job Title / Project Name 1]** at [Company / Org 1]
* **[Job Title / Project Name 2]** at [Company / Org 2]
* ... (up to {n_max_exp} items)

**3. Other Retrieved CV Experiences:**
* **[Job Title / Project Name X]** at [Company / Org X]
* ... (remaining items, if any)

---
Based on the keywords and highlighted experiences, which **highlighted experience** would you like me to help you refine first? Please provide the title/position.


**Phase 2: Interactive Refinement (After User Selects an Experience)**
1.  **Wait** for the user to tell you which HIGHLIGHTED experience to refine (they will provide the title/position).
2.  Focus **only** on that single selected experience.
3.  Rewrite its description/highlights, strategically incorporating the relevant **KEYWORDS** identified in Phase 1. Aim for achievement-oriented, ATS-friendly language.
4.  Present the result using the **exact** example format provided below ({example}). Include the original text for comparison, explain the changes, and list the keywords used.
5.  **Wait** for the user's response. They might:
    * Provide feedback for further changes on the *current* suggestion.
    * Type 'next' (or similar) to approve and move to another highlighted item.
    * Type 'q' (or similar) to quit.
6.  If the user provides feedback, refine the *current* suggestion based on it and present the updated version in the same {example} format.
7.  If the user types 'next', ask them: "Okay, which **other highlighted experience** would you like to refine next? Please provide the title/position." Then wait for their selection (repeat Step 2).
8.  If the user types 'q', respond politely with "Understood. Exiting refinement process." and stop.

**--- Context Provided by User ---**

**JOB DESCRIPTION:**

{job_description}


**RETRIEVED CV EXPERIENCES (Input text for each item):**
yaml
{cv_experiences}


**--- Task Start ---**
Begin now by performing the **Phase 1: Initial Analysis Task** and present the output in the specified format, ending with the exact question asking the user which item to refine first. Do not proceed to Phase 2 until the user responds.
"""

# Spanish version needs adaptation of the workflow instructions and formats
content["es"][
    "prompt"
] = r"""Eres un experto redactor de CVs y currículums, especializado en optimizar contenido para Sistemas de Seguimiento de Candidatos (ATS) incorporando estratégicamente palabras clave de descripciones de puestos.

**Tu Objetivo:** Analiza la DESCRIPCIÓN DEL PUESTO proporcionada y la lista de EXPERIENCIAS DEL CV RECUPERADAS. Tu tarea principal es ayudar al usuario a refinar el texto de sus experiencias de CV más relevantes para alinearlas fuertemente con las palabras clave y requisitos de la descripción del puesto, haciendo el CV más compatible con ATS e impactante.

**Requisitos y Restricciones de Salida:**
- Usa lenguaje amigable para ATS (verbos de acción, sustantivos relevantes).
- Sé conciso y enfócate en impacto y logros (cuantifica donde sea posible).
- Usa puntos de lista (viñetas) para listas de responsabilidades o logros en el texto refinado.
- **Adhiérete estrictamente** a los formatos de salida solicitados a continuación. NO añadas texto conversacional extra, saludos o autocorrecciones a menos que sea específicamente parte del formato solicitado.
- **Enfócate solo en el contenido textual** de los ítems del CV proporcionados. No inventes nuevas experiencias o secciones.

**--- Flujo de Trabajo ---**

**Fase 1: Análisis Inicial (Tu Primera Respuesta)**
1.  Lee la DESCRIPCIÓN DEL PUESTO e identifica las **palabras clave y habilidades clave más críticas** (habilidades técnicas, blandas, herramientas, cualificaciones). Lístalas claramente.
2.  Revisa las EXPERIENCIAS DEL CV RECUPERADAS. Selecciona las **{n_max_exp} experiencias más relevantes** basándote en su similitud con la descripción del puesto. Lístalas bajo "EXPERIENCIAS DESTACADAS", incluyendo título/puesto y empresa/organización.
3.  Lista cualquier experiencia recuperada restante bajo "OTRAS EXPERIENCIAS RECUPERADAS".
4.  **Crucialmente**, termina este análisis inicial haciendo al usuario **exactamente** esta pregunta:
    "Basándome en las palabras clave y las experiencias destacadas, ¿qué **experiencia destacada** te gustaría que te ayude a refinar primero? Por favor, indica el título/puesto."

**Formato de Salida del Análisis Inicial (Usa exactamente esta estructura):**

**ANÁLISIS COMPLETO**

**1. Habilidades Clave y Palabras Clave de la Descripción del Puesto:**
* [Palabra Clave/Habilidad 1]
* [Palabra Clave/Habilidad 2]
* ...

**2. Experiencias Destacadas del CV (Más Relevantes - Máx {n_max_exp}):**
* **[Título Puesto / Nombre Proyecto 1]** en [Empresa / Org 1]
* **[Título Puesto / Nombre Proyecto 2]** en [Empresa / Org 2]
* ... (hasta {n_max_exp} ítems)

**3. Otras Experiencias Recuperadas del CV:**
* **[Título Puesto / Nombre Proyecto X]** en [Empresa / Org X]
* ... (ítems restantes, si hay)

---
Basándome en las palabras clave y las experiencias destacadas, ¿qué **experiencia destacada** te gustaría que te ayude a refinar primero? Por favor, indica el título/puesto.


**Fase 2: Refinamiento Interactivo (Después de que el Usuario Seleccione una Experiencia)**
1.  **Espera** a que el usuario te diga qué experiencia DESTACADA refinar (proporcionará el título/puesto).
2.  Enfócate **solo** en esa única experiencia seleccionada.
3.  Reescribe su descripción/puntos destacados, incorporando estratégicamente las **PALABRAS CLAVE** relevantes identificadas en la Fase 1. Busca un lenguaje orientado a logros y amigable para ATS.
4.  Presenta el resultado usando el formato de ejemplo **exacto** proporcionado ({example}). Incluye el texto original para comparación, explica los cambios y lista las palabras clave usadas.
5.  **Espera** la respuesta del usuario. Podría:
    * Dar feedback para más cambios en la sugerencia *actual*.
    * Escribir 'next'/'siguiente' (o similar) para aprobar y pasar a otro ítem destacado.
    * Escribir 'q' (o similar) para salir.
6.  Si el usuario da feedback, refina la sugerencia *actual* basándote en él y presenta la versión actualizada en el mismo formato {example}.
7.  Si el usuario escribe 'next'/'siguiente', pregúntale: "Entendido, ¿qué **otra experiencia destacada** te gustaría refinar ahora? Por favor, indica el título/puesto." Luego espera su selección (repite Paso 2).
8.  Si el usuario escribe 'q', responde cortésmente con "Entendido. Saliendo del proceso de refinamiento." y detente.

**--- Contexto Proporcionado por el Usuario ---**

**DESCRIPCIÓN DEL PUESTO:**

{job_description}


**EXPERIENCIAS DEL CV RECUPERADAS (Texto de entrada para cada ítem):**

{cv_experiences}


**--- Inicio de Tarea ---**
Comienza ahora realizando la **Fase 1: Tarea de Análisis Inicial** y presenta la salida en el formato especificado, terminando con la pregunta exacta solicitando al usuario qué ítem refinar primero. No procedas a la Fase 2 hasta que el usuario responda.
"""


# --- 3. User Interaction Messages ---
# Messages for guiding the user during the chat loop

# Quit message
content["en"]["quit_msg"] = "Enter 'q' or 'quit' to exit the refinement process."
content["es"][
    "quit_msg"
] = "Ingresa 'q' o 'salir' para terminar el proceso de refinamiento."

# Initial response header
content["en"]["initial_response"] = "LLM Initial Analysis Response:"
content["es"]["initial_response"] = "Respuesta de Análisis Inicial del LLM:"

# Exit confirmation message
content["en"]["exit_msg"] = "\nExiting interactive refinement loop."
content["es"]["exit_msg"] = "\nSaliendo del bucle de refinamiento interactivo."

# Interpretation of empty input (assuming confirmation/continue)
content["en"][
    "y_interpretation"
] = "(Interpreting empty input as 'next' or confirmation to proceed...)"
content["es"][
    "y_interpretation"
] = "(Interpretando entrada vacía como 'siguiente' o confirmación para proceder...)"

# Sending message indicator
content["en"]["send_message"] = "\n➡️ Sending your message to the AI assistant..."
content["es"]["send_message"] = "\n➡️ Enviando tu mensaje al asistente AI..."

# Receiving message indicator
content["en"]["received_message"] = "🤖 AI Assistant response:"
content["es"]["received_message"] = "🤖 Respuesta del Asistente AI:"

# Keyboard interrupt message
content["en"]["keyboard_interrupt"] = "\nLoop interrupted by user (Ctrl+C). Exiting."
content["es"][
    "keyboard_interrupt"
] = "\nBucle interrumpido por el usuario (Ctrl+C). Saliendo."

print("✅ Chat prompts and messages defined.")

✅ Chat prompts and messages defined.


### 7.1. Start the AI Chat Session

In [19]:
# @title Initialize Chat and Get Initial Analysis (Click ► to run)
# This cell prepares the context (job description and retrieved CV items)
# formats the initial instruction prompt for the AI, and starts the chat session.
# It then sends the initial prompt to get the AI's analysis (keywords and relevant items).

from IPython.display import Markdown, display
import google.generativeai as genai  # Using preferred import style
import sys

# --- Check for necessary variables ---
if "client" not in globals():
    print("❌ Error: AI Client object not initialized. Check previous steps.")
    sys.exit("Stopping: AI client missing.")
if "content" not in globals() or not content:
    print("❌ Error: Chat prompts dictionary is missing. Check Cell 47.")
    sys.exit("Stopping: Chat prompts not defined.")
if "LANGUAGE" not in globals() or LANGUAGE not in content:
    print(
        f"❌ Error: Language setting '{LANGUAGE}' is invalid or missing. Check Step"
        " 3.3."
    )
    sys.exit("Stopping: Invalid language setting.")
if "JOB_DESCRIPTION" not in globals() or not JOB_DESCRIPTION:
    print("❌ Error: Job Description is missing. Check Step 3.2.")
    sys.exit("Stopping: Job description required.")
if (
    "retrieved_documents" not in globals()
    or "retrieved_metadatas" not in globals()
    or "MAX_RELEVANT_ITEMS" not in globals()
):
    print(
        "❌ Error: Retrieved CV data or config is missing. Check RAG step (Cell 44) and"
        " Step 3.3."
    )
    sys.exit("Stopping: Missing data for chat context.")

# Initialize chat object variable
chat = None
initial_response = None

# --- 1. Format Retrieved Experiences for the Prompt Context ---
formatted_experiences = []
if retrieved_documents:  # Check if the list is not empty
    print("Formatting retrieved CV items for AI context...")
    for i, doc in enumerate(retrieved_documents):
        # Get metadata safely
        meta = retrieved_metadatas[i] if i < len(retrieved_metadatas) else {}
        # Extract identifying info from metadata (using keys added in prepare_embedding_data)
        section = meta.get("section", "Unknown Section")
        item_name = meta.get(
            "item_name", meta.get("item_title", "")
        )  # Try 'name' then 'title'
        item_position = meta.get("item_position", "")
        item_org = meta.get(
            "item_organization",
            meta.get("item_institution", meta.get("item_issuer", "")),
        )  # Try org/institution/issuer

        # Create a header for context
        header = f"{item_name or 'Item ' + str(i+1)}"  # Use name or fallback
        if item_position:
            header += f" ({item_position})"
        if item_org:
            header += f" at {item_org}"
        header += f" [Section: {section}]"

        # Append formatted item to list
        formatted_experiences.append(f"--- CV Item {i+1}: {header} ---\n{doc}\n")
    experiences_context_string = "\n".join(formatted_experiences)
    print(f"   Formatted {len(formatted_experiences)} items.")
else:
    # Handle case where RAG returned nothing
    experiences_context_string = (
        "No relevant experiences were retrieved from the database for this job"
        " description.\n(The AI might not be able to provide specific refinement"
        " suggestions)."
    )
    print("⚠️ Warning: No retrieved CV experiences to include in the prompt.")


# --- 2. Select Language and Format Final Initial Prompt ---
content_lang = content[LANGUAGE]  # Select EN or ES content dict based on user setting
if not content_lang:
    print(f"❌ Error: Could not load prompts for the selected language '{LANGUAGE}'.")
    sys.exit("Stopping: Language prompts missing.")

try:
    initial_prompt = content_lang["prompt"].format(
        n_max_exp=MAX_RELEVANT_ITEMS,
        example=content_lang["example"],  # The example format string
        job_description=JOB_DESCRIPTION,
        cv_experiences=experiences_context_string,  # The formatted context string
    )
except KeyError as e:
    print(
        f"❌ Error: Missing key '{e}' in language content dictionary. Check Cell 47"
        " definitions."
    )
    sys.exit("Stopping: Prompt formatting failed.")
except Exception as e:
    print(f"❌ Error formatting initial prompt: {type(e).__name__} - {e}")
    sys.exit("Stopping: Prompt formatting failed.")


# --- 3. Initialize LLM Chat Session (using google.genai client style) ---
print(f"\nInitializing AI chat session with model '{GENERATIVE_MODEL_NAME}'...")
try:
    # Define generation configuration for the chat
    # Using parameters defined in Internal Config (Cell 3.5)
    rewriting_generation_config = {
        "temperature": REWRITING_LLM_TEMPERATURE,
        "top_p": REWRITING_LLM_TOP_P,
        "top_k": REWRITING_LLM_TOP_K,
    }
    # Safety settings could be added here if needed

    # Start a new chat session using the client object
    # Pass history=[] to start fresh
    # Note: Passing config directly to client.chats.create might depend on client version.
    # Alternative is setting config when sending message if needed.
    chat = client.chats.create(
        model=GENERATIVE_MODEL_NAME,
        history=[],
        # config=rewriting_generation_config # Check if client.chats.create accepts config
    )
    # Store config separately to pass during send_message if needed
    chat_config = rewriting_generation_config

    print("✅ Chat session initialized.")

    # --- 4. Send Initial Prompt and Get AI's First Analysis ---
    print("\nSending initial prompt to AI for analysis...")
    print("   (This first AI response might take a bit longer...)")

    # Send the formatted prompt using the chat object
    # Pass the generation config here if not accepted during creation
    initial_response = chat.send_message(
        initial_prompt,
        config=chat_config,  # Pass generation config for this specific turn
    )

    print("✅ Initial AI analysis received.")

    # Basic check of the response
    if (
        not initial_response
        or not hasattr(initial_response, "text")
        or not initial_response.text
    ):
        print("❌ Error: Received an invalid or empty initial response from the AI.")
        initial_response = None  # Ensure it's None for checks in next cell
        # sys.exit("Stopping: Invalid initial AI response.") # Optional stop
    else:
        # Optional: print a snippet for debugging
        # print("\n--- AI Initial Response Snippet ---")
        # print(initial_response.text[:300] + "...")
        # print("--- End Snippet ---")
        pass


except Exception as e:
    print("\n❌ ERROR initializing chat or sending initial message:")
    print(f"   Error Type: {type(e).__name__}")
    print(f"   Error details: {e}")
    chat = None  # Ensure chat object is None on error
    initial_response = None
    sys.exit("Stopping: Failed to start chat session.")


# --- Final validation for next step ---
if chat is None or initial_response is None:
    print(
        "\n❌ Critical Error: Chat session could not be started or initial AI response"
        " failed."
    )
    print("   Please check API key, model name, and previous cell outputs.")
    # Ensure execution stops if chat isn't ready
    sys.exit("Stopping: Chat setup failed.")
else:
    print("\nChat ready for interaction.")

Formatting retrieved CV items for AI context...
   Formatted 5 items.

Initializing AI chat session with model 'gemini-2.0-flash'...
✅ Chat session initialized.

Sending initial prompt to AI for analysis...
   (This first AI response might take a bit longer...)
✅ Initial AI analysis received.

Chat ready for interaction.


### 7.2. Interactive Refinement Chat

Run the code cell below to start the interactive chat session with the AI assistant.

1.  It will first display the initial analysis (keywords and relevant CV items) that it prepared in the previous step.
2.  Read the analysis and the AI's question at the end.
3.  Type your response in the input box that appears (e.g., tell it which CV item title/position you want to refine first).
4.  The notebook will send your message to the AI and display its response (which might be a rewritten suggestion or another question).
5.  Continue chatting, providing feedback or instructions ('next' to move on) until you're satisfied or want to stop.
6.  Enter `q` or `quit` at any time to exit the chat loop.

In [21]:
# @title Start Interactive Chat Loop (Click ►)

from IPython.display import Markdown, display
import sys

# Check if chat objects exist from previous step
if "chat" not in globals() or chat is None:
    display(
        Markdown(
            "❌ **Error:** Chat session is not initialized. Please run Step 7.1 first."
        )
    )
    sys.exit("Stopping: Chat not ready.")
if "initial_response" not in globals() or initial_response is None:
    display(
        Markdown(
            "❌ **Error:** Initial AI response is missing. Please run Step 7.1 first."
        )
    )
    sys.exit("Stopping: Initial AI response missing.")
if "content" not in globals() or "LANGUAGE" not in globals() or LANGUAGE not in content:
    display(
        Markdown(
            "❌ **Error:** Language content for messages is missing. Check config and"
            " prompt definitions."
        )
    )
    sys.exit("Stopping: Language content missing.")
if "chat_config" not in globals():
    display(Markdown("❌ **Error:** Chat generation configuration missing."))
    sys.exit("Stopping: Chat config missing.")

# --- Select Language for User Messages ---
content_lang = content[LANGUAGE]
quit_options = {"q", "quit", "exit", "salir", "n", "no"}  # User inputs to exit loop
next_options = {
    "next",
    "siguiente",
    "y",
    "yes",
    "si",
    "",
}  # User inputs to move to next item or confirm (empty input treated as 'next')


# --- 1. Display Initial LLM Analysis ---
display(Markdown("---"))  # Separator
display(
    Markdown(
        "### 🤖"
        f" {content_lang.get('initial_response', 'AI Initial Analysis Response:')}"
    )
)  # Use message from dict
display(Markdown("---"))
# Display the first response text which should already be Markdown formatted by the AI
display(Markdown(initial_response.text))
display(Markdown("---"))  # Separator


# --- 2. Interactive Refinement Loop ---
quit_msg = content_lang.get("quit_msg", "Enter 'q' or 'quit' to exit")

while True:
    # Display quit instructions clearly before each prompt
    display(Markdown(f"**{quit_msg}**"))
    try:
        # Prompt user for input (input() itself doesn't use Markdown)
        user_input = input("Your response > ")
        display(Markdown("---"))  # Separator after user input

        # Check if the user wants to exit
        if user_input.lower().strip() in quit_options:
            display(
                Markdown(
                    f"*{content_lang.get('exit_msg', 'Exiting interactive loop.')}*"
                )
            )
            break  # Exit the while loop

        # Handle empty input as confirmation/next
        interpreted_input = user_input
        if user_input.strip() == "":
            interpreted_input = "next"  # Treat empty as wanting to proceed
            display(
                Markdown(
                    f"*{content_lang.get('y_interpretation', '(Interpreting empty input as next/confirm...)')}*"
                )
            )

        # Send the user's message to the LLM via the chat session
        display(
            Markdown(
                f"*{content_lang.get('send_message', '➡️ Sending message to AI...')}*"
            )
        )
        # Pass the generation config with each message if needed
        llm_response = chat.send_message(interpreted_input, config=chat_config)
        display(
            Markdown(
                f"### {content_lang.get('received_message', '🤖 AI response received:')}"
            )
        )
        display(Markdown("---"))  # Separator before AI response

        # Display the LLM's response using Markdown rendering
        display(Markdown(llm_response.text))
        display(Markdown("---"))  # Separator after AI response

        # Check if the AI signaled exit (optional based on prompt design)
        if "Exiting refinement process." in llm_response.text:
            display(Markdown("*AI indicated process is complete.*"))
            break

    except KeyboardInterrupt:
        display(
            Markdown(
                f"*{content_lang.get('keyboard_interrupt', 'Loop interrupted by user. Exiting.')}*"
            )
        )
        break
    except Exception as e:
        # Catch potential errors during chat interaction (e.g., API errors)
        display(Markdown("❌ **ERROR during interactive chat:**"))
        display(Markdown(f"   *Error Type: {type(e).__name__}*"))
        display(Markdown(f"   *Error details: {e}*"))
        display(Markdown("   *Exiting interactive loop due to error.*"))
        break  # Exit loop on error

display(Markdown("--- **Interactive Chat Finished** ---"))

---

### 🤖 LLM Initial Analysis Response:

---

**ANALYSIS COMPLETE**

**1. Key Skills & Keywords from Job Description:**
* AI/Machine Learning
* Machine Learning Models
* Deep Learning Models
* NLP
* Predictive Analytics
* Anomaly Detection
* Personalization
* Data Pipelines
* MLOps
* Python
* Scikit-learn
* TensorFlow
* PyTorch
* Keras
* SQL
* NoSQL
* Pandas
* NumPy
* Data Science
* Model Deployment
* Data Exploration
* Data Engineering
* Data Analysis
* Communication Skills
* Teamwork

**2. Highlighted CV Experiences (Most Relevant - Max 3):**
* **Stanford Graduate School of Education (Research Assistant – Educational Technology)** at Stanford Graduate School of Education
* **CodeForge Inc (Senior Web Developer)** at CodeForge Inc
* **Freelance (Full-Stack Developer)** at Freelance

**3. Other Retrieved CV Experiences:**
* **OpenData Labs (Junior Software Engineer)** at OpenData Labs
* **UC Berkeley Library System (IT Support Specialist)** at UC Berkeley Library System

---
Based on the keywords and highlighted experiences, which **highlighted experience** would you like me to help you refine first? Please provide the title/position.


---

**Enter 'q' or 'quit' to exit the refinement process.**

Your response > stanford


---

*
➡️ Sending your message to the AI assistant...*

### 🤖 AI Assistant response:

---

## Position Name / Title: Stanford Graduate School of Education (Research Assistant – Educational Technology)
- Company Name / Org / Issuer: Stanford Graduate School of Education

### Original Description / Highlights
Developed analytics dashboards for MOOCs using Python and Flask, Analyzed LMS interaction data for learner engagement patterns, Contributed to academic papers on learning feedback systems

### Suggested Refinement (ATS-Friendly)

Applied machine learning techniques to enhance online learning platforms, focusing on data-driven insights and model development.
- Developed interactive analytics dashboards using Python and Flask to monitor MOOC performance and learner behavior, contributing to data-driven decision-making.
- Analyzed large-scale Learning Management System (LMS) interaction data to identify learner engagement patterns and predict student success, informing personalized learning interventions.
- Contributed to research and publications on learning feedback systems, showcasing expertise in applying AI to improve educational outcomes.

### Changes Made & Keywords Used
* **Explanation:** Expanded on the original descriptions to highlight the application of machine learning and data analysis skills relevant to the target job. Added details to quantify impact and align with the job description's focus on model development, data analysis, and communication of findings.
* **Keywords Incorporated:** Machine Learning, Python, Data Analysis, Data-Driven, Personalization, Model Development
---
**Shall we refine this suggestion further, proceed to the next highlighted item, or stop?**
(Enter your feedback, 'next', or 'q' to quit)


---

**Enter 'q' or 'quit' to exit the refinement process.**

Your response > codeforge inc


---

*
➡️ Sending your message to the AI assistant...*

### 🤖 AI Assistant response:

---

You chose "codeforge inc" when I asked you to pick from the HIGHLIGHTED experiences. Did you mean to type 'next' so we can move to the next highlighted experience? If not, please confirm which of the highlighted experiences you would like me to refine.


---

**Enter 'q' or 'quit' to exit the refinement process.**

Your response > next experience


---

*
➡️ Sending your message to the AI assistant...*

### 🤖 AI Assistant response:

---

Okay, which **other highlighted experience** would you like to refine next? Please provide the title/position.


---

**Enter 'q' or 'quit' to exit the refinement process.**

Your response > the next


---

*
➡️ Sending your message to the AI assistant...*

### 🤖 AI Assistant response:

---

I understand you want to move on to the next highlighted experience. The remaining highlighted experience is **Freelance (Full-Stack Developer)** at Freelance.

Shall I proceed with refining that experience? (yes/no)


---

**Enter 'q' or 'quit' to exit the refinement process.**

Your response > 


---

*(Interpreting empty input as 'next' or confirmation to proceed...)*

*
➡️ Sending your message to the AI assistant...*

### 🤖 AI Assistant response:

---

## Position Name / Title: Freelance (Full-Stack Developer)
- Company Name / Org / Issuer: Freelance

### Original Description / Highlights
Designed custom web solutions using Django and Vue.js, 'Integrated Stripe, Twilio, and RESTful APIs for client systems', Built dashboards and CRMs for non-profits and startups

### Suggested Refinement (ATS-Friendly)

Developed and deployed full-stack web applications, integrating machine learning models and data analytics for diverse client needs.
- Designed and implemented custom web solutions using Django and Vue.js, incorporating data-driven features and scalable architectures.
- Integrated RESTful APIs, including Stripe and Twilio, to enhance application functionality and enable seamless data exchange between systems.
- Built interactive dashboards and CRMs for non-profits and startups, leveraging data visualization techniques to provide actionable insights and improve decision-making.

### Changes Made & Keywords Used
* **Explanation:** Reframed the experience to emphasize the full-stack development skills while subtly incorporating keywords related to data analysis and model integration. The focus is shifted towards quantifiable achievements and the impact of the developed solutions.
* **Keywords Incorporated:** Data Analysis, Model Deployment, APIs, Data-Driven
---
**Shall we refine this suggestion further, proceed to the next highlighted item, or stop?**
(Enter your feedback, 'next', or 'q' to quit)


---

**Enter 'q' or 'quit' to exit the refinement process.**

Your response > q


---

*
Exiting interactive refinement loop.*

--- **Interactive Chat Finished** ---

## 8. Finished! Next Steps

Great work! You've successfully used the AI assistant to refine key parts of your CV based on the specific job description you provided.

### ✅ What You Achieved:

* **Loaded & Indexed:** Your CV information was loaded and organized into a "Smart Index".
* **Found Relevance:** The AI identified which parts of your CV were most relevant to the target job description.
* **Refined Content:** Through the interactive chat, you received suggestions for rewriting those relevant sections, incorporating important keywords and focusing on impact to better appeal to both ATS software and human reviewers.

### 🚀 Next Steps: IMPORTANT!

The AI provided suggestions, but the final step is yours:

1.  **Copy the Refined Text:** Carefully scroll back through the chat output in **Step 7.2** above. Copy the final "Suggested Refinement" text snippets for the CV items you are happy with.
2.  **Update Your Actual CV:** Paste the copied text into your real CV document (e.g., your Microsoft Word `.docx` file, Google Doc, or wherever you maintain your CV). Replace the old descriptions with the new, refined versions.
3.  **Review & Polish:** Read through your entire updated CV. Ensure the new text flows well with the rest of your document, maintains your personal tone, and is free of errors. Make any final tweaks needed.

Tailoring your CV for each application significantly increases your chances. Good luck with your job search!

### About This Project & Feedback

🌟 **CONGRATULATIONS!** 🌟

If you’ve made it all the way to the end of this notebook (whether you’re testing it out, a fellow data enthusiast, or just curious about the process), you definitely deserve some kudos!

This project is constantly evolving, and your feedback helps keep it improving.

👉 **Show some love by giving the project's repository a star on GitHub:**
[https://github.com/framunoz/cv-analyser-with-rag/](https://github.com/framunoz/cv-analyser-with-rag/)

A single ⭐ helps the framework grow, motivates new features, and lets the creators know this work is valuable to the community.

Thanks for reading, building, and experimenting alongside us—see you in the next commit!

### Authors

* [Francisco Muñoz Guajardo](https://www.linkedin.com/in/femunozg/)
* [Gabriel Ortega Hernández](https://www.linkedin.com/in/gabriel-ortega-hernandez-evaluacion-educativa/)

### How to Cite This Work

If you use this notebook or the underlying project in your work, please cite it as follows:

**APA Style (example):**

Muñoz Guajardo, F., & Ortega Hernández, G. (2025). *AI-Driven CV Optimisation with RAG* (Colab Notebook). Retrieved from https://github.com/framunoz/cv-analyser-with-rag/

**BibTeX Entry (example):**

```bibtex
@misc{MunozOrtega2025CVA RAG,
  author = {Mu{\~n}oz Guajardo, Francisco and Ortega Hern{\'a}ndez, Gabriel},
  title = {{AI-Driven CV Optimisation with RAG (Colab Notebook)}},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{[https://github.com/framunoz/cv-analyser-with-rag/](https://github.com/framunoz/cv-analyser-with-rag/)}}
}
```