# Secure PDF Bill of Materials (BoM) Analyzer (via ScopePDF API)

Welcome! This Google Colab notebook provides a secure interface to analyze Bill of Materials (BoM) data extracted from PDF files using the **ScopePDF API**, part of the Scope4 API family.

**What you can do here:**
* Securely authenticate to use the notebook.
* Upload a PDF file containing a Bill of Materials.
* Send the PDF securely to the ScopePDF API for analysis.
* View the extracted materials data directly in the notebook.
* Download the extracted data as a CSV file.

**Security:**
* Access to this notebook requires a password.
* This password also authenticates your requests to the ScopePDF API.
* Data is transferred securely via HTTPS.

**Prerequisites:**
* You need a **password** for this notebook. This acts as your API key for the ScopePDF API. You will be prompted to enter it when you run the setup cell.
* **How to get a password/API key:** Please request access via the [Builder Form](https://scope4.dev/request-form-builder) or [Enterprise Form](https://scope4.dev/request-form-enterprise) (the same process as for the ScopeGreen API).

*For questions or support, contact tommaso@scope4.dev.*

## 1. Setup and Authentication

Run this first code cell. It imports necessary libraries and handles the authentication process. You will be prompted for your password.

In [None]:
#@title üîê Secure PDF Bill of Materials Analyzer Setup
# Access to this notebook's functionality is password protected.

import os
import json
import base64
import pandas as pd
import requests
from google.colab import files, output
import ipywidgets as widgets
from IPython.display import display, HTML, Javascript
import hashlib
import getpass

# --- Configuration ---

# IMPORTANT: Administrator needs to set the correct ScopePDF API endpoint URL.
SCOPEPDF_API_URL = "https://europe-west1-lcabench-api.cloudfunctions.net/process_bom_from_colab"
# IMPORTANT: Administrator needs to set the correct SHA256 hash of the password.
# This hash is used BOTH for notebook access AND as the Bearer token for the ScopePDF API.
EXPECTED_PASSWORD_HASH = "your-SHA256-password"

# --- Global variable to store the auth token after successful login ---
COLAB_SESSION_TOKEN = None

# --- Password Authentication Function ---
def authenticate_user():
    """Prompts user for password, validates it, and sets the session token."""
    global COLAB_SESSION_TOKEN

    # Check if the placeholder hash is still present
    if EXPECTED_PASSWORD_HASH == "YOUR_SHA256_PASSWORD_HASH_HERE" or EXPECTED_PASSWORD_HASH == "your-SHA256-password":
        display(HTML("<p style='color:red; font-weight:bold;'>ADMIN ACTION REQUIRED: Please set the 'EXPECTED_PASSWORD_HASH' in the Colab script. This hash is required for both notebook access and ScopePDF API authentication.</p>"))
        return False

    display(HTML("<h4>üîí Notebook Access Authentication</h4>"))
    # Use getpass for secure password input in Colab
    user_entered_password = getpass.getpass("Enter the password required for this notebook: ")

    # Hash the entered password using SHA256
    hashed_user_input = hashlib.sha256(user_entered_password.encode('utf-8')).hexdigest()

    # Compare the hashed input with the expected hash
    if hashed_user_input == EXPECTED_PASSWORD_HASH:
        display(HTML("<p style='color:green;'>Authentication successful. Welcome!</p><hr>"))
        # Set the session token upon successful authentication (using the hash as the token)
        COLAB_SESSION_TOKEN = EXPECTED_PASSWORD_HASH
        return True
    else:
        display(HTML("<p style='color:red; font-weight:bold;'>Authentication failed. Incorrect password.</p>"))
        COLAB_SESSION_TOKEN = None
        return False

# --- UI Elements (initially None or hidden until authenticated) ---
file_upload_widget = None
process_button_widget = None
# Output area to display results and messages
output_display_area = widgets.Output()

# --- Action Handler for PDF Processing --- (Interacts with ScopePDF API)
def on_process_button_clicked(b):
    """Handles the 'Analyze PDF Securely' button click event."""
    global COLAB_SESSION_TOKEN
    # Use the output area to display messages for this operation
    with output_display_area:
        output_display_area.clear_output(wait=True) # Clear previous output

        # 1. Check Authentication
        if not COLAB_SESSION_TOKEN:
            display(HTML("<p style='color:red; font-weight:bold;'>Error: Not authenticated. Please re-run the authentication cell (Cell 1).</p>"))
            return

        # 2. Check ScopePDF API URL Configuration
        if SCOPEPDF_API_URL == "YOUR_HTTPS_TRIGGER_URL_FROM_GOOGLE_CLOUD_FUNCTION_HERE" or not SCOPEPDF_API_URL:
            display(HTML("<p style='color:red; font-weight:bold;'>Error: ScopePDF API URL is not configured in the notebook. Please contact the administrator.</p>"))
            return

        # 3. Check if a file has been uploaded
        if not file_upload_widget or not file_upload_widget.value:
            display(HTML("<p style='color:orange;'>Please upload a PDF file first.</p>"))
            return

        # 4. Process Uploaded File
        uploaded_file_data_dict = file_upload_widget.value
        # Ensure the uploaded data structure is as expected
        if not isinstance(uploaded_file_data_dict, dict) or not uploaded_file_data_dict:
            display(HTML("<p style='color:red;'>Error reading uploaded file data.</p>"))
            return

        try:
            # Extract file info (assuming single file upload)
            first_file_name = list(uploaded_file_data_dict.keys())[0]
            uploaded_file_info = uploaded_file_data_dict[first_file_name]
            pdf_filename = uploaded_file_info['metadata']['name']
            pdf_content_bytes = uploaded_file_info['content']

            display(HTML(f"<p>Sending '{pdf_filename}' to ScopePDF API for analysis... This may take a moment.</p>"))

            # 5. Prepare data for ScopePDF API
            base64_encoded_pdf = base64.b64encode(pdf_content_bytes).decode('utf-8')
            payload_to_api = {
                "pdf_filename": pdf_filename,
                "pdf_base64_content": base64_encoded_pdf
            }

            # 6. Set up headers for the API request, including the auth token
            request_headers = {
                "Content-Type": "application/json",
                "Authorization": f"Bearer {COLAB_SESSION_TOKEN}" # Use the validated password hash as token
            }

            # 7. Send request to the ScopePDF API
            response_from_api = requests.post(
                SCOPEPDF_API_URL,
                json=payload_to_api,
                headers=request_headers,
                timeout=660 # Set a generous timeout (11 minutes) for potentially long processing
            )
            # Raise an exception for bad status codes (4xx or 5xx)
            response_from_api.raise_for_status()

            # 8. Process ScopePDF API Response
            api_result = response_from_api.json()

            # Handle successful response with extracted data
            if api_result.get("status") == "success" and "extracted_data" in api_result:
                display(HTML("<h4 style='color:green;'>Analysis Successful!</h4>"))
                extracted_data = api_result["extracted_data"]
                materials = extracted_data.get("materials", [])

                # Display extracted materials if available
                if isinstance(materials, list) and materials:
                    df = pd.DataFrame(materials)
                    display(HTML("<h5>Extracted Materials:</h5>"))
                    display(df) # Display DataFrame in Colab

                    # Generate and save CSV
                    csv_filename = f"{os.path.splitext(pdf_filename)[0]}_extracted_materials.csv"
                    df.to_csv(csv_filename, index=False)
                    display(HTML(f"<p>File '{csv_filename}' has been generated and saved. You can download it from the 'Files' panel on the left-hand side of the Colab interface.</p>"))

                elif not materials:
                    display(HTML("<p>No materials were extracted from the PDF according to the ScopePDF API analysis.</p>"))
                else:
                    # Handle case where 'materials' is present but not a list or empty
                    display(HTML("<p style='color:orange;'>The 'materials' data from the ScopePDF API was not in the expected list format or was empty.</p>"))

            # Handle API error response
            elif "error" in api_result:
                error_msg = api_result.get("error", "Unknown error from ScopePDF API.")
                details_msg = api_result.get("details", "")
                raw_output_msg = api_result.get("raw_llm_output", "") # Optional raw output from API
                display(HTML(f"<p style='color:red; font-weight:bold;'>ScopePDF API Error: {error_msg}</p>"))
                if details_msg: display(HTML(f"<p style='color:red;'>Details: {str(details_msg)}</p>"))
                # Display a snippet of raw LLM output if provided by the API for debugging
                if raw_output_msg: display(HTML(f"<p style='color:red;'>Raw LLM (from API - snippet): {str(raw_output_msg)[:500]}...</p>"))

            # Handle unexpected API response structure
            else:
                display(HTML("<p style='color:red;'>Received an unexpected response structure from the ScopePDF API.</p>"))
                display(HTML(f"<pre>{json.dumps(api_result, indent=2)}</pre>")) # Show the raw response

        # 9. Handle Specific Errors (HTTP, Network, General)
        except requests.exceptions.HTTPError as http_err:
            error_text = f"HTTP Error communicating with ScopePDF API: {http_err}."
            # Try to get more details from the response body
            if http_err.response is not None:
                try:
                    error_details = http_err.response.json()
                    error_text += f" Server status: {http_err.response.status_code}. API said: {json.dumps(error_details)}"
                except json.JSONDecodeError:
                    error_text += f" Server status: {http_err.response.status_code}. API said: {http_err.response.text}"
            else:
                 error_text += " No response received from API."
            display(HTML(f"<p style='color:red;'>{error_text}</p>"))
        except requests.exceptions.RequestException as req_err:
            display(HTML(f"<p style='color:red;'>Network error communicating with ScopePDF API: {req_err}</p>"))
        except Exception as e:
            display(HTML(f"<p style='color:red;'>An unexpected error occurred in the Colab notebook: {e}</p>"))

        # 10. Cleanup: Reset the file upload widget visually
        finally:
            if file_upload_widget and hasattr(file_upload_widget, 'value') and file_upload_widget.value:
                # Clear the internal value
                file_upload_widget.value.clear()
                # Force a visual reset by incrementing the internal counter (common workaround)
                file_upload_widget._counter = getattr(file_upload_widget, '_counter', 0) + 1


# --- Main Application Flow --- (Initialization Part)
def initialize_interface():
    """Sets up the UI elements after successful authentication."""
    global file_upload_widget, process_button_widget

    # Display configuration warnings if needed (for administrator)
    config_ok = True
    if SCOPEPDF_API_URL == "YOUR_HTTPS_TRIGGER_URL_FROM_GOOGLE_CLOUD_FUNCTION_HERE" or not SCOPEPDF_API_URL:
        display(HTML("<h2 style='color:red;'>ADMIN CONFIGURATION NEEDED!</h2>"))
        display(HTML("<p style='color:red; font-weight:bold;'>The 'SCOPEPDF_API_URL' variable in this notebook's code is NOT set. Please edit the script and replace the placeholder with the actual ScopePDF API URL.</p><hr>"))
        config_ok = False

    if EXPECTED_PASSWORD_HASH == "YOUR_SHA256_PASSWORD_HASH_HERE" or EXPECTED_PASSWORD_HASH == "your-SHA256-password":
        display(HTML("<h2 style='color:red;'>ADMIN CONFIGURATION NEEDED!</h2>"))
        display(HTML("<p style='color:red; font-weight:bold;'>The 'EXPECTED_PASSWORD_HASH' variable in this notebook's code is NOT set. Please generate a password hash and update the script. This hash is also used as the ScopePDF API access token.</p><hr>"))
        config_ok = False

    if not config_ok:
      return # Stop if configuration is missing

    # --- Authentication Step ---
    if authenticate_user():
        # --- Post-Authentication UI Setup ---

        display(HTML("<h2>2. Upload and Analyze PDF</h2>"))
        # 1. File Upload Widget
        display(HTML("<h3>Upload your BoM PDF File:</h3>"))
        file_upload_widget = widgets.FileUpload(
            accept='.pdf',  # Only accept PDF files
            multiple=False, # Allow only single file upload
            description='Upload BoM PDF'
        )
        display(file_upload_widget)

        # 2. Process Button
        display(HTML("<h3>Start Secure Analysis:</h3>"))
        process_button_widget = widgets.Button(
            description="Analyze PDF Securely",
            button_style='success', # Green button
            tooltip='Upload a PDF and click to start the secure analysis via ScopePDF API',
            icon='rocket' # Add an icon
        )
        # Link the button click to the handler function
        process_button_widget.on_click(on_process_button_clicked)
        display(process_button_widget)

        # 3. Output Area
        display(HTML("<hr><h3>Results:</h3>"))
        display(output_display_area) # Display the area where results/errors will appear
    else:
        # Authentication failed, message already displayed by authenticate_user()
        pass

# --- Entry Point ---
# Run the initialization function when this cell is executed.
initialize_interface()
