
# Code Overview

This code implements a **real-time data anonymization system** using **GPT-4o** for text extraction and **Presidio** for sensitive data detection and anonymization. It processes webcam input, detects sensitive information, anonymizes it, and provides annotated output via a Gradio interface.

---

## Functions and Classes

### **GetAnalyzer**
- **`__init__`**: Initializes Presidio's analyzer engine, sets up supported languages, and loads recognizers for identifying sensitive information (e.g., numbers, credit cards, emails, etc.).
- **`GetNLPEngine(initLanguage)`**: Returns a configured NLP engine for the specified language.
- **`Run(initLanguage)`**: Creates and configures an `AnalyzerEngine` to detect sensitive entities in text.

### **GetAnonymizer**
- **`__init__`**: Initializes the anonymizer engine.
- **`Run`**: Returns a Presidio `AnonymizerEngine` for anonymizing detected sensitive entities.

### **RETURNNLPPROVIDER(initLanguage)**
- Configures and initializes the NLP engine provider for language-specific processing.

### **MainProcessSelection**
- **`__init__(openai_api_key)`**: Initializes the system, including:
  - A GPT-4o model for vision-based text extraction.
  - Presidio's analyzer and anonymizer engines for sensitive data detection and anonymization.
- **`process_frame(frame)`**: Processes a single frame from the webcam by:
  1. Extracting text using GPT-4o.
  2. Detecting sensitive information using Presidio.
  3. Anonymizing sensitive information.
  4. Annotating the frame with detection results.

### **create_gradio_interface**
- Creates a **Gradio interface** to stream webcam input, process frames, and display results, including:
  - Webcam feed.
  - Annotated output.
  - Text extracted by GPT-4o.
  - Sensitive data detected and anonymized by Presidio.

---

### **What the Code Does**
1. Captures frames from a webcam in real time.
2. Extracts text and sensitive information using GPT-4o and Presidio.
3. Anonymizes sensitive data and displays results as:
   - Annotated video feed.
   - Text outputs (extracted and anonymized).
4. Provides a user-friendly interface through Gradio for real-time visualization.


In [None]:
!pip install gradio



In [None]:
# Cell - GetAnalyzer and GetAnonymizer Classes


class GetAnalyzer(object):
    def __init__(self)->CLASSINIT:
        self.recognizers = GetRecognizers()
        self.supportedLanguage = ["en", "es", "fr", "de", "ru", "nl", "xx"]

    def GetNLPEngine(self, initLanguage:str)->PROCESS:
        nlpEngine = RETURNNLPPROVIDER(initLanguage)
        return nlpEngine.create_engine()

    def Run(self, initLanguage:str)->ANALYZER:
        nlpEngine = self.GetNLPEngine(initLanguage=initLanguage)
        analyzer = AnalyzerEngine(
            nlp_engine=nlpEngine,
            context_aware_enhancer=CONTEXTAWARE,
            supported_languages=self.supportedLanguage
        )
        analyzer.registry.add_recognizer(self.recognizers.numberRecognizer)
        analyzer.registry.add_recognizer(self.recognizers.creditcardRecognizer)
        analyzer.registry.add_recognizer(self.recognizers.emailRecognizer)
        analyzer.registry.add_recognizer(self.recognizers.urlRecognizer)
        analyzer.registry.add_recognizer(self.recognizers.phoneRecognizer)
        analyzer.registry.add_recognizer(PERSONNAMERECOGNIZER)
        return analyzer

class GetAnonymizer(object):
    def __init__(self)->CLASSINIT:
        pass

    def Run(self)->ANONYMIZER:
        anonymizerEngine = AnonymizerEngine()
        return anonymizerEngine

def RETURNNLPPROVIDER(initLanguage:str)->PROCESS:
    configurationNLP = DEFINELANGUAGE(initLanguage)  # Get language configuration
    NLPPROVIDER = NlpEngineProvider(nlp_configuration=configurationNLP)  # Create NLP provider
    return NLPPROVIDER

In [None]:
import cv2
import time
import gradio as gr
import numpy as np
from datetime import datetime
import logging

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)



class MainProcessSelection:
    def __init__(self, openai_api_key):
        logger.info("Initializing MainProcessSelection...")
        self.fps = 0
        self.frame_count = 0
        self.frame_time = time.time()

        try:
            # Initialize model operation
            logger.info("Initializing ModelCompletionOperation...")
            self.model_op = ModelCompletionOperation()
            self.model = self.model_op.GetModel()
            logger.info("Model initialization successful")
        except Exception as e:
            logger.error(f"Error initializing model: {str(e)}")
            raise

        try:
            # Initialize Presidio components
            logger.info("Initializing Presidio components...")
            self.analyzer = GetAnalyzer()
            self.analysis_engine = self.analyzer.Run(initLanguage="en")
            self.anonymizer = GetAnonymizer()
            self.anonymizer_engine = self.anonymizer.Run()
            logger.info("Presidio initialization successful")
        except Exception as e:
            logger.error(f"Error initializing Presidio: {str(e)}")
            raise

    def process_frame(self, frame):
        if frame is None:
            logger.warning("Received empty frame")
            return None, "No frame received", "0 FPS", "No OpenAI output", "No Presidio output"

        # Update FPS
        current_time = time.time()
        self.frame_count += 1
        if (current_time - self.frame_time) > 1.0:
            self.fps = self.frame_count
            self.frame_count = 0
            self.frame_time = current_time

        try:
            # OpenAI Vision Analysis
            logger.info("Starting GPT-4o analysis...")
            openai_result = self.model_op.RunVision(
                self.model,
                "Extract and list any text, numbers, or sensitive information visible in this image.",
                frame
            )

            if not openai_result:
                logger.warning("No text detected by GPT-4o")
                openai_output = "No text detected in image"
                presidio_output = "No text to analyze"
                return frame, "No text detected", f"{self.fps} FPS", openai_output, presidio_output

            logger.info(f"GPT-4o extracted text: {openai_result[:100]}...")  # Log first 100 chars
            openai_output = f"""
Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
Extracted Text:
{openai_result}
"""

            # Presidio Analysis
            logger.info("Starting Presidio analysis...")
            analysis_results = self.analysis_engine.analyze(
                text=openai_result,
                language='en'
            )

            if not analysis_results:
                logger.info("No sensitive information detected by Presidio")
            else:
                logger.info(f"Detected {len(analysis_results)} entities")

            # Anonymization
            anonymized_result = self.anonymizer_engine.anonymize(
                text=openai_result,
                analyzer_results=analysis_results,
                operators=OPERATORGENERAL
            )

            presidio_output = f"""
Analysis Results:
Entities Found: {len(analysis_results)}
Types: {', '.join(set(result.entity_type for result in analysis_results)) if analysis_results else 'None'}

Anonymized Text:
{anonymized_result.text}
"""

            # Create annotated frame
            annotated_frame = frame.copy()
            if analysis_results:
                cv2.putText(
                    annotated_frame,
                    f"Found {len(analysis_results)} sensitive items",
                    (10, 30),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    1,
                    (0, 0, 255),
                    2
                )

            return annotated_frame, "Processing complete", f"{self.fps} FPS", openai_output, presidio_output

        except Exception as e:
            logger.error(f"Processing error: {str(e)}", exc_info=True)
            return frame, f"Error: {str(e)}", f"{self.fps} FPS", f"OpenAI Error: {str(e)}", f"Presidio Error: {str(e)}"

def create_gradio_interface():
    logger.info("Creating Gradio interface...")
    processor = MainProcessSelection(project_config["apikey"]["OPENAI_API_KEY"])

    with gr.Blocks() as interface:
        gr.Markdown("""
        # Real-time Data Anonymization System
        Using GPT-4o for text extraction and Presidio for anonymization
        """)

        with gr.Row():
            # Input column
            with gr.Column(scale=1):
                webcam = gr.Image(
                    label="Webcam Feed",
                    sources=["webcam"],
                    streaming=True,
                    mirror_webcam=True
                )
                status_output = gr.Textbox(label="Status")
                fps_output = gr.Textbox(label="Performance")

            # Output column
            with gr.Column(scale=1):
                output_image = gr.Image(label="Processed Feed")

        with gr.Row():
            # Separate boxes for OpenAI and Presidio outputs
            with gr.Column(scale=1):
                openai_output = gr.Textbox(
                    label="OpenAI Vision Analysis",
                    lines=10,
                    placeholder="GPT-4o text extraction results will appear here..."
                )
            with gr.Column(scale=1):
                presidio_output = gr.Textbox(
                    label="Presidio Analysis & Anonymization",
                    lines=10,
                    placeholder="Presidio analysis results will appear here..."
                )

        webcam.stream(
            processor.process_frame,
            inputs=[webcam],
            outputs=[output_image, status_output, fps_output, openai_output, presidio_output],
            show_progress=False
        )

    return interface

if __name__ == "__main__":
    try:
        logger.info("Starting application...")
        demo = create_gradio_interface()
        demo.queue().launch(share=True, debug=True)
    except Exception as e:
        logger.error(f"Application startup error: {str(e)}", exc_info=True)



Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://57e0cc8691d730329b.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)
