In [48]:
import pandas as pd
data = pd.read_csv('output_outline_1.csv')

In [49]:
data

Unnamed: 0,file,title,level,text,page
0,E0CCG5S312.json,Overview Foundation Level Extensions,H1,Revision History,2
1,E0CCG5S312.json,Overview Foundation Level Extensions,H1,Table of Contents,3
2,E0CCG5S312.json,Overview Foundation Level Extensions,H1,Acknowledgements,4
3,E0CCG5S312.json,Overview Foundation Level Extensions,H1,1. Introduction to the Foundation Level Extens...,5
4,E0CCG5S312.json,Overview Foundation Level Extensions,H1,2. Introduction to Foundation Level Agile Test...,6
...,...,...,...,...,...
166,Z1QW23ER45.json,Marketing Plan: Product Launch 2025,H3,Social Media Campaigns,7
167,Z1QW23ER45.json,Marketing Plan: Product Launch 2025,H3,Influencer Partnerships,8
168,Z1QW23ER45.json,Marketing Plan: Product Launch 2025,H2,Traditional Marketing,9
169,Z1QW23ER45.json,Marketing Plan: Product Launch 2025,H1,Budget Allocation,10


In [50]:
# Check data structure and unique levels
print("Data shape:", data.shape)
print("\nColumn names:", data.columns.tolist())
print("\nUnique levels:", data['level'].unique())
print("\nLevel counts:")
print(data['level'].value_counts())
print("\nFirst few rows:")
print(data.head())

Data shape: (171, 5)

Column names: ['file', 'title', 'level', 'text', 'page']

Unique levels: ['H1' 'H2' 'H3' 'H4']

Level counts:
level
H2    72
H1    59
H3    36
H4     4
Name: count, dtype: int64

First few rows:
              file                                    title level  \
0  E0CCG5S312.json  Overview  Foundation Level Extensions      H1   
1  E0CCG5S312.json  Overview  Foundation Level Extensions      H1   
2  E0CCG5S312.json  Overview  Foundation Level Extensions      H1   
3  E0CCG5S312.json  Overview  Foundation Level Extensions      H1   
4  E0CCG5S312.json  Overview  Foundation Level Extensions      H1   

                                                text  page  
0                                  Revision History      2  
1                                 Table of Contents      3  
2                                  Acknowledgements      4  
3  1. Introduction to the Foundation Level Extens...     5  
4  2. Introduction to Foundation Level Agile Test...     6  


In [51]:
# Import required libraries for machine learning
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
from sklearn.preprocessing import LabelEncoder
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [52]:
# Data preprocessing and feature engineering
print("Preparing data for Random Forest classification...")

# Clean and prepare the data
data_clean = data.dropna(subset=['title', 'text', 'level'])

# Combine title and text for feature extraction
data_clean['combined_text'] = data_clean['title'].astype(str) + ' ' + data_clean['text'].astype(str)

# Remove empty or very short texts
data_clean = data_clean[data_clean['combined_text'].str.len() > 3]

print(f"Cleaned data shape: {data_clean.shape}")
print(f"Level distribution after cleaning:")
print(data_clean['level'].value_counts())

Preparing data for Random Forest classification...
Cleaned data shape: (166, 6)
Level distribution after cleaning:
level
H2    70
H1    57
H3    35
H4     4
Name: count, dtype: int64


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_clean['combined_text'] = data_clean['title'].astype(str) + ' ' + data_clean['text'].astype(str)


In [53]:
# Feature extraction using TF-IDF with multilingual UTF-8 support
print("Extracting features using TF-IDF with multilingual support...")

# Create TF-IDF features from combined text with UTF-8 and multilingual support
vectorizer = TfidfVectorizer(
    max_features=2000,  # Increased for multilingual content
    ngram_range=(1, 3),  # Use unigrams, bigrams, and trigrams for better multilingual support
    min_df=1,  # Reduced to handle diverse languages
    max_df=0.9,  # Slightly increased threshold
    lowercase=True,  # Convert to lowercase for consistency
    analyzer='char_wb',  # Character-based analysis for multilingual support
    encoding='utf-8',  # Explicit UTF-8 encoding
    decode_error='ignore',  # Handle encoding errors gracefully
    strip_accents='unicode',  # Handle accented characters
    token_pattern=r'(?u)\b\w+\b'  # Unicode-aware word boundaries
)

# Alternative configuration for word-based analysis (uncomment if preferred)
# vectorizer = TfidfVectorizer(
#     max_features=2000,
#     stop_words=None,  # No stop words for multilingual support
#     ngram_range=(1, 2),
#     min_df=1,
#     max_df=0.9,
#     lowercase=True,
#     encoding='utf-8',
#     decode_error='ignore',
#     strip_accents='unicode'
# )

# Fit and transform the text data
X = vectorizer.fit_transform(data_clean['combined_text'])
y = data_clean['level']

print(f"Feature matrix shape: {X.shape}")
print(f"Target variable shape: {y.shape}")
print(f"Vectorizer encoding: {vectorizer.encoding}")
print(f"Vectorizer analyzer: {vectorizer.analyzer}")

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.1, random_state=42, stratify=y
)

print(f"Training set size: {X_train.shape[0]}")
print(f"Testing set size: {X_test.shape[0]}")

Extracting features using TF-IDF with multilingual support...
Feature matrix shape: (166, 1593)
Target variable shape: (166,)
Vectorizer encoding: utf-8
Vectorizer analyzer: char_wb
Training set size: 149
Testing set size: 17




In [57]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import numpy as np

# =============================================================
# 1. Decision Tree Classifier with GridSearchCV and class weighting
# =============================================================
print("Training Decision Tree Classifier with GridSearch and Class Weighting...")

# Define parameter grid for Decision Tree
dt_param_grid = {
    'max_depth': [10, 20, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4],
    'criterion': ['gini', 'entropy']
}

# Create a Decision Tree classifier with balanced class weights
dt_classifier = DecisionTreeClassifier(
    random_state=42,
    class_weight='balanced'  # Address data imbalance
)

# Set up GridSearchCV for Decision Tree
dt_grid_search = GridSearchCV(
    estimator=dt_classifier,
    param_grid=dt_param_grid,
    cv=3,  # 3-fold cross-validation
    n_jobs=-1,  # Use all available cores
    verbose=1
)

# Fit the grid search to the data
dt_grid_search.fit(X_train, y_train)

# Get the best Decision Tree estimator
best_dt = dt_grid_search.best_estimator_

print("\nBest Parameters found for Decision Tree:")
print(dt_grid_search.best_params_)

# Make predictions with the best Decision Tree model
dt_y_pred = best_dt.predict(X_test)

# Evaluate the Decision Tree model
dt_accuracy = accuracy_score(y_test, dt_y_pred)
print(f"\nDecision Tree Accuracy: {dt_accuracy:.4f}")

print("\nDecision Tree Classification Report:")
print(classification_report(y_test, dt_y_pred, zero_division=0))

print("\nDecision Tree Confusion Matrix:")
dt_cm = confusion_matrix(y_test, dt_y_pred, labels=np.unique(y))
print(dt_cm)

# =============================================================
# 2. Random Forest Classifier with GridSearchCV and class weighting
# =============================================================
print("\n" + "="*50)
print("Training Random Forest Classifier with GridSearch and Class Weighting...")

# Define the parameter grid for Random Forest GridSearchCV
rf_param_grid = {
    'n_estimators': [100, 200],
    'max_depth': [10, 20, None],
    'min_samples_split': [2, 5],
    'min_samples_leaf': [1, 2],
    'bootstrap': [True]
}

# Create a Random Forest classifier with balanced class weights
rf_classifier = RandomForestClassifier(
    random_state=42,
    class_weight='balanced'  # Address data imbalance
)

# Set up GridSearchCV for Random Forest
rf_grid_search = GridSearchCV(
    estimator=rf_classifier,
    param_grid=rf_param_grid,
    cv=3,  # 3-fold cross-validation
    n_jobs=-1,  # Use all available cores
    verbose=1
)

# Fit the grid search to the data
rf_grid_search.fit(X_train, y_train)

# Get the best Random Forest estimator
best_rf = rf_grid_search.best_estimator_

print("\nBest Parameters found for Random Forest:")
print(rf_grid_search.best_params_)

# Make predictions with the best Random Forest model
rf_y_pred = best_rf.predict(X_test)

# Evaluate the Random Forest model
rf_accuracy = accuracy_score(y_test, rf_y_pred)
print(f"\nRandom Forest Accuracy: {rf_accuracy:.4f}")

print("\nRandom Forest Classification Report:")
print(classification_report(y_test, rf_y_pred, zero_division=0))

print("\nRandom Forest Confusion Matrix:")
rf_cm = confusion_matrix(y_test, rf_y_pred, labels=np.unique(y))
print(rf_cm)

# =============================================================
# 3. Compare the models
# =============================================================
print("\n" + "="*50)
print("Model Comparison:")
print(f"Decision Tree Accuracy: {dt_accuracy:.4f}")
print(f"Random Forest Accuracy: {rf_accuracy:.4f}")
print(f"Accuracy Difference (RF - DT): {rf_accuracy - dt_accuracy:.4f}")

Training Decision Tree Classifier with GridSearch and Class Weighting...
Fitting 3 folds for each of 54 candidates, totalling 162 fits

Best Parameters found for Decision Tree:
{'criterion': 'gini', 'max_depth': 10, 'min_samples_leaf': 1, 'min_samples_split': 2}

Decision Tree Accuracy: 0.7059

Decision Tree Classification Report:
              precision    recall  f1-score   support

          H1       0.67      0.67      0.67         6
          H2       0.83      0.71      0.77         7
          H3       0.75      0.75      0.75         4
          H4       0.00      0.00      0.00         0

    accuracy                           0.71        17
   macro avg       0.56      0.53      0.55        17
weighted avg       0.75      0.71      0.73        17


Decision Tree Confusion Matrix:
[[4 1 0 1]
 [1 5 1 0]
 [1 0 3 0]
 [0 0 0 0]]

Training Random Forest Classifier with GridSearch and Class Weighting...
Fitting 3 folds for each of 24 candidates, totalling 72 fits

Best Parameters fo

In [61]:
# =============================================================
# 4. Function to predict heading level for new title and text
# =============================================================

def predict_heading_level(title, text, vectorizer, model):
    """
    Predict the heading level (H1, H2, H3, H4) for a given title and text

    Args:
        title (str): The heading text
        text (str): The content under the heading
        vectorizer: Fitted vectorizer used in training (e.g., TfidfVectorizer)
        model: Trained classification model (e.g., best_rf)

    Returns:
        tuple: (predicted_level, confidence_scores)
    """
    # Combine title and text
    combined_text = str(title) + ' ' + str(text)

    # Transform the input using the trained vectorizer
    text_features = vectorizer.transform([combined_text])

    # Predict label
    prediction = model.predict(text_features)[0]

    # Predict confidence scores (probabilities)
    probabilities = model.predict_proba(text_features)[0]
    classes = model.classes_

    # Map classes to probabilities
    confidence_scores = dict(zip(classes, probabilities))

    return prediction, confidence_scores


In [62]:
# =============================================================
# 5. Testing the function with examples
# =============================================================
print("\n" + "="*50)
print("Testing the prediction function using best_rf:")
print("="*50)

examples = [
    ("Introduction", "This document provides an overview of the foundation level extensions"),
    ("Testing Methods", "Various testing approaches and methodologies used in software development"),
    ("Specific Implementation Details", "Detailed explanation of implementation steps and procedures")
]

for idx, (title, text) in enumerate(examples, 1):
    pred_level, conf_scores = predict_heading_level(title, text, vectorizer, best_rf)
    print(f"\nExample {idx}:")
    print(f"Title: '{title}'")
    print(f"Text: '{text}'")
    print(f"Predicted Level: {pred_level}")
    print(f"Confidence Scores: {conf_scores}")



Testing the prediction function using best_rf:

Example 1:
Title: 'Introduction'
Text: 'This document provides an overview of the foundation level extensions'
Predicted Level: H1
Confidence Scores: {'H1': 0.494875751950144, 'H2': 0.4076307288127603, 'H3': 0.09306494780852431, 'H4': 0.004428571428571429}

Example 2:
Title: 'Testing Methods'
Text: 'Various testing approaches and methodologies used in software development'
Predicted Level: H2
Confidence Scores: {'H1': 0.3935607981888293, 'H2': 0.4446521956604157, 'H3': 0.15735843472218367, 'H4': 0.004428571428571429}

Example 3:
Title: 'Specific Implementation Details'
Text: 'Detailed explanation of implementation steps and procedures'
Predicted Level: H1
Confidence Scores: {'H1': 0.48617563609843606, 'H2': 0.3697804555098211, 'H3': 0.1390439083917425, 'H4': 0.005}


In [63]:
# Function to predict heading level using Decision Tree
def predict_heading_level_dt(title, text):
    """
    Predict the heading level (H1, H2, H3, H4) for given title and text using Decision Tree
    
    Args:
        title (str): The title/heading text
        text (str): The content text
    
    Returns:
        tuple: (predicted_level, confidence_scores)
    """
    # Combine title and text
    combined_text = str(title) + ' ' + str(text)
    
    # Transform using the same vectorizer
    text_features = vectorizer.transform([combined_text])
    
    # Predict using Decision Tree
    prediction = best_dt.predict(text_features)[0]
    
    # Get prediction probabilities from Decision Tree
    probabilities = best_dt.predict_proba(text_features)[0]
    classes = best_dt.classes_
    
    # Create confidence scores dictionary
    confidence_scores = dict(zip(classes, probabilities))
    
    return prediction, confidence_scores

# Test the Decision Tree prediction function with some examples
print("Testing the Decision Tree prediction function:")
print("-" * 50)

# Example 1
title1 = "Introduction"
text1 = "This document provides an overview of the foundation level extensions"
pred1, conf1 = predict_heading_level_dt(title1, text1)
print(f"Title: '{title1}'")
print(f"Text: '{text1}'")
print(f"Predicted Level: {pred1}")
print(f"Confidence Scores: {conf1}")
print()

# Example 2
title2 = "Testing Methods"
text2 = "Various testing approaches and methodologies used in software development"
pred2, conf2 = predict_heading_level_dt(title2, text2)
print(f"Title: '{title2}'")
print(f"Text: '{text2}'")
print(f"Predicted Level: {pred2}")
print(f"Confidence Scores: {conf2}")
print()

# Example 3
title3 = "Specific Implementation Details"
text3 = "Detailed explanation of implementation steps and procedures"
pred3, conf3 = predict_heading_level_dt(title3, text3)
print(f"Title: '{title3}'")
print(f"Text: '{text3}'")
print(f"Predicted Level: {pred3}")
print(f"Confidence Scores: {conf3}")

Testing the Decision Tree prediction function:
--------------------------------------------------
Title: 'Introduction'
Text: 'This document provides an overview of the foundation level extensions'
Predicted Level: H3
Confidence Scores: {'H1': 0.0, 'H2': 0.0, 'H3': 1.0, 'H4': 0.0}

Title: 'Testing Methods'
Text: 'Various testing approaches and methodologies used in software development'
Predicted Level: H1
Confidence Scores: {'H1': 1.0, 'H2': 0.0, 'H3': 0.0, 'H4': 0.0}

Title: 'Specific Implementation Details'
Text: 'Detailed explanation of implementation steps and procedures'
Predicted Level: H3
Confidence Scores: {'H1': 0.0, 'H2': 0.0, 'H3': 1.0, 'H4': 0.0}


In [65]:
print("\n" + "=" * 60)
print("🔤 Testing Multilingual Capabilities for Heading Prediction")
print("=" * 60)

# Define multilingual examples
examples = [
    # English
    ("ENGLISH", "Chapter Introduction", "This chapter covers the basic concepts"),
    
    # Spanish
    ("SPANISH", "Introducción al Capítulo", "Este capítulo cubre los conceptos básicos"),
    
    # French
    ("FRENCH", "Introduction du Chapitre", "Ce chapitre couvre les concepts de base"),
    
    # German
    ("GERMAN", "Kapitel Einführung", "Dieses Kapitel behandelt die Grundkonzepte"),
    
    # Accented Spanish
    ("ACCENTED (ES)", "Configuración Avanzada", "Configuración detallada de parámetros específicos"),
    
    # Empty title with text
    ("EDGE CASE (Empty Title)", "", "What Colleges Say!"),
    
    # Mixed English + French
    ("MIXED LANGUAGE", "API Documentation", "Documentation complète pour l'API REST"),
    
    # Japanese
    ("JAPANESE", "章の紹介", "この章では基本的な概念について説明します")
]

# Run predictions
for idx, (lang, title, text) in enumerate(examples, 1):
    pred, conf = predict_heading_level(title, text, vectorizer, best_rf)
    top_conf = max(conf.values())
    print(f"\n{idx}. 🌐 Language: {lang}")
    print(f"   📝 Title: '{title}'")
    print(f"   📄 Text : '{text}'")
    print(f"   🔍 Predicted Level: {pred}")
    print(f"   📊 Confidence: {top_conf:.3f}")



🔤 Testing Multilingual Capabilities for Heading Prediction

1. 🌐 Language: ENGLISH
   📝 Title: 'Chapter Introduction'
   📄 Text : 'This chapter covers the basic concepts'
   🔍 Predicted Level: H1
   📊 Confidence: 0.554

2. 🌐 Language: SPANISH
   📝 Title: 'Introducción al Capítulo'
   📄 Text : 'Este capítulo cubre los conceptos básicos'
   🔍 Predicted Level: H1
   📊 Confidence: 0.476

3. 🌐 Language: FRENCH
   📝 Title: 'Introduction du Chapitre'
   📄 Text : 'Ce chapitre couvre les concepts de base'
   🔍 Predicted Level: H1
   📊 Confidence: 0.479

4. 🌐 Language: GERMAN
   📝 Title: 'Kapitel Einführung'
   📄 Text : 'Dieses Kapitel behandelt die Grundkonzepte'
   🔍 Predicted Level: H2
   📊 Confidence: 0.472

5. 🌐 Language: ACCENTED (ES)
   📝 Title: 'Configuración Avanzada'
   📄 Text : 'Configuración detallada de parámetros específicos'
   🔍 Predicted Level: H2
   📊 Confidence: 0.417

6. 🌐 Language: EDGE CASE (Empty Title)
   📝 Title: ''
   📄 Text : 'What Colleges Say!'
   🔍 Predicted Level:

In [66]:
# Test multilingual capabilities
print("Testing multilingual capabilities:")
print("=" * 60)

# English examples
print("ENGLISH EXAMPLES:")
print("-" * 30)
pred_en1, conf_en1 = predict_heading_level_dt("Chapter Introduction", "This chapter covers the basic concepts")
print(f"EN Title: 'Chapter Introduction'")
print(f"EN Text: 'This chapter covers the basic concepts'")
print(f"Predicted: {pred_en1}, Confidence: {max(conf_en1.values()):.3f}")
print()

# Spanish examples
print("SPANISH EXAMPLES:")
print("-" * 30)
pred_es1, conf_es1 = predict_heading_level_dt("Introducción al Capítulo", "Este capítulo cubre los conceptos básicos")
print(f"ES Title: 'Introducción al Capítulo'")
print(f"ES Text: 'Este capítulo cubre los conceptos básicos'")
print(f"Predicted: {pred_es1}, Confidence: {max(conf_es1.values()):.3f}")
print()

# French examples
print("FRENCH EXAMPLES:")
print("-" * 30)
pred_fr1, conf_fr1 = predict_heading_level_dt("Introduction du Chapitre", "Ce chapitre couvre les concepts de base")
print(f"FR Title: 'Introduction du Chapitre'")
print(f"FR Text: 'Ce chapitre couvre les concepts de base'")
print(f"Predicted: {pred_fr1}, Confidence: {max(conf_fr1.values()):.3f}")
print()

# German examples
print("GERMAN EXAMPLES:")
print("-" * 30)
pred_de1, conf_de1 = predict_heading_level_dt("Kapitel Einführung", "Dieses Kapitel behandelt die Grundkonzepte")
print(f"DE Title: 'Kapitel Einführung'")
print(f"DE Text: 'Dieses Kapitel behandelt die Grundkonzepte'")
print(f"Predicted: {pred_de1}, Confidence: {max(conf_de1.values()):.3f}")
print()

# Test with accented characters
print("ACCENTED CHARACTERS:")
print("-" * 30)
pred_acc, conf_acc = predict_heading_level_dt("Configuración Avanzada", "Configuración detallada de parámetros específicos")
print(f"Accented Title: 'Configuración Avanzada'")
print(f"Accented Text: 'Configuración detallada de parámetros específicos'")
print(f"Predicted: {pred_acc}, Confidence: {max(conf_acc.values()):.3f}")
print()

print("TEST:")
print("-" * 30)
pred_acc2, conf_acc2 = predict_heading_level_dt(" ", "What Colleges Say!")
print(f"Accented Title: ''")
print(f"Accented Text: 'What Colleges Say!'")
print(f"Predicted: {pred_acc2}, Confidence: {max(conf_acc2.values()):.3f}")
print()

# Test with mixed languages
print("MIXED LANGUAGE:")
print("-" * 30)
pred_mix, conf_mix = predict_heading_level_dt("API Documentation", "Documentation complète pour l'API REST")
print(f"Mixed Title: 'API Documentation'")
print(f"Mixed Text: 'Documentation complète pour l'API REST'")
print(f"Predicted: {pred_mix}, Confidence: {max(conf_mix.values()):.3f}")
print()

# Japanese examples
print("JAPANESE EXAMPLES:")
print("-" * 30)
pred_jp, conf_jp = predict_heading_level_dt("章の紹介", "この章では基本的な概念について説明します")
print(f"JP Title: '章の紹介'")
print(f"JP Text: 'この章では基本的な概念について説明します'")
print(f"Predicted: {pred_jp}, Confidence: {max(conf_jp.values()):.3f}")

Testing multilingual capabilities:
ENGLISH EXAMPLES:
------------------------------
EN Title: 'Chapter Introduction'
EN Text: 'This chapter covers the basic concepts'
Predicted: H1, Confidence: 1.000

SPANISH EXAMPLES:
------------------------------
ES Title: 'Introducción al Capítulo'
ES Text: 'Este capítulo cubre los conceptos básicos'
Predicted: H2, Confidence: 1.000

FRENCH EXAMPLES:
------------------------------
FR Title: 'Introduction du Chapitre'
FR Text: 'Ce chapitre couvre les concepts de base'
Predicted: H3, Confidence: 1.000

GERMAN EXAMPLES:
------------------------------
DE Title: 'Kapitel Einführung'
DE Text: 'Dieses Kapitel behandelt die Grundkonzepte'
Predicted: H2, Confidence: 1.000

ACCENTED CHARACTERS:
------------------------------
Accented Title: 'Configuración Avanzada'
Accented Text: 'Configuración detallada de parámetros específicos'
Predicted: H3, Confidence: 1.000

TEST:
------------------------------
Accented Title: ''
Accented Text: 'What Colleges Say!'
Pre

# 🐛 Document Processing Issue Analysis

The enhanced Document Intelligence System is encountering a critical bug where PDF documents are being closed prematurely during processing. This is causing all 15 PDF files in Collection 2 to fail processing.

## Issue Details:
- **Error**: "document closed" occurring for all PDF files
- **Root Cause**: Improper PDF document lifecycle management in the enhanced document processor
- **Impact**: No documents are successfully processed, causing the system to exit

## Solution Required:
We need to fix the document processor to properly handle PDF document objects and ensure they remain open during the entire processing cycle.

# 🔍 Root Cause Analysis

**Issue Identified**: The `document closed` error is caused by premature closing of PDF documents in the processing pipeline.

**Technical Details**:
- The `process_document` method closes the PDF document after processing
- However, the return statement happens after the `finally` block which tries to access `doc.page_count`
- This creates a race condition where the document is accessed after being closed

**Solution Strategy**:
We need to create a completely isolated document processing approach where each document is processed in its own context without premature closing.