VERSION 1

By: Murilo Farias

The code below goes to the VAM folder, analyzes all the subfolders, identifies which files are PDFs, and extracts them to the ALL_PDF folder.
If there are two files with the same name, it keeps the newest one.

os: Provides functions to interact with the operating system (e.g., file paths, checking file properties).
shutil: Used for file operations like copying files.
main_folder specifies the directory where the search for PDF files will begin. The path is a string and uses the r prefix to make it a raw string (to avoid issues with backslashes).
ALL_PDF inside the main_folder if it does not already exist.
os.path.join() is used to combine main_folder with ALL_PDF to create the full path for the destination folder.
os.makedirs(destination_folder, exist_ok=True) ensures that the folder is created, and if it already exists, it won't raise an error.
os.walk(main_folder) generates the file names in the directory tree rooted at main_folder, walking through every subdirectory (root), subdirectories (dirs), and files (files).
The if condition filters out only the files that have the .pdf extension (case-insensitive).
source_file: Full path of the PDF file found in the current subfolder.
destination_file: Full path where the PDF file would go in the destination folder
os.path.exists(destination_file) checks if the file already exists in the destination folder.
If it does exist, the script compares the last modified time of both files using os.path.getmtime().
If the source file is newer (source_modified_time > destination_modified_time), the file is copied to replace the older one in the destination folder.
The destination folder is always named ALL_PDF

1 Define the path to the main folder where all your files are stored (e.g., main_folder)
2 Create a folder (ALL_PDF) inside the main folder to store all the consolidated PDF files
3 Use os.walk() to iterate through all subfolders and files within the main_folder.
4 Check if each file ends with .pdf (case-insensitive).
5 If the PDF exists in the ALL_PDF folder, compare the last modified times of the source and destination files.
6 Copy or replace (if is newer, replace the older ;  if is older, skip the copy process ; If the file doesn't exist in the destination folder, copy it there.)
7 Print messages to indicate whether a file was copied, replaced, or skipped.
8 Once all PDFs are processed, print a final message confirming that all PDFs have been copied to the destination folder.

In [None]:
import os
import shutil

# Set the path to your main folder
main_folder = r"C:\Users\MuriloFarias\OneDrive - Kubik Engenharia Ltda\Área de Trabalho\VAM"

# Create a destination folder to store all PDFs with a fixed name
destination_folder = os.path.join(main_folder, "ALL_PDF")
os.makedirs(destination_folder, exist_ok=True)

# Walk through all subfolders in the main folder
for root, dirs, files in os.walk(main_folder):
    # Skip the ALL_PDF folder
    if destination_folder in root:
        continue

    for file in files:
        if file.lower().endswith(".pdf"):  # Check if the file is a PDF
            source_file = os.path.join(root, file)
            destination_file = os.path.join(destination_folder, file)

            if os.path.exists(destination_file):
                # Compare last modified time of both files
                source_modified_time = os.path.getmtime(source_file)
                destination_modified_time = os.path.getmtime(destination_file)

                if source_modified_time > destination_modified_time:
                    # Replace the older file in the destination folder
                    shutil.copy(source_file, destination_file)
                    print(f"Replaced older file: {file}")
                else:
                    print(f"Skipped older file: {file}")
            else:
                # Copy the file if it doesn't exist in the destination
                shutil.copy(source_file, destination_folder)
                print(f"Copied new file: {file}")

print(f"All PDFs have been copied to: {destination_folder}")


To analyze all PDFs in the ALL_PDF folder and extract the text while ignoring images, diagrams, and other non-text content, I'm using PyPDF2 
1 Install PyPDF2 for text extraction.
2 Use PyPDF2.PdfReader: Reads the PDF file and extracts text from each page.
3 The script loops through all PDFs in the ALL_PDF folder.
4 Extracts text from each PDF.
5 Saves the extracted text into a summary file (extracted_text_summary.txt) for easy review.
6 Catches and logs errors if any PDF cannot be processed.
7 Counts the total number of PDFs in the folder.
8 Displays the progress percentage and the current file being processed in real time.

In [4]:
pip install PyPDF2

Collecting PyPDF2Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: C:\Users\MuriloFarias\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip



  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
   ---------------------------------------- 0.0/232.6 kB ? eta -:--:--
   - -------------------------------------- 10.2/232.6 kB ? eta -:--:--
   ------ -------------------------------- 41.0/232.6 kB 393.8 kB/s eta 0:00:01
   ------------ -------------------------- 71.7/232.6 kB 491.5 kB/s eta 0:00:01
   --------------------- ---------------- 133.1/232.6 kB 782.7 kB/s eta 0:00:01
   ---------------------------------------- 232.6/232.6 kB 1.1 MB/s eta 0:00:00
Installing collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1


In [None]:
import os
from PyPDF2 import PdfReader

# Path to the ALL_PDF folder
all_pdf_folder = r"C:\Users\MuriloFarias\OneDrive - Kubik Engenharia Ltda\Área de Trabalho\VAM\ALL_PDF"

# Function to extract text from a PDF file
def extract_text_from_pdf(pdf_path):
    try:
        reader = PdfReader(pdf_path)
        text = ""
        for page in reader.pages:
            text += page.extract_text()  # Extract text from each page
        return text
    except Exception as e:
        print(f"Error processing {pdf_path}: {e}")
        return ""

# Analyze all PDFs in the folder
all_text_data = {}  # Dictionary to store PDF file names and their extracted text
for file_name in os.listdir(all_pdf_folder):
    if file_name.lower().endswith(".pdf"):  # Check if the file is a PDF
        pdf_path = os.path.join(all_pdf_folder, file_name)
        print(f"Extracting text from: {file_name}")
        extracted_text = extract_text_from_pdf(pdf_path)
        all_text_data[file_name] = extracted_text

# Optional: Save all extracted text to a summary file
summary_file_path = os.path.join(all_pdf_folder, "extracted_text_summary.txt")
with open(summary_file_path, "w", encoding="utf-8") as summary_file:
    for file_name, text in all_text_data.items():
        summary_file.write(f"--- {file_name} ---\n{text}\n\n")

print(f"Text extraction completed! Summary saved to: {summary_file_path}")


In [6]:
import os
import time
from PyPDF2 import PdfReader
from datetime import datetime

# Paths
main_folder = r"C:\Users\MuriloFarias\OneDrive - Kubik Engenharia Ltda\Área de Trabalho\VAM"
all_pdf_folder = os.path.join(main_folder, "ALL_PDF")
summary_file_path = os.path.join(main_folder, "extracted_text_summary.txt")  # Save summary in the main folder

# Start timer
start_time = time.time()

# Load already processed file names and modification dates from the summary file
processed_files = {}
if os.path.exists(summary_file_path):
    with open(summary_file_path, "r", encoding="utf-8") as summary_file:
        for line in summary_file:
            if line.startswith("---"):
                parts = line.strip("--- ").strip().split(" --- ")
                if len(parts) == 3:
                    file_name, file_path, mod_date = parts[0], parts[1], parts[2].replace("Data of file: ", "")
                    processed_files[file_name] = mod_date

# Get all PDF files in the folder
pdf_files = [f for f in os.listdir(all_pdf_folder) if f.lower().endswith(".pdf")]
total_files = len(pdf_files)

# Identify new or updated PDFs
new_files = []
for file_name in pdf_files:
    file_path = os.path.join(all_pdf_folder, file_name)
    mod_time = os.path.getmtime(file_path)  # Get last modification time
    mod_date = datetime.fromtimestamp(mod_time).strftime("%Y-%m-%d")

    if file_name not in processed_files or processed_files[file_name] < mod_date:
        new_files.append((file_name, file_path, mod_date))

# Process new or updated PDFs
processed_count = 0
if not new_files:
    print("No new or updated PDFs to process. All files are up to date!")
else:
    print(f"Found {len(new_files)} new or updated file(s) to process.")
    for index, (file_name, file_path, mod_date) in enumerate(new_files):
        print(f"Processing file {index + 1} of {len(new_files)}: {file_name}")
        
        # Extract text
        try:
            reader = PdfReader(file_path)
            text = ""
            for page in reader.pages:
                text += page.extract_text()

            # Append text to the summary file
            with open(summary_file_path, "a", encoding="utf-8") as summary_file:
                summary_file.write(f"--- {file_name} --- PATH {file_path} --- Data of file: {mod_date}\n{text}\n\n")

            processed_files[file_name] = mod_date
            processed_count += 1
            print(f"Processed: {file_name}")
        except Exception as e:
            print(f"Error processing {file_name}: {e}")

# End timer
end_time = time.time()
elapsed_time = end_time - start_time

# KPIs
success_rate = (processed_count / total_files * 100) if total_files > 0 else 0
failure_rate = 100 - success_rate

print("\n--- Processing Summary ---")
print(f"Run Time: {elapsed_time:.2f} minutes")
print(f"Total PDFs in ALL_PDF folder: {total_files}")
print(f"New or Updated PDFs processed: {processed_count}")
print(f"Success Rate: {success_rate:.2f}%")
print(f"Failure Rate: {failure_rate:.2f}%")


Found 479 new or updated file(s) to process.
Processing file 1 of 479: 123-176.pdf
Processed: 123-176.pdf
Processing file 2 of 479: 131-003.pdf
Processed: 131-003.pdf
Processing file 3 of 479: 183997_APROVAÇÃO_ST.ADELIA_DS-0414-B.pdf
Processed: 183997_APROVAÇÃO_ST.ADELIA_DS-0414-B.pdf
Processing file 4 of 479: 183997_APROVAÇÃO_ST.ADELIA_FD.pdf
Processed: 183997_APROVAÇÃO_ST.ADELIA_FD.pdf
Processing file 5 of 479: 23.2382.0005 - Cubículo de Conexão UFV - CPFL 15kV - Uso Externo - UFV SANTA ADÉLIA.pdf
Processed: 23.2382.0005 - Cubículo de Conexão UFV - CPFL 15kV - Uso Externo - UFV SANTA ADÉLIA.pdf
Processing file 6 of 479: 23.2382.0005.pdf
Processed: 23.2382.0005.pdf
Processing file 7 of 479: 23.2382.0006 - Cubículo de Conexão UFV - CPFL 15kV - Uso Externo - UFV SANTA ADÉLIA.pdf
Processed: 23.2382.0006 - Cubículo de Conexão UFV - CPFL 15kV - Uso Externo - UFV SANTA ADÉLIA.pdf
Processing file 8 of 479: 23.2382.0006.pdf
Processed: 23.2382.0006.pdf
Processing file 9 of 479: 23.2382.0007 - 

unknown widths : 
[0, IndirectObject(1673, 0, 1821560102608)]
unknown widths : 
[0, IndirectObject(1675, 0, 1821560102608)]
unknown widths : 
[0, IndirectObject(1677, 0, 1821560102608)]
unknown widths : 
[0, IndirectObject(1679, 0, 1821560102608)]
unknown widths : 
[0, IndirectObject(1681, 0, 1821560102608)]
unknown widths : 
[0, IndirectObject(1683, 0, 1821560102608)]
unknown widths : 
[0, IndirectObject(1685, 0, 1821560102608)]
unknown widths : 
[0, IndirectObject(1687, 0, 1821560102608)]
unknown widths : 
[0, IndirectObject(1689, 0, 1821560102608)]
unknown widths : 
[0, IndirectObject(1691, 0, 1821560102608)]
unknown widths : 
[0, IndirectObject(1693, 0, 1821560102608)]
unknown widths : 
[0, IndirectObject(1695, 0, 1821560102608)]
unknown widths : 
[0, IndirectObject(1697, 0, 1821560102608)]
unknown widths : 
[0, IndirectObject(1699, 0, 1821560102608)]
unknown widths : 
[0, IndirectObject(1701, 0, 1821560102608)]
unknown widths : 
[0, IndirectObject(1703, 0, 1821560102608)]
unknown 

Processed: ATO-COM-TRAFOS-DEODE-SA-UFV1-R00-Comentado Kubik.pdf
Processing file 42 of 479: ATO-COM-TRAFOS-DEODE-SA-UFV1-R00.pdf


unknown widths : 
[0, IndirectObject(1580, 0, 1821561510160)]
unknown widths : 
[0, IndirectObject(1586, 0, 1821561510160)]
unknown widths : 
[0, IndirectObject(1592, 0, 1821561510160)]
unknown widths : 
[0, IndirectObject(1598, 0, 1821561510160)]
unknown widths : 
[0, IndirectObject(1604, 0, 1821561510160)]
unknown widths : 
[0, IndirectObject(1610, 0, 1821561510160)]
unknown widths : 
[0, IndirectObject(1616, 0, 1821561510160)]
unknown widths : 
[0, IndirectObject(1622, 0, 1821561510160)]
unknown widths : 
[0, IndirectObject(1628, 0, 1821561510160)]
unknown widths : 
[0, IndirectObject(1634, 0, 1821561510160)]
unknown widths : 
[0, IndirectObject(1644, 0, 1821561510160)]
unknown widths : 
[0, IndirectObject(1650, 0, 1821561510160)]
unknown widths : 
[0, IndirectObject(1668, 0, 1821561510160)]
unknown widths : 
[0, IndirectObject(1674, 0, 1821561510160)]
unknown widths : 
[0, IndirectObject(1680, 0, 1821561510160)]
unknown widths : 
[0, IndirectObject(1686, 0, 1821561510160)]
unknown 

Processed: ATO-COM-TRAFOS-DEODE-SA-UFV1-R00.pdf
Processing file 43 of 479: ATO-COM-TRAFOS-DEODE-SA-UFV2-R00 - Comentado.pdf
Processed: ATO-COM-TRAFOS-DEODE-SA-UFV2-R00 - Comentado.pdf
Processing file 44 of 479: ATO-COM-TRAFOS-DEODE-SA-UFV2-R00.pdf
Processed: ATO-COM-TRAFOS-DEODE-SA-UFV2-R00.pdf
Processing file 45 of 479: ATO-COM-TRAFOS-DEODE-SA-UFV2-R02-Comentado Kubik.pdf


unknown widths : 
[0, IndirectObject(1714, 0, 1821533001872)]
unknown widths : 
[0, IndirectObject(1716, 0, 1821533001872)]
unknown widths : 
[0, IndirectObject(1718, 0, 1821533001872)]
unknown widths : 
[0, IndirectObject(1720, 0, 1821533001872)]
unknown widths : 
[0, IndirectObject(1722, 0, 1821533001872)]
unknown widths : 
[0, IndirectObject(1724, 0, 1821533001872)]
unknown widths : 
[0, IndirectObject(1726, 0, 1821533001872)]
unknown widths : 
[0, IndirectObject(1728, 0, 1821533001872)]
unknown widths : 
[0, IndirectObject(1730, 0, 1821533001872)]
unknown widths : 
[0, IndirectObject(1732, 0, 1821533001872)]
unknown widths : 
[0, IndirectObject(1734, 0, 1821533001872)]
unknown widths : 
[0, IndirectObject(1736, 0, 1821533001872)]
unknown widths : 
[0, IndirectObject(1738, 0, 1821533001872)]
unknown widths : 
[0, IndirectObject(1740, 0, 1821533001872)]
unknown widths : 
[0, IndirectObject(1742, 0, 1821533001872)]
unknown widths : 
[0, IndirectObject(1744, 0, 1821533001872)]
unknown 

Processed: ATO-COM-TRAFOS-DEODE-SA-UFV2-R02-Comentado Kubik.pdf
Processing file 46 of 479: ATO-COM-TRAFOS-DEODE-SA-UFV2-R02-Comentado.pdf
Processed: ATO-COM-TRAFOS-DEODE-SA-UFV2-R02-Comentado.pdf
Processing file 47 of 479: ATO-COM-TRAFOS-DEODE-SA-UFV2-R02.pdf


unknown widths : 
[0, IndirectObject(1625, 0, 1821566016976)]
unknown widths : 
[0, IndirectObject(1631, 0, 1821566016976)]
unknown widths : 
[0, IndirectObject(1637, 0, 1821566016976)]
unknown widths : 
[0, IndirectObject(1643, 0, 1821566016976)]
unknown widths : 
[0, IndirectObject(1649, 0, 1821566016976)]
unknown widths : 
[0, IndirectObject(1655, 0, 1821566016976)]
unknown widths : 
[0, IndirectObject(1661, 0, 1821566016976)]
unknown widths : 
[0, IndirectObject(1667, 0, 1821566016976)]
unknown widths : 
[0, IndirectObject(1673, 0, 1821566016976)]
unknown widths : 
[0, IndirectObject(1679, 0, 1821566016976)]
unknown widths : 
[0, IndirectObject(1689, 0, 1821566016976)]
unknown widths : 
[0, IndirectObject(1695, 0, 1821566016976)]
unknown widths : 
[0, IndirectObject(1714, 0, 1821566016976)]
unknown widths : 
[0, IndirectObject(1720, 0, 1821566016976)]
unknown widths : 
[0, IndirectObject(1726, 0, 1821566016976)]
unknown widths : 
[0, IndirectObject(1732, 0, 1821566016976)]
unknown 

Processed: ATO-COM-TRAFOS-DEODE-SA-UFV2-R02.pdf
Processing file 48 of 479: ATO-COM-TRAFOS-DEODE-SA-UFV3-R00 - Comentado.pdf
Processed: ATO-COM-TRAFOS-DEODE-SA-UFV3-R00 - Comentado.pdf
Processing file 49 of 479: ATO-COM-TRAFOS-DEODE-SA-UFV3-R00-Comentado Kubik.pdf


unknown widths : 
[0, IndirectObject(1686, 0, 1821642950416)]
unknown widths : 
[0, IndirectObject(1688, 0, 1821642950416)]
unknown widths : 
[0, IndirectObject(1690, 0, 1821642950416)]
unknown widths : 
[0, IndirectObject(1692, 0, 1821642950416)]
unknown widths : 
[0, IndirectObject(1694, 0, 1821642950416)]
unknown widths : 
[0, IndirectObject(1696, 0, 1821642950416)]
unknown widths : 
[0, IndirectObject(1698, 0, 1821642950416)]
unknown widths : 
[0, IndirectObject(1700, 0, 1821642950416)]
unknown widths : 
[0, IndirectObject(1702, 0, 1821642950416)]
unknown widths : 
[0, IndirectObject(1704, 0, 1821642950416)]
unknown widths : 
[0, IndirectObject(1706, 0, 1821642950416)]
unknown widths : 
[0, IndirectObject(1708, 0, 1821642950416)]
unknown widths : 
[0, IndirectObject(1710, 0, 1821642950416)]
unknown widths : 
[0, IndirectObject(1712, 0, 1821642950416)]
unknown widths : 
[0, IndirectObject(1714, 0, 1821642950416)]
unknown widths : 
[0, IndirectObject(1716, 0, 1821642950416)]
unknown 

Processed: ATO-COM-TRAFOS-DEODE-SA-UFV3-R00-Comentado Kubik.pdf
Processing file 50 of 479: ATO-COM-TRAFOS-DEODE-SA-UFV3-R00.pdf


unknown widths : 
[0, IndirectObject(1673, 0, 1821569280912)]
unknown widths : 
[0, IndirectObject(1675, 0, 1821569280912)]
unknown widths : 
[0, IndirectObject(1677, 0, 1821569280912)]
unknown widths : 
[0, IndirectObject(1679, 0, 1821569280912)]
unknown widths : 
[0, IndirectObject(1681, 0, 1821569280912)]
unknown widths : 
[0, IndirectObject(1683, 0, 1821569280912)]
unknown widths : 
[0, IndirectObject(1685, 0, 1821569280912)]
unknown widths : 
[0, IndirectObject(1687, 0, 1821569280912)]
unknown widths : 
[0, IndirectObject(1689, 0, 1821569280912)]
unknown widths : 
[0, IndirectObject(1691, 0, 1821569280912)]
unknown widths : 
[0, IndirectObject(1693, 0, 1821569280912)]
unknown widths : 
[0, IndirectObject(1695, 0, 1821569280912)]
unknown widths : 
[0, IndirectObject(1697, 0, 1821569280912)]
unknown widths : 
[0, IndirectObject(1699, 0, 1821569280912)]
unknown widths : 
[0, IndirectObject(1701, 0, 1821569280912)]
unknown widths : 
[0, IndirectObject(1703, 0, 1821569280912)]
unknown 

Processed: ATO-COM-TRAFOS-DEODE-SA-UFV3-R00.pdf
Processing file 51 of 479: ATO-MT-CUBICULO-STADELIA-R00-Comentado Kubik.pdf


unknown widths : 
[0, IndirectObject(3335, 0, 1821638448848)]
unknown widths : 
[0, IndirectObject(3337, 0, 1821638448848)]
unknown widths : 
[0, IndirectObject(3339, 0, 1821638448848)]
unknown widths : 
[0, IndirectObject(3341, 0, 1821638448848)]
unknown widths : 
[0, IndirectObject(3343, 0, 1821638448848)]
unknown widths : 
[0, IndirectObject(3345, 0, 1821638448848)]
unknown widths : 
[0, IndirectObject(3347, 0, 1821638448848)]
unknown widths : 
[0, IndirectObject(3349, 0, 1821638448848)]
unknown widths : 
[0, IndirectObject(3351, 0, 1821638448848)]
unknown widths : 
[0, IndirectObject(3353, 0, 1821638448848)]
unknown widths : 
[0, IndirectObject(3355, 0, 1821638448848)]
unknown widths : 
[0, IndirectObject(3357, 0, 1821638448848)]
unknown widths : 
[0, IndirectObject(3359, 0, 1821638448848)]
unknown widths : 
[0, IndirectObject(3361, 0, 1821638448848)]
unknown widths : 
[0, IndirectObject(3363, 0, 1821638448848)]
unknown widths : 
[0, IndirectObject(3365, 0, 1821638448848)]
unknown 

Processed: ATO-MT-CUBICULO-STADELIA-R00-Comentado Kubik.pdf
Processing file 52 of 479: ATO-MT-CUBICULO-STADELIA-R00.pdf


unknown widths : 
[0, IndirectObject(3381, 0, 1821698985936)]
unknown widths : 
[0, IndirectObject(3387, 0, 1821698985936)]
unknown widths : 
[0, IndirectObject(3393, 0, 1821698985936)]
unknown widths : 
[0, IndirectObject(3399, 0, 1821698985936)]
unknown widths : 
[0, IndirectObject(3405, 0, 1821698985936)]
unknown widths : 
[0, IndirectObject(3411, 0, 1821698985936)]
unknown widths : 
[0, IndirectObject(3417, 0, 1821698985936)]
unknown widths : 
[0, IndirectObject(3423, 0, 1821698985936)]
unknown widths : 
[0, IndirectObject(3429, 0, 1821698985936)]
unknown widths : 
[0, IndirectObject(3435, 0, 1821698985936)]
unknown widths : 
[0, IndirectObject(3446, 0, 1821698985936)]
unknown widths : 
[0, IndirectObject(3452, 0, 1821698985936)]
unknown widths : 
[0, IndirectObject(3470, 0, 1821698985936)]
unknown widths : 
[0, IndirectObject(3476, 0, 1821698985936)]
unknown widths : 
[0, IndirectObject(3482, 0, 1821698985936)]
unknown widths : 
[0, IndirectObject(3488, 0, 1821698985936)]
unknown 

Processed: ATO-MT-CUBICULO-STADELIA-R00.pdf
Processing file 53 of 479: ATO-MT-HIPOT-STA-010-Comentado Kubik.pdf
Processed: ATO-MT-HIPOT-STA-010-Comentado Kubik.pdf
Processing file 54 of 479: ATO-MT-HIPOT-STA-010.pdf
Processed: ATO-MT-HIPOT-STA-010.pdf
Processing file 55 of 479: ATO-MT-HIPOT-STA-UFV01-011 - Comentado.pdf
Processed: ATO-MT-HIPOT-STA-UFV01-011 - Comentado.pdf
Processing file 56 of 479: ATO-MT-HIPOT-STA-UFV01-011.pdf
Processed: ATO-MT-HIPOT-STA-UFV01-011.pdf
Processing file 57 of 479: ATO-MT-HIPOT-STA-UFV02-011 - Comentado.pdf
Processed: ATO-MT-HIPOT-STA-UFV02-011 - Comentado.pdf
Processing file 58 of 479: ATO-MT-HIPOT-STA-UFV02-011.pdf
Processed: ATO-MT-HIPOT-STA-UFV02-011.pdf
Processing file 59 of 479: ATO-MT-HIPOT-STA-UFV03-011 - Comentado.pdf
Processed: ATO-MT-HIPOT-STA-UFV03-011 - Comentado.pdf
Processing file 60 of 479: ATO-MT-HIPOT-STA-UFV03-011.pdf
Processed: ATO-MT-HIPOT-STA-UFV03-011.pdf
Processing file 61 of 479: ATO-MT-RISOMT-STA-011-Comentado Kubik.pdf
Process

Multiple definitions in dictionary at byte 0x19dd10 for key /PageMode


Processed: poweredget150ism.pdf
Processing file 113 of 479: QA_ARQSEN_SALA_DE_CONTROLE(SANTA-ADELIA) rev01a.pdf


Multiple definitions in dictionary at byte 0x1aaaed for key /PageMode


Processed: QA_ARQSEN_SALA_DE_CONTROLE(SANTA-ADELIA) rev01a.pdf
Processing file 114 of 479: QA_ARQSEN_SALA_DE_CONTROLE(SANTA-ADELIA) rev02a.pdf
Processed: QA_ARQSEN_SALA_DE_CONTROLE(SANTA-ADELIA) rev02a.pdf
Processing file 115 of 479: Quick-Guide-1531279573rb2011U-qg.pdf
Processed: Quick-Guide-1531279573rb2011U-qg.pdf
Processing file 116 of 479: Respostas comentarios relatorios Santa Adelia.pdf
Processed: Respostas comentarios relatorios Santa Adelia.pdf
Processing file 117 of 479: roseta-optica.pdf
Processed: roseta-optica.pdf
Processing file 118 of 479: RT  24191-1 - ATO ENERGIA - UFV SANTA ADELIA-SP- ANEXO I-Comentado Kubik.pdf
Processed: RT  24191-1 - ATO ENERGIA - UFV SANTA ADELIA-SP- ANEXO I-Comentado Kubik.pdf
Processing file 119 of 479: RT  24191-1 - ATO ENERGIA - UFV SANTA ADELIA-SP- ANEXO I.pdf
Processed: RT  24191-1 - ATO ENERGIA - UFV SANTA ADELIA-SP- ANEXO I.pdf
Processing file 120 of 479: RT 24191-1 - ATO ENERGIA - UFV SANTA ADELIA-SP.pdf


Multiple definitions in dictionary at byte 0xb5653 for key /PageMode


Processed: RT 24191-1 - ATO ENERGIA - UFV SANTA ADELIA-SP.pdf
Processing file 121 of 479: Santa Adelia rev01a.pdf
Processed: Santa Adelia rev01a.pdf
Processing file 122 of 479: Santa Adelia rev02a.pdf
Processed: Santa Adelia rev02a.pdf
Processing file 123 of 479: Santa Adelia.pdf
Processed: Santa Adelia.pdf
Processing file 124 of 479: Santa Adélia - UFV-01 - Formulário Anexo F - Dados para registro.pdf
Processed: Santa Adélia - UFV-01 - Formulário Anexo F - Dados para registro.pdf
Processing file 125 of 479: Santa Adélia - UFV-02 - Formulário Anexo F - Dados para registro.pdf
Processed: Santa Adélia - UFV-02 - Formulário Anexo F - Dados para registro.pdf
Processing file 126 of 479: Santa Adélia - UFV-03 - Formulário Anexo F - Dados para registro.pdf
Processed: Santa Adélia - UFV-03 - Formulário Anexo F - Dados para registro.pdf
Processing file 127 of 479: SmartACU2000D Smart Array Controller User Manual (With PID Modules).pdf
Processed: SmartACU2000D Smart Array Controller User Manual 

Multiple definitions in dictionary at byte 0xe22d7 for key /PageMode


Processed: SPSTAAD01 -ENG-E2-DES-202-RLD_0.pdf
Processing file 133 of 479: SPSTAAD01-ENG-E2-DES-201-RLD_0.pdf
Processed: SPSTAAD01-ENG-E2-DES-201-RLD_0.pdf
Processing file 134 of 479: SPSTAAD01-ENG-E2-DES-201-RLD_0_CV.pdf


Multiple definitions in dictionary at byte 0xdfd2b for key /PageMode


Processed: SPSTAAD01-ENG-E2-DES-201-RLD_0_CV.pdf
Processing file 135 of 479: SPSTAAD01-ENG-E2-DES-201-RLD_A.pdf
Processed: SPSTAAD01-ENG-E2-DES-201-RLD_A.pdf
Processing file 136 of 479: SPSTAAD01-ENG-E2-DES-201-RLD_A_CV.pdf
Processed: SPSTAAD01-ENG-E2-DES-201-RLD_A_CV.pdf
Processing file 137 of 479: SPSTADE01-2024-001.pdf
Processed: SPSTADE01-2024-001.pdf
Processing file 138 of 479: SPSTADE01-2024-002.pdf
Processed: SPSTADE01-2024-002.pdf
Processing file 139 of 479: SPSTADE01-2024-003.pdf
Processed: SPSTADE01-2024-003.pdf
Processing file 140 of 479: SPSTADE01-2024-004.pdf
Processed: SPSTADE01-2024-004.pdf
Processing file 141 of 479: SPSTADE01-2024-005.pdf
Processed: SPSTADE01-2024-005.pdf
Processing file 142 of 479: SPSTADE01-2024-006.pdf
Processed: SPSTADE01-2024-006.pdf
Processing file 143 of 479: SPSTADE01-2024-007.pdf
Processed: SPSTADE01-2024-007.pdf
Processing file 144 of 479: SPSTADE01-2024-008.pdf
Processed: SPSTADE01-2024-008.pdf
Processing file 145 of 479: SPSTADE01-2024-009.

Multiple definitions in dictionary at byte 0xdfd2b for key /PageMode


Processed: SPSTADE01-EL9-00001.pdf
Processing file 158 of 479: SPSTADE01-ENG-E2-DES-201-R01.pdf
Processed: SPSTADE01-ENG-E2-DES-201-R01.pdf
Processing file 159 of 479: SPSTADE01-ENG-E2-EST-103-R01.pdf
Processed: SPSTADE01-ENG-E2-EST-103-R01.pdf
Processing file 160 of 479: SPSTADE01-ENG-E3-ACT-001-R00.pdf
Processed: SPSTADE01-ENG-E3-ACT-001-R00.pdf
Processing file 161 of 479: SPSTADE01-ENG-E3-ACT-002-R00.pdf
Processed: SPSTADE01-ENG-E3-ACT-002-R00.pdf
Processing file 162 of 479: SPSTADE01-ENG-E3-ACT-003-R00.pdf
Processed: SPSTADE01-ENG-E3-ACT-003-R00.pdf
Processing file 163 of 479: SPSTADE01-ENG-E3-ACT-003-R04.pdf
Processed: SPSTADE01-ENG-E3-ACT-003-R04.pdf
Processing file 164 of 479: SPSTADE01-ENG-E3-ACT-004-R00.pdf
Processed: SPSTADE01-ENG-E3-ACT-004-R00.pdf
Processing file 165 of 479: SPSTADE01-ENG-E3-ACT-005-R00.pdf
Processed: SPSTADE01-ENG-E3-ACT-005-R00.pdf
Processing file 166 of 479: SPSTADE01-ENG-E3-ACT-006-R00.pdf
Processed: SPSTADE01-ENG-E3-ACT-006-R00.pdf
Processing file 167 

Multiple definitions in dictionary at byte 0x83864d for key /PageMode


Processed: SPSTADE01-ENG-E3-DES-103-R01_LCR.pdf
Processing file 200 of 479: SPSTADE01-ENG-E3-DES-103-R02.pdf
Processed: SPSTADE01-ENG-E3-DES-103-R02.pdf
Processing file 201 of 479: SPSTADE01-ENG-E3-DES-103-R02_LCR.pdf


Multiple definitions in dictionary at byte 0x72db74 for key /PageMode


Processed: SPSTADE01-ENG-E3-DES-103-R02_LCR.pdf
Processing file 202 of 479: SPSTADE01-ENG-E3-DES-103-R03.pdf
Processed: SPSTADE01-ENG-E3-DES-103-R03.pdf
Processing file 203 of 479: SPSTADE01-ENG-E3-DES-103-R03_LCR.pdf
Processed: SPSTADE01-ENG-E3-DES-103-R03_LCR.pdf
Processing file 204 of 479: SPSTADE01-ENG-E3-DES-103-R04.pdf
Processed: SPSTADE01-ENG-E3-DES-103-R04.pdf
Processing file 205 of 479: SPSTADE01-ENG-E3-DES-103-R04_APR.pdf
Processed: SPSTADE01-ENG-E3-DES-103-R04_APR.pdf
Processing file 206 of 479: SPSTADE01-ENG-E3-DES-104-R01.pdf
Processed: SPSTADE01-ENG-E3-DES-104-R01.pdf
Processing file 207 of 479: SPSTADE01-ENG-E3-DES-104-R01_LCR.pdf


Multiple definitions in dictionary at byte 0x3c29cf for key /PageMode


Processed: SPSTADE01-ENG-E3-DES-104-R01_LCR.pdf
Processing file 208 of 479: SPSTADE01-ENG-E3-DES-104-R02.pdf
Processed: SPSTADE01-ENG-E3-DES-104-R02.pdf
Processing file 209 of 479: SPSTADE01-ENG-E3-DES-104-R02_LCR.pdf
Processed: SPSTADE01-ENG-E3-DES-104-R02_LCR.pdf
Processing file 210 of 479: SPSTADE01-ENG-E3-DES-104-R03.pdf
Processed: SPSTADE01-ENG-E3-DES-104-R03.pdf
Processing file 211 of 479: SPSTADE01-ENG-E3-DES-104-R03_APR.pdf
Processed: SPSTADE01-ENG-E3-DES-104-R03_APR.pdf
Processing file 212 of 479: SPSTADE01-ENG-E3-DES-105-R01 - Estudo Hidrológico.pdf
Processed: SPSTADE01-ENG-E3-DES-105-R01 - Estudo Hidrológico.pdf
Processing file 213 of 479: SPSTADE01-ENG-E3-DES-105-R01 - Estudo Hidrológico_LCR.pdf
Processed: SPSTADE01-ENG-E3-DES-105-R01 - Estudo Hidrológico_LCR.pdf
Processing file 214 of 479: SPSTADE01-ENG-E3-DES-105-R01 - Memorial de Cálculo de Drenagem.pdf
Processed: SPSTADE01-ENG-E3-DES-105-R01 - Memorial de Cálculo de Drenagem.pdf
Processing file 215 of 479: SPSTADE01-ENG

Multiple definitions in dictionary at byte 0x3ad60b for key /PageMode


Processed: SPSTADE01-ENG-E3-DES-106-R03_APR.pdf
Processing file 230 of 479: SPSTADE01-ENG-E3-DES-112-R01.pdf
Processed: SPSTADE01-ENG-E3-DES-112-R01.pdf
Processing file 231 of 479: SPSTADE01-ENG-E3-DES-112-R01_(DEODE).pdf
Error processing SPSTADE01-ENG-E3-DES-112-R01_(DEODE).pdf: trailer can not be read ()
Processing file 232 of 479: SPSTADE01-ENG-E3-DES-112-R01_LCR.pdf
Processed: SPSTADE01-ENG-E3-DES-112-R01_LCR.pdf
Processing file 233 of 479: SPSTADE01-ENG-E3-DES-112-R01_LCR_1.pdf
Processed: SPSTADE01-ENG-E3-DES-112-R01_LCR_1.pdf
Processing file 234 of 479: SPSTADE01-ENG-E3-DES-112-R02(Pond. Deode).pdf
Processed: SPSTADE01-ENG-E3-DES-112-R02(Pond. Deode).pdf
Processing file 235 of 479: SPSTADE01-ENG-E3-DES-112-R02.pdf
Processed: SPSTADE01-ENG-E3-DES-112-R02.pdf
Processing file 236 of 479: SPSTADE01-ENG-E3-DES-112-R02_LCR.pdf
Processed: SPSTADE01-ENG-E3-DES-112-R02_LCR.pdf
Processing file 237 of 479: SPSTADE01-ENG-E3-DES-112-R03 (Pond. Deode).pdf
Processed: SPSTADE01-ENG-E3-DES-112-R0

Multiple definitions in dictionary at byte 0xc3c16 for key /PageMode


Processed: SPSTADE01-ENG-E3-DES-207-R01.pdf
Processing file 275 of 479: SPSTADE01-ENG-E3-DES-208 (Pond. Deode).pdf
Error processing SPSTADE01-ENG-E3-DES-208 (Pond. Deode).pdf: trailer can not be read ()
Processing file 276 of 479: SPSTADE01-ENG-E3-DES-208-R01.pdf
Processed: SPSTADE01-ENG-E3-DES-208-R01.pdf
Processing file 277 of 479: SPSTADE01-ENG-E3-DES-208-R02 (Pond. Deode).pdf
Processed: SPSTADE01-ENG-E3-DES-208-R02 (Pond. Deode).pdf
Processing file 278 of 479: SPSTADE01-ENG-E3-DES-208-R02.pdf
Processed: SPSTADE01-ENG-E3-DES-208-R02.pdf
Processing file 279 of 479: SPSTADE01-ENG-E3-DES-208-R03.pdf
Processed: SPSTADE01-ENG-E3-DES-208-R03.pdf
Processing file 280 of 479: SPSTADE01-ENG-E3-DES-209-R01 - 123-176.pdf
Processed: SPSTADE01-ENG-E3-DES-209-R01 - 123-176.pdf
Processing file 281 of 479: SPSTADE01-ENG-E3-DES-209-R01 - 131-003.pdf
Processed: SPSTADE01-ENG-E3-DES-209-R01 - 131-003.pdf
Processing file 282 of 479: SPSTADE01-ENG-E3-DES-209-R01 - 183997_APROVAÇÃO_ST.ADELIA_DS-0414-B.pdf

Multiple definitions in dictionary at byte 0x2ba4b3 for key /PageMode


Processed: SPSTADE01-ENG-E3-DES-209-R01 - Manual TH104.pdf
Processing file 287 of 479: SPSTADE01-ENG-E3-DES-209-R01.pdf
Processed: SPSTADE01-ENG-E3-DES-209-R01.pdf
Processing file 288 of 479: SPSTADE01-ENG-E3-DES-210-R01 - FD - T.1Ø 2000VA.pdf
Processed: SPSTADE01-ENG-E3-DES-210-R01 - FD - T.1Ø 2000VA.pdf
Processing file 289 of 479: SPSTADE01-ENG-E3-DES-210-R01 - FD - T.3Ø 30KVA.pdf
Processed: SPSTADE01-ENG-E3-DES-210-R01 - FD - T.3Ø 30KVA.pdf
Processing file 290 of 479: SPSTADE01-ENG-E3-DES-211-R01.pdf
Processed: SPSTADE01-ENG-E3-DES-211-R01.pdf
Processing file 291 of 479: SPSTADE01-ENG-E3-DES-211-R02.pdf
Processed: SPSTADE01-ENG-E3-DES-211-R02.pdf
Processing file 292 of 479: SPSTADE01-ENG-E3-DES-212-R01 Pond. (Deode).pdf
Processed: SPSTADE01-ENG-E3-DES-212-R01 Pond. (Deode).pdf
Processing file 293 of 479: SPSTADE01-ENG-E3-DES-212-R01.pdf
Processed: SPSTADE01-ENG-E3-DES-212-R01.pdf
Processing file 294 of 479: SPSTADE01-ENG-E3-DES-212-R02(Pond. Deode) .pdf
Processed: SPSTADE01-ENG-E3-D

Multiple definitions in dictionary at byte 0x9ac45 for key /PageMode


Processed: SPSTADE01-ENG-E3-DES-216-R01 (Pond. Deode).pdf
Processing file 325 of 479: SPSTADE01-ENG-E3-DES-216-R01.pdf
Processed: SPSTADE01-ENG-E3-DES-216-R01.pdf
Processing file 326 of 479: SPSTADE01-ENG-E3-DES-216-R02 - Pond Deode.pdf
Processed: SPSTADE01-ENG-E3-DES-216-R02 - Pond Deode.pdf
Processing file 327 of 479: SPSTADE01-ENG-E3-DES-216-R02.pdf
Processed: SPSTADE01-ENG-E3-DES-216-R02.pdf
Processing file 328 of 479: SPSTADE01-ENG-E3-DES-216-R03.pdf
Processed: SPSTADE01-ENG-E3-DES-216-R03.pdf
Processing file 329 of 479: SPSTADE01-ENG-E3-DES-218-R00.pdf
Processed: SPSTADE01-ENG-E3-DES-218-R00.pdf
Processing file 330 of 479: SPSTADE01-ENG-E3-DES-218-R00_REP.pdf
Processed: SPSTADE01-ENG-E3-DES-218-R00_REP.pdf
Processing file 331 of 479: SPSTADE01-ENG-E3-DES-219-R01.pdf
Processed: SPSTADE01-ENG-E3-DES-219-R01.pdf
Processing file 332 of 479: SPSTADE01-ENG-E3-DES-219-R01_REP.pdf
Processed: SPSTADE01-ENG-E3-DES-219-R01_REP.pdf
Processing file 333 of 479: SPSTADE01-ENG-E3-DES-219-R02.pdf

Multiple definitions in dictionary at byte 0x3afcef for key /PageMode


Processed: SPSTADE01-ENG-E3-DES-219-R02_LCR.pdf
Processing file 335 of 479: SPSTADE01-ENG-E3-DES-220-R01.pdf
Processed: SPSTADE01-ENG-E3-DES-220-R01.pdf
Processing file 336 of 479: SPSTADE01-ENG-E3-DES-220-R01_REP.pdf


Multiple definitions in dictionary at byte 0x2f53ca for key /PageMode


Processed: SPSTADE01-ENG-E3-DES-220-R01_REP.pdf
Processing file 337 of 479: SPSTADE01-ENG-E3-DES-220-R02.pdf
Processed: SPSTADE01-ENG-E3-DES-220-R02.pdf
Processing file 338 of 479: SPSTADE01-ENG-E3-DES-220-R02_REP.pdf
Processed: SPSTADE01-ENG-E3-DES-220-R02_REP.pdf
Processing file 339 of 479: SPSTADE01-ENG-E3-DES-220-R03.pdf
Processed: SPSTADE01-ENG-E3-DES-220-R03.pdf
Processing file 340 of 479: SPSTADE01-ENG-E3-DES-220-R03_LSC.pdf
Processed: SPSTADE01-ENG-E3-DES-220-R03_LSC.pdf
Processing file 341 of 479: SPSTADE01-ENG-E3-DES-221-R01 (Pond. Deode).pdf
Processed: SPSTADE01-ENG-E3-DES-221-R01 (Pond. Deode).pdf
Processing file 342 of 479: SPSTADE01-ENG-E3-DES-221-R01.pdf
Processed: SPSTADE01-ENG-E3-DES-221-R01.pdf
Processing file 343 of 479: SPSTADE01-ENG-E3-DES-221-R02.pdf
Processed: SPSTADE01-ENG-E3-DES-221-R02.pdf
Processing file 344 of 479: SPSTADE01-ENG-E3-DES-223-R01 (Pond. Deode).pdf
Processed: SPSTADE01-ENG-E3-DES-223-R01 (Pond. Deode).pdf
Processing file 345 of 479: SPSTADE01-EN

Multiple definitions in dictionary at byte 0x999516 for key /PageMode


Processed: SPSTADE01-ENG-E3-DES-225-R01(Pond. Deode).pdf
Processing file 356 of 479: SPSTADE01-ENG-E3-DES-225-R01.pdf
Processed: SPSTADE01-ENG-E3-DES-225-R01.pdf
Processing file 357 of 479: SPSTADE01-ENG-E3-DES-225-R02.pdf
Processed: SPSTADE01-ENG-E3-DES-225-R02.pdf
Processing file 358 of 479: SPSTADE01-ENG-E3-DES-225-R02_LCR.pdf
Processed: SPSTADE01-ENG-E3-DES-225-R02_LCR.pdf
Processing file 359 of 479: SPSTADE01-ENG-E3-DES-225-R03-LCR.pdf
Processed: SPSTADE01-ENG-E3-DES-225-R03-LCR.pdf
Processing file 360 of 479: SPSTADE01-ENG-E3-DES-225-R03.pdf
Processed: SPSTADE01-ENG-E3-DES-225-R03.pdf
Processing file 361 of 479: SPSTADE01-ENG-E3-DES-225-R04.pdf
Processed: SPSTADE01-ENG-E3-DES-225-R04.pdf
Processing file 362 of 479: SPSTADE01-ENG-E3-DES-227-R01 (Pond. Deode).pdf


Multiple definitions in dictionary at byte 0x68c5bb for key /PageMode


Processed: SPSTADE01-ENG-E3-DES-227-R01 (Pond. Deode).pdf
Processing file 363 of 479: SPSTADE01-ENG-E3-DES-227-R01.pdf
Processed: SPSTADE01-ENG-E3-DES-227-R01.pdf
Processing file 364 of 479: SPSTADE01-ENG-E3-DES-227-R02 (Pond. Deode).pdf
Processed: SPSTADE01-ENG-E3-DES-227-R02 (Pond. Deode).pdf
Processing file 365 of 479: SPSTADE01-ENG-E3-DES-227-R02.pdf
Processed: SPSTADE01-ENG-E3-DES-227-R02.pdf
Processing file 366 of 479: SPSTADE01-ENG-E3-DES-227-R03.pdf
Processed: SPSTADE01-ENG-E3-DES-227-R03.pdf
Processing file 367 of 479: SPSTADE01-ENG-E3-DES-228-R01 - 23.2382.0005.pdf
Processed: SPSTADE01-ENG-E3-DES-228-R01 - 23.2382.0005.pdf
Processing file 368 of 479: SPSTADE01-ENG-E3-DES-228-R01 - 23.2382.0006.pdf
Processed: SPSTADE01-ENG-E3-DES-228-R01 - 23.2382.0006.pdf
Processing file 369 of 479: SPSTADE01-ENG-E3-DES-228-R01 - 23.2382.0007.pdf
Processed: SPSTADE01-ENG-E3-DES-228-R01 - 23.2382.0007.pdf
Processing file 370 of 479: SPSTADE01-ENG-E3-DES-232-R01 - Respondido.pdf
Processed: SPST

Multiple definitions in dictionary at byte 0x96bb5 for key /PageMode


Processed: SPSTADE01-ENG-E3-DES-236-R01 - 23.2382.0010 - trfaux 2 kVA.pdf
Processing file 378 of 479: SPSTADE01-ENG-E3-DES-237-R01.pdf
Processed: SPSTADE01-ENG-E3-DES-237-R01.pdf
Processing file 379 of 479: SPSTADE01-ENG-E3-DES-237-R02 (Pond. Deode).pdf
Processed: SPSTADE01-ENG-E3-DES-237-R02 (Pond. Deode).pdf
Processing file 380 of 479: SPSTADE01-ENG-E3-DES-237-R02.pdf
Processed: SPSTADE01-ENG-E3-DES-237-R02.pdf
Processing file 381 of 479: SPSTADE01-ENG-E3-DES-237-R03- Comentado.pdf
Processed: SPSTADE01-ENG-E3-DES-237-R03- Comentado.pdf
Processing file 382 of 479: SPSTADE01-ENG-E3-DES-237-R03.pdf
Processed: SPSTADE01-ENG-E3-DES-237-R03.pdf
Processing file 383 of 479: SPSTADE01-ENG-E3-DES-237-R04.pdf
Processed: SPSTADE01-ENG-E3-DES-237-R04.pdf
Processing file 384 of 479: SPSTADE01-ENG-E3-DES-237-R05.pdf
Processed: SPSTADE01-ENG-E3-DES-237-R05.pdf
Processing file 385 of 479: SPSTADE01-ENG-E3-DES-239-R01 (Pond. Deode).pdf


Multiple definitions in dictionary at byte 0x9191a for key /PageMode


Processed: SPSTADE01-ENG-E3-DES-239-R01 (Pond. Deode).pdf
Processing file 386 of 479: SPSTADE01-ENG-E3-DES-239-R01.pdf
Processed: SPSTADE01-ENG-E3-DES-239-R01.pdf
Processing file 387 of 479: SPSTADE01-ENG-E3-DES-239-R02.pdf
Processed: SPSTADE01-ENG-E3-DES-239-R02.pdf
Processing file 388 of 479: SPSTADE01-ENG-E3-EST-120-R01.pdf
Processed: SPSTADE01-ENG-E3-EST-120-R01.pdf
Processing file 389 of 479: SPSTADE01-ENG-E3-EST-120-R01_APR.pdf
Processed: SPSTADE01-ENG-E3-EST-120-R01_APR.pdf
Processing file 390 of 479: SPSTADE01-ENG-E3-EST-235-R01 - D485-LCR.pdf
Processed: SPSTADE01-ENG-E3-EST-235-R01 - D485-LCR.pdf
Processing file 391 of 479: SPSTADE01-ENG-E3-EST-235-R01 - Datalogger-LCR.pdf
Processed: SPSTADE01-ENG-E3-EST-235-R01 - Datalogger-LCR.pdf
Processing file 392 of 479: SPSTADE01-ENG-E3-EST-235-R01 - Desenho Sensores Tracker-LCR.pdf
Processed: SPSTADE01-ENG-E3-EST-235-R01 - Desenho Sensores Tracker-LCR.pdf
Processing file 393 of 479: SPSTADE01-ENG-E3-EST-235-R01 - FA01-LCR.pdf
Processed

Multiple definitions in dictionary at byte 0x25a12e for key /PageMode


Processed: SPSTADE01-ENG-E3-MDC-231-R01.pdf
Processing file 457 of 479: SPSTADE01-ENG-E3-MDC-233-R01.pdf
Processed: SPSTADE01-ENG-E3-MDC-233-R01.pdf
Processing file 458 of 479: SPSTADE01-ENG-E3-MDC-233-R03 (Pond. Deode).pdf
Processed: SPSTADE01-ENG-E3-MDC-233-R03 (Pond. Deode).pdf
Processing file 459 of 479: SPSTADE01-ENG-E3-MDC-233-R03.pdf
Processed: SPSTADE01-ENG-E3-MDC-233-R03.pdf
Processing file 460 of 479: SPSTADE01-ENG-E3-MDC-233-R04.pdf
Processed: SPSTADE01-ENG-E3-MDC-233-R04.pdf
Processing file 461 of 479: SPSTADE01-ENG-E3-MDC-234-R01.pdf
Processed: SPSTADE01-ENG-E3-MDC-234-R01.pdf
Processing file 462 of 479: SPSTADE01-ENG-E3-MDC-234-R02 - Memorial Descritivo.pdf
Processed: SPSTADE01-ENG-E3-MDC-234-R02 - Memorial Descritivo.pdf
Processing file 463 of 479: SPSTADE01-ENG-E3-MDC-234-R02.pdf
Processed: SPSTADE01-ENG-E3-MDC-234-R02.pdf
Processing file 464 of 479: SPSTADE01-ENG-E3-MDC-238-R01.pdf
Processed: SPSTADE01-ENG-E3-MDC-238-R01.pdf
Processing file 465 of 479: SPSTADE01-ENG-E3

VERSION 2

In [2]:
pip install pdfplumber

Collecting pdfplumber
  Downloading pdfplumber-0.11.5-py3-none-any.whl.metadata (42 kB)
     ---------------------------------------- 0.0/42.5 kB ? eta -:--:--
     ------------------ ------------------- 20.5/42.5 kB 640.0 kB/s eta 0:00:01
     -------------------------------------- 42.5/42.5 kB 511.8 kB/s eta 0:00:00
Collecting pdfminer.six==20231228 (from pdfplumber)
  Downloading pdfminer.six-20231228-py3-none-any.whl.metadata (4.2 kB)
Collecting Pillow>=9.1 (from pdfplumber)
  Downloading pillow-11.1.0-cp311-cp311-win_amd64.whl.metadata (9.3 kB)
Collecting pypdfium2>=4.18.0 (from pdfplumber)
  Downloading pypdfium2-4.30.1-py3-none-win_amd64.whl.metadata (48 kB)
     ---------------------------------------- 0.0/48.2 kB ? eta -:--:--
     --------------------------------- ------ 41.0/48.2 kB 2.0 MB/s eta 0:00:01
     ---------------------------------------- 48.2/48.2 kB 1.2 MB/s eta 0:00:00
Collecting charset-normalizer>=2.0.0 (from pdfminer.six==20231228->pdfplumber)
  Downloading c


[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: C:\Users\MuriloFarias\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [5]:
pip install sentence-transformers

Collecting sentence-transformers
  Downloading sentence_transformers-3.3.1-py3-none-any.whl.metadata (10 kB)
Collecting transformers<5.0.0,>=4.41.0 (from sentence-transformers)
  Downloading transformers-4.48.0-py3-none-any.whl.metadata (44 kB)
     ---------------------------------------- 0.0/44.4 kB ? eta -:--:--
     --------- ------------------------------ 10.2/44.4 kB ? eta -:--:--
     ----------------------------------- -- 41.0/44.4 kB 487.6 kB/s eta 0:00:01
     -------------------------------------- 44.4/44.4 kB 437.8 kB/s eta 0:00:00
Collecting tqdm (from sentence-transformers)
  Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
     ---------------------------------------- 0.0/57.7 kB ? eta -:--:--
     ---------------------------------------- 57.7/57.7 kB 1.5 MB/s eta 0:00:00
Collecting torch>=1.11.0 (from sentence-transformers)
  Downloading torch-2.5.1-cp311-cp311-win_amd64.whl.metadata (28 kB)
Collecting scikit-learn (from sentence-transformers)
  Downloading sci


[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: C:\Users\MuriloFarias\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [6]:
pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.9.0.post1-cp311-cp311-win_amd64.whl.metadata (4.5 kB)
Downloading faiss_cpu-1.9.0.post1-cp311-cp311-win_amd64.whl (13.8 MB)
   ---------------------------------------- 0.0/13.8 MB ? eta -:--:--
   ---------------------------------------- 0.0/13.8 MB ? eta -:--:--
   ---------------------------------------- 0.0/13.8 MB 487.6 kB/s eta 0:00:29
   ---------------------------------------- 0.1/13.8 MB 744.7 kB/s eta 0:00:19
    --------------------------------------- 0.3/13.8 MB 2.0 MB/s eta 0:00:07
   --- ------------------------------------ 1.0/13.8 MB 4.7 MB/s eta 0:00:03
   --- ------------------------------------ 1.0/13.8 MB 4.7 MB/s eta 0:00:03
   ------ --------------------------------- 2.1/13.8 MB 7.0 MB/s eta 0:00:02
   ------- -------------------------------- 2.5/13.8 MB 7.1 MB/s eta 0:00:02
   --------- ------------------------------ 3.1/13.8 MB 8.4 MB/s eta 0:00:02
   ------------ --------------------------- 4.2/13.8 MB 9.5 MB/s eta 


[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: C:\Users\MuriloFarias\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [7]:
pip install faiss-gpu

Note: you may need to restart the kernel to use updated packages.


ERROR: Could not find a version that satisfies the requirement faiss-gpu (from versions: none)
ERROR: No matching distribution found for faiss-gpu

[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: C:\Users\MuriloFarias\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [9]:
pip install numpy

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: C:\Users\MuriloFarias\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [10]:
import os
from pdfplumber import open as open_pdf
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

print("All libraries installed and working!")


  from .autonotebook import tqdm as notebook_tqdm


All libraries installed and working!


Step 1: Extract Text and Generate Embeddings

In [1]:
pip install tqdm

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: C:\Users\MuriloFarias\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [10]:
import os
import torch
from pdfplumber import open as open_pdf
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
import pickle
from tqdm import tqdm  # For progress bar

# Verify PyTorch Installation
try:
    print("PyTorch Version:", torch.__version__)
    print("CUDA Available:", torch.cuda.is_available())
except Exception as e:
    print("Error checking PyTorch:", e)

# Paths
pdf_folder = r"C:\Users\MuriloFarias\OneDrive - Kubik Engenharia Ltda\Área de Trabalho\VAM\SELECIONADOS"
save_folder = r"C:\Users\MuriloFarias\Documents\AI"  # Save FAISS index and metadata here
os.makedirs(save_folder, exist_ok=True)  # Ensure the save folder exists
index_path = os.path.join(save_folder, "faiss_index.bin")
metadata_path = os.path.join(save_folder, "metadata.pkl")

# Load the SentenceTransformer model
try:
    model = SentenceTransformer('all-MiniLM-L6-v2')
    print("SentenceTransformer model loaded successfully!")
except Exception as e:
    print("Error loading SentenceTransformer:", e)

# Initialize FAISS
dimension = 384  # Embedding dimension for all-MiniLM-L6-v2
index = faiss.IndexFlatL2(dimension)
metadata = []  # To track file and chunk details for embeddings

# Load existing metadata and index if they exist
if os.path.exists(metadata_path):
    with open(metadata_path, "rb") as f:
        metadata = pickle.load(f)
    print("Loaded existing metadata.")

if os.path.exists(index_path):
    index = faiss.read_index(index_path)
    print("Loaded existing FAISS index.")

# Counters for processed files
files_with_text = 0
files_without_text = 0
files_without_text_list = []

# Function to extract text from PDFs
def extract_text_from_pdf(file_path):
    with open_pdf(file_path) as pdf:
        text = ""
        for page in pdf.pages:
            page_text = page.extract_text()
            if page_text:
                text += page_text
    return text

# Function to chunk text
def chunk_text(text, max_length=300):
    """Split text into smaller chunks for better embedding."""
    sentences = text.split(".")
    chunks = []
    current_chunk = ""
    for sentence in sentences:
        if len(current_chunk) + len(sentence) < max_length:
            current_chunk += sentence + "."
        else:
            chunks.append(current_chunk)
            current_chunk = sentence + "."
    if current_chunk:
        chunks.append(current_chunk)
    return chunks

# Batch process embeddings
def process_embeddings_in_batches(chunks, batch_size=32):
    embeddings = []
    for i in range(0, len(chunks), batch_size):
        batch = chunks[i:i + batch_size]
        batch_embeddings = model.encode(batch, convert_to_tensor=True).cpu().numpy()
        embeddings.extend(batch_embeddings)
    return embeddings

# Get list of PDF files
pdf_files = [file for file in os.listdir(pdf_folder) if file.endswith(".pdf")]

# Process all PDFs with a progress bar
with tqdm(total=len(pdf_files), desc="Processing PDFs", unit="file") as pbar:
    for file_name in pdf_files:
        file_path = os.path.join(pdf_folder, file_name)
        try:
            # Extract text
            text = extract_text_from_pdf(file_path)
            if text.strip():
                files_with_text += 1
                chunks = chunk_text(text)  # Split text into chunks

                # Process embeddings in batches
                embeddings = process_embeddings_in_batches(chunks)
                index.add(np.array(embeddings))  # Add chunk embeddings to FAISS

                for chunk in chunks:
                    metadata.append({"file": file_name, "path": file_path, "chunk": chunk})
            else:
                files_without_text += 1
                files_without_text_list.append(file_name)
                print(f"No extractable text found in {file_name}. Images or empty pages may be present.")

            # Save periodically
            if len(metadata) % 1000 == 0:
                print("Saving intermediate FAISS index and metadata...")
                faiss.write_index(index, index_path)
                with open(metadata_path, "wb") as f:
                    pickle.dump(metadata, f)

            pbar.set_postfix({"Current File": file_name})
            pbar.update(1)  # Update progress bar

        except Exception as e:
            print(f"Error processing {file_name}: {e}")

# Final save of the FAISS index and metadata
try:
    print(f"Saving final FAISS index to: {index_path}")
    faiss.write_index(index, index_path)
    print("FAISS index saved successfully.")

    print(f"Saving final metadata to: {metadata_path}")
    with open(metadata_path, "wb") as f:
        pickle.dump(metadata, f)
    print("Metadata saved successfully.")
except Exception as e:
    print(f"Error saving FAISS index or metadata: {e}")

# Report results
print(f"Total files processed: {len(pdf_files)}")
print(f"Files with extractable text: {files_with_text}")
print(f"Files without extractable text: {files_without_text}")
if files_without_text_list:
    print("Files without extractable text:")
    for file_name in files_without_text_list:
        print(f"- {file_name}")


PyTorch Version: 2.5.1+cpu
CUDA Available: False
SentenceTransformer model loaded successfully!
Loaded existing metadata.
Loaded existing FAISS index.


Processing PDFs:   2%|▏         | 1/50 [00:00<00:29,  1.67file/s, Current File=123-176.pdf]

No extractable text found in 123-176.pdf. Images or empty pages may be present.


Processing PDFs: 100%|██████████| 50/50 [21:24<00:00, 25.69s/file, Current File=ATO-COM-TRAFOS-DEODE-SA-UFV3-R00.pdf]                                                                                          

Saving final FAISS index to: C:\Users\MuriloFarias\Documents\AI\faiss_index.bin
FAISS index saved successfully.
Saving final metadata to: C:\Users\MuriloFarias\Documents\AI\metadata.pkl
Metadata saved successfully.
Total files processed: 50
Files with extractable text: 49
Files without extractable text: 1
Files without extractable text:
- 123-176.pdf





saved in: C:\Users\MuriloFarias\Documents\AI

Step 2: Build a Query System with FAISS

In [11]:
import os
import faiss
import pickle
from sentence_transformers import SentenceTransformer
import numpy as np
from sklearn.preprocessing import normalize

# Paths
save_folder = r"C:\Users\MuriloFarias\Documents\AI"
index_path = os.path.join(save_folder, "faiss_index.bin")
metadata_path = os.path.join(save_folder, "metadata.pkl")

# Load the SentenceTransformer model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Load FAISS index and metadata
index = faiss.read_index(index_path)
with open(metadata_path, "rb") as f:
    metadata = pickle.load(f)

# Normalize the FAISS index for cosine similarity
def normalize_index(index):
    embeddings = []
    for i in range(index.ntotal):
        vec = index.reconstruct(i)
        embeddings.append(vec)
    embeddings = np.array(embeddings)
    embeddings = normalize(embeddings, axis=1)  # Normalize to unit vectors
    new_index = faiss.IndexFlatIP(embeddings.shape[1])  # Cosine similarity
    new_index.add(embeddings)
    return new_index

index = normalize_index(index)

# Function to query FAISS
def query_faiss(question, index, model, metadata):
    # Generate embedding for the question
    question_embedding = model.encode([question], convert_to_tensor=True).cpu().numpy()
    question_embedding = normalize(question_embedding, axis=1)  # Normalize the question vector

    # Search FAISS for the most similar result
    D, I = index.search(np.array(question_embedding), k=1)  # k=1 for top result

    # If a result is found, return metadata
    if I[0][0] != -1:
        matched_metadata = metadata[I[0][0]]
        similarity_score = D[0][0]
        return {
            "relevant_file": matched_metadata["file"],
            "file_path": matched_metadata["path"],
            "similarity_score": similarity_score
        }
    else:
        return {"error": "No relevant information found."}



In [14]:
pip install flask

Collecting flask
  Downloading flask-3.1.0-py3-none-any.whl.metadata (2.7 kB)
Collecting Werkzeug>=3.1 (from flask)
  Downloading werkzeug-3.1.3-py3-none-any.whl.metadata (3.7 kB)
Collecting itsdangerous>=2.2 (from flask)
  Downloading itsdangerous-2.2.0-py3-none-any.whl.metadata (1.9 kB)
Collecting click>=8.1.3 (from flask)
  Downloading click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting blinker>=1.9 (from flask)
  Downloading blinker-1.9.0-py3-none-any.whl.metadata (1.6 kB)
Downloading flask-3.1.0-py3-none-any.whl (102 kB)
   ---------------------------------------- 0.0/103.0 kB ? eta -:--:--
   --- ------------------------------------ 10.2/103.0 kB ? eta -:--:--
   ----------------------- --------------- 61.4/103.0 kB 825.8 kB/s eta 0:00:01
   -------------------------------------- 103.0/103.0 kB 986.6 kB/s eta 0:00:00
Downloading blinker-1.9.0-py3-none-any.whl (8.5 kB)
Downloading click-8.1.8-py3-none-any.whl (98 kB)
   ---------------------------------------- 0.0/98.2 kB ? 


[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: C:\Users\MuriloFarias\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [15]:
pip install faiss-cpu




[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: C:\Users\MuriloFarias\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip





3 Create a Web Interface with Streamlit

In [30]:
pip install streamlit

Note: you may need to restart the kernel to use updated packages.Collecting streamlit
  Downloading streamlit-1.41.1-py2.py3-none-any.whl.metadata (8.5 kB)
Collecting altair<6,>=4.0 (from streamlit)
  Downloading altair-5.5.0-py3-none-any.whl.metadata (11 kB)
Collecting cachetools<6,>=4.0 (from streamlit)
  Downloading cachetools-5.5.0-py3-none-any.whl.metadata (5.3 kB)
Collecting protobuf<6,>=3.20 (from streamlit)
  Downloading protobuf-5.29.3-cp310-abi3-win_amd64.whl.metadata (592 bytes)
Collecting pyarrow>=7.0 (from streamlit)
  Downloading pyarrow-18.1.0-cp311-cp311-win_amd64.whl.metadata (3.4 kB)
Collecting rich<14,>=10.14.0 (from streamlit)
  Downloading rich-13.9.4-py3-none-any.whl.metadata (18 kB)
Collecting tenacity<10,>=8.1.0 (from streamlit)
  Downloading tenacity-9.0.0-py3-none-any.whl.metadata (1.2 kB)
Collecting toml<2,>=0.10.1 (from streamlit)
  Downloading toml-0.10.2-py2.py3-none-any.whl.metadata (7.1 kB)
Collecting watchdog<7,>=2.1.5 (from streamlit)
  Downloading wat


[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: C:\Users\MuriloFarias\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [31]:
pip show streamlit


Name: streamlit
Version: 1.41.1
Summary: A faster way to build and share data apps
Home-page: https://streamlit.io
Author: Snowflake Inc
Author-email: hello@streamlit.io
License: Apache License 2.0
Location: C:\Users\MuriloFarias\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages
Requires: altair, blinker, cachetools, click, gitpython, numpy, packaging, pandas, pillow, protobuf, pyarrow, pydeck, requests, rich, tenacity, toml, tornado, typing-extensions, watchdog
Required-by: 
Note: you may need to restart the kernel to use updated packages.
