# Documentation

## Overview
This document describes a Python script that extracts text from PDF files and saves the extracted text into .txt files.

### Script Functionality
- Reads PDF files from the `pdf` folder.
- Extracts text using the PyMuPDF library (`fitz`).
- Saves the extracted text into a separate .txt file for each PDF in the `txt` folder.

## Dependencies
- Python
- os
- fitz (PyMuPDF)

## Code Explanation

### Function Descriptions

1. **extract_text_from_pdf(pdf_path)**
   - Extracts text from a PDF file using PyMuPDF.
   - Iterates through each page to collect text.
   - Handles exceptions if extraction fails.

2. **save_text_to_file(text, txt_path)**
   - Saves the extracted text into a .txt file.
   - Ensures text encoding is UTF-8 to handle special characters.

3. **convert_all_pdfs_to_txt(pdf_folder, output_folder)**
   - Iterates through all PDF files in the input folder.
   - Calls `extract_text_from_pdf()` for each PDF.
   - Saves the extracted text using `save_text_to_file()`.
   - Creates the output folder if it doesn’t exist.

4. **Main Script Logic**
   - Calls `convert_all_pdfs_to_txt()` with `pdf` as the input folder and `txt` as the output folder.

### Brief Explanation
This script automates the extraction of text from PDFs saved in the `pdf` folder and exports them as .txt files in the `txt` folder, using PyMuPDF for reliable text extraction.


# Code

In [None]:
import os
import fitz  # PyMuPDF

def extract_text_from_pdf(pdf_path):
    """
    Extracts text from a PDF file using PyMuPDF and returns it as a string.
    
    Parameters:
    - pdf_path (str): The path to the PDF file.
    
    Returns:
    - str: The extracted text.
    """
    text = ""
    try:
        doc = fitz.open(pdf_path)

        for page in doc:
            text += page.get_text()
    except Exception as e:
        print(f"Error extracting text from {pdf_path}: {e}")
    return text

def save_text_to_file(text, txt_path):
    """
    Saves the provided text into a .txt file.
    
    Parameters:
    - text (str): The text to save.
    - txt_path (str): The file path where the text will be saved.
    """
    with open(txt_path, "w", encoding="utf-8") as f:
        f.write(text)

def convert_all_pdfs_to_txt(pdf_folder, output_folder):
    """
    Iterates over all PDF files in the specified folder, extracts text from each,
    and saves the text into separate .txt files in the output folder.
    
    The text file will have the same base name as the original PDF.
    
    Parameters:
    - pdf_folder (str): The folder containing PDF files.
    - output_folder (str): The folder where the .txt files will be saved.
    """
    os.makedirs(output_folder, exist_ok=True)
    for filename in os.listdir(pdf_folder):
        if filename.lower().endswith(".pdf"):
            pdf_path = os.path.join(pdf_folder, filename)
            print(f"Processing {pdf_path} ...")
            text = extract_text_from_pdf(pdf_path)
            if text:
                base_name = os.path.splitext(filename)[0]
                txt_filename = base_name + ".txt"
                txt_path = os.path.join(output_folder, txt_filename)
                save_text_to_file(text, txt_path)
                print(f"Extracted text saved to {txt_path}")
            else:
                print(f"No text extracted from {pdf_path}")

if __name__ == "__main__":
    convert_all_pdfs_to_txt("pdf", "txt")


Processing pdf\KPMG_2025-02-06_KPMG global tech report 2024.pdf ...
Extracted text saved to txt\KPMG_2025-02-06_KPMG global tech report 2024.txt
Processing pdf\KPMG_2025-02-07_KPMG global tech report energy insights.pdf ...
Extracted text saved to txt\KPMG_2025-02-07_KPMG global tech report energy insights.txt
Processing pdf\KPMG_2025-02-07_KPMG global tech report Technology insights.pdf ...
Extracted text saved to txt\KPMG_2025-02-07_KPMG global tech report Technology insights.txt
Processing pdf\KPMG_2025-02-07_KPMG global tech report – industrial manufacturing insights.pdf ...
Extracted text saved to txt\KPMG_2025-02-07_KPMG global tech report – industrial manufacturing insights.txt
Processing pdf\KPMG_2025-02-20_Food and Nutritional Security in India.pdf ...
Extracted text saved to txt\KPMG_2025-02-20_Food and Nutritional Security in India.txt
Processing pdf\KPMG_2025-02-28_Issue no. 103  February 2025.pdf ...
Extracted text saved to txt\KPMG_2025-02-28_Issue no. 103  February 2025.