PDF-To-Excel Converter

This repository contains a Go application (APP.go) that converts text data from a PDF file into an Excel spreadsheet. It utilizes the go-fitz library for PDF parsing and excelize for Excel file manipulation.

Requirements

Go 1.11+
Libraries:
- github.com/gen2brain/go-fitz
- github.com/360EntSecGroup-Skylar/excelize

Installation

Make sure you have Go installed and set up properly.

Clone this repository:

git clone https://github.com/samyam81/PDF-To-Excel

Install dependencies:
```
go mod tidy
```

Usage

The application expects two command-line arguments:

pdf_path: Path to the input PDF file.
output_path: Path to the output Excel file.

Example usage:

go run APP.go -pdf_path input.pdf -output_path output.xlsx

How It Works

Argument Parsing: Command-line arguments (pdf_path and output_path) are parsed using the flag package.
Excel Initialization: An Excel workbook is initialized using excelize.NewFile().
PDF Processing:
- The input PDF file is opened and read using go-fitz.
- Text blocks are extracted from each page of the PDF using page.TextBlocks().
- Each text block is cleaned of newline characters and written into successive rows in the Excel sheet (Sheet1).
Excel Writing:
- Text blocks are written into corresponding cells in the Excel sheet, starting from column A and incrementing the row for each text block.
Saving: The resulting Excel file is saved to the specified output path using xlsx.SaveAs(outputPath).

Notes

Text blocks in the PDF are directly processed without additional grouping by vertical position (groupMapsByRange function is removed).
Ensure the input PDF is structured such that text extraction results in meaningful rows and columns in the Excel output.

Author

This project was developed by Samyam.

Explanation of Changes:

How It Works: Updated to reflect the direct extraction of text blocks from each page of the PDF using page.TextBlocks() method.
Excel Writing: Clarified that text blocks are written into Excel starting from column A and incrementing the row for each block.
Notes: Removed the section about vertical position grouping (groupMapsByRange function) as it was not utilized in the revised code.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
App.go		App.go
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF-To-Excel Converter

Requirements

Installation

Usage

How It Works

Notes

Author

Explanation of Changes:

About

Releases

Packages

Languages

samyam81/PDF-To-Excel

Folders and files

Latest commit

History

Repository files navigation

PDF-To-Excel Converter

Requirements

Installation

Usage

How It Works

Notes

Author

Explanation of Changes:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages