- Overview
- Business Needs Challenge
- Features
- Getting Started
- Code Examples
- Use Cases
- Supported Formats
- Resources
This repository demonstrates how to extract tables from PDF documents using GroupDocs.Parser for .NET. Learn how to parse document tables, extract table values, and process structured data from PDF files programmatically with C#.
Key Capabilities:
- โ Extract tables from PDF documents automatically
- โ Parse document structures without templates
- โ Extract table values with cell-level precision
- โ Process tables from specific pages or entire documents
- โ Display extracted tables in formatted console output
Note: This repository currently focuses on table extraction without templates. Template-based extraction examples will be added in future updates.
Modern businesses face the critical challenge of efficiently extracting document data from PDF files containing structured tables. Manual data entry is time-consuming, error-prone, and doesn't scale. Organizations need automated solutions to parse document structures and extract tables programmatically to access critical business information.
1. Invoice and Financial Document Processing
Businesses receive hundreds of invoices, receipts, and financial statements daily. Each document contains critical table extraction needs:
- Extract document data from invoice line items (product names, quantities, prices)
- Parse document tables to capture billing addresses, tax calculations, and totals
- Extract tables containing payment terms, discounts, and shipping information
- Automatically process vendor invoices for accounts payable automation
Solution: Automatic table extraction eliminates manual data entry, reduces processing time from hours to seconds, and ensures 100% accuracy when extracting tables from financial documents. With programmatic table extraction, businesses can process thousands of invoices daily, extract all relevant data, and integrate directly into accounting systems.
2. Report and Analytics Data Extraction
Organizations generate and receive numerous reports containing analytical data in tabular format:
- Extract document data from sales reports with product performance metrics
- Parse document tables containing quarterly financial results and KPIs
- Extract tables from operational reports with inventory levels, production metrics, and resource utilization
- Process regulatory compliance reports with structured data requirements
Solution: Automated table extraction enables businesses to extract tables from reports programmatically, transforming static PDF documents into actionable data. This allows for real-time data analysis, automated reporting pipelines, and seamless integration with business intelligence tools. By extracting document data automatically, organizations can make data-driven decisions faster and maintain accurate records.
3. Purchase Orders and Supply Chain Documents
Supply chain operations depend on accurate data extraction from purchase orders, shipping manifests, and inventory reports:
- Extract document data from purchase orders including SKU numbers, quantities, and unit prices
- Parse document tables to capture supplier information, delivery dates, and shipping addresses
- Extract tables containing inventory levels, stock movements, and warehouse locations
- Process shipping manifests with item lists, tracking numbers, and delivery confirmations
Solution: Table extraction automates the entire supply chain document processing workflow. By extracting tables from purchase orders and shipping documents, businesses can automatically update inventory systems, track shipments, and reconcile orders without manual intervention. This table extraction capability ensures supply chain visibility and reduces processing errors.
Traditional manual data extraction is inefficient and costly. Table extraction technology solves these challenges by:
- Eliminating Manual Errors: Automated table extraction ensures consistent, accurate data capture
- Scaling Operations: Extract tables from hundreds of documents in minutes, not days
- Reducing Costs: Cut data entry costs by up to 90% with automated table extraction
- Improving Speed: Parse document files instantly and extract document data in real-time
- Enabling Integration: Extract tables directly into databases, ERP systems, and analytics platforms
Whether you need to extract document data from invoices, parse document reports, or extract tables from any PDF document, automated table extraction provides the solution to transform unstructured documents into structured, actionable data.
- Automatic Table Detection โ No templates required for basic table extraction
- Page-Specific Extraction โ Extract tables from specific pages
- Full Document Processing โ Extract all tables across all pages
- Structured Output โ Formatted table display with headers and values
- Cell-Level Access โ Access individual table cells and their content
- Table headers and data rows
- Cell values with precise positioning
- Table dimensions (rows ร columns)
- Multi-page table extraction
- Tables organized by page
- .NET 6.0 or later (.NET 9.0 recommended)
- GroupDocs.Parser for .NET NuGet package
- Valid GroupDocs.Parser license (optional for evaluation)
Clone the repository:
git clone https://github.com/groupdocs-parser/Pdf-tables-extraction-using-groupdocs-parser.git
cd Pdf-tables-extraction-using-groupdocs-parserFor production use, set your GroupDocs.Parser license:
new License().SetLicense(@"path\to\GroupDocs.Parser.NET.lic");For evaluation, you can use a temporary license.
This example demonstrates how to extract tables from a particular page of a PDF document. The method analyzes the document structure and extracts all tables found on the specified page.
Code:
static void ExtractTablesPerParticluarPage()
{
string sample = "Invoices.pdf";
// Initialize parser with PDF document
using (var parser = new Parser(sample))
{
// Get document information
var documentInfo = parser.GetDocumentInfo();
int pageCount = documentInfo.PageCount;
// Extract tables from first page (pageIndex = 0)
var pageIndex = 0;
var tables = parser.GetTables(pageIndex);
if (tables != null && tables.Any())
{
int tableNumber = 1;
foreach (var table in tables)
{
// Process each table
// Display table dimensions and content
ProcessTable(table);
tableNumber++;
}
}
}
}Extract all tables from all pages of a PDF document, organized by page:
static void ExtractAllTablesFromDocument()
{
string sample = "TablesReport.pdf";
using (var parser = new Parser(sample))
{
// Get all tables from entire document
var tables = parser.GetTables();
if (tables != null && tables.Any())
{
// Group tables by page index
var tablesByPage = tables
.GroupBy(table => table.Page.Index)
.OrderBy(group => group.Key);
foreach (var pageGroup in tablesByPage)
{
int pageIndex = pageGroup.Key;
Console.WriteLine($"Tables in the Page {pageIndex + 1}");
int tableNumber = 1;
foreach (var table in pageGroup)
{
Console.WriteLine($" Table {tableNumber}: {table.RowCount} rows x {table.ColumnCount} columns");
ProcessTable(table);
tableNumber++;
}
}
}
}
}Access and process individual cells from extracted tables:
static void ProcessTable(PageTableArea table)
{
// Calculate column widths for proper alignment
int[] columnWidths = Enumerable.Range(0, table.ColumnCount)
.Select(col => Math.Max(3, Enumerable.Range(0, table.RowCount)
.Max(row => table[row, col]?.Text?.Length ?? 0)))
.ToArray();
// Display table with borders
string separator = "+" + string.Join("+", columnWidths.Select(w => new string('-', w + 2))) + "+";
// Display header row (first row)
Console.WriteLine(" " + separator);
Console.Write(" |");
for (int col = 0; col < table.ColumnCount; col++)
{
string cellText = GetCellText(table, 0, col);
Console.Write($" {cellText.PadRight(columnWidths[col])} |");
}
Console.WriteLine();
Console.WriteLine(" " + separator);
// Display data rows
for (int row = 1; row < table.RowCount; row++)
{
Console.Write(" |");
for (int col = 0; col < table.ColumnCount; col++)
{
string cellText = GetCellText(table, row, col);
Console.Write($" {cellText.PadRight(columnWidths[col])} |");
}
Console.WriteLine();
}
Console.WriteLine(" " + separator);
}
static string GetCellText(PageTableArea table, int row, int col)
{
return table[row, col]?.Text ?? "";
}- Invoice Processing โ Extract line items, totals, and payment information
- Financial Reports โ Parse balance sheets, income statements, and financial tables
- Purchase Orders โ Extract product details, quantities, and pricing
- Receipts โ Extract itemized lists and transaction details
- Database Import โ Convert PDF tables to database records
- Excel Conversion โ Extract tables for spreadsheet processing
- API Integration โ Parse document tables for REST API consumption
- ETL Pipelines โ Extract, transform, and load table data
- Report Analysis โ Extract structured data from business reports
- Compliance Documents โ Parse regulatory tables and forms
- Research Data โ Extract tables from research papers and publications
- Legal Documents โ Parse tables from contracts and legal filings
- Automated Data Entry โ Reduce manual data entry from PDFs
- Batch Processing โ Process multiple PDF documents automatically
- Content Indexing โ Extract tables for search engine indexing
- Data Validation โ Verify table data against business rules
| Format | Extension | Table Extraction |
|---|---|---|
.pdf |
โ Supported | |
| Microsoft Word | .doc, .docx |
โ Supported |
| Microsoft Excel | .xls, .xlsx |
โ Supported |
| Microsoft PowerPoint | .ppt, .pptx |
โ Supported |
| OpenDocument | .odt, .ods, .odp |
โ Supported |
Note: This repository focuses on PDF table extraction. Other formats are supported by GroupDocs.Parser but not demonstrated in these examples.
Pdf-tables-extraction-using-groupdocs-parser/
โ
โโโ Program.cs # Main code examples
โโโ README.md # This file
โโโ LICENSE # License file
โ
โโโ Invoices.pdf # Sample PDF document
โโโ TablesReport.pdf # Sample PDF with tables
โโโ Operations.pdf # Sample PDF document
โ
โโโ document-page-01.png # Document preview image
โโโ console-output-01.png # Console output example
โ
โโโ bin/ # Build output directory
- ๐ GroupDocs.Parser for .NET Documentation
- ๐ API Reference
- ๐ Working with Tables Guide
- ๐ก Code Samples
- ๐ฌ Free Support Forum
- ๐ Paid Support Helpdesk
- ๐ Blog Articles
- ๐ฌ Video Tutorials
- ๐ Product Page
- ๐ฎ Live Demos
- ๐ Get Temporary License
- ๐ฐ Pricing Information
Contributions are welcome! If you'd like to contribute:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Add more table extraction examples
- Improve error handling
- Add template-based extraction examples
- Enhance documentation
- Add unit tests
- Extract tables from PDF documents
- Page-specific table extraction
- Full document table extraction
- Formatted table display
- Template-based table extraction examples
- OCR support for scanned PDFs
- Batch processing multiple documents
- Export to CSV/Excel formats
- Advanced table formatting options
Primary Keywords:
- extract table from PDF
- parse document tables
- extracting table values
- PDF table extraction C#
- GroupDocs.Parser examples
- table parsing .NET
- extract tables from PDF documents
- parse PDF tables programmatically
- C# PDF table extraction
- document table parser
Related Terms:
- PDF parser, table extraction, document parsing, data extraction, PDF processing, C# PDF library, .NET PDF parser, table data extraction, structured data extraction, PDF table reader
If you find this repository helpful, please consider giving it a star! โญ
Made with โค๏ธ by GroupDocs

