Skip to content

Samples demonstrating how to extract tables from PDF using GroupDocs.Parser for .NET. Shows automatic table detection without templates, working with structured and unstructured business files. Includes code examples, parsing techniques, and best practices.

License

Notifications You must be signed in to change notification settings

groupdocs-parser/Pdf-tables-extraction-using-groupdocs-parser

Repository files navigation

Extract Tables from PDF Documents with C# - GroupDocs.Parser Examples

Product Page Docs Demos API Blog Support Temp License

๐Ÿ“‹ Quick Navigation


๐Ÿ“– Overview

This repository demonstrates how to extract tables from PDF documents using GroupDocs.Parser for .NET. Learn how to parse document tables, extract table values, and process structured data from PDF files programmatically with C#.

Key Capabilities:

  • โœ… Extract tables from PDF documents automatically
  • โœ… Parse document structures without templates
  • โœ… Extract table values with cell-level precision
  • โœ… Process tables from specific pages or entire documents
  • โœ… Display extracted tables in formatted console output

Note: This repository currently focuses on table extraction without templates. Template-based extraction examples will be added in future updates.


๐Ÿ’ผ Business Needs Challenge

Modern businesses face the critical challenge of efficiently extracting document data from PDF files containing structured tables. Manual data entry is time-consuming, error-prone, and doesn't scale. Organizations need automated solutions to parse document structures and extract tables programmatically to access critical business information.

Common Document Processing Requirements

1. Invoice and Financial Document Processing

Businesses receive hundreds of invoices, receipts, and financial statements daily. Each document contains critical table extraction needs:

  • Extract document data from invoice line items (product names, quantities, prices)
  • Parse document tables to capture billing addresses, tax calculations, and totals
  • Extract tables containing payment terms, discounts, and shipping information
  • Automatically process vendor invoices for accounts payable automation

Solution: Automatic table extraction eliminates manual data entry, reduces processing time from hours to seconds, and ensures 100% accuracy when extracting tables from financial documents. With programmatic table extraction, businesses can process thousands of invoices daily, extract all relevant data, and integrate directly into accounting systems.

2. Report and Analytics Data Extraction

Organizations generate and receive numerous reports containing analytical data in tabular format:

  • Extract document data from sales reports with product performance metrics
  • Parse document tables containing quarterly financial results and KPIs
  • Extract tables from operational reports with inventory levels, production metrics, and resource utilization
  • Process regulatory compliance reports with structured data requirements

Solution: Automated table extraction enables businesses to extract tables from reports programmatically, transforming static PDF documents into actionable data. This allows for real-time data analysis, automated reporting pipelines, and seamless integration with business intelligence tools. By extracting document data automatically, organizations can make data-driven decisions faster and maintain accurate records.

3. Purchase Orders and Supply Chain Documents

Supply chain operations depend on accurate data extraction from purchase orders, shipping manifests, and inventory reports:

  • Extract document data from purchase orders including SKU numbers, quantities, and unit prices
  • Parse document tables to capture supplier information, delivery dates, and shipping addresses
  • Extract tables containing inventory levels, stock movements, and warehouse locations
  • Process shipping manifests with item lists, tracking numbers, and delivery confirmations

Solution: Table extraction automates the entire supply chain document processing workflow. By extracting tables from purchase orders and shipping documents, businesses can automatically update inventory systems, track shipments, and reconcile orders without manual intervention. This table extraction capability ensures supply chain visibility and reduces processing errors.

Why Automatic Table Extraction Matters

Traditional manual data extraction is inefficient and costly. Table extraction technology solves these challenges by:

  • Eliminating Manual Errors: Automated table extraction ensures consistent, accurate data capture
  • Scaling Operations: Extract tables from hundreds of documents in minutes, not days
  • Reducing Costs: Cut data entry costs by up to 90% with automated table extraction
  • Improving Speed: Parse document files instantly and extract document data in real-time
  • Enabling Integration: Extract tables directly into databases, ERP systems, and analytics platforms

Whether you need to extract document data from invoices, parse document reports, or extract tables from any PDF document, automated table extraction provides the solution to transform unstructured documents into structured, actionable data.


โœจ Features

Table Extraction Capabilities

  • Automatic Table Detection โ€“ No templates required for basic table extraction
  • Page-Specific Extraction โ€“ Extract tables from specific pages
  • Full Document Processing โ€“ Extract all tables across all pages
  • Structured Output โ€“ Formatted table display with headers and values
  • Cell-Level Access โ€“ Access individual table cells and their content

What You Can Extract

  • Table headers and data rows
  • Cell values with precise positioning
  • Table dimensions (rows ร— columns)
  • Multi-page table extraction
  • Tables organized by page

๐Ÿš€ Getting Started

Prerequisites

  • .NET 6.0 or later (.NET 9.0 recommended)
  • GroupDocs.Parser for .NET NuGet package
  • Valid GroupDocs.Parser license (optional for evaluation)

Installation

Clone the repository:

git clone https://github.com/groupdocs-parser/Pdf-tables-extraction-using-groupdocs-parser.git
cd Pdf-tables-extraction-using-groupdocs-parser

License Setup (optional)

For production use, set your GroupDocs.Parser license:

new License().SetLicense(@"path\to\GroupDocs.Parser.NET.lic");

For evaluation, you can use a temporary license.


๐Ÿ’ป Code Examples

Example 1: Extract Tables from a Specific Page

This example demonstrates how to extract tables from a particular page of a PDF document. The method analyzes the document structure and extracts all tables found on the specified page.

Source Document: PDF Document Page

Code:

static void ExtractTablesPerParticluarPage()
{
    string sample = "Invoices.pdf";
    
    // Initialize parser with PDF document
    using (var parser = new Parser(sample))
    {
        // Get document information
        var documentInfo = parser.GetDocumentInfo();
        int pageCount = documentInfo.PageCount;
        
        // Extract tables from first page (pageIndex = 0)
        var pageIndex = 0;
        var tables = parser.GetTables(pageIndex);

        if (tables != null && tables.Any())
        {
            int tableNumber = 1;
            foreach (var table in tables)
            {
                // Process each table
                // Display table dimensions and content
                ProcessTable(table);
                tableNumber++;
            }
        }
    }
}

Console Output: Console Output

Example 2: Extract All Tables from Document

Extract all tables from all pages of a PDF document, organized by page:

static void ExtractAllTablesFromDocument()
{
    string sample = "TablesReport.pdf";

    using (var parser = new Parser(sample))
    {   
        // Get all tables from entire document
        var tables = parser.GetTables();

        if (tables != null && tables.Any())
        {
            // Group tables by page index
            var tablesByPage = tables
                .GroupBy(table => table.Page.Index)
                .OrderBy(group => group.Key);

            foreach (var pageGroup in tablesByPage)
            {
                int pageIndex = pageGroup.Key;
                Console.WriteLine($"Tables in the Page {pageIndex + 1}");
                
                int tableNumber = 1;
                foreach (var table in pageGroup)
                {
                    Console.WriteLine($"  Table {tableNumber}: {table.RowCount} rows x {table.ColumnCount} columns");
                    ProcessTable(table);
                    tableNumber++;
                }
            }
        }
    }
}

Example 3: Access Individual Table Cells

Access and process individual cells from extracted tables:

static void ProcessTable(PageTableArea table)
{
    // Calculate column widths for proper alignment
    int[] columnWidths = Enumerable.Range(0, table.ColumnCount)
        .Select(col => Math.Max(3, Enumerable.Range(0, table.RowCount)
            .Max(row => table[row, col]?.Text?.Length ?? 0)))
        .ToArray();

    // Display table with borders
    string separator = "+" + string.Join("+", columnWidths.Select(w => new string('-', w + 2))) + "+";
    
    // Display header row (first row)
    Console.WriteLine("    " + separator);
    Console.Write("    |");
    for (int col = 0; col < table.ColumnCount; col++)
    {
        string cellText = GetCellText(table, 0, col);
        Console.Write($" {cellText.PadRight(columnWidths[col])} |");
    }
    Console.WriteLine();
    Console.WriteLine("    " + separator);

    // Display data rows
    for (int row = 1; row < table.RowCount; row++)
    {
        Console.Write("    |");
        for (int col = 0; col < table.ColumnCount; col++)
        {
            string cellText = GetCellText(table, row, col);
            Console.Write($" {cellText.PadRight(columnWidths[col])} |");
        }
        Console.WriteLine();
    }
    Console.WriteLine("    " + separator);
}

static string GetCellText(PageTableArea table, int row, int col)
{
    return table[row, col]?.Text ?? "";
}

๐ŸŽฏ Use Cases

Business Document Processing

  • Invoice Processing โ€“ Extract line items, totals, and payment information
  • Financial Reports โ€“ Parse balance sheets, income statements, and financial tables
  • Purchase Orders โ€“ Extract product details, quantities, and pricing
  • Receipts โ€“ Extract itemized lists and transaction details

Data Migration & Integration

  • Database Import โ€“ Convert PDF tables to database records
  • Excel Conversion โ€“ Extract tables for spreadsheet processing
  • API Integration โ€“ Parse document tables for REST API consumption
  • ETL Pipelines โ€“ Extract, transform, and load table data

Document Analysis

  • Report Analysis โ€“ Extract structured data from business reports
  • Compliance Documents โ€“ Parse regulatory tables and forms
  • Research Data โ€“ Extract tables from research papers and publications
  • Legal Documents โ€“ Parse tables from contracts and legal filings

Automation & Workflows

  • Automated Data Entry โ€“ Reduce manual data entry from PDFs
  • Batch Processing โ€“ Process multiple PDF documents automatically
  • Content Indexing โ€“ Extract tables for search engine indexing
  • Data Validation โ€“ Verify table data against business rules

๐Ÿ“„ Supported Formats

Document Formats

Format Extension Table Extraction
PDF .pdf โœ… Supported
Microsoft Word .doc, .docx โœ… Supported
Microsoft Excel .xls, .xlsx โœ… Supported
Microsoft PowerPoint .ppt, .pptx โœ… Supported
OpenDocument .odt, .ods, .odp โœ… Supported

Note: This repository focuses on PDF table extraction. Other formats are supported by GroupDocs.Parser but not demonstrated in these examples.


๐Ÿ”ง Project Structure

Pdf-tables-extraction-using-groupdocs-parser/
โ”‚
โ”œโ”€โ”€ Program.cs                 # Main code examples
โ”œโ”€โ”€ README.md                  # This file
โ”œโ”€โ”€ LICENSE                    # License file
โ”‚
โ”œโ”€โ”€ Invoices.pdf              # Sample PDF document
โ”œโ”€โ”€ TablesReport.pdf          # Sample PDF with tables
โ”œโ”€โ”€ Operations.pdf            # Sample PDF document
โ”‚
โ”œโ”€โ”€ document-page-01.png      # Document preview image
โ”œโ”€โ”€ console-output-01.png     # Console output example
โ”‚
โ””โ”€โ”€ bin/                      # Build output directory

๐Ÿ“š Resources

Documentation & Learning

Support & Community

Product Information


๐Ÿค Contributing

Contributions are welcome! If you'd like to contribute:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Contribution Ideas

  • Add more table extraction examples
  • Improve error handling
  • Add template-based extraction examples
  • Enhance documentation
  • Add unit tests

๐Ÿ”ฎ Roadmap

Current Features โœ…

  • Extract tables from PDF documents
  • Page-specific table extraction
  • Full document table extraction
  • Formatted table display

Coming Soon ๐Ÿš€

  • Template-based table extraction examples
  • OCR support for scanned PDFs
  • Batch processing multiple documents
  • Export to CSV/Excel formats
  • Advanced table formatting options

๐Ÿ“Š Keywords & SEO

Primary Keywords:

  • extract table from PDF
  • parse document tables
  • extracting table values
  • PDF table extraction C#
  • GroupDocs.Parser examples
  • table parsing .NET
  • extract tables from PDF documents
  • parse PDF tables programmatically
  • C# PDF table extraction
  • document table parser

Related Terms:

  • PDF parser, table extraction, document parsing, data extraction, PDF processing, C# PDF library, .NET PDF parser, table data extraction, structured data extraction, PDF table reader

โญ Star History

If you find this repository helpful, please consider giving it a star! โญ


Made with โค๏ธ by GroupDocs

About

Samples demonstrating how to extract tables from PDF using GroupDocs.Parser for .NET. Shows automatic table detection without templates, working with structured and unstructured business files. Includes code examples, parsing techniques, and best practices.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages