Document Parser API works well to search & extract formatted text as well as the raw text from a variety of documents of 50+ supported file formats.
Directory | Description |
---|---|
Demos | Source code for live demos hosted at https://products.groupdocs.app/parser/family. |
Examples | C# examples and sample files that will help you learn how to use product features. |
- Parse documents by user-defined templates.
- Extract plain and structured text.
- Extract text areas with coordinates, text styles and other information.
- Search text by a keyword or regular expression; extract text around that word.
- Extract HTML or Markdown (MD) formatted text for a fast preview.
- Increase performance by extracting raw text.
- Extract formatted text, metadata, images, containers, and attachments.
- Extract table of contents for some supported document formats.
- Parse form data from PDF documents.
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF, TXT
Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, CSV, XLA, XLAM, NUMBERS
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Portable: PDF
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF, TXT
Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, CSV, XLA, XLAM, NUMBERS
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Email: EML, EMLX, MSG
Markup: HTML, XHTML, MHTML, MD, XML
eBooks: CHM, EPUB, FB2
Portable: PDF
Notes: ONE
Databases: Databases are supported via ADO.NET. To work with the corresponding database format install its database provider.
Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, XLA, XLAM
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM
Portable: PDF
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF
Spreadsheet: XLS, XLT, XLSX, XLSM, XLTX, XLTM, XLA, XLAM
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Email: EML, EMLX, MSG
Markup: MD (Formatted Text is Not supported)
eBooks: CHM, EPUB, FB2
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF
Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, XLA, XLAM, NUMBERS
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Portable: PDF
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF
Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, XLA, XLAM
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Email: EML, EMLX, MSG
eBooks: EPUB, FB2
Portable: PDF
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF
Spreadsheet: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, XLA, XLAM, NUMBERS
Presentation: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
Email: EML, EMLX, MSG
Portable: PDF
Archive: ZIP
Email: PST, OST, EML, EMLX, MSG
Portable: PDF
Archive: ZIP
Portable: PDF
Word Processing: DOC, DOT, DOCX, DOCM, DOTX, DOTM, ODT, OTT, RTF
eBooks: CHM, EPUB
Portable: PDF
Databases: Databases are supported via ADO.NET. To work with the corresponding database format install its database provider.
Microsoft Windows: Microsoft Windows Desktop & Server (x86, x64), Windows Azure
macOS: Mac OS X
Linux: Ubuntu, OpenSUSE, CentOS, and others
Development Environments: Microsoft Visual Studio, Xamarin.Android, Xamarin.IOS, Xamarin.Mac, MonoDevelop.
Supported Frameworks: NET Standard 2.0, .NET Framework 2.0 or higher, .NET Core 2.0 or higher, Mono Framework 1.2 or higher
Are you ready to give GroupDocs.Parser for .NET a try? Simply execute Install-Package GroupDocs.Parser
from Package Manager Console in Visual Studio to fetch & reference GroupDocs.Parser assembly in your project. If you already have GroupDocs.Parser for .Net and want to upgrade it, please execute Update-Package GroupDocs.Parser
to get the latest version.
string connectionString = string.Format("Provider=System.Data.Sqlite;Data Source={0};Version=3;", "database.db");
// create an instance of Parser class to extract tables from the database
// as filePath connection parameters are passed; LoadOptions is set to Database file format
using (Parser parser = new Parser(connectionString, new LoadOptions(FileFormat.Database)))
{
// check if text extraction is supported
if (!parser.Features.Text)
{
Console.WriteLine("Text extraction isn't supported.");
return;
}
// check if toc extraction is supported
if (!parser.Features.Toc)
{
Console.WriteLine("Toc extraction isn't supported.");
return;
}
// get a list of tables
IEnumerable<TocItem> toc = parser.GetToc();
// iterate over tables
foreach (TocItem i in toc)
{
// print the table name
Console.WriteLine(i.Text);
// extract a table content as a text
using (TextReader reader = parser.GetText(i.PageIndex.Value))
{
Console.WriteLine(reader.ReadToEnd());
}
}
}
// create an instance of Parser class
using (Parser parser = new Parser(Constants.SampleZip))
{
// extract images from document
IEnumerable<PageImageArea> images = parser.GetImages();
// check if images extraction is supported
if (images == null)
{
Console.WriteLine("Page images extraction isn't supported");
return;
}
// create the options to save images in PNG format
ImageOptions options = new ImageOptions(ImageFormat.Png);
int imageNumber = 0;
// iterate over images
foreach (PageImageArea image in images)
{
// save the image to the png file
image.Save(imageNumber.ToString() + ".png", options);
imageNumber++;
}
}
Home | Product Page | Documentation | Demo | API Reference | Examples | Blog | Search | Free Support | Temporary License