GroupDocs.Parser for Java is a Document Parser & Data Extraction Library that supports more than 50 popular document types. It can help build Java-based business applications with features of parsing raw, structured & formatted text as well as image & metadata extraction.
Directory | Description |
---|---|
Examples | Java examples and sample documents for you to get started quickly. |
- Extract plain text from any of the supported documents.
- Extract HTML or Markdown formatted text for a fast preview.
- Extract structured text.
- Extract text areas with coordinates, text style and other information.
- Search text by a keyword or regular expression. Also get text around the found word.
- Extract metadata from supported document formats.
- Get information about document images and save them.
- Extract data containers like ZIP archives, PDF portfolios, emails, OST and so on.
- Extract table of contents (ToC).
- Parse form data from PDF documents.
GroupDocs.Parser for Java requires J2SE 7.0 (1.7), J2SE 8.0 (1.8) or above. Please install Java first if you do not have it already.
GroupDocs hosts all Java APIs on GroupDocs Artifact Repository, so simply configure your Maven project to fetch the dependencies automatically.
// create an instance of Parser class
try (Parser parser = new Parser(Constants.SamplePdf)) {
// extract a text into the reader
try (TextReader reader = parser.getText()) {
// print a text from the document
// if text extraction isn't supported, a reader is null
System.out.println(reader == null ? "Text extraction isn't supported" : reader.readToEnd());
}
}
// create an instance of Parser class
try (Parser parser = new Parser(Constants.SampleDocx)) {
// extract a formatted text into the reader
try (TextReader reader = parser.getFormattedText(new FormattedTextOptions(FormattedTextMode.Html))) {
// print a formatted text from the document
// if formatted text extraction isn't supported, a reader is null
System.out.println(reader == null ? "Formatted text extraction isn't suppported" : reader.readToEnd());
}
}
// create an instance of Parser class
try (Parser parser = new Parser(Constants.SampleDocx)) {
// extract metadata from the document
Iterable<MetadataItem> metadata = parser.getMetadata();
// check if metadata extraction is supported
if (metadata == null) {
System.out.println("Metatada extraction isn't supported");
}
// iterate over metadata items
for (MetadataItem item : metadata) {
// print an item name and value
System.out.println(String.format("%s: %s", item.getName(), item.getValue()));
}
}
Home | Product Page | Documentation | Demos | API Reference | Examples | Blog | Search | Free Support | Temporary License