🚀 Welcome to the GroupDocs.Parser for Python via .NET repository!
This repository will soon host a collection of practical, ready‑to‑run examples that demonstrate how to use GroupDocs.Parser for Python via .NET in your own applications.
We’re working on publishing code samples that will help you:
- Get started quickly with using GroupDocs.Parser
- Explore both basic and advanced features of the API
- Integrate GroupDocs.Parser into your applications
Stay tuned — examples are coming soon! ✨
GroupDocs.Parser for Python via .NET is a powerful document‑parsing and data‑extraction library. It lets you extract text, images, attachments, barcodes, and structured content from a wide range of document formats, including PDF, Word, Excel, PowerPoint, emails, archives, e‑books and many image types. The library runs on Windows, Linux and macOS and works with any supported Python 3.5+ interpreter.
- Rich text extraction & search – plain or formatted text, page‑level access, case‑sensitive, whole‑word and regex search.
- Structured content & templates – parse headings, paragraphs, tables, text areas and use templates to pull strongly‑typed fields from invoices, receipts, etc.
- Images, attachments & barcodes – extract embedded images, file attachments and barcodes from supported documents.
- OCR for scanned documents – read text from scanned PDFs and raster images, with optional spell‑checking.
- Wide format & platform support – dozens of document, image and archive formats on Windows, Linux and macOS, all via a unified .NET‑powered API accessed from Python.
Word Processing – DOC, DOT, DOCX, DOCM, DOTX, DOTM, RTF, TXT, ODT, OTT
PDF – PDF (full text, images, attachments, forms, barcode scanning)
Markup – XHTML, MHTML, MD, XML
eBooks – CHM, EPUB, FB2, MOBI, AZW3
Spreadsheets – XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, CSV, XLA, XLAM, NUMBERS
Presentations – PPT, PPS, POT, PPTX, PPTM, PPSX, PPSM, POTX, POTM, ODP, OTP
Email – PST, OST, EML, EMLX, MSG
Notes – ONE
Archives – 7Z, ZIP, RAR, TAR, GZ, BZ2
Images – BMP, GIF, JP2, JPG/JPEG, PNG, TIF/TIFF, DICOM, DJVU, EMF, J2K, PS, PSD, SVG, SVGZ, WEBP, WMF
- Python 3.5+
- Windows, Linux or macOS with a supported Python runtime
pip install groupdocs-parser-netUpgrade to the latest version:
pip install --upgrade groupdocs-parser-netfrom groupdocs.parser import Parser
with Parser("sample.pdf") as parser:
text = parser.GetText()
print(text)Search text in a PDF
from groupdocs.parser import Parser
with Parser("sample.pdf") as parser:
for area in parser.Search("Total Amount"):
print(f"Page {area.PageIndex}, Rectangle: {area.Rectangle}")Extract images from a Word document
from groupdocs.parser import Parser
with Parser("sample.docx") as parser:
images = parser.GetImages()
for i, image in enumerate(images, 1):
image.Save(f"image{i}.png")Read document metadata
from groupdocs.parser import Parser
with Parser("sample.pdf") as parser:
metadata = parser.GetMetadata()
for item in metadata:
print(f"{item.Name}: {item.Value}")You can request a 30‑day Temporary License for unrestricted testing:
- Visit the Get a Temporary License page.
- Follow the instructions to obtain the license file.
- Apply the license in your code:
import os
from groupdocs.parser import License
license_path = os.path.abspath("./GroupDocs.Parser.lic")
License().set_license(license_path)GroupDocs.Parser for Python via .NET is distributed under the GroupDocs End‑User License Agreement (EULA). For full pricing details see the pricing page.
GroupDocs offers unlimited free technical support for all products, including evaluation versions.
- Free Support Forum – Ask questions, share ideas and get help directly from the development team: https://forum.groupdocs.com/c/parser
- Paid Support Helpdesk – Faster response times and dedicated assistance: https://helpdesk.groupdocs.com/
- Paid Consulting – Custom development, feature requests or consulting services: https://consulting.groupdocs.com/contact/
- 📘 Check other products: GroupDocs Products Catalog
- 💻 Download free trial: GroupDocs Downloads
- 💬 Need help? Visit our Free Support Forum.