Skip to content

docs: revise replacable parts in readme.md for the extractor api lib#74

Merged
a-klos merged 9 commits intomainfrom
docs/fix-broken-links
Aug 6, 2025
Merged

docs: revise replacable parts in readme.md for the extractor api lib#74
a-klos merged 9 commits intomainfrom
docs/fix-broken-links

Conversation

@a-klos
Copy link
Copy Markdown
Member

@a-klos a-klos commented Aug 4, 2025

This pull request introduces updates to the extractor API library readme.md, focusing on improving extractor configurations, renaming components for clarity, and enhancing metadata handling.

Updates to extractor configurations and mappings:

  • Replaced InformationExtractor with InformationFileExtractor for pdf_extractor, ms_docs_extractor, and xml_extractor in the README.md file. Additionally, the all_extractors list was renamed to file_extractors for better specificity.
  • Added new mappers: intern2external, confluence_document2information_piece, and sitemap_document2information_piece, which handle specific metadata mapping for Confluence and sitemap sources.

Renaming for clarity:

  • Renamed langchain_document2information_piece to confluence_document2information_piece in the DependencyContainer class and updated its usage in the confluence_extractor.

@a-klos a-klos merged commit 8ba8617 into main Aug 6, 2025
12 checks passed
@a-klos a-klos deleted the docs/fix-broken-links branch August 6, 2025 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants