Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipelines #1398

Merged
merged 13 commits into from
Feb 7, 2022
Merged

Pipelines #1398

merged 13 commits into from
Feb 7, 2022

Conversation

longhotsummer
Copy link
Contributor

  • Use pipelines for parsing and importing documents. This makes it a lot easier to customise this behaviour.
  • Adjust importer plugin methods to distinguish between importing from a document, and parsing text.
  • smarter docx import via smarter html import (cleaning tables, merging lists, cleaning whitespace, merging adjacent inlines, removing empty inlines)

@longhotsummer longhotsummer marked this pull request as ready for review February 4, 2022 13:04
indigo/pipelines/html.py Outdated Show resolved Hide resolved
"""
def __call__(self, context):
#   to space
context.html_text = context.html_text.replace(' ', ' ')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side topic: I'd love for us to keep and somehow support non-breaking spaces at some point

This reverts commit 9dd9b60.
@longhotsummer longhotsummer merged commit e9c699d into master Feb 7, 2022
@longhotsummer longhotsummer deleted the pipelines branch February 7, 2022 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants