Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamlining Chunk Addition in ColumnText Without Storing All in JVM #1167

Open
HemaSudha1498 opened this issue May 24, 2024 · 5 comments
Open

Comments

@HemaSudha1498
Copy link

Hello Team,

I am working on generating a PDF book with very lengthy text (>1000 pages). Due to the large content, I encountered OutOfMemoryException when using phrase/paragraph directly and decided to use ColumnText to handle the text chunk by chunk. While this approach helps avoid memory issues, I have run into a problem where each new chunk starts on a new line within the same column. This behavior is observed when adding chunks one at a time. In contrast, adding multiple chunks before calling ct.go() works fine, with the text flowing continuously. I identified that BidiLine processes all chunks and outputs the content line by line.

Code I have tried :

  public static void main(String[] args) throws Exception {
      Document document = new Document(new Rectangle(842, 595));
      PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("ParagraphTest.pdf"));
      document.open();
      DataSupplier dataSupplier = new DataSupplier();
      ColumnText ct = new ColumnText(writer.getDirectContent());
      ct.setSimpleColumn(50, 50, 200, 545);
      while (dataSupplier.hasMoreContent()) {  // until data supplier has no more content
          String text = dataSupplier.getText();
          ct.addText(new Chunk(text));  // adding chunk by chunk. Each chunk may actually vary in style
          while (ct.go() != ColumnText.NO_MORE_TEXT) { // if end of page is reached, creating new page and resuming
              document.newPage();
              ct.setSimpleColumn(50, 50, 200, 545);
          }
      }
      document.close();
  }

Is there any way to append chunks to the previous chunk on the same line? Alternatively, if there is a different class or method to achieve streaming of text, please suggest.

Thank you !

@mkl-public
Copy link
Contributor

I'm wondering about the starting issue, that you encountered OutOfMemoryException when using phrase/paragraph directly and decided to use ColumnText to handle the text chunk by chunk. Unless you put all the content into a single element (like a single paragraph or table containing all the content), adding phrases/paragraphs directly should not require more memory than using ColumnText, at least not a relevant amount.

@HemaSudha1498
Copy link
Author

Hi @mkl-public,

Yes, it is possible to add phrases individually. However, besides memory issues, I need the content to be at a specific position (positionX and positionY) within a specified width. Can you please guide me on whether it is possible to achieve this with a direct paragraph or phrase?

Thank you.

@mkl-public
Copy link
Contributor

Yes, it is possible to add phrases individually. However, besides memory issues, I need the content to be at a specific position (positionX and positionY) within a specified width. Can you please guide me on whether it is possible to achieve this with a direct paragraph or phrase?

Ah, ok, then ColumnText indeed should be used.

Unfortunately I'm not into such layout processing details. Maybe someone else can help.

@asturio
Copy link
Member

asturio commented May 27, 2024

I'm still wondering what are you really trying to achieve. We have here a parser. We read a template and parse it, producing the PDF on the way. So content is added as a stream to the PDF. On the way titles, paragraphs, list and so on are created.

Do you have some kind of draft, how your document should look like?

@HemaSudha1498
Copy link
Author

Hello @asturio,

Assume my goal is to generate a multi-colored e-book with complex formatting, including:

  • Multiple headings
  • Paragraphs
  • Bullet point text
  • Inline links
  • Highlighted text
  • Various colors and fonts
  • Images
    etc.,

The content is extensive, potentially exceeding 1000 pages, and comes from multiple data sources. Due to the large volume, holding all the content in the JVM is not possible. Therefore, I am looking for a way to write the content chunk by chunk.

Additionally, I need the ability to place content at specific positions (positionX and positionY) within a specified width, and I may require multiple columns of text in the PDF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants