Skip to content

Data post elaboration pipeline and merged regions

firegloves edited this page Jul 20, 2022 · 1 revision

Data post elaboration pipeline

In some cases it's useful to have a way to make a data elaboration after the export file is generated. A good example could be the creation of merged regions. For this reason MemPOI introduces the Data post elaboration system. The main concept resides in the list of MempoiColumnElaborationStep added to the MempoiColumn class.

The elaboration consists of 2 phases: analyzing data and applying transformation based on previously collected data. This is the working process:

  • after each row is added to each sheet -> analyze and collect data
  • after the last row is added to each sheet -> close analysis making some final operations
  • after data export completion -> apply data transformations

You can create your own Data post elaboration system's implementation by 2 ways:

  • implementing the base interface MempoiColumnElaborationStep
  • extending the abstract class StreamApiElaborationStep

MempoiColumnElaborationStep

This represents the base functionality and defines the methods you should implement to manage your desired data post elaboration flow. You can find an example in NotStreamApiMergedRegionsStep.

StreamApiElaborationStep

This class supplies some basic implementations to deal with Apache POI stream API. Then you have to implement, as for MempoiColumnElaborationStep, the interface logic methods. You can find an example in StreamApiMergedRegionsStep.

Differences

The main difference resides in the underlying Apache POI system, so it is a good practice to use the right implementation depending on the used Workbook implementation. However we could list some behaviors:

MempoiColumnElaborationStep

  • it should be used with HSSF or XSSF
  • it should access the generated Workbook as all in memory => document too large could saturate your memory causing an error
  • memory is never flushed

StreamApiElaborationStep

  • it should be used with SXSSF
  • it should access only a portion of the generated Workbook keeping in mind that at each time only a subset of the created rows are loaded in memory
  • you could find your desired configuration for the workbook's RandomAccessWindowSize property or you could try with its default value.
  • memory is flushed in order to keep only a subset of the generated rows in memory
  • memory flush mechanism is automated but it is a fragile mechanism, as reported by Apache POI doc, so it has to be used carefully

Adding data post elaboration steps

You can add as many steps as you want as follows:

MempoiSheetBuilder.aMempoiSheet()
           .withSheetName("Multiple steps")
           .withPrepStmt(prepStmt)
           .withDataElaborationStep("name", step1)
           .withDataElaborationStep("usefulChar", step2)
           .withDataElaborationStep("name", step3);

Note that you can add more than one step on each column. Keep in mind that order matters: for each column, steps will be executed in the added order so be careful. Built-in steps (like Merged Regions) will be added firstly. If you want to change this behavior you could configure them without using built-in functionalities.

For example both the following codes will result in executing merged regions step and then the custom one:

MempoiSheetBuilder.aMempoiSheet()
           .withSheetName("Multiple steps")
           .withPrepStmt(prepStmt)
           .withMergedRegionColumns(new String[]{"name"})
           .withDataElaborationStep("name", customStep);
MempoiSheetBuilder.aMempoiSheet()
           .withSheetName("Multiple steps")
           .withPrepStmt(prepStmt)
           .withDataElaborationStep("name", customStep)
           .withMergedRegionColumns(new String[]{"name"});

But this one will execute firstly the custom step and then the merged regions one:

MempoiSheetBuilder.aMempoiSheet()
           .withSheetName("Multiple steps")
           .withPrepStmt(prepStmt)
           .withDataElaborationStep("name", customStep)
           .withDataElaborationStep("name", new NotStreamApiMergedRegionsStep<>(columnList.get(colIndex).getCellStyle(), colIndex));

Merged Regions

Currently MemPOI supplies only one Data post elaboration system's step in order to ease merged regions management. All you have to do is to pass a String array to the MempoiSheetBuilder representing the list of columns to merge.

String[] mergedColumns = new String[]{"name"};

MempoiSheet mempoiSheet = MempoiSheetBuilder.aMempoiSheet()
    .withSheetName("Merged regions name column 2")
    .withPrepStmt(prepStmt)
    .withMergedRegionColumns(mergedColumns)
    .withStyleTemplate(new RoseStyleTemplate())
    .build();

MemPOI memPOI = MempoiBuilder.aMemPOI()
    .withFile(fileDest)
    .withStyleTemplate(new ForestStyleTemplate())
    .withWorkbook(new HSSFWorkbook())
    .addMempoiSheet(mempoiSheet)
    .build();

memPOI.prepareMempoiReport().get();