Skip to content
This repository has been archived by the owner on Jan 15, 2022. It is now read-only.

COR-329 : New implementation of the MSExcelDocumentReader using the Even... #28

Closed
wants to merge 3 commits into from
Closed

COR-329 : New implementation of the MSExcelDocumentReader using the Even... #28

wants to merge 3 commits into from

Conversation

fdrouet
Copy link

@fdrouet fdrouet commented Aug 7, 2014

New implementation of the MSExcelDocumentReader using the Event User Model approach to have * lower memory footprint with big Excel files*.

The new Reader follow the following rules :

we only index :

  • a maximum of 5000 cells starting from the first tab
  • after 5000 cells processed, we abort the parsing

we KEEP only the following data :

  • tab name
  • cells with string with a length > 2 chars

we SKIP the following data :

  • cells with string with a length < 3 chars
  • cells with number (date formatted or simple number)
  • cells with blank value
  • cells with boolean or error value
  • cells with formula

New implementation of the MSXExcelDocumentReader using the Event User Model approach to have lower memory footprint with big Excel files.

The new Reader follow the following rules :

we only index :

  • a maximum of 5000 cells starting from the first tab
  • a maximum of 5 tabs
  • a maximum of 1000 cells processed per tab

we KEEP only the following data :

  • tab name
  • cells with string with a length > 2 chars

we SKIP the following data :

  • cells with string with a length < 3 chars
  • cells with number (date formatted or simple number)
  • cells with blank value
  • cells with boolean or error value
  • cells with formula

New implementation of the POIPropertiesReader.readDCProperties(...) for ooxml documents (MS 2007 office file formats) to have lower memory footprint with big Excel (xlsx) or Word (docx) or Power Point (pptx) files.

Frederic DROUET added 3 commits August 13, 2014 08:45
…vent User Model approach to have lower memory footprint with big Excel files
…Event User Model approach to have lower memory footprint with big Excel files
…ory footprint with XLSX, DOCX and PPTX documents
@fdrouet fdrouet closed this Aug 13, 2014
@fdrouet fdrouet deleted the fix/2.5.10-GA/COR-329-event branch August 13, 2014 15:05
@fdrouet
Copy link
Author

fdrouet commented Aug 13, 2014

PR dropped and replace by #29

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
1 participant