Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for MS excel format using Apache POI #123

Closed
benas opened this Issue Aug 11, 2015 · 6 comments

Comments

2 participants
@benas
Copy link
Contributor

benas commented Aug 11, 2015

Add reader and writer for MS Excel format using Apache POI

@benas benas added the feature label Aug 11, 2015

@benas benas self-assigned this Aug 11, 2015

@benas benas added this to the 3.2.0 milestone Aug 11, 2015

@benas benas modified the milestones: 4.1.0, 4.0.0 Nov 7, 2015

@sothavirak

This comment has been minimized.

Copy link

sothavirak commented Nov 9, 2015

Hi, when this feature is released? I actually implemented my own Ms excel reader which can support both xls and xlsx.

benas pushed a commit that referenced this issue Nov 9, 2015

@benas

This comment has been minimized.

Copy link
Contributor Author

benas commented Nov 9, 2015

Hi,

There is no due date, it depends on the progress. There is a branch named easybatch-msexcel where I am experimenting with Apache POI. See example here.

Here is the current status of the branch:

  • MsExcelRecord
  • MsExcelRecordReader
  • MsExcelRecordMapper
  • MsExcelRecordMarshaller
  • MsExcelRecordWriter

You can already try it by importing the 4.1.0-SNAPSHOT version in your project.

The current reader supports only xls format, you are welcome to add support for xlsx from your implementation.

BTW, are you using the latest version 4.0? If not, I strongly recommend you to upgrade, this is the best release so far! I would love to hear your thoughts on this new version :-)

Regards
Mahmoud

@sothavirak

This comment has been minimized.

Copy link

sothavirak commented Nov 10, 2015

Hi Mahmoud,

Yes we are using version 4.0 and we love it :)... As for xlsx support, our requirement is to be able to read both format by replacing one line in your MsExcelRecordReader.java

HSSFWorkbook workbook = new HSSFWorkbook(new FileInputStream(file));

with

Workbook workbook = WorkbookFactory.create(new FileInputStream(file));

Thanks for the great work. This framework is rock!

Regards,
Sothavirak

@benas

This comment has been minimized.

Copy link
Contributor Author

benas commented Nov 10, 2015

Thank you for this kind feedback! I'm glad you like it :-)

I was thinking about making the MsExcel format as a second parameter to the MsExcelRecordReader:

File tweets = new File("tweets.xls");
Job job = JobBuilder.aNewJob()
      .reader(new MsExcelRecordReader(tweets , MsExcelFormat.XLS)) // or MsExcelFormat.XLSX
      .build();

Or may be other names for the constants, but the idea is to make the format configurable at construction time to support both versions.

The work is in progress, keep tuned. You are welcome to contribute if you want!

Kind regards
Mahmoud

@benas benas added the in progress label Nov 11, 2015

benas added a commit that referenced this issue May 26, 2016

@benas

This comment has been minimized.

Copy link
Contributor Author

benas commented May 26, 2016

Hi Sothavirak,

It took some time, but it is finally here 😄 I've added support for MsExcel XLSX support in the master branch and will be released soon. Here are the main components:

  • MsExcelRecord: record having apache POI Row as payload
  • MsExcelRecordReader: iteratively read MsExcelRecords from a xlsx file
  • MsExcelRecordMapper: map rows to POJOs
  • MsExcelRecordMarshaller: marshal POJOs to rows
  • MsExcelRecordWriter: write rows to a sheet in a xlsx file

I've decided to support only the new format (xlsx) but not the legacy one (xls).
Looks like apache POI provide two APIs for reading data: eventmodel (ala SAX) and usermodel (in memory). I've used the usermodel for now, which is fine for small to medium files. I'll add streaming support later on for large input files.

It would be great if you can try it with version 4.1.0-SNAPSHOT and give me your feedback. An example can be found here.

Kind regards
Mahmoud

@benas

This comment has been minimized.

Copy link
Contributor Author

benas commented Jun 12, 2016

I'm preparing to release v4.1 including this feature.

If you have any issue, please file a bug.

@benas benas closed this Jun 12, 2016

@benas benas removed the in progress label Jun 12, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.