Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for MS excel format using Apache POI #123

Closed
fmbenhassine opened this issue Aug 11, 2015 · 6 comments
Closed

Add support for MS excel format using Apache POI #123

fmbenhassine opened this issue Aug 11, 2015 · 6 comments
Assignees
Labels
Milestone

Comments

@fmbenhassine
Copy link
Member

Add reader and writer for MS Excel format using Apache POI

@fmbenhassine fmbenhassine self-assigned this Aug 11, 2015
@fmbenhassine fmbenhassine added this to the 3.2.0 milestone Aug 11, 2015
@fmbenhassine fmbenhassine modified the milestones: 4.1.0, 4.0.0 Nov 7, 2015
@sothavirak
Copy link

Hi, when this feature is released? I actually implemented my own Ms excel reader which can support both xls and xlsx.

@fmbenhassine
Copy link
Member Author

Hi,

There is no due date, it depends on the progress. There is a branch named easybatch-msexcel where I am experimenting with Apache POI. See example here.

Here is the current status of the branch:

  • MsExcelRecord
  • MsExcelRecordReader
  • MsExcelRecordMapper
  • MsExcelRecordMarshaller
  • MsExcelRecordWriter

You can already try it by importing the 4.1.0-SNAPSHOT version in your project.

The current reader supports only xls format, you are welcome to add support for xlsx from your implementation.

BTW, are you using the latest version 4.0? If not, I strongly recommend you to upgrade, this is the best release so far! I would love to hear your thoughts on this new version :-)

Regards
Mahmoud

@sothavirak
Copy link

Hi Mahmoud,

Yes we are using version 4.0 and we love it :)... As for xlsx support, our requirement is to be able to read both format by replacing one line in your MsExcelRecordReader.java

HSSFWorkbook workbook = new HSSFWorkbook(new FileInputStream(file));

with

Workbook workbook = WorkbookFactory.create(new FileInputStream(file));

Thanks for the great work. This framework is rock!

Regards,
Sothavirak

@fmbenhassine
Copy link
Member Author

Thank you for this kind feedback! I'm glad you like it :-)

I was thinking about making the MsExcel format as a second parameter to the MsExcelRecordReader:

File tweets = new File("tweets.xls");
Job job = JobBuilder.aNewJob()
      .reader(new MsExcelRecordReader(tweets , MsExcelFormat.XLS)) // or MsExcelFormat.XLSX
      .build();

Or may be other names for the constants, but the idea is to make the format configurable at construction time to support both versions.

The work is in progress, keep tuned. You are welcome to contribute if you want!

Kind regards
Mahmoud

@fmbenhassine
Copy link
Member Author

fmbenhassine commented May 26, 2016

Hi Sothavirak,

It took some time, but it is finally here 😄 I've added support for MsExcel XLSX support in the master branch and will be released soon. Here are the main components:

  • MsExcelRecord: record having apache POI Row as payload
  • MsExcelRecordReader: iteratively read MsExcelRecords from a xlsx file
  • MsExcelRecordMapper: map rows to POJOs
  • MsExcelRecordMarshaller: marshal POJOs to rows
  • MsExcelRecordWriter: write rows to a sheet in a xlsx file

I've decided to support only the new format (xlsx) but not the legacy one (xls).
Looks like apache POI provide two APIs for reading data: eventmodel (ala SAX) and usermodel (in memory). I've used the usermodel for now, which is fine for small to medium files. I'll add streaming support later on for large input files.

It would be great if you can try it with version 4.1.0-SNAPSHOT and give me your feedback. An example can be found here.

Kind regards
Mahmoud

@fmbenhassine
Copy link
Member Author

I'm preparing to release v4.1 including this feature.

If you have any issue, please file a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants