Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding uniVocity support #258

Merged
merged 6 commits into from
Dec 8, 2016
Merged

Adding uniVocity support #258

merged 6 commits into from
Dec 8, 2016

Conversation

AussieGuy0
Copy link
Contributor

This pull request adds a uniVocity extension so that CSV, TSV and Fixed Width Formats can be mapped and marshalled.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.3%) to 78.801% when pulling 9aaa474 on AussieGuy0:master into 33c57e6 on EasyBatch:master.

@fmbenhassine
Copy link
Member

Hello Anthony,

Fantastic! That's great 👍 This is what we call a very high quality pull request 😄
Nothing to say, straight to the master branch.
Many thanks, I really appreciate your help.

Keep the good work up on univocity!
I'll not hesitate to give you credits.

Best regards
Mahmoud

@fmbenhassine fmbenhassine merged commit 2b7cebd into j-easy:master Dec 8, 2016
@fmbenhassine fmbenhassine added this to the v5.1.0 milestone Mar 28, 2017
@fmbenhassine
Copy link
Member

Hi Anthony,

The build is failing on windows (I have to add a CI for windows in addition to mac/linux, travis-ci does not support windows yet..).

The following tests are failing:

  1. UnivocityRecordMarshallerTest#processRecordIntoCsv
  2. UnivocityRecordMarshallerTest#processRecordIntoTsv
  3. UnivocityRecordMarshallerTest#processRecordIntoFixedWidth
  4. UnivocityRecordMapperTest#testUnivocityCsvCarriageReturn

I have already fixed 1, 2 and 3 by replacing "\n" with Utils.LINE_SEPARATOR. I'll push the fix asap.

For the 4th test, it seems like the CsvParser converts "\r\n" to "\n". Here is a failing test that isolates the problem:

    @Test
    public void testUnivocityCsvCarriageReturnLineFeed() throws Exception {
        BeanListProcessor<TestBean> beanListProcessor = new BeanListProcessor<>(TestBean.class);
        CsvParserSettings csvParserSettings = new CsvParserSettings();
        csvParserSettings.getFormat().setQuote('\'');
        csvParserSettings.setProcessor(beanListProcessor);

        CsvParser parser = new CsvParser(csvParserSettings);
        String payload = "'foo" + LINE_SEPARATOR + "','bar" + LINE_SEPARATOR + "'";
        parser.parse(new StringReader(payload));

        TestBean result = beanListProcessor.getBeans().get(0);
        assertThat(result).isNotNull();
        assertThat(result.getFirstName()).isEqualTo("foo" + LINE_SEPARATOR);
        assertThat(result.getLastName()).isEqualTo("bar" + LINE_SEPARATOR);
    }

I tried to set some options:

csvParserSettings.setDelimiterDetectionEnabled(true);
csvParserSettings.setQuoteDetectionEnabled(true);
csvParserSettings.setLineSeparatorDetectionEnabled(true);

but it still failing. The payload variable on windows contains 'foo\r\n','bar\r\n' but result.getFirstName() is equal to foo\n. Any idea?

Kr
Mahmoud

@AussieGuy0
Copy link
Contributor Author

Hey Mahmoud,

I'll investigate this tonight. As a quick guess explicitly setting the line separator may help. e.g.

csvParserSettings.getFormat().setLineSeparator(LINE_SEPARATOR);

Also maybe ending the payload string with a line separator as well. e.g:

String payload = "'foo" + LINE_SEPARATOR + "','bar" + LINE_SEPARATOR + "'" + LINE_SEPARATOR;`

@fmbenhassine
Copy link
Member

Oh thanks! I tried both of these changes and it is still failing.

I'll continue investigating on my side. Let me know if you find something interesting.

Kr
Mahmoud

@fmbenhassine
Copy link
Member

I have updated tests to ignore os specific line endings, but still, I'm curious to know why this is failing on windows..

@AussieGuy0
Copy link
Contributor Author

Hey mate,

Found the problem. You need to add the following code:

csvParserSettings.setNormalizeLineEndingsWithinQuotes(false);

Basically, uniVocity will replace a detected line separator that is within quotes with whatever the normalized line separator (default is \n). This is to prevent blank lines being added when dealing with the same data on different platforms. By adjusting this setting to false, the \r\n will be retained in the test code.

Hope this solves everything. If you have any more questions, feel free to ask! Altough you owe me for making be boot into Windows 😉

@fmbenhassine
Copy link
Member

I came across this setting and set it to true (not false).. Now it is more clear to me.

Thank you for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants