Create a test suite showing all the features of supercsv #56

kbilsted · 2015-07-05T19:53:22Z

An idea could be to create a unit test called "FeaturesOfSuperCsv" which has a test for each row of the comparison matrix showing that supercsv in fact has this feature. Missing features should be made into individual issues which the community then can implement or reject.

this test is nice for

new people on the team
people doing comparisons
for linking from the documentation pages

features to show can be found in

the existing documentation
http://csveed.org/comparison-matrix.html
https://github.com/uniVocity/csv-parsers-comparison

ZioberMichal · 2015-07-05T19:56:26Z

Thanks for pointing me to this issue. I will try to start working on test suite this week.

kbilsted · 2015-07-05T20:01:33Z

Sounds fantastic. Not sure how we should structure this? I mean we cannot put ALL the features into one test I guess since we have extra modules e.g. for jodatime. Any ideas? Maybe put a "feature" test in each package?

Then we also need from the frontpage, to link to alle the "featuretests"

ZioberMichal · 2015-07-05T21:48:54Z

"Feature" test in each package sounds good. We can start using this approach and when we will see better one we can refactor already created tests.

Edit:
We should decide how we want to solve this for example for "Tokenizer" section from the first link:

custom separator
custom quote
custom escape
custom EOL
\n in delimited field
escaped quote in quoted field
different escape / quote
deal with leading/trailing whitespace

we can

A: Create TokenizerTest which will be containing 8 test for each related features. Put this test to "features" subpackage.
B: Split it on separated 8 feature tests in "features" subpackage.

kbilsted · 2015-07-06T13:57:14Z

I'm not sure what provides the better overview. Maybe it will be clear after some code

ZioberMichal · 2015-07-09T00:00:51Z

I have implemented first test for "Tokenizer" section and actually it is not possible to:

change escape character.
set different characters for "quote character" and "escape character"

Or maybe it is possible, but I could not find any feature which allows that. Actually, we can trim leading and trailing white-characters but we have to choose proper cell processors. I think, it should be listed in documentation or guideline.

Below you can see class and simple tests. I did not want to push them as PR because actually I do not know what to do with tests for which we are not able to change "escape character". Should we thrown exception "Not implemented yet!" or something like this? What do you think?

package org.supercsv.features;

import org.junit.Assert;
import org.junit.Test;
import org.supercsv.cellprocessor.constraint.NotNull;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.io.CsvBeanWriter;
import org.supercsv.io.CsvListWriter;
import org.supercsv.prefs.CsvPreference;

import java.io.IOException;
import java.io.StringWriter;
import java.util.Arrays;
import java.util.List;

/**
 * Test class which checks all features listed on <a href="http://csveed.org/comparison-matrix.html">comparison page</a> in "Tokenizer" section.
 *
 * @author Michał Ziober
 */
public class TokenizerFeaturesTest {


    @Test
    public void testCustomSeparator() throws IOException {
        List<String> data = Arrays.asList("John", "Connor");
        CellProcessor[] processors = { new NotNull(), new NotNull() };

        char customSeparator = '+';
        CsvPreference customPreference = new CsvPreference.Builder('"', customSeparator, "").build();
        StringWriter writer = new StringWriter();
        CsvListWriter beanWriter = new CsvListWriter(writer, customPreference);
        beanWriter.write(data, processors);
        beanWriter.close();

        Assert.assertEquals("John+Connor", writer.toString());
    }

    @Test
    public void testCustomQuote() throws IOException {
        List<String> data = Arrays.asList("John \n Connor");
        CellProcessor[] processors = { new NotNull() };

        char customQuote = '|';
        CsvPreference customPreference = new CsvPreference.Builder(customQuote, ',', "").build();
        StringWriter writer = new StringWriter();
        CsvListWriter beanWriter = new CsvListWriter(writer, customPreference);
        beanWriter.write(data, processors);
        beanWriter.close();

        Assert.assertEquals("|John  Connor|", writer.toString());
    }

    //TODO: not possible to change
    @Test
    public void testCustomEscape() throws IOException {
        List<String> data = Arrays.asList("\"John Connor\"");
        CellProcessor[] processors = { new NotNull() };

        CsvPreference customPreference = new CsvPreference.Builder('"', ',', "").build();
        StringWriter writer = new StringWriter();
        CsvListWriter beanWriter = new CsvListWriter(writer, customPreference);
        beanWriter.write(data, processors);
        beanWriter.close();

        Assert.assertEquals("\"\"\"John Connor\"\"\"", writer.toString());
    }

    @Test
    public void testCustomEOL() throws IOException {
        List<String> data = Arrays.asList("John Connor");
        CellProcessor[] processors = { new NotNull() };

        String customEndOfLine = ">\r\n";
        CsvPreference customPreference = new CsvPreference.Builder('"', ',', customEndOfLine).build();
        StringWriter writer = new StringWriter();
        CsvListWriter beanWriter = new CsvListWriter(writer, customPreference);
        beanWriter.write(data, processors);
        beanWriter.close();

        Assert.assertEquals("John Connor>\r\n", writer.toString());
    }

    @Test
    public void testNewLineInDelimitedField() throws IOException {
        List<String> data = Arrays.asList("Jo\nhn", "Con\nnor");
        CellProcessor[] processors = { new NotNull(), new NotNull() };

        CsvPreference customPreference = new CsvPreference.Builder('"', ',', "\n").build();
        StringWriter writer = new StringWriter();
        CsvListWriter beanWriter = new CsvListWriter(writer, customPreference);
        beanWriter.write(data, processors);
        beanWriter.close();

        Assert.assertEquals("\"Jo\nhn\",\"Con\nnor\"\n", writer.toString());
    }

    @Test
    public void testEscapedQuoteInQuotedField() throws IOException {
        List<String> data = Arrays.asList("Joh\"n", "Con\"nor");
        CellProcessor[] processors = { new NotNull(), new NotNull() };

        CsvPreference customPreference = new CsvPreference.Builder('"', ',', "").build();
        StringWriter writer = new StringWriter();
        CsvListWriter beanWriter = new CsvListWriter(writer, customPreference);
        beanWriter.write(data, processors);
        beanWriter.close();

        Assert.assertEquals("\"Joh\"\"n\",\"Con\"\"nor\"", writer.toString());
    }

    //TODO: not possible to separate
    @Test
    public void testDifferentEscapeAndQuote() throws IOException {
        List<String> data = Arrays.asList("\"John Connor\"");
        CellProcessor[] processors = { new NotNull() };

        char customQuote = '"';
        CsvPreference customPreference = new CsvPreference.Builder(customQuote, ',', "").build();
        StringWriter writer = new StringWriter();
        CsvListWriter beanWriter = new CsvListWriter(writer, customPreference);
        beanWriter.write(data, processors);
        beanWriter.close();

        Assert.assertEquals("\"\"\"John Connor\"\"\"", writer.toString());
    }

    //TODO: not possible to trim values
    @Test
    public void testDealWithLeadingTrailingWhitespace() throws IOException {
        List<String> data = Arrays.asList("     John", "Connor     ");
        CellProcessor[] processors = { new NotNull(), new NotNull() };

        char customQuote = '"';
        CsvPreference customPreference = new CsvPreference.Builder(customQuote, ',', "").surroundingSpacesNeedQuotes(false).build();
        StringWriter writer = new StringWriter();
        CsvListWriter beanWriter = new CsvListWriter(writer, customPreference);
        beanWriter.write(data, processors);
        beanWriter.close();

        Assert.assertEquals("     John,Connor     ", writer.toString());
    }
}

jamesbassett · 2015-07-09T04:05:30Z

Hey Michał, looks like all of those tests are for writing, whereas any features that Tokenizer should support are related to reading. Some features only apply to reading or writing, others apply to both.

Regarding testDealWithLeadingTrailingWhitespace(), you're correct that we don't automatically trim and the only way to do that is to use a cell processor. I think this is logical. surroundingSpacesNeedQuotes can be set to true to force surrounding quotes for values with leading/trailing spaces, but setting to false (it defaults to false btw) just disables that behaviour (it will still quote if the quote mode says so, or if the value has a special character).

Here's some related issues:

Support backslash escaping (configurable escape character) Support backslash escaping (configurable escape character) #14
Support multi-character delimiter Support multi-character delimiter #10
Configurable line terminator for reading Configurable line terminator for reading #5

I don't think the feature tests should break the build (perhaps we should just @Ignore it and run it on demand), but it would be good to document our position (against the test method) on any features that aren't supported. For example, you guys might disagree, but I really hate the backslash escaping - so many issues are caused by invalid CSV and this feature only makes the situation worse.

ZioberMichal · 2015-07-09T05:58:06Z

Actually I wanted wrote these tests for "read" mode as well but at the beginning I just wanted show first draft and talk about that a little. Anyway, that a good point that some features are available in "read" and "write" mode and some only in one mode. I agree that @Ignore annotation should be used if do not want to break build.

kbilsted · 2015-07-09T06:13:39Z

How we organise the tests are still unclear to me but a test fixture for reading features and one for writing features sound great.

With the tests we need a table in the documentation to quickly give an overview of the features possibly with a column referring to the specific tests by linking to the source code maybe.

ZioberMichal · 2015-07-09T08:49:09Z

It is a good idea to create documentation which explicitly tells which features are available. A little sum up:

I will create one test class for read and one for write mode, TokenizerFeaturesReadTest and TokenizerFeaturesWriteTest accordingly.
Features which are not available will be represented by ignored tests.

When tests will be ready and "well implemented" we can create new section in documentation.

1. Added two new tests which explicitly checks features from http://csveed.org/comparison-matrix.html page (Tokenizer section).

ZioberMichal · 2015-07-10T22:23:13Z

I've pushed tests for Tokenizer read/write features. Let me know what do you think about this implementation.

1. Added two new tests which explicitly checks features from http://csveed.org/comparison-matrix.html page (Tokenizer section).

ZioberMichal · 2015-07-11T12:43:16Z

I've also added tests which explicitly checks features from http://csveed.org/comparison-matrix.html page from “Line Reading” section.

kbilsted · 2015-07-11T12:54:19Z

Great I was about to asking just that
Den 11/07/2015 14.43 skrev "Michał Ziober" notifications@github.com:

I've also added tests which explicitly checks features from
http://csveed.org/comparison-matrix.html page from “Line Reading” section.

—
Reply to this email directly or view it on GitHub
#56 (comment).

ZioberMichal · 2015-07-11T13:03:53Z

We have to write tests also for "Bean Mapping" and "Writing CSV" sections. "Documentation" sections is completed for "super-csv". I am not sure whether is something to do related with "Quality of error feedback", what do you think? "Convenience" section is also not clear for me.

Will you merge this PR as it is now or you want to wait for tests related with other sections?

kbilsted · 2015-07-11T13:57:57Z

Already merged.. but feel free to send more stuff 👍

ZioberMichal · 2015-07-11T23:20:11Z

I am not sure whether it is already merged ... :)

I am a little confused what some ideas mean in this comparison matrix in "Bean mapping" section.

Idea	Status	Question
converts to primitives	set manually	How it should be tested? What does it mean "set manually".
converts to basic Objects	set manually	What "basic Objects" mean? I am not sure...
Converter support	no	We have these cell processors. Do you think that this is it?
Returns typed Bean	yes	I am not sure how to test it. Method read returns generic `T` type. Should I use reflection to check whether this method returns generic type or we can skip it in test suite?

What do you think guys?

jamesbassett · 2015-07-12T00:08:11Z

My interpretation...

Idea	Status	James' interpretation
converts to primitives	set manually	Relates to using `CsvListReader` and converting columns to primitives (int, long, etc). This requires cell processors, i.e. 'set manually'
converts to basic Objects	set manually	Relates to using `CsvBeanReader`/`CsvDozerBeanReader` converting to simple POJOs (no deep or indexed mapping). This requires cell processors for all non-String fields, i.e. 'set manually'
Converter support	no	After looking at the csveed API (see [BeanReaderInstructions.setConverter](http://javadocs.csveed.org/org/csveed/bean/BeanReaderInstructions.html#setConverter%28java.lang.String, org.csveed.bean.conversion.Converter%29)), this is exactly the same as cell processors. But I think csveed can do some of this conversion automatically. We could achieve this by setting up a global mapping of cell processors to use when mapping, e.g. if we're reading a CSV column into an Integer and no cell processor is defined for a column, then use `new Optional(new ParseInt())`.
Returns typed Bean	yes	No idea - it says opencsv doesn't support this but it looks like it returns typed beans to me...

ZioberMichal · 2015-07-12T18:15:01Z

Thanks for it!

I've added new tests. Could you please, @kbilsted, @jamesbassett review them?

1. Added two new tests which explicitly checks features from http://csveed.org/comparison-matrix.html page.

#56 - test suite for Tokenizer read/write operations.

ZioberMichal · 2015-09-02T21:45:41Z

@kbilsted ,
Implemented tests covers below sections:

Tokenizer
Line Reading
Bean Mapping

Next sections mostly covers "concepts" not "implementation":

Writer - each our writer can write to Writer.
Documentation - done.
Quality of error feedback - I think that we should create new issue for each feature but mainly we have implemented this.
Convenience - as above
Open Source Community - done 😏

Do you think we can do anything more in this issue?

kbilsted · 2015-09-03T16:42:55Z

Sure create the new issues and close. Do we need an issue for making a documentation page giving an overview of all features of supercsv linking into the tests?

ZioberMichal · 2015-09-03T21:12:18Z

I think that we can create new documentation page in this issue.

P.S: Until first of October I have a lot of work and probably I will not be able to do it sooner.

ZioberMichal · 2015-11-01T23:10:58Z

I have created new issues which should be implemented:

Also, I have asked to update Comparison matrix in this task 42BV/CSVeed#65.

I checked documentation and it contains already all needed informations. I think that we can create new documentation page when above tasks will be finished.

kbilsted added the documentation label Jul 5, 2015

kbilsted mentioned this issue Jul 5, 2015

Improvements bundle #26

Open

kbilsted assigned ZioberMichal Jul 6, 2015

ZioberMichal added a commit to ZioberMichal/super-csv that referenced this issue Jul 10, 2015

super-csv#56 - test suite for Tokenizer read/write operations.

5c04420

1. Added two new tests which explicitly checks features from http://csveed.org/comparison-matrix.html page (Tokenizer section).

ZioberMichal added a commit to ZioberMichal/super-csv that referenced this issue Jul 11, 2015

super-csv#56 - test suite for Tokenizer read/write operations.

12dc06a

1. Added two new tests which explicitly checks features from http://csveed.org/comparison-matrix.html page (Tokenizer section).

ZioberMichal added a commit to ZioberMichal/super-csv that referenced this issue Jul 11, 2015

super-csv#56 - test suite for Tokenizer read/write operations.

961be90

1. Added two new tests which explicitly checks features from http://csveed.org/comparison-matrix.html page (Tokenizer section).

ZioberMichal added a commit to ZioberMichal/super-csv that referenced this issue Jul 15, 2015

super-csv#56 - test suite for read/write operations.

ea1a500

1. Added two new tests which explicitly checks features from http://csveed.org/comparison-matrix.html page.

kbilsted added a commit that referenced this issue Jul 31, 2015

Merge pull request #59 from ZioberMichal/master

9e8c338

#56 - test suite for Tokenizer read/write operations.

ZioberMichal closed this as completed Nov 1, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a test suite showing all the features of supercsv #56

Create a test suite showing all the features of supercsv #56

kbilsted commented Jul 5, 2015

ZioberMichal commented Jul 5, 2015

kbilsted commented Jul 5, 2015

ZioberMichal commented Jul 5, 2015

kbilsted commented Jul 6, 2015

ZioberMichal commented Jul 9, 2015

jamesbassett commented Jul 9, 2015

ZioberMichal commented Jul 9, 2015

kbilsted commented Jul 9, 2015

ZioberMichal commented Jul 9, 2015

ZioberMichal commented Jul 10, 2015

ZioberMichal commented Jul 11, 2015

kbilsted commented Jul 11, 2015

ZioberMichal commented Jul 11, 2015

kbilsted commented Jul 11, 2015

ZioberMichal commented Jul 11, 2015

jamesbassett commented Jul 12, 2015

ZioberMichal commented Jul 12, 2015

ZioberMichal commented Sep 2, 2015

kbilsted commented Sep 3, 2015

ZioberMichal commented Sep 3, 2015

ZioberMichal commented Nov 1, 2015

Create a test suite showing all the features of supercsv #56

Create a test suite showing all the features of supercsv #56

Comments

kbilsted commented Jul 5, 2015

ZioberMichal commented Jul 5, 2015

kbilsted commented Jul 5, 2015

ZioberMichal commented Jul 5, 2015

kbilsted commented Jul 6, 2015

ZioberMichal commented Jul 9, 2015

jamesbassett commented Jul 9, 2015

ZioberMichal commented Jul 9, 2015

kbilsted commented Jul 9, 2015

ZioberMichal commented Jul 9, 2015

ZioberMichal commented Jul 10, 2015

ZioberMichal commented Jul 11, 2015

kbilsted commented Jul 11, 2015

ZioberMichal commented Jul 11, 2015

kbilsted commented Jul 11, 2015

ZioberMichal commented Jul 11, 2015

jamesbassett commented Jul 12, 2015

ZioberMichal commented Jul 12, 2015

ZioberMichal commented Sep 2, 2015

kbilsted commented Sep 3, 2015

ZioberMichal commented Sep 3, 2015

ZioberMichal commented Nov 1, 2015