Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added possibility to change csv delimiter and csv quote character #86

Closed
wants to merge 1 commit into from

Conversation

snuyanzin
Copy link
Collaborator

@snuyanzin snuyanzin commented Jul 31, 2018

This PR provides possibility to change csv delimiter and csv quote character. Also it fixes escaping in case of csv/tsv outputFormat. I guess it fixes #50 and partly #51
Also there is a fix for #87 as proposed in #87

@julianhyde
Copy link
Owner

Thanks, for this; this is useful. I have reviewed 86dd42c; a few comments:

  • In SeparatedValuesOutputFormat, could we keep the behavior closer to "immutable", and have delimiter/quote as final fields. Then, if you change the delimiter or quote, you would have to create a new SeparatedValuesOutputFormat. I know you would have to make map() return a modifiable map.
  • I would prefer if the syntax for !set (and indeed all ! commands) were analogous to bash, and require single and double quotes to be closed. You can enclose a single in doubles, or enclose a double in singles. For example:
    ** !set csvDelimiter '"' # ok - double single double
    ** !set csvDelimiter "'' # ok - single double single
    ** !set csvDelimiter ' # not ok - naked single
    ** !set csvDelmiter " # not ok - naked double
    ** !set csvDelimiter '!' # ok - single bang single

@snuyanzin
Copy link
Collaborator Author

Thank you for your review and comments
I understand your point however I have some clarifying questions relating to quotes:
What about escaping? For example
!set csvDelimiter '\'' # should it be treated as a single quote or as \'?
Are single and double quotes equivalent because in bash they are not [1]

[1] http://www.gnu.org/software/bash/manual/bashref.html#Single-Quotes

@julianhyde
Copy link
Owner

Yes there are subtle differences between single and double quote in bash (backslash behavior, environment variable expansion, globbing). We might get to those at some point. However for now, it's sufficient that we allow both, and require each to be closed. So, 'a' is ok, "a" is ok, ' is not ok.

@snuyanzin
Copy link
Collaborator Author

snuyanzin commented Aug 1, 2018

I did updates. If they are acceptable I will squash
Some words about how I did it and what could be done more (may be in different issues)

  • SeparatedValuesOutputFormat is almost 'immutable' while for any new delimiter separated formats there is a map (WeakHashMap as there is no dependency to anything like Guava).
  • Currently there is implementation which works for all the cases you have provided in this issue (there is a separate test for them now sqlline.SqlLineTest#testDequote). At the same time there are still cases where it works not well for instance:
    !set csvDelimiter 'a
    !set csvDelimiter "a
    !set csvDelimiter a'
    The reason is sqlline.SqlLineArgsTest#testRecordFilenameWithSpace which starts failing as soon as such validation is added. I did some analysis of that and it looks like the issue is in sqlline.AbstractCommandHandler#matches where instead of taking only first element of parts which is required the code does splitting the whole filename with spaces and tries to do dequote for each of its part. I would suggest to introduce some limit for sqlline.SqlLine#split like it is for String#split but I think it is part for a different issue for example sqlline.SqlLine#dequote fails with StringIndexOutOfBoundsException in case of odd number of quotes #87

@snuyanzin
Copy link
Collaborator Author

snuyanzin commented Aug 1, 2018

just realized that here

I know you would have to make map() return a modifiable map.

You talk about the method from SqlLine => I changed to have map returning modifiable map rather than weakHashMap

@snuyanzin
Copy link
Collaborator Author

snuyanzin commented Aug 4, 2018

At the same time there are still cases where it works not well for instance:
!set csvDelimiter 'a
!set csvDelimiter "a
!set csvDelimiter a'

Now I found a way to cope with it and added such tests as well
please let me know in case of questions/objections

@julianhyde
Copy link
Owner

As of c1bda8d this seems to work fairly well.

I found a couple of bugs. First, if you set delimiter 'null', execute a query, then set delimiter ',':

sqlline> !connect jdbc:hsqldb:res:scott SCOTT TIGER
[INFO ] 15:04:42.387 hsqldb.db.HSQLDB6511455664.ENGINE.privlog() - open start - state modified
[INFO ] 15:04:42.442 hsqldb.db.HSQLDB6511455664.ENGINE.privlog() - Checkpoint start
[INFO ] 15:04:42.443 hsqldb.db.HSQLDB6511455664.ENGINE.privlog() - Checkpoint end - txts: 25
0: jdbc:hsqldb:res:scott> !set outputformat csv
0: jdbc:hsqldb:res:scott> !set csvdelimiter null
0: jdbc:hsqldb:res:scott> select * from dept;
'DEPTNO'null'DNAME'null'LOC'
'10'null'ACCOUNTING'null'NEW YORK'
'20'null'RESEARCH'null'DALLAS'
'30'null'SALES'null'CHICAGO'
'40'null'OPERATIONS'null'BOSTON'
4 rows selected (0.011 seconds)
0: jdbc:hsqldb:res:scott> !set csvdelimiter ,
0: jdbc:hsqldb:res:scott> select * from dept;
java.lang.NullPointerException
	at sqlline.SqlLine.print(SqlLine.java:1678)
	at sqlline.Commands.execute(Commands.java:834)
	at sqlline.Commands.sql(Commands.java:733)
	at sqlline.SqlLine.dispatch(SqlLine.java:796)
	at sqlline.SqlLine.begin(SqlLine.java:669)
	at sqlline.SqlLine.start(SqlLine.java:374)
	at sqlline.SqlLine.main(SqlLine.java:266)

Second, you can't set the delimiter to space:

sqlline> !set csvdelimiter ' '
java.lang.IllegalArgumentException: A quote should be closed
	at sqlline.SqlLine.dequote(SqlLine.java:1338)
	at sqlline.SqlLine.split(SqlLine.java:1376)
	at sqlline.SqlLine.split(SqlLine.java:1167)
	at sqlline.SqlLine.split(SqlLine.java:1154)
	at sqlline.SqlLine.split(SqlLine.java:1442)
	at sqlline.Commands.set(Commands.java:514)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:564)
	at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:38)
	at sqlline.SqlLine.dispatch(SqlLine.java:792)
	at sqlline.SqlLine.begin(SqlLine.java:669)
	at sqlline.SqlLine.start(SqlLine.java:374)
	at sqlline.SqlLine.main(SqlLine.java:266)

…f quoted text, fix NPE found while review stage
@snuyanzin
Copy link
Collaborator Author

snuyanzin commented Aug 7, 2018

Good catch. Thank you, I forgot that splitting firstly goes by spaces.
Unfortunately I did not found a way to have it working with tokenizers => I rework split method and added more tests. Now it does not split quoted strings.

About NPE: it looks like there is a more generic issue as null without quotes treated as null.
Here I fix only the case mentioned by you earlier however for all of other NPEs I created a separate issue #88

@julianhyde
Copy link
Owner

Merged as 8e0061f. Thanks for the PR, @snuyanzin!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

With outputformat=csv, bad generated CSV document when single quotes in input data
2 participants