Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiline fields with arbitrary separators #35

Open
mechatroner opened this issue Apr 2, 2022 · 3 comments
Open

Support multiline fields with arbitrary separators #35

mechatroner opened this issue Apr 2, 2022 · 3 comments

Comments

@mechatroner
Copy link
Owner

Currently, multiline fields are only supported with commas and semicolons, see #20 for context.

@122800
Copy link

122800 commented May 9, 2022

Hello @mechatroner, I've started looking into this issue and have two questions.

1 - As you've mentioned, we can enable arbitrary separators by removing this check:

if policy in ['quoted', 'quoted_rfc'] and selection_text not in [';', ',']:

However, from what I could tell, the policy will usually be defaulted to simple by the following lines:

sublime_rainbow_csv/main.py

Lines 459 to 463 in e2fa752

if policy == 'auto':
if selection_text in [';', ',']:
policy = 'quoted'
else:
policy = 'simple'

So unless the user takes explicit action to enable the quoted policy together with their non-standard separator (either by using the enable_quoted sublime-command, or through the configuration), the feature will still not appear to work as expected.

Does this analysis seem correct to you? If yes, can we consider removing this check as well (and therefore always defaulting auto to quoted), to change the behaviour to be more intuitive for users?

2 - I've seen that this same check on the standard separators [';', ','] also appears here (and as a side note, also in several locations in the RBQL dependency, but I have not delved into that):

if policy in ['quoted', 'quoted_rfc'] and delim in [';', ',']:
return (name, True, False)

I'm not so familiar with python and couldn't easily determine the impact of leaving this check here while removing it from other locations. Would you be able to indicate whether this part of the code must be aligned as well in order to more cleanly resolve this issue?

Thanks!

@mechatroner
Copy link
Owner Author

Your analysis is correct! I think that is the only place where changes should be made.

can we consider removing this check as well (and therefore always defaulting auto to quoted), to change the behaviour to be more intuitive for users?

I am not sure if this would be more intuitive, for example, tab-separated dialects are doublequote-agnostic in 100% of cases and I always thought that pipe-separated files also mostly use "simple" dialect (e.g. use case here: mechatroner/vscode_rainbow_csv#1 (comment) ). I admit that I don't have any statistics about pipe-separated file usage in different domains but I would prefer not to change the default behavior without strong supportive evidence. My impression was that choosing | as a separator allows to avoid double quotes because unlike , and ; characters it is much less likely to see it in the actual text, so quoting is not needed, and the advantage over tab is at it won't get lost/replaced with spaces during copy/paste operations.

We can add another setting though - like a list of separators that should use "quoted" dialect by default, or even a boolean setting like "quote all separators by default" as you suggest and handle it the main.py.

@122800
Copy link

122800 commented Jun 11, 2022

Hi @mechatroner, sorry for the late response.

I think that is the only place where changes should be made.

Thanks for confirming this 👍

choosing | as a separator allows to avoid double quotes because […] it is much less likely to see it in the actual text, so quoting is not needed

My use case was actually not about escaping | in the text, but rather enabling the handling of line breaks in the text. So I would have needed quoted fields regardless of what separator was in use.

I would prefer not to change the default behavior without strong supportive evidence

But if nobody else has complained, then you're right, there's no need to modify the behaviour for an edge case such as mine. I don't think there's a real need to add any new configuration options, since the existing commands (after this fix) can be used to make this work. Maybe all that's needed is to make sure that the documentation helps users to achieve this if they need?

With this I'm all set to work on getting you a pull request, then :) Thanks for your feedback so far!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants