Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make bat autodetect CSV separator #1574

Closed
bestouff opened this issue Mar 5, 2021 · 6 comments · Fixed by #1601
Closed

Make bat autodetect CSV separator #1574

bestouff opened this issue Mar 5, 2021 · 6 comments · Fixed by #1601
Assignees
Labels
feature-request New feature or request

Comments

@bestouff
Copy link

bestouff commented Mar 5, 2021

Hi,

CSV files often have a semicolon ; as separator instead of a simple comma ,. There also exist so-called TSV files where the separator is a tabulation. Would it be possible for bat to autodetect that ?

@bestouff bestouff added the feature-request New feature or request label Mar 5, 2021
@sharkdp
Copy link
Owner

sharkdp commented Mar 7, 2021

Thank you for your request. That would be great, I agree.

I'm not sure if we can really "detect" the separator, but I guess we could write a syntax file that would accept all ,, ; and \t. @keith-hall could certainly tell us more about whether this could work or not.

Another thing that I have been thinking about is to use a concept similar to https://packagecontrol.io/packages/rainbow_csv where we would display different CSV columns with different (or alternating) colors. Unfortunately, https://packagecontrol.io/packages/rainbow_csv does not seem to properly support quoting (such as 1,"column two with text, and a comma",3,….

@keith-hall
Copy link
Collaborator

we could write a syntax file that would accept all ,, ; and \t

Yep, that's certainly possible

Unfortunately, https://packagecontrol.io/packages/rainbow_csv does not seem to properly support quoting (such as 1,"column two with text, and a comma",3,….

I'm surprised at that, the screenshot in the readme of that package seems to suggest it handles quoted values.

@sharkdp
Copy link
Owner

sharkdp commented Mar 7, 2021

I'm surprised at that, the screenshot in the readme of that package seems to suggest it handles quoted values.

My mistake. It does. I only looked at the last lines of our test file previously:

image

but that example is pretty challenging, if not to say... broken.

The "rainbow" doesn't look too nice though. Not sure if we would have to make any changes to bat theming to support that properly.

@keith-hall
Copy link
Collaborator

but that example is pretty challenging, if not to say... broken.

I guess some CSV parsers support newlines in quoted values (i.e. so it's not treated as a row-terminator), and double quotes to escape the quotation marks, so it could be considered not broken, just unusual - one shouldn't really use CSV in such cases IMHO ;)

The "rainbow" doesn't look too nice though. Not sure if we would have to make any changes to bat theming to support that properly.

Can you be more specific about what you don't like about it please? Does it look different in bat versus Sublime Text? I guess it's also worth comparing it to what we have now - our current CSV highlighting highlights numbers, commas and quotes if I understand correctly, and everything else also gets a color instead of the default foreground color. (This latter part makes it very noisy IMO.)

@sharkdp
Copy link
Owner

sharkdp commented Mar 27, 2021

Can you be more specific about what you don't like about it please?

Mostly the colors. But that obviously depends on the theme. I don't like that the default (white) color is used. And that the dim comment color is used ("city"). Also, it would be great if the separator character would be highlighted in a different (consistent) way.

@keith-hall keith-hall self-assigned this Mar 27, 2021
@keith-hall
Copy link
Collaborator

keith-hall commented Mar 27, 2021

I don't like that the default (white) color is used. And that the dim comment color is used ("city"). Also, it would be great if the separator character would be highlighted in a different (consistent) way.

I completely agree :)

I've started working on this, and it colors commas and semi-colons as separators, and treats tabs as separators but doesn't currently color them... Here is how it looks so far:
image

What do we think about quoted strings? Currently, I've got strings as their own separate color, i.e. the same behavior as other syntaxes, and escaped quotes are highlighted the same as character escapes are usually in quoted strings. But do we want to keep the "Advanced CSV" behavior of each field gets a separate color, regardless of whether it is quoted or not?

Monokai only really has about 5 different colors to choose from, when we ignore strings and comment colors, so that's all I have added support for, at least for now. Feedback welcome, after which I can tidy it up and submit a PR :) (Would it make sense to leave the default (white in this case) color for separators or anything to give us an extra color to work with?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants