Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request, duplicate column by cut #106

Closed
wavefancy opened this issue Jul 9, 2020 · 7 comments
Closed

Feature request, duplicate column by cut #106

wavefancy opened this issue Jul 9, 2020 · 7 comments

Comments

@wavefancy
Copy link

wavefancy commented Jul 9, 2020

Dear @shenwei356,

Thank you very much for your great work.
I am writing for requesting a new feature

  1. duplicate column by csvtk cut -f.
    For example:
    -f 1,1
    -f ID1,ID1
    Will output two columns other than just one column in current behavior.

  2. Select range to the end, without need to specify the end:
    -f2-, will select from the second column to the end.

  3. Support copy the comments line to stdin out, instead of just ignore.
    Sometimes, the comments lines are important for the next step analysis.

I found this behavior is very helpful for my daily work.

Thank you very much for your help.

Best regards
Wallace

@shenwei356
Copy link
Owner

  1. everything is alright.

     $ echo -ne "ID1\n1\n2\n" |  csvtk -t cut -f ID1,ID1
     ID1
     1
     2
    
  2. Reasonable.

  3. It's very hard to implement, cause the CSV reader ignoring comment lines. Maybe you can find a workaround, e.g., grep and save headers and cat back in the end.

@wavefancy
Copy link
Author

wavefancy commented Jul 20, 2020

Hi @shenwei356 ,

re 1. I would expect the results as:

echo -ne "ID1\n1\n2\n" |  csvtk -t cut -f ID1,ID1
 ID1 ID1
 1 1
 2 2

As I listed the title two times after -f, which I meant duplicate columns, otherwise just need specify ID1 one time. This feature is very helpful I found in my daily analysis?

re 3. Did you use upstream csv parser library? So it's hard for you to implement. I had a python script do a similar thing as your csvtk cut, which is very easy to implement this feature, but I would prefer yours as I thought that Go might be more efficient than Python. But I would defer to you whether you want to implement this function or not.

A related question, I benchmarked your go version cut, as my python version, I found, for a CSV do not have many columns, your version is faster. However, If a csv has many columns, eg 10000, if we only need information from the very beginning a few columns, my python version is even much faster than your GO version. I think that make sense, as my version was optimized to only parse the few columns I really need it, but now the whole line. You may improve your version by this tricks. But would also defer to you by your decision. Just provide a feedback here for your reference.

Best regards
Wallace

@shenwei356
Copy link
Owner

Hi Wallace,

I'll implement feature 1 and 2 when I have time.

For feature 3, I could write a simple csv parser, but it needs lots of effort to implement the full features of a general standard parser, for example for handling multi-line field.

Performance may be affected by matching column names using regular expression, for supporting columns containing wildcard (-F), which can be optimized. And it checks all column names in case user give nonexistent ones.

@wavefancy
Copy link
Author

wavefancy commented Jul 21, 2020 via email

@shenwei356
Copy link
Owner

It's supported now:

$ echo -ne "ID1\n1\n2\n" |  csvtk -t cut -f ID1,ID1
ID1     ID1
1       1
2       2

@wavefancy
Copy link
Author

Thank you very much!- Wallace

@shenwei356
Copy link
Owner

  1. Select range to the end, without need to specify the end:
    -f2-, will select from the second column to the end.

Just updated and it's supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants