Feature request, duplicate column by cut #106

wavefancy · 2020-07-09T18:41:22Z

Dear @shenwei356,

Thank you very much for your great work.
I am writing for requesting a new feature

duplicate column by csvtk cut -f.
For example:
-f 1,1
-f ID1,ID1
Will output two columns other than just one column in current behavior.
Select range to the end, without need to specify the end:
-f2-, will select from the second column to the end.
Support copy the comments line to stdin out, instead of just ignore.
Sometimes, the comments lines are important for the next step analysis.

I found this behavior is very helpful for my daily work.

Thank you very much for your help.

Best regards
Wallace

The text was updated successfully, but these errors were encountered:

shenwei356 · 2020-07-18T16:18:49Z

everything is alright.

 $ echo -ne "ID1\n1\n2\n" |  csvtk -t cut -f ID1,ID1
 ID1
 1
 2

Reasonable.
It's very hard to implement, cause the CSV reader ignoring comment lines. Maybe you can find a workaround, e.g., grep and save headers and cat back in the end.

wavefancy · 2020-07-20T15:36:21Z

Hi @shenwei356 ,

re 1. I would expect the results as:

echo -ne "ID1\n1\n2\n" |  csvtk -t cut -f ID1,ID1
 ID1 ID1
 1 1
 2 2

As I listed the title two times after -f, which I meant duplicate columns, otherwise just need specify ID1 one time. This feature is very helpful I found in my daily analysis?

re 3. Did you use upstream csv parser library? So it's hard for you to implement. I had a python script do a similar thing as your csvtk cut, which is very easy to implement this feature, but I would prefer yours as I thought that Go might be more efficient than Python. But I would defer to you whether you want to implement this function or not.

A related question, I benchmarked your go version cut, as my python version, I found, for a CSV do not have many columns, your version is faster. However, If a csv has many columns, eg 10000, if we only need information from the very beginning a few columns, my python version is even much faster than your GO version. I think that make sense, as my version was optimized to only parse the few columns I really need it, but now the whole line. You may improve your version by this tricks. But would also defer to you by your decision. Just provide a feedback here for your reference.

Best regards
Wallace

shenwei356 · 2020-07-21T00:47:34Z

Hi Wallace,

I'll implement feature 1 and 2 when I have time.

For feature 3, I could write a simple csv parser, but it needs lots of effort to implement the full features of a general standard parser, for example for handling multi-line field.

Performance may be affected by matching column names using regular expression, for supporting columns containing wildcard (-F), which can be optimized. And it checks all column names in case user give nonexistent ones.

wavefancy · 2020-07-21T06:03:26Z

Thanks much. That makes sense. Best regards Wallace

On Mon, Jul 20, 2020 at 8:47 PM Wei Shen ***@***.***> wrote: Hi Wallace, I'll implement feature 1 and 2 when I have time. For feature 3, I could write a simple csv parser, but it needs lots of effort to implement the full features of a general standard parser, for example for handling multi-line field. Performance may be affected by matching column names using regular expression, for supporting columns containing wildcard (-F), which can be optimized. And it checks all column names in case user give nonexistent ones. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#106 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALGO4TURVTNVPY3P734FLDR4TQTHANCNFSM4OV36GXQ> .

-- Best regards Wallace(Minxian) Wang --------------------------------- Computational Biologist, Khera Lab. Broad Institute of MIT and Harvard 415 Main St, Cambridge, MA 02142

shenwei356 · 2020-10-30T10:16:10Z

It's supported now:

$ echo -ne "ID1\n1\n2\n" |  csvtk -t cut -f ID1,ID1
ID1     ID1
1       1
2       2

wavefancy · 2020-10-30T15:25:04Z

Thank you very much!- Wallace

shenwei356 · 2020-11-02T13:13:47Z

Select range to the end, without need to specify the end:
-f2-, will select from the second column to the end.

Just updated and it's supported.

shenwei356 added the new feature label Jul 18, 2020

shenwei356 closed this as completed in 1fdd424 Oct 30, 2020

shenwei356 added a commit that referenced this issue Nov 2, 2020

csvtk cut: supporting select range to the end. e.g., -f 2- . fix #106

677b38d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request, duplicate column by cut #106

Feature request, duplicate column by cut #106

wavefancy commented Jul 9, 2020 •

edited

Loading

shenwei356 commented Jul 18, 2020

wavefancy commented Jul 20, 2020 •

edited

Loading

shenwei356 commented Jul 21, 2020

wavefancy commented Jul 21, 2020 via email

shenwei356 commented Oct 30, 2020

wavefancy commented Oct 30, 2020

shenwei356 commented Nov 2, 2020

Feature request, duplicate column by cut #106

Feature request, duplicate column by cut #106

Comments

wavefancy commented Jul 9, 2020 • edited Loading

shenwei356 commented Jul 18, 2020

wavefancy commented Jul 20, 2020 • edited Loading

shenwei356 commented Jul 21, 2020

wavefancy commented Jul 21, 2020 via email

shenwei356 commented Oct 30, 2020

wavefancy commented Oct 30, 2020

shenwei356 commented Nov 2, 2020

wavefancy commented Jul 9, 2020 •

edited

Loading

wavefancy commented Jul 20, 2020 •

edited

Loading