-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request, duplicate column by cut #106
Comments
|
Hi @shenwei356 , re 1. I would expect the results as:
As I listed the title two times after re 3. Did you use upstream csv parser library? So it's hard for you to implement. I had a python script do a similar thing as your A related question, I benchmarked your go version cut, as my python version, I found, for a CSV do not have many columns, your version is faster. However, If a csv has many columns, eg 10000, if we only need information from the very beginning a few columns, my python version is even much faster than your GO version. I think that make sense, as my version was optimized to only parse the few columns I really need it, but now the whole line. You may improve your version by this tricks. But would also defer to you by your decision. Just provide a feedback here for your reference. Best regards |
Hi Wallace, I'll implement feature 1 and 2 when I have time. For feature 3, I could write a simple csv parser, but it needs lots of effort to implement the full features of a general standard parser, for example for handling multi-line field. Performance may be affected by matching column names using regular expression, for supporting columns containing wildcard (-F), which can be optimized. And it checks all column names in case user give nonexistent ones. |
Thanks much. That makes sense.
Best regards
Wallace
On Mon, Jul 20, 2020 at 8:47 PM Wei Shen ***@***.***> wrote:
Hi Wallace,
I'll implement feature 1 and 2 when I have time.
For feature 3, I could write a simple csv parser, but it needs lots of
effort to implement the full features of a general standard parser, for
example for handling multi-line field.
Performance may be affected by matching column names using regular
expression, for supporting columns containing wildcard (-F), which can be
optimized. And it checks all column names in case user give nonexistent
ones.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#106 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALGO4TURVTNVPY3P734FLDR4TQTHANCNFSM4OV36GXQ>
.
--
Best regards
Wallace(Minxian) Wang
---------------------------------
Computational Biologist, Khera Lab.
Broad Institute of MIT and Harvard
415 Main St, Cambridge, MA 02142
|
It's supported now:
|
Thank you very much!- Wallace |
Just updated and it's supported. |
Dear @shenwei356,
Thank you very much for your great work.
I am writing for requesting a new feature
duplicate column by
csvtk cut -f
.For example:
-f 1,1
-f ID1,ID1
Will output two columns other than just one column in current behavior.
Select range to the end, without need to specify the end:
-f2-
, will select from the second column to the end.Support copy the comments line to stdin out, instead of just ignore.
Sometimes, the comments lines are important for the next step analysis.
I found this behavior is very helpful for my daily work.
Thank you very much for your help.
Best regards
Wallace
The text was updated successfully, but these errors were encountered: