Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --fields opt to string split #6770

Closed
wants to merge 9 commits into from
Closed

Conversation

ammgws
Copy link
Contributor

@ammgws ammgws commented Mar 19, 2020

Description

Adds --fields option to string split, similar to the one cut has.

  • Fields can be specified as a comma separated string
  • Indexing starts at 1. Non-positive fields error out the function
  • Each field is printed on a newline, same as current string split behaviour
  • Non-existent fields cause function to error out (whereas cut skips over them)
> string split --fields=1,9,3 ";" "a;b;c"
[ret 1]
>
>echo "a;b;c" | cut -d';' -f1,9,3
a;c

Fixes issue #

...Need to find out the issue where this may have been mentioned...

TODOs:

  • Changes to fish usage are reflected in user documentation/manpages.
  • Tests have been added
  • User-visible changes noted in CHANGELOG.md

@ammgws ammgws force-pushed the splitfields branch 2 times, most recently from 93ce226 to a03fe55 Compare March 20, 2020 15:53
@ammgws ammgws changed the title WIP/RFC: add --fields opt to string split Add --fields opt to string split Mar 20, 2020
@faho
Copy link
Member

faho commented Mar 21, 2020

So my cut manpage says it supports the following:

   Use one, and only one of -b, -c or -f.  Each LIST is made up of one range, or many ranges separated by commas.  Selected input is written in the same order that it is read, and is
   written exactly once.  Each range is one of:

   N      N'th byte, character or field, counted from 1

   N-     from N'th byte, character or field, to end of line

   N-M    from N'th to M'th (included) byte, character or field

   -M     from first to M'th (included) byte, character or field

Looking through our scripts (grep -r '\bcut\b' share/) I see we have a few uses of N-M and N-, so they would be useful.

They can of course be left for later.

@ammgws
Copy link
Contributor Author

ammgws commented Mar 21, 2020

A couple of ways this implementation is different to cut is that it allows arbitrary ordering of the fields:

>string split --fields=3,9,1 "" abc
c
a

>echo "a;b;c;" | cut -d';' -f3,9,1
a;c

and the fields may be printed out more than once:

echo "a;b;c;" | cut -d';' -f3,9,1,3
a;c

>string split --fields=3,9,1,3 "" abc
c
a
c

Seems more versatile to me this way - any thoughts?

Trying to think of whether there any possible gotchas to the end-user doing it this way...

@ammgws
Copy link
Contributor Author

ammgws commented Mar 21, 2020

Code is probably amateurville, but I just added support for the "N-M" feature mentioned above. I'll leave N- and -M for someone else to do later since will probably need to overhaul everything to handle it.

@faho
Copy link
Member

faho commented Mar 21, 2020

it allows arbitrary ordering of the fields:

and the fields may be printed out more than once:

That's probably alright. It might be useful (say you're preparing a csv file for handling elsewhere).

(cut sorts the fields and ignores duplicates, which seems weird)

@ammgws
Copy link
Contributor Author

ammgws commented Mar 21, 2020

One thing I'm still not sure about is what to do if a specified field doesn't exist. Right now it just ignores it like cut but it seems like a recipe for disaster if you don't realise only two fields were output when you were expecting three.

@ammgws
Copy link
Contributor Author

ammgws commented Mar 22, 2020

Perhaps print nothing and return error if field is not found?

string split --fields=3,9,1 "" abc
(return code 1)

@ammgws
Copy link
Contributor Author

ammgws commented Mar 24, 2020

Fixed conflicts with master and made the feature return 1 if any of the given fields is non-existent.

@faho Any thoughts apart from the code itself?

@faho
Copy link
Member

faho commented Mar 26, 2020

So the use case shows a problem with this: You're gonna use it like string split | string join often. That should probably have a shortcut.

Otherwise I think you got it. Returning 1 if nothing happened is what string tends to do, and so far it has worked well. Reordering seems useful, and it's a good feature.

@ammgws
Copy link
Contributor Author

ammgws commented Mar 31, 2020

I guess I could add an option --append-delimiter but it's likely you want to use a different delimiter to the one you split on, so --append-delimiter would end up having to take an arg, and in the end you might as well have used string join.

Also I instead found more cases (for completions anyway) look like this:

pacman -Sl | cut --delim ' ' --fields 2- | tr ' ' \t | sort)

Where it wouldn't be all that different after:
pacman -Sl | string join --fields 2- ' ' | string join '\t' | sort)

Note: I have not yet implemented the "2-" syntax

@faho faho added this to the fish 3.2.0 milestone Apr 4, 2020
@faho
Copy link
Member

faho commented Apr 4, 2020

Merged as 7cb1d3a..30459b0, thanks!

@faho faho closed this Apr 4, 2020
@ammgws ammgws deleted the splitfields branch April 4, 2020 13:53
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants