Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement ranges #123

Merged
merged 3 commits into from Oct 14, 2011
Merged

implement ranges #123

merged 3 commits into from Oct 14, 2011

Conversation

JoeGermuska
Copy link
Contributor

Added support for range syntax in the -c flag, eg

csvcut -c 1:4,7: file.csv
csvcut -c :5 file.csv

open-ended (or open-beginning) ranges are permitted. delimiters can be either : or -

negative indices in slices are not supported (but could probably maybe be worked out)

@bycoffe
Copy link

bycoffe commented Sep 28, 2011

I wonder if - should be supported as a range delimiter if there's any plan for negative indices to be supported.

I assume the usage for such a feature would be

csvcut -c 5:-5 file.csv

Which might be awkward if using - as the delimiter

csvcut -c 5--5 file.csv

But perhaps it's fine to leave - as an option, even if : would be preferable in such situations.

@JoeGermuska
Copy link
Contributor Author

I used both and privileged ':' with some idea towards negative index support.

I like the idea of negative indexing, but it pushes a couple of questions which may lead to tweaks in this implementation:

  1. what does this do?
    csvcut -c 1:3 file.csv

A python user might guess it returns the first and second columns. As implemented, I happened to make it end-inclusive, so that it returns columns 1,2 and 3. This may be "wrong", but part of me things that it's in the same league as 1-indexing things.

Having that be non-intuitive made me wonder how one should similarly handle negative indices, which would be non-intuitive to the non-programmers in the bunch. But negative indices can be cool... I wrote this 'cause I wanted all but the last column in a 30-column file, and it would have been even more elegant to specify -c ':-1' than -c '1:29'

what do you think?

@bycoffe
Copy link

bycoffe commented Sep 28, 2011

I agree that since columns are 1-based, 1:3 should include all three columns.

Good question about how negative indices would work considering that ranges are inclusive, and I'm not sure what the best answer is. It strikes me as most intuitive to make -c :-1 return all but the last column, but, like you suggested, this would be slightly inconsistent internally.

Another issue with negative indices is whether they should work outside of ranges. I think they should, meaning

csvcut -c -2

returns the second-to-last column (or, I suppose, third-to-last depending on how the question you posed is resolved).

But, since - can be used as a range delimiter, we'd have no way of knowing whether that means the user wants to return columns 1 and 2 or to return the second-to-last column.

@JoeGermuska
Copy link
Contributor Author

i'd be ok with indicating that ':' is the preferred syntax, and open-ended ranges should only be specified using ':' —

FWIW, there are specific checks that prohibit bare negative column IDs ("if c < 0: raise ColumnIdentifierError('Column 0 is not valid; columns are 1-based.')" after c was decremented to adjust from 1-based to 0-based

If we want to support negative indices, I think that should be just a new ticket, but it's true that there are a lot of quirky edge cases.

@onyxfish
Copy link
Collaborator

Somehow I completely missed this conversation. Regarding negative indexes: I really want to support them, however, I tried to do this once and found that the implementation was extremely tricky. Eventually I punted on it. For now I'm going to import the work Joe has done and focus on maintaining consistency with regard to 1-based columns. If we ever decide to support "true" Python slice syntax then we can revisit that decision.

onyxfish pushed a commit that referenced this pull request Oct 14, 2011
@onyxfish onyxfish merged commit cc0515d into wireservice:master Oct 14, 2011
lcorbasson pushed a commit to lcorbasson/csvkit that referenced this pull request Sep 7, 2020
lcorbasson pushed a commit to lcorbasson/csvkit that referenced this pull request Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants