New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement ranges #123
implement ranges #123
Conversation
I wonder if - should be supported as a range delimiter if there's any plan for negative indices to be supported. I assume the usage for such a feature would be csvcut -c 5:-5 file.csv Which might be awkward if using - as the delimiter csvcut -c 5--5 file.csv But perhaps it's fine to leave - as an option, even if : would be preferable in such situations. |
I used both and privileged ':' with some idea towards negative index support. I like the idea of negative indexing, but it pushes a couple of questions which may lead to tweaks in this implementation:
A python user might guess it returns the first and second columns. As implemented, I happened to make it end-inclusive, so that it returns columns 1,2 and 3. This may be "wrong", but part of me things that it's in the same league as 1-indexing things. Having that be non-intuitive made me wonder how one should similarly handle negative indices, which would be non-intuitive to the non-programmers in the bunch. But negative indices can be cool... I wrote this 'cause I wanted all but the last column in a 30-column file, and it would have been even more elegant to specify -c ':-1' than -c '1:29' what do you think? |
I agree that since columns are 1-based, 1:3 should include all three columns. Good question about how negative indices would work considering that ranges are inclusive, and I'm not sure what the best answer is. It strikes me as most intuitive to make -c :-1 return all but the last column, but, like you suggested, this would be slightly inconsistent internally. Another issue with negative indices is whether they should work outside of ranges. I think they should, meaning csvcut -c -2 returns the second-to-last column (or, I suppose, third-to-last depending on how the question you posed is resolved). But, since - can be used as a range delimiter, we'd have no way of knowing whether that means the user wants to return columns 1 and 2 or to return the second-to-last column. |
i'd be ok with indicating that ':' is the preferred syntax, and open-ended ranges should only be specified using ':' — FWIW, there are specific checks that prohibit bare negative column IDs ("if c < 0: raise ColumnIdentifierError('Column 0 is not valid; columns are 1-based.')" after c was decremented to adjust from 1-based to 0-based If we want to support negative indices, I think that should be just a new ticket, but it's true that there are a lot of quirky edge cases. |
Somehow I completely missed this conversation. Regarding negative indexes: I really want to support them, however, I tried to do this once and found that the implementation was extremely tricky. Eventually I punted on it. For now I'm going to import the work Joe has done and focus on maintaining consistency with regard to 1-based columns. If we ever decide to support "true" Python slice syntax then we can revisit that decision. |
Added support for range syntax in the -c flag, eg
csvcut -c 1:4,7: file.csv
csvcut -c :5 file.csv
open-ended (or open-beginning) ranges are permitted. delimiters can be either : or -
negative indices in slices are not supported (but could probably maybe be worked out)