New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sorted columns and indices #2
Comments
Hi! Thanks for using it and thanks for the feedback. I definitely think this is worth addressing. My instinct is that making the sorted output the default on the columns makes sense, especially since tools like In terms of sorting indices, I'm curious about what the benefit is to having this tool handle sorting, rather than piping the output into something like |
For the index sorting you're right, having options for things that are already well-covered by xsv doesn't make much sense. On the other hand, what I'd expect as default behavior, as a regular user, would either be keeping the order in which the index values appear in the original, or sorting them. So from that perspective it may be worth it to provide a flag to switch between those two? |
I think adding |
Sounds good! Agreed on the defaults. And I don't think adding the option to sort the indexes as well adds that much extra value, but you could add it for consistency. |
Hi. Thanks again for your suggestion. I wanted to refactor the existing code, but I finally found some time to do that; the current version has support for sorting rows and columns. |
Hi! I love your tool, it makes a great complement to xsv for some of the csv processing I do, but there's one thing that kind of messes up my workflow: the indices and columns are in a seemingly-random order in the output. I assume that this order is based on the hash values of the column and index names?
Ideally the sorting would be a command line option, but if that's too complicated, maybe it's worth it to make the sorted output the default? I do this in the version I have installed locally by replacing the HashSets you use for the columns and indices in
aggregation.rs
with BTreeSets, for which the iterators return the items in (ascending) sorted order. This may be a bit simplistic and might reduce performance a bit, but in my experience for ~1GB csv files it doesn't seem to change much of the performance characteristics.What do you think?
The text was updated successfully, but these errors were encountered: