New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
from-ssv
logic updated
#845
from-ssv
logic updated
#845
Conversation
New tests are added to test for additional cases that might be trickier to handle with the new logic. Old tests are updated where their expectations are no longer expected to hold true. For instance: previously, lines would be treated separately, allowing any index offset between columns on different rows, as long as they had the same row index as decided by a separator. When this is no longer the case, some things need to be adjusted.
The table parsing/creation logic has changed from treating every line the same to processing each line in context of the column header's placement. Previously, lines on separate rows would go towards the same column as long as they were the same index based on separator alone. Now, each item's index is based on vertical alignment to the column header. This may seem brittle, but it solves the problem of some tables operating with empty cells that would cause remaining values to be paired with the wrong column. Based on kubernetes output (get pods, events), the new method has shown to have much greater success rates for parsing.
I think you're right it's going to often be columns. We have a way now to split on just the whitespace, and this gives you something closer to what you would want: being able to understand the columns more visually. Do you think this is going to be the major way people use it? Is it worth having two modes or just this one? |
Hmm, that's a good question. I don't really have a clear idea about how people use text in this format outside of the kubernetes ecosystem, but I think that it's generally going to be presented in a columnar fashion. If you want to store text in a file format that uses a separator, you're more than likely going to use one of the more traditional formats such as csv or tsv. This makes it easier for humans to scan and for machines to work with. Having a separator be a single character instead of a combination of characters--especially when they're all spaces, which is a character that might appear frequently anyway--makes it clearer where the separation occurs, and easier to parse. I think the main use case for space-separated tables like this is for output from command line interfaces and the like where the source of the data can't guarantee tab width, and where aligning values is more important than having a clear separator between them. This strikes me as being chiefly concerned with readability, and as such, very likely to align data with column headers. Side note: I was playing around with it a bit and noticed that when piping it through |
Previously it would split the last column on the first separator value found between the start of the column and the end of the row. Changing this to using everything from the start of the column to the end of the string makes it behave more similarly to the other columns, making it less surprising.
Update on not registering values if they're empty: indeed, they don't show up in the JSON output, which is good. However, because the TaggedDict uses an For instance, in some of my tests, I saw columns that should have been roughly in the middle get moved to the end because they're only populated sometimes. I think this would be more confusing to users (and to me), than some values happening to return empty strings when converted to json or yaml (most of which could be filtered out relatively easily. I've been looking at the API, but haven't found an easy way to guarantee the order when values might be empty, so I think this might be a worthwhile trade-off in the meantime. |
Yeah that sounds good. I think we still need to work out exactly what happens in tables where the columns are different, both visually and in terms of how they serialize, but for now I think we can land this and keep iterating. |
Yeah, I agree. I'm happy with where it's at right now, but wouldn't mind expanding on it (or for anyone else to) if there are other suggestions. You sound like you have some thoughts, so I'd be very interested to hear them. |
Hey! So a pull request with no prior discussion may be bad form, but I was tinkering around with this problem I had and managed to solve it, and due to how we're seemingly in very different time zones, I just wanted to get it out there without too much delay. Hope you don't mind.
What and why
Anyway: I'd like to change the table creation algorithm for
from-ssv
. Previously, the table would be created based only on an entries index in a row as decided by the separators in the row. This meant that two values, though vertically misaligned, could still belong together:So in the above snippet, assuming a separator of two spaces,
a
belongs tocol1
, andb
tocol2
. This makes a lot of sense to me, but it has one pretty big hole: what about empty values?Given this table:
It seems quite intuitive that
c
would go incol3
and thatcol2
just doesn't have a value in this case, but with the old (current) algorithm,c
would belong tocol2
andcol3
would have an empty value.With the new algorithm, the first table would give
col1
the value ofa b
andcol2
and empty string. The second table would paircol1
witha
,col3
withc
andcol2
with an empty string again.Thus, given that the only real world examples I have seen of space-separated values being used are output from
kubectl
(Kubernetes) oroc
(OpenShift), and that these output values sometimes do have holes in them, I think this approach, basing an entry's column on its horizontal alignment, is more correct and intuitive.Discussion on this is very welcome, and please do not feel any pressure to merge this if you disagree with the sentiment. I should have opened an issue for clarification but got a bit to absorbed by trying to solve the problem, so let this act as both a potential solution and a general discussion for the issue.
Cheers!