Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect columns fails to parse cell contents that include spaces #4183

Closed
waldyrious opened this issue Dec 5, 2021 · 4 comments · Fixed by #12277
Closed

detect columns fails to parse cell contents that include spaces #4183

waldyrious opened this issue Dec 5, 2021 · 4 comments · Fixed by #12277
Labels
🐛 bug Something isn't working help wanted Extra attention is needed Stale used for marking issues and prs as stale
Milestone

Comments

@waldyrious
Copy link
Contributor

waldyrious commented Dec 5, 2021

Describe the bug

Using the example output from the docker ps documentation:

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
673394ef1d4c        busybox             "top"               47 seconds ago      Up 45 seconds                           nostalgic_shockley
d85756f57265        busybox             "top"               52 seconds ago      Up 51 seconds                           high_albattani

Here's how detect columns parses it:

> cat ~/docker-ps.txt | detect columns
───┬──────────────┬──────────────┬─────────┬─────────┬─────────┬─────────┬───────┬────────────────────
 # │  CONTAINER   │      ID      │  IMAGE  │ COMMAND │ CREATED │ STATUS  │ PORTS │       NAMES        
───┼──────────────┼──────────────┼─────────┼─────────┼─────────┼─────────┼───────┼────────────────────
 0 │ 673394ef1d4c │ 673394ef1d4c │ busybox │ "top"   │ seconds │ seconds │       │ nostalgic_shockley 
 1 │ d85756f57265 │ d85756f57265 │ busybox │ "top"   │ seconds │ seconds │       │ high_albattani     
───┴──────────────┴──────────────┴─────────┴─────────┴─────────┴─────────┴───────┴────────────────────

There are two errors in this output, both related to cell contents that include spaces:

  • The CONTAINER ID column is split into two columns, with identical content;
  • Cell contents that include spaces (in the CREATED and STATUS columns) is trimmed down to a single word.

How to reproduce

  1. Create a ~/docker-ps.txt file with the contents indicated above
  2. Run ~/docker-ps.txt | detect columns

Expected behavior

The correct output should be:

> cat ~/docker-ps.txt | detect columns
───┬──────────────┬─────────┬─────────┬────────────────┬───────────────┬───────┬────────────────────
 # │ CONTAINER ID │  IMAGE  │ COMMAND │    CREATED     │    STATUS     │ PORTS │       NAMES        
───┼──────────────┼─────────┼─────────┼────────────────┼───────────────┼───────┼────────────────────
 0 │ 673394ef1d4c │ busybox │ "top"   │ 47 seconds ago │ Up 45 seconds │       │ nostalgic_shockley 
 1 │ d85756f57265 │ busybox │ "top"   │ 52 seconds ago │ Up 51 seconds │       │ high_albattani     
───┴──────────────┴─────────┴─────────┴────────────────┴───────────────┴───────┴────────────────────

Screenshots

No response

Configuration

key value
version 0.40.0
build_os linux-x86_64
rust_version rustc 1.56.1
cargo_version cargo 1.56.0
pkg_version 0.40.0
build_time 1980-01-01 00:00:00 +00:00
build_rust_channel release
features clipboard-cli, ctrlc, dataframe, default, rustyline, term, trash, uuid, which, zip
installed_plugins binaryview, chart bar, chart line, fetch, from bson, from sqlite, inc, match, post, ps, query json, s3, selector, start, sys, textview, to bson, to sqlite, tree, xpath

Additional context

If it helps, from ssv is able to correctly identify cell contents that include spaces:

> cat ~/docker-ps.txt | from ssv
───┬──────────────┬─────────┬─────────┬────────────────┬───────────────┬────────────────────
 # │ CONTAINER ID │  IMAGE  │ COMMAND │    CREATED     │    STATUS     │       PORTS        
───┼──────────────┼─────────┼─────────┼────────────────┼───────────────┼────────────────────
 0 │ 673394ef1d4c │ busybox │ "top"   │ 47 seconds ago │ Up 45 seconds │ nostalgic_shockley 
 1 │ d85756f57265 │ busybox │ "top"   │ 52 seconds ago │ Up 51 seconds │ high_albattani     
───┴──────────────┴─────────┴─────────┴────────────────┴───────────────┴────────────────────

(There's a separate bug in this output, in that the empty PORTS column isn't properly detected, but I've reported that separately as #4182.)

@fdncred
Copy link
Collaborator

fdncred commented Mar 12, 2022

detect columns is a relatively new command, but from ssv seems to work fine for this use case | from ssv --aligned-columns

@github-actions github-actions bot added the Stale used for marking issues and prs as stale label Jan 17, 2023
@sophiajt
Copy link
Member

sophiajt commented Aug 9, 2023

detect columns still has this issue on latest main.

@fdncred I wonder if detect columns should work more closely to what | from ssv --aligned-columns does?

@fdncred
Copy link
Collaborator

fdncred commented Aug 9, 2023

I played around a bit more with this and you can overcome spaces in columns by replacing it like this

❯ open test.txt | str replace "CONTAINER ID" "CONTAINER_ID" | detect columns
╭─#─┬─CONTAINER_ID─┬──IMAGE──┬COMMAND┬─CREATED─┬─STATUS──┬PORTS┬───────NAMES────────╮
│ 0 │ 673394ef1d4c │ busybox │ "top" │ seconds │ seconds │     │ nostalgic_shockley │
│ 1 │ d85756f57265 │ busybox │ "top" │ seconds │ seconds │     │ high_albattani     │
╰───┴──────────────┴─────────┴───────┴─────────┴─────────┴─────┴────────────────────╯

but it still doesn't parse the created and status columns correctly, which I just think may be a bug in detect columns.

from ssv works nicely with this type of data though

❯ open test.txt | from ssv --aligned-columns
╭─#─┬─CONTAINER ID─┬──IMAGE──┬COMMAND┬────CREATED─────┬────STATUS─────┬PORTS┬───────NAMES────────╮
│ 0 │ 673394ef1d4c │ busybox │ "top" │ 47 seconds ago │ Up 45 seconds │     │ nostalgic_shockley │
│ 1 │ d85756f57265 │ busybox │ "top" │ 52 seconds ago │ Up 51 seconds │     │ high_albattani     │
╰───┴──────────────┴─────────┴───────┴────────────────┴───────────────┴─────┴────────────────────╯

One of the things I like about having both of these commands is that if one fails we have the other one to try. They work differently and I think that's good. I'm not sure if we should have 2 commands like this or not. I suspect it would be difficult to merge them.

I've also explored column -t and the tabulate external commands which both have different ways of handling this, but also could help with trying to solve these types of problems. Neither does very well, but I'm not providing any parameters to either.

open test.txt | ^column -t 
CONTAINER     ID       IMAGE  COMMAND  CREATED  STATUS  PORTS  NAMES
673394ef1d4c  busybox  "top"  47       seconds  ago     Up     45     seconds  nostalgic_shockley
d85756f57265  busybox  "top"  52       seconds  ago     Up     51     seconds  high_albattani
open test.txt | tabulate 
[src/main.rs:113] &args = Args {
    truncate: None,
    ratio: 1.0,
    lines: 1000,
    include_cols: None,
    exclude_cols: None,
    delim: " \t",
    output_delim: "  ",
    strict_delim: false,
    online: false,
    print_info: false,
}
CONTAINER     ID       IMAGE  COMMAND  CREATED  STATUS  PORTS  NAMES
673394ef1d4c  busybox  "top"  47       seconds  ago     Up     45     seconds  nostalgic_shockley
d85756f57265  busybox  "top"  52       seconds  ago     Up     51     seconds  high_albattani

WindSoilder added a commit that referenced this issue Mar 26, 2024
# Description
@fdncred found another histogram based algorithm to detect columns, and
rewrite it in rust: https://github.com/fdncred/guess-width

I have tested it manually, and it works good with `df`, `docker ps`,
`^ps`. This pr is going to use the algorithm in `detect columns`

Fix: #4183

The pitfall of new algorithm:
1. it may not works well if there isn't too much rows of input
2. it may not works well if the length of value is less than the header
to value, e.g:
```
c1 c2 c3 c4 c5
a b c d e
g h i j k
g a a q d
a v c q q | detect columns
```
In this case, users might need to use ~~`--old`~~ `--legacy` to make it
works well.

# User-Facing Changes
User might need to add ~~`--old`~~ `--legacy` to scripts if they find
`detect columns` in their scripts broken.

# Tests + Formatting
Done

# After Submitting
NaN
@hustcer hustcer added this to the v0.92.0 milestone Mar 26, 2024
@fdncred
Copy link
Collaborator

fdncred commented Mar 26, 2024

After the latest changes to the detect columns algorithm.

❯ 'CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
❯❯❯ 673394ef1d4c        busybox             "top"               47 seconds ago      Up 45 seconds                           nostalgic_shockley
❯❯❯ d85756f57265        busybox             "top"               52 seconds ago      Up 51 seconds                           high_albattani' | detect columns
╭─#─┬─CONTAINER ID─┬──IMAGE──┬COMMAND┬────CREATED─────┬────STATUS─────┬PORTS┬───────NAMES────────╮
│ 0 │ 673394ef1d4c │ busybox │ "top" │ 47 seconds ago │ Up 45 seconds │     │ nostalgic_shockley │
│ 1 │ d85756f57265 │ busybox │ "top" │ 52 seconds ago │ Up 51 seconds │     │ high_albattani     │
╰───┴──────────────┴─────────┴───────┴────────────────┴───────────────┴─────┴────────────────────╯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working help wanted Extra attention is needed Stale used for marking issues and prs as stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants