Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default algorithm in detect columns #12277

Merged
merged 16 commits into from Mar 26, 2024

Conversation

WindSoilder
Copy link
Collaborator

@WindSoilder WindSoilder commented Mar 25, 2024

Description

@fdncred found another histogram based algorithm to detect columns, and rewrite it in rust: https://github.com/fdncred/guess-width

I have tested it manually, and it works good with df, docker ps, ^ps. This pr is going to use the algorithm in detect columns

Fix: #4183

The pitfall of new algorithm:

  1. it may not works well if there isn't too much rows of input
  2. it may not works well if the length of value is less than the header to value, e.g:
c1 c2 c3 c4 c5
a b c d e
g h i j k
g a a q d
a v c q q | detect columns

In this case, users might need to use --old --legacy to make it works well.

User-Facing Changes

User might need to add --old --legacy to scripts if they find detect columns in their scripts broken.

Tests + Formatting

Done

After Submitting

NaN

@hustcer
Copy link
Contributor

hustcer commented Mar 25, 2024

--old may not be a good flag name, can we change it to --legacy or something else ?

@WindSoilder
Copy link
Collaborator Author

Sure! Just renamed it.

@WindSoilder WindSoilder added pr:breaking-change This PR implies a change affecting users and has to be noted in the release notes pr:bugfix This PR fixes some bug pr:commands This PR changes our commands in some way and removed pr:breaking-change This PR implies a change affecting users and has to be noted in the release notes labels Mar 25, 2024
@fdncred
Copy link
Collaborator

fdncred commented Mar 25, 2024

I think if we can maybe just refactor the combine columns functionality so it works with the legacy and new algorithm, and add the examples back, we could land this.

@fdncred
Copy link
Collaborator

fdncred commented Mar 26, 2024

if you've added it and it you're good with it, feel free to land this. i'm too tired to study the changes tonight. LOL. 😆

@WindSoilder
Copy link
Collaborator Author

Ok, it's committed by c07395e

Let's land it and see if it works :)

@WindSoilder WindSoilder merged commit a15462f into nushell:main Mar 26, 2024
16 checks passed
@hustcer hustcer added this to the v0.92.0 milestone Mar 26, 2024
@fdncred
Copy link
Collaborator

fdncred commented Mar 26, 2024

I love that this works so much better. Great job @WindSoilder! ❤️ Thanks for working on this with me!

@hustcer
Copy link
Contributor

hustcer commented Mar 29, 2024

After trying the latest nightly build, I'm wondering if the previous algorithms may be more widely applicable? Is it a better idea to add the new algorithm as a feature with flag --guess-width rather than set it as the default algorithm and make it a breaking change?
BTW: --legacy make me think the flag is kind of deprecated or not recommend to use and will be removed someday.
Any thoughts? @WindSoilder @fdncred

@WindSoilder
Copy link
Collaborator Author

WindSoilder commented Mar 29, 2024

Do you have some examples show that previous algorithms works better?

@hustcer
Copy link
Contributor

hustcer commented Mar 29, 2024

Do you have some examples show that previous algorithms works better?

My examples all falls into the pitfalls of the new algorithm such as:

git remote -v | detect columns -n
git diff head~2 --numstat | detect columns -n

@fdncred
Copy link
Collaborator

fdncred commented Mar 29, 2024

I'm personally fine with having the new algorithm as --guess and making the legacy mode default. I just thought it worked better because I hadn't found a way to break it yet. @hustcer found a way. 😆

I wish there was a better name other than --guess though. Any ideas? --alt? for alternate maybe

Another thought

What if we keep --guess as the default but if it returns only one column, we automatically try the legacy mode? Is that worth trying @WindSoilder @hustcer ?

@WindSoilder
Copy link
Collaborator Author

WindSoilder commented Mar 29, 2024

What if we keep --guess as the default but if it returns only one column, we automatically try the legacy mode? Is that worth trying @WindSoilder @hustcer ?

I think it's worth to try, but because we are close to next release, I would prefer to try it after next version

@hustcer
Copy link
Contributor

hustcer commented Mar 30, 2024

Another thought

What if we keep --guess as the default but if it returns only one column, we automatically try the legacy mode? Is that worth trying @WindSoilder @hustcer ?

git remote -v | detect columns -n returns two columns already, but not the same as git remote -v | detect columns -n --legacy

BTW: --legacy make me think the flag is kind of deprecated or not recommend to use and will be removed someday.

If we want to keep the new algorithm as default, we'd better rename --legacy flag to a more meaningful name (like --detect-by-space or something else), so we can know the algorithm used by the flag to avoid next breaking change.

fdncred pushed a commit that referenced this pull request Mar 30, 2024
# Description
This pr is addressing feedback from
#12277 (comment)

Currently I think it's fine to replace `--legacy` flag with `--guess`
one. Only use `guess_width` algorithm if `--guess` is provided.

# User-Facing Changes
So it won't be a breaking change to previous version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr:bugfix This PR fixes some bug pr:commands This PR changes our commands in some way
Projects
None yet
Development

Successfully merging this pull request may close these issues.

detect columns fails to parse cell contents that include spaces
3 participants