-
-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New options for analyze-tables --common-limit --no-most and --no-least #544
Labels
enhancement
New feature or request
Comments
simonw
added a commit
that referenced
this issue
May 21, 2023
This change adds three new options and modifies the `analyze_tables` function and `analyze_column` method in the `cli.py` and `db.py` modules respectively. The new options are `common_limit`, `no_most`, and `no_least`. The `common_limit` option specifies how many common values should be returned by `analyze_column` method, by default it's set to 10. The `no_most` and `no_least` options, when set to True, skip returning the most and least common values, respectively. The `analyze_tables` function was modified to pass the new options to the `_analyze` method, and the `analyze_column` method in the `db.py` module was modified to use the newly added options. Now, when `analyze_column` is called, it checks if `most_common` or `least_common` options are set to True before running the corresponding query. If the `num_distinct` value is less than or equal to `common_limit`, it doesn't run the least common query. The results from the most and least common queries are sorted before being returned.
I generated the commit message in 1c1991b using |
simonw
added a commit
that referenced
this issue
May 21, 2023
4 tasks
simonw
added a commit
that referenced
this issue
May 21, 2023
simonw
added a commit
that referenced
this issue
May 21, 2023
New docs:
New help output:
|
simonw
added a commit
that referenced
this issue
May 21, 2023
simonw
added a commit
that referenced
this issue
May 21, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The "least common" section is frequently uninteresting, especially for huge tables with a large number of repeated-once values.
The text was updated successfully, but these errors were encountered: