Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New options for analyze-tables --common-limit --no-most and --no-least #544

Closed
simonw opened this issue May 21, 2023 · 2 comments
Closed
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Owner

simonw commented May 21, 2023

The "least common" section is frequently uninteresting, especially for huge tables with a large number of repeated-once values.

sqlite-utils analyze-tables content.db repos --common-limit 20 --no-least
@simonw simonw added the enhancement New feature or request label May 21, 2023
simonw added a commit that referenced this issue May 21, 2023
This change adds three new options and modifies the `analyze_tables` function and `analyze_column` method in the `cli.py` and `db.py` modules respectively. The new options are `common_limit`, `no_most`, and `no_least`.

The `common_limit` option specifies how many common values should be returned by `analyze_column` method, by default it's set to 10. The `no_most` and `no_least` options, when set to True, skip returning the most and least common values, respectively.

The `analyze_tables` function was modified to pass the new options to the `_analyze` method, and the `analyze_column` method in the `db.py` module was modified to use the newly added options. Now, when `analyze_column` is called, it checks if `most_common` or `least_common` options are set to True before running the corresponding query. If the `num_distinct` value is less than or equal to `common_limit`, it doesn't run the least common query. The results from the most and least common queries are sorted before being returned.
@simonw
Copy link
Owner Author

simonw commented May 21, 2023

I generated the commit message in 1c1991b using git diff | llm --system 'describe this change'.

@simonw
Copy link
Owner Author

simonw commented May 21, 2023

New docs:

New help output:

 % sqlite-utils analyze-tables --help
Usage: sqlite-utils analyze-tables [OPTIONS] PATH [TABLES]...

  Analyze the columns in one or more tables

  Example:

      sqlite-utils analyze-tables data.db trees

Options:
  -c, --column TEXT       Specific columns to analyze
  --save                  Save results to _analyze_tables table
  --common-limit INTEGER  How many common values
  --no-most               Skip most common values
  --no-least              Skip least common values
  --load-extension TEXT   Path to SQLite extension, with optional :entrypoint
  -h, --help              Show this message and exit.

simonw added a commit that referenced this issue May 21, 2023
simonw added a commit that referenced this issue May 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant