Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyze tables options: --common-limit, --no-most, --no-least #546

Merged
merged 4 commits into from May 21, 2023

Conversation

simonw
Copy link
Owner

@simonw simonw commented May 21, 2023

Refs #544

  • Documentation for CLI options
  • Documentation for new Python API parameters: most_common: bool and least_common: bool
  • Tests for CLI
  • Tests for Python API

This change adds three new options and modifies the `analyze_tables` function and `analyze_column` method in the `cli.py` and `db.py` modules respectively. The new options are `common_limit`, `no_most`, and `no_least`.

The `common_limit` option specifies how many common values should be returned by `analyze_column` method, by default it's set to 10. The `no_most` and `no_least` options, when set to True, skip returning the most and least common values, respectively.

The `analyze_tables` function was modified to pass the new options to the `_analyze` method, and the `analyze_column` method in the `db.py` module was modified to use the newly added options. Now, when `analyze_column` is called, it checks if `most_common` or `least_common` options are set to True before running the corresponding query. If the `num_distinct` value is less than or equal to `common_limit`, it doesn't run the least common query. The results from the most and least common queries are sorted before being returned.
@simonw simonw added the enhancement New feature or request label May 21, 2023
@codecov
Copy link

codecov bot commented May 21, 2023

Codecov Report

Patch coverage: 93.75% and no project coverage change.

Comparison is base (b3b100d) 96.30% compared to head (9f23e68) 96.31%.

❗ Current head 9f23e68 differs from pull request most recent head 2eca17d. Consider uploading reports for the commit 2eca17d to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #546   +/-   ##
=======================================
  Coverage   96.30%   96.31%           
=======================================
  Files           6        6           
  Lines        2707     2712    +5     
=======================================
+ Hits         2607     2612    +5     
  Misses        100      100           
Impacted Files Coverage Δ
sqlite_utils/db.py 97.37% <90.90%> (+<0.01%) ⬆️
sqlite_utils/cli.py 95.26% <100.00%> (+0.01%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@simonw simonw merged commit d2a7b15 into main May 21, 2023
63 of 64 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant