Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyze tables options: --common-limit, --no-most, --no-least #546

Merged
merged 4 commits into from May 21, 2023

Commits on May 21, 2023

  1. New options for analyze-tables - refs #544

    This change adds three new options and modifies the `analyze_tables` function and `analyze_column` method in the `cli.py` and `db.py` modules respectively. The new options are `common_limit`, `no_most`, and `no_least`.
    
    The `common_limit` option specifies how many common values should be returned by `analyze_column` method, by default it's set to 10. The `no_most` and `no_least` options, when set to True, skip returning the most and least common values, respectively.
    
    The `analyze_tables` function was modified to pass the new options to the `_analyze` method, and the `analyze_column` method in the `db.py` module was modified to use the newly added options. Now, when `analyze_column` is called, it checks if `most_common` or `least_common` options are set to True before running the corresponding query. If the `num_distinct` value is less than or equal to `common_limit`, it doesn't run the least common query. The results from the most and least common queries are sorted before being returned.
    simonw committed May 21, 2023
    Copy the full SHA
    1c1991b View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    9e9d63b View commit details
    Browse the repository at this point in the history
  3. Copy the full SHA
    9f23e68 View commit details
    Browse the repository at this point in the history
  4. Copy the full SHA
    2eca17d View commit details
    Browse the repository at this point in the history