Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_entries() query too complicated #123

Closed
lemon24 opened this issue Jul 7, 2019 · 5 comments
Closed

get_entries() query too complicated #123

lemon24 opened this issue Jul 7, 2019 · 5 comments
Labels

Comments

@lemon24
Copy link
Owner

lemon24 commented Jul 7, 2019

# TODO: This needs some sort of query builder so badly.

@lemon24
Copy link
Owner Author

lemon24 commented Jul 7, 2019

Prototype in this gist (including using another sort key for #122).

The end result looks nice, but using it would add ~150 lines of code (that need to be tested), and it's less useful for simpler queries.

Also, it looks nice now, when I just wrote it; 6 months from now it may be hard to understand.

@lemon24
Copy link
Owner Author

lemon24 commented Jul 14, 2019

Alternative:

Don't "build" the query; always shove everything into it and disable various bits through boolean parameters, and let the optimizer sort it out.

Cons:

  • The optimizer may not be able to optimize everything away, especially if the query is optimized before the parameters are passed in (so it doesn't get a change to remove stuff).
  • Some things we may not be able to turn off at all.
  • Do we still need to provide values for parameters used in the disabled bits? Probably yes.
  • Passing more parameters may be more expensive. Would it be more expensive than the string building, though?

@lemon24 lemon24 added the core label Jul 15, 2019
@lemon24
Copy link
Owner Author

lemon24 commented Feb 5, 2020

Summarizing the possibilities a bit:

Whatever solution we use, it needs to support:

For now, we'll do nothing; we can re-assess after we implement full text search, use indexes, and maybe implement random sort.

@lemon24
Copy link
Owner Author

lemon24 commented May 5, 2020

Version 2 of the query builder, with support for WITH, JOIN, and UNION, and enforcing the correct keyword order.

~110 lines base, ~140 lines with extensions (no type annotations).

There are full examples of the get_entries() and search_entries() queries.

lemon24 added a commit that referenced this issue May 6, 2020
lemon24 added a commit that referenced this issue May 7, 2020
lemon24 added a commit that referenced this issue May 7, 2020
lemon24 added a commit that referenced this issue May 7, 2020
lemon24 added a commit that referenced this issue May 7, 2020
lemon24 added a commit that referenced this issue May 7, 2020
lemon24 added a commit that referenced this issue May 7, 2020
lemon24 added a commit that referenced this issue May 7, 2020
lemon24 added a commit that referenced this issue May 7, 2020
lemon24 added a commit that referenced this issue May 8, 2020
lemon24 added a commit that referenced this issue May 8, 2020
@lemon24
Copy link
Owner Author

lemon24 commented May 8, 2020

Counts for _search.py + _storage.py and _sql_utils.py (the query builder, for after):

  • statements: 437 before, 409 + 75 = 484 after
  • lines: 1411 before, 1304 + 136 = 1440 after (and an additional 182 for _sql_utils.py tests)

It's not much worse. There's less duplication between get_entries() and search_entries(), and it should now be easier to paginate most of the get_...() queries.


Also, it should now be much easier to tweak the queries. fb38d91 is a good example of this (the search_entries_read is ~5x faster because 1/5 of the entries are read).

Before:

$ python -u bench.py time 'search_entries_*' -n1 -r100 \
> | grep -e stat -e min -e p50 | grep -v -e ' 32' -e ' 64'
stat number repeat num_entries search_entries_all search_entries_read
 min      1    100         128              0.008               0.003
 p50      1    100         128              0.008               0.003
 min      1    100         256              0.014               0.005
 p50      1    100         256              0.015               0.005
 min      1    100         512              0.030               0.011
 p50      1    100         512              0.030               0.011
 min      1    100        1024              0.062               0.024
 p50      1    100        1024              0.062               0.024
 min      1    100        2048              0.168               0.046
 p50      1    100        2048              0.168               0.046

After:

$ python -u bench.py time 'search_entries_*' -n1 -r100 \
> | grep -e stat -e min -e p50 | grep -v -e ' 32' -e ' 64'
stat number repeat num_entries search_entries_all search_entries_read
 min      1    100         128              0.009               0.001
 p50      1    100         128              0.009               0.001
 min      1    100         256              0.016               0.001
 p50      1    100         256              0.016               0.001
 min      1    100         512              0.031               0.002
 p50      1    100         512              0.032               0.002
 min      1    100        1024              0.064               0.004
 p50      1    100        1024              0.064               0.004
 min      1    100        2048              0.171               0.007
 p50      1    100        2048              0.172               0.007

lemon24 added a commit that referenced this issue May 8, 2020
@lemon24 lemon24 closed this as completed May 8, 2020
lemon24 added a commit that referenced this issue May 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant