Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use indexes #134

Closed
lemon24 opened this issue Aug 24, 2019 · 6 comments
Closed

Use indexes #134

lemon24 opened this issue Aug 24, 2019 · 6 comments

Comments

@lemon24
Copy link
Owner

lemon24 commented Aug 24, 2019

The schema is probably stable enough (maybe after #127).

@lemon24 lemon24 added the core label Aug 24, 2019
lemon24 added a commit that referenced this issue Sep 15, 2019
lemon24 added a commit that referenced this issue Sep 15, 2019
lemon24 added a commit that referenced this issue Sep 15, 2019
@lemon24
Copy link
Owner Author

lemon24 commented Sep 15, 2019

Before 612cb29:

number num_entries get_entries_all get_entries_feed
...
     4         512           0.100            0.018
     4        1024           0.229            0.032
     4        2048           0.538            0.055

After:

number num_entries get_entries_all get_entries_feed
...
     4         512           0.101            0.019
     4        1024           0.221            0.032
     4        2048           0.481            0.057

@lemon24
Copy link
Owner Author

lemon24 commented Oct 30, 2020

Some recent timings for my current deployment, as measured by the web app ("page generated in about ... seconds") on a t3a.nano instance (with warm caches):

  • first 64 read entries: ~.2s
  • all read entries (1.9k): ~2.2s
  • all entries (11k): ~20s

@lemon24
Copy link
Owner Author

lemon24 commented Nov 5, 2020

This gist shows a version of 5594ee3 that yields a 2-4x improvement for get_entries(): https://gist.github.com/lemon24/f2b73e59f881c69052b178d71bc25ff8 (took about 3.5 hours of experimenting).

lemon24 added a commit that referenced this issue Nov 6, 2020
lemon24 added a commit that referenced this issue Nov 6, 2020
lemon24 added a commit that referenced this issue Nov 6, 2020
@lemon24
Copy link
Owner Author

lemon24 commented Nov 6, 2020

To be done for the above:

  • optimize get_entries() query (took ~1h)
  • optimize search_entries(sort='recent') query (took ~1h to make it work just as fast, should be acceptable)
    • get rid of the old apply_recent implementation
  • do a final round of measurements (bench.py and deployed) (took .5-1h)
  • changelog

lemon24 added a commit that referenced this issue Nov 6, 2020
@lemon24
Copy link
Owner Author

lemon24 commented Nov 6, 2020

On my laptop, with my real database, get_entries() is 30% faster and get_entries(read=True) is 16% faster.

search_entries(sort='recent') becomes up to 3x slower when there are few results (< .2% of the total number of entries), but remains just as slow when there are a lot of results (>10% of the number of entries).

Click for details.

Benchmark diff (the % numbers show improvement):

$ for f in generated db-get db-search-python db-search-elon; do
    echo $f
    python scripts/bench.py diff before-$f.txt after-$f.txt | grep -E '^(stat| min)'
    echo
done
generated
stat number repeat num_entries get_entries_all get_entries_read search_entries_recent_all search_entries_recent_read show
 min      1      5          32            0.0%            -inf%                    -50.0%                       0.0% 0.0%
 min      1      5          64            0.0%            -inf%                      0.0%                       0.0% 0.0%
 min      1      5         128            0.0%            -inf%                    -16.7%                       0.0% 0.0%
 min      1      5         256            6.7%             0.0%                     -9.1%                    -100.0% 0.0%
 min      1      5         512            0.0%             0.0%                     -9.1%                     -50.0% -2.0%
 min      1      5        1024           -5.3%          -100.0%                     -8.9%                     -50.0% -2.5%
 min      1      5        2048          -12.9%          -200.0%                    -12.3%                     -25.0% -3.1%

db-get
stat number repeat num_entries get_entries_all get_entries_read show
 min      1      5           0           30.4%            16.3% 14.7%

db-search-python
stat number repeat num_entries search_entries_recent_all search_entries_recent_read
 min      1      5           0                     -7.0%                     -10.2%

db-search-elon
stat number repeat num_entries search_entries_recent_all search_entries_recent_read
 min      1      5           0                   -284.6%                    -370.0%

I ran the benchmarks with bench-all ...:

function bench {
    python -u scripts/bench-latest.py time -n1 -r5 "$@"
}
function bench-all {
    bench get_entries_all get_entries_read 'search_entries_recent_*' show \
    | tee "$1"-generated.txt
    bench --db db.sqlite get_entries_all get_entries_read show \
    | tee "$1"-db-get.txt
    bench --db db.sqlite --query python 'search_entries_recent_*' \
    | tee "$1"-db-search-python.txt
    bench --db db.sqlite --query elon 'search_entries_recent_*' \
    | tee "$1"-db-search-elon.txt
}

For my deployment, I got (compare with #134 (comment)):

  • first 64 read entries: ~.17s (.03s less)
  • all read entries (1.9k): ~2.4s (.2s more :( )
  • all entries (11k): ~17s (3s less)

lemon24 added a commit that referenced this issue Nov 6, 2020
@lemon24
Copy link
Owner Author

lemon24 commented Nov 6, 2020

get_entries() is the most used method, and returns most rows; in 6 hours I got a maybe-30% (but more likely ~10%) improvement.

This is probably enough for now, especially since this issue was an "exploratory" one, and is not solving a specific thing being too slow.

Although the improvement is not that big, the changes are likely still worth it (they added an additional 10 statements / 67 lines).

Resolving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant