Use indexes #134

lemon24 · 2019-08-24T21:21:49Z

The schema is probably stable enough (maybe after #127).

lemon24 · 2019-09-15T12:19:05Z

Before 612cb29:

number num_entries get_entries_all get_entries_feed
...
     4         512           0.100            0.018
     4        1024           0.229            0.032
     4        2048           0.538            0.055

After:

number num_entries get_entries_all get_entries_feed
...
     4         512           0.101            0.019
     4        1024           0.221            0.032
     4        2048           0.481            0.057

For #138, #134.

For #134.

lemon24 · 2020-10-30T16:44:47Z

Some recent timings for my current deployment, as measured by the web app ("page generated in about ... seconds") on a t3a.nano instance (with warm caches):

first 64 read entries: ~.2s
all read entries (1.9k): ~2.2s
all entries (11k): ~20s

For #134.

lemon24 · 2020-11-05T15:36:53Z

This gist shows a version of 5594ee3 that yields a 2-4x improvement for get_entries(): https://gist.github.com/lemon24/f2b73e59f881c69052b178d71bc25ff8 (took about 3.5 hours of experimenting).

For #134.

lemon24 · 2020-11-06T09:47:30Z

To be done for the above:

optimize get_entries() query (took ~1h)
optimize search_entries(sort='recent') query (took ~1h to make it work just as fast, should be acceptable)
- get rid of the old apply_recent implementation
do a final round of measurements (bench.py and deployed) (took .5-1h)
changelog

…ersion. For #134.

For #134.

lemon24 · 2020-11-06T16:49:21Z

On my laptop, with my real database, get_entries() is 30% faster and get_entries(read=True) is 16% faster.

search_entries(sort='recent') becomes up to 3x slower when there are few results (< .2% of the total number of entries), but remains just as slow when there are a lot of results (>10% of the number of entries).

Click for details.

Benchmark diff (the % numbers show improvement):

$ for f in generated db-get db-search-python db-search-elon; do
    echo $f
    python scripts/bench.py diff before-$f.txt after-$f.txt | grep -E '^(stat| min)'
    echo
done
generated
stat number repeat num_entries get_entries_all get_entries_read search_entries_recent_all search_entries_recent_read show
 min      1      5          32            0.0%            -inf%                    -50.0%                       0.0% 0.0%
 min      1      5          64            0.0%            -inf%                      0.0%                       0.0% 0.0%
 min      1      5         128            0.0%            -inf%                    -16.7%                       0.0% 0.0%
 min      1      5         256            6.7%             0.0%                     -9.1%                    -100.0% 0.0%
 min      1      5         512            0.0%             0.0%                     -9.1%                     -50.0% -2.0%
 min      1      5        1024           -5.3%          -100.0%                     -8.9%                     -50.0% -2.5%
 min      1      5        2048          -12.9%          -200.0%                    -12.3%                     -25.0% -3.1%

db-get
stat number repeat num_entries get_entries_all get_entries_read show
 min      1      5           0           30.4%            16.3% 14.7%

db-search-python
stat number repeat num_entries search_entries_recent_all search_entries_recent_read
 min      1      5           0                     -7.0%                     -10.2%

db-search-elon
stat number repeat num_entries search_entries_recent_all search_entries_recent_read
 min      1      5           0                   -284.6%                    -370.0%

I ran the benchmarks with bench-all ...:

function bench {
    python -u scripts/bench-latest.py time -n1 -r5 "$@"
}
function bench-all {
    bench get_entries_all get_entries_read 'search_entries_recent_*' show \
    | tee "$1"-generated.txt
    bench --db db.sqlite get_entries_all get_entries_read show \
    | tee "$1"-db-get.txt
    bench --db db.sqlite --query python 'search_entries_recent_*' \
    | tee "$1"-db-search-python.txt
    bench --db db.sqlite --query elon 'search_entries_recent_*' \
    | tee "$1"-db-search-elon.txt
}

For my deployment, I got (compare with #134 (comment)):

first 64 read entries: ~.17s (.03s less)
all read entries (1.9k): ~2.4s (.2s more :( )
all entries (11k): ~17s (3s less)

For #134.

lemon24 · 2020-11-06T20:02:53Z

get_entries() is the most used method, and returns most rows; in 6 hours I got a maybe-30% (but more likely ~10%) improvement.

This is probably enough for now, especially since this issue was an "exploratory" one, and is not solving a specific thing being too slow.

Although the improvement is not that big, the changes are likely still worth it (they added an additional 10 statements / 67 lines).

Resolving.

lemon24 added the core label Aug 24, 2019

lemon24 added a commit that referenced this issue Sep 15, 2019

Add more get_entries benchmarks.

70ca9d3

For #134.

lemon24 added a commit that referenced this issue Sep 15, 2019

Add get_entries_feed benchmark.

2d005be

For #134.

lemon24 added a commit that referenced this issue Sep 15, 2019

Use explicit join in the get_entries() query.

612cb29

For #134.

lemon24 mentioned this issue Sep 15, 2019

Use NOT NULL where appropriate #138

Closed

lemon24 added a commit that referenced this issue Sep 15, 2019

Don't check read and important are null.

183c0f5

For #138, #134.

lemon24 added a commit that referenced this issue Oct 11, 2019

Rewrite the get_entrie() query and add some benchmarks.

6965b0d

For #134.

lemon24 added a commit that referenced this issue Oct 11, 2019

Rewrite the get_entries() query and add some benchmarks.

5594ee3

For #134.

lemon24 mentioned this issue Feb 5, 2020

get_entries() query too complicated #123

Closed

lemon24 added a commit that referenced this issue Nov 5, 2020

Fix scripts/bench.py and add some basic tests for it.

ad318a0

For #134.

lemon24 added a commit that referenced this issue Nov 5, 2020

Fix scripts/bench.py and add some basic tests for it.

b48bce1

For #134.

lemon24 added a commit that referenced this issue Nov 5, 2020

bench.py: allow using an existing database.

4593b0f

For #134.

lemon24 added a commit that referenced this issue Nov 6, 2020

Allow bench.py to change the search query.

85bc2e3

For #134.

lemon24 added a commit that referenced this issue Nov 6, 2020

Use indexes (and a new query) for get_entries.

8f91988

For #134.

lemon24 added a commit that referenced this issue Nov 6, 2020

Add search_entries(sort='recent') benchmark.

da8d57b

For #134.

lemon24 added a commit that referenced this issue Nov 6, 2020

Make search_entries(sort='recent') work with the new apply_recent() v…

20b2a55

…ersion. For #134.

lemon24 added a commit that referenced this issue Nov 6, 2020

Make bench.py diff support zeroes.

887005d

For #134.

lemon24 added a commit that referenced this issue Nov 6, 2020

Changelog.

845cd6b

For #134.

lemon24 closed this as completed Nov 6, 2020

lemon24 added the performance label Dec 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use indexes #134

Use indexes #134

lemon24 commented Aug 24, 2019

lemon24 commented Sep 15, 2019

lemon24 commented Oct 30, 2020 •

edited

Loading

lemon24 commented Nov 5, 2020 •

edited

Loading

lemon24 commented Nov 6, 2020 •

edited

Loading

lemon24 commented Nov 6, 2020 •

edited

Loading

lemon24 commented Nov 6, 2020

Use indexes #134

Use indexes #134

Comments

lemon24 commented Aug 24, 2019

lemon24 commented Sep 15, 2019

lemon24 commented Oct 30, 2020 • edited Loading

lemon24 commented Nov 5, 2020 • edited Loading

lemon24 commented Nov 6, 2020 • edited Loading

lemon24 commented Nov 6, 2020 • edited Loading

lemon24 commented Nov 6, 2020

lemon24 commented Oct 30, 2020 •

edited

Loading

lemon24 commented Nov 5, 2020 •

edited

Loading

lemon24 commented Nov 6, 2020 •

edited

Loading

lemon24 commented Nov 6, 2020 •

edited

Loading