20200507 DB Q&A Notes

Jump to bottom

John edited this page May 14, 2020 · 6 revisions

Attendees:

Joel Henderson
John Ritchey
Adam Kendis
Jake Mensch
John Darragh
Cynthia Kiser
Tim Eccleston

Resources:

Meeting Video Recording
PG Performance Test
Postgres SQL Range Types
Intro to materialized views:
PG Pass
Allows for Env File auto look at startup

Purpose(s) of the meeting:

Background: 3 teams were looking for help with their postgres DB. We reached out to CFA and got Tim and Joel (with H4LA) to advise our devs.
311 - Needs help improving DB performance to have much faster response times for user requests prior to beta release in time for meeting with the mayor. We only write to the database once per day. The user only reads from the database. 32 fields. 5.2M rows. Only use 9 of those fields. User Submits filters. When broad enough, can take 10-15 seconds for a typical query. Redis running. Looking for strategies to improve read times.
Food Oasis - Needs some help with postgres. John has a lot of experience with DB Administration
Civic Tech Index - Needs to setup a database for the project.

Notes:

Make DB Smaller / (Slower for quick response user response times)

Normalize our 1 table
Remove unused cols

311-Speed up Solutions

Postgres is a great transactional DB. For Speed, do more with Redis or similarly designed solution. Build things as ready to serve as possible:

Pre-Join everything together

Have redundant data and pull out one doc at a time

Pagination

1st Filter Set needs to hit Postgres
Faster Open Source Alternatives to Postgres: Redis / MongoDB.
Date Range Queries are very slow. Make a clustered index that starts with that date field.
Make dates a day or week number as an index so searching is faster. (strings, vchar etc. are very slow. Needs to be enum. Will be 10x faster)
Have pages for most common queries

Food Oasis Q & A

PG Performance Test

Joel, using a typical MacBook, typical PG 12, and I generated a demo table and data. A few findings are below.

No meaningful speed difference between using categories as text vs. categories as enum. I’m presuming that the categories are generally short words. Adding an index gave 4x speed.
The PG time comparison functions are even slower than I thought. Adding an index to the timestamp column gave a 30x improvement.
Pre-calculating a column “recent_week” and its index, as we discussed, was 10x /worse/ than just using the timestamp field and its index. EXPLAIN shows PG ignores the recent_week index, instead doing a full scan.