Avoid loading information about table-types at start#320
Avoid loading information about table-types at start#320sumerman wants to merge 1 commit intoelixir-ecto:masterfrom
Conversation
|
Hello, @sumerman! This is your first Pull Request that will be reviewed by Ebert, an automatic Code Review service. It will leave comments on this diff with potential issues and style violations found in the code as you push new commits. You can also see all the issues found on this Pull Request on its review page. Please check our documentation for more information. |
|
Hi @sumerman. That's a lot of tables! AFAIK we had resolved the bootstrapping issue for people with 10s of thousands, rather than 100s of thousands. I can see this being a real problem for you! I am concerned that this approach may delay bootstrapping to more time sensitive parts in other peoples projects. Perhaps we could offer a switch to make this more manageable for people so people could opt into lazy bootstrapping that might limit to the built-in oids on initial bootstrap and then fetch oids as required when a prepare fails to find those required (I think we call it reload). What do you think? |
|
It's not a common workload indeed 😄 Current approach only postpones loading of table-types and their array counterparts, those are unlikely to ever hit a wire. (The only case I can think of is One way to make it strictly better is to always skip table-types unless an OID that caused a reload is of table type. It would retain current behavior, for the most part, yet touch table types, the main source of bloat, only when absolutely necessary. |
|
@sumerman I think that sounds like a good. I think this also highlights that we could end up with an expensive bootstrap during a query: postgrex/lib/postgrex/protocol.ex Line 890 in 2c026cb bootstrap) and bootstrap just the oids at query time (reload) i think that would be a very neat improvement as we minimise handshaking and add the smallest possible overhead for a query that misses types. Would you like to handle it?
|
|
@fishcakez I glanced thru |
In one of our analytical databases, we have approx. 200k tables, this way we have more than 400k types in the result when you factor in array types automatically created by postgres. Therefore this bootstrap statement runs long enough to cause all other connection processes that are waiting in
fetchto exit which, in turn, causes a restart of the pool's supervisor, thus type info is never obtained. While relaxing pool'smax_restartshelps to mask the problem the bootstrap query still runs for tens of seconds and obstructs any useful work.Proposed change avoids loading potentially enormous chunk of data at start, but will load everything if a subsequent bootstrap is triggered (e.g by
prepare)In the case of our app it cuts initial bootstrap runtime from tens of seconds down to ~300ms and subsequent bootstrap queries are avoided.