|
| 1 | +src/backend/access/heap/README.XID64 |
| 2 | + |
| 3 | +64-bit Transaction ID's (XID) |
| 4 | +============================= |
| 5 | + |
| 6 | +A limited number (N = 2^32) of XID's required to do vacuum freeze to prevent |
| 7 | +wraparound every N/2 transactions. This causes performance degradation due |
| 8 | +to the need to exclusively lock tables while being vacuumed. In each |
| 9 | +wraparound cycle, SLRU buffers are also being cut. |
| 10 | + |
| 11 | +With 64-bit XID's wraparound is effectively postponed to a very distant |
| 12 | +future. Even in highly loaded systems that had 2^32 transactions per day |
| 13 | +it will take huge 2^31 days before the first enforced "vacuum to prevent |
| 14 | +wraparound"). Buffers cutting and routine vacuum are not enforced, and DBA |
| 15 | +can plan them independently at the time with the least system load and least |
| 16 | +critical for database performance. Also, it can be done less frequently |
| 17 | +(several times a year vs every several days) on systems with transaction rates |
| 18 | +similar to those mentioned above. |
| 19 | + |
| 20 | +On-disk tuple and page format |
| 21 | +----------------------------- |
| 22 | + |
| 23 | +On-disk tuple format remains unchanged. 32-bit t_xmin and t_xmax store the |
| 24 | +lower parts of 64-bit XMIN and XMAX values. Each heap page has additional |
| 25 | +64-bit pd_xid_base and pd_multi_base which are common for all tuples on a page. |
| 26 | +They are placed into a pd_special area - 16 bytes in the end of a heap page. |
| 27 | +Actual XMIN/XMAX for a tuple are calculated upon reading a tuple from a page |
| 28 | +as follows: |
| 29 | + |
| 30 | +XMIN = t_xmin + pd_xid_base. (1) |
| 31 | +XMAX = t_xmax + pd_xid_base/pd_multi_base. (2) |
| 32 | + |
| 33 | +"Double XMAX" page format |
| 34 | +--------------------------------- |
| 35 | + |
| 36 | +At first read of a heap page after pg_upgrade from 32-bit XID PostgreSQL |
| 37 | +version pd_special area with a size of 16 bytes should be added to a page. |
| 38 | +Though a page may not have space for this. Then it can be converted to a |
| 39 | +temporary format called "double XMAX". |
| 40 | + |
| 41 | +All tuples after pg-upgrade would necessarily have xmin = FrozenTransactionId. |
| 42 | +So we don't need tuple header t_xmin field and we reuse t_xmin to store higher |
| 43 | +32 bits of its XMAX. |
| 44 | + |
| 45 | +Double XMAX format is only for full pages that don't have 16 bytes for |
| 46 | +pd_special. So it neither has a place for a single tuple. Insert and HOT update |
| 47 | +for double XMAX pages is impossible and not supported. We can only read or |
| 48 | +delete tuples from it. |
| 49 | + |
| 50 | +When we are able to prune page double XMAX it will be converted from it to |
| 51 | +general 64-bit XID page format with all operations on its tuples supported. |
| 52 | + |
| 53 | +In-memory tuple format |
| 54 | +---------------------- |
| 55 | + |
| 56 | +In-memory tuple representation consists of two parts: |
| 57 | +- HeapTupleHeader from disk page (contains all heap tuple contents, not only |
| 58 | +header) |
| 59 | +- HeapTuple with additional in-memory fields |
| 60 | + |
| 61 | +HeapTuple for each tuple in memory stores t_xid_base/t_multi_base - a copies of |
| 62 | +page's pd_xid_base/pd_multi_base. With tuple's 32-bit t_xmin and t_xmax from |
| 63 | +HeapTupleHeader they are used to calculate actual 64-bit XMIN and XMAX: |
| 64 | + |
| 65 | +XMIN = t_xmin + t_xid_base. (3) |
| 66 | +XMAX = t_xmax + t_xid_base/t_multi_base. (4) |
| 67 | + |
| 68 | +The downside of this is that we can not use tuple's XMIN and XMAX right away. |
| 69 | +We often need to re-read t_xmin and t_xmax - which could actually be pointers |
| 70 | +into a page in shared buffers and therefore they could be updated by any other |
| 71 | +backend. |
| 72 | + |
| 73 | +Update/delete with 64-bit XIDs and 32-bit t_xmin/t_xmax |
| 74 | +-------------------------------------------------------------- |
| 75 | + |
| 76 | +When we try to delete/update a tuple, we check that XMAX for a page fits (2). |
| 77 | +I.e. that t_xmax will not be over MaxShortTransactionId relative to |
| 78 | +pd_xid_base/pd_multi_base of a its page. |
| 79 | + |
| 80 | +If the current XID doesn't fit a range |
| 81 | +(pd_xid_base, pd_xid_base + MaxShortTransactionId) (5): |
| 82 | + |
| 83 | +- heap_page_prepare_for_xid() will try to increase pd_xid_base/pd_multi_base on |
| 84 | +a page and update all t_xmin/t_xmax of the other tuples on the page to |
| 85 | +correspond new pd_xid_base/pd_multi_base. |
| 86 | + |
| 87 | +- If it was impossible, it will try to prune and freeze tuples on a page. |
| 88 | + |
| 89 | +- If this is unsuccessful it will throw an error. Normally this is very |
| 90 | +unlikely but if there is a very old living transaction with an age of around |
| 91 | +2^32 this can arise. Basically, this is a behavior similar to one during the |
| 92 | +vacuum to prevent wraparound when XID was 32-bit. Dba should take care and |
| 93 | +avoid very-long-living transactions with an age close to 2^32. So long-living |
| 94 | +transactions often they are most likely defunct. |
| 95 | + |
| 96 | +Insert with 64-bit XIDs and 32-bit t_xmin/t_xmax |
| 97 | +------------------------------------------------ |
| 98 | + |
| 99 | +On insert we check if current XID fits a range (5). Otherwise: |
| 100 | + |
| 101 | +- heap_page_prepare_for_xid() will try to increase pd_xid_base for t_xmin will |
| 102 | +not be over MaxShortTransactionId. |
| 103 | + |
| 104 | +- If it is impossible, then it will try to prune and freeze tuples on a page. |
| 105 | + |
| 106 | +Known issue: if pd_xid_base could not be shifted to accommodate a tuple being |
| 107 | +inserted due to a very long-running transaction, we just throw an error. We |
| 108 | +neither try to insert a tuple into another page nor mark the current page as |
| 109 | +full. So, in this (unlikely) case we will get regular insert errors on the next |
| 110 | +tries to insert to the page 'locked' by this very long-running transaction. |
| 111 | + |
| 112 | +Upgrade from 32-bit XID versions |
| 113 | +-------------------------------- |
| 114 | + |
| 115 | +pg_upgrade doesn't change pages format itself. It is done lazily after. |
| 116 | + |
| 117 | +1. At first heap page read, tuples on a page are repacked to free 16 bytes |
| 118 | +at the end of a page, possibly freeing space from dead tuples. |
| 119 | + |
| 120 | +2A. 16 bytes of pd_special is added if there is a place for it |
| 121 | + |
| 122 | +2B. Page is converted to "Double XMAX" format if there is no place for |
| 123 | +pd_special |
| 124 | + |
| 125 | +3. If a page is in double XMAX format after its first read, and vacuum (or |
| 126 | +micro-vacuum at select query) could prune some tuples and free space for |
| 127 | +pd_special, prune_page will add pd_special and convert page from double XMAX |
| 128 | +to general 64-bit XID page format. |
0 commit comments