Skip to content

Huge pages

chleroy edited this page Apr 4, 2024 · 36 revisions

Various powerpc platforms use non-standard page sizes, aka. huge pages.

Details TBD 😺

Book3s64

hash 64k and hugepage 16M/16G

graph LR
    A[Full table covering 4PB]
    B[pgd entry covering 16TB]
    C[pud entry covering 16GB]
    D[huge pte entry covering 16G]
    E[pmd entry covering 16M]
    F[huge pte entry covering 16M]
    G[pte entry covering 64K]

A --> B --> C & D
C --> E & F
E --> G
**Direct Map**
No linux kernel page table. Hash page table entries are directly created with 16M page size if available.

**VMAP**
64K page size as show above with Linux page table

**IOMAP**
64K page size as shown above with Linux page table

**VMEMMAP**
No Linux kernel page table. Hash page table entries are directly created with 16M page size if available

**64K Linux page size with 4K hash pte page size**
We support that by having a large pte_page or level 0 page table.Some portion of the level 0 table is used to store the slot information of the 4k hash page table entry currently inserted into the hash page table. (for more details hash_64k.c:__hash_page_4k())

hash 4k and hugepage 16M/16G

graph LR
    A[Full table covering 64TB]
    B[pgd covering 128GB]
    C[hugepd covering 128GB]
    D[hugepte covering 16G]
    E[pud covering 256M]
    F[hugepd covering 256M]
    G[hugepte covering 16M ]
    H[pmd covering  2M]
    I[pte covering 4K]

A --> B & C 
C --> D
B --> E & F
F --> G
E --> H --> I

radix 64k/radix hugepage 2M/1G

graph LR
    A[Full table covering 4PB]
    B[pgd covering 512GB]
    C[pud covering 1GB]
    G[hugepte covering 1G]
    D[pmd covering 2M]
    E[pte covering 64K]
    F[hugepte covering 2M]

A --> B -->C & G
C --> D & F
D --> E
**Direct Map**
Linux page table is created with a page size limited by memory block size (unit of memory hotplug). ie, if memory block size is 256MB,
then direct map will be created in the above format with a page size of 2M.

**VMAP**
We support 2M/64K mapping 

**IOMAP**
We support 2M/64K mapping

**VMEMMAP**
Can be mapped using 2M or 64K page size. Altmap needs to ensure that the vmemmap area falls within the altmap region. 

radix 4k/radix hugepage 2M/1G

graph LR
    A[Full table covering 4PB]
    B[pgd covering 512GB]
    C[pud covering 1GB]
    F[hugepte covering 1G]
    D[pmd covering 2M]
    G[hugepte covering 2M]
    E[pte covering 4K]

A --> B -->C  & F
C --> D & G 
D --> E

Book3s32

hash 4K

graph LR
    A[Full table covering 4G]
    B[pgd covering 4M]
    C[pte covering 4K]
    

A --> B -->C
**No HugeTLBFS/THP/sparse vmemmap and memory hotplug support**

**DirectMAP using Block Address Translation (BAT) (from 128KB -> 256MB)**
Details of BAT can be found in https://github.com/linuxppc/public-docs/blob/main/ISA/PowerPC_Assembly_IBM_Programming_Environment_2.3.pdf  section 7.4

**memmap is also mapped via BAT (only supports FLATMEM)**

**vmap/iomap are mapped like the page table above and use size 4K.

nohash 64

e5500/e6500 4K e5500/e6500 hugepage 4M

DirectMap is bolted TLB entries 
graph LR
    A[Full table covering 64T]
    B[pgd covering 128GB]
    C[pud covering 256M]
    D[hugepd covering 256M]
    E[pmd covering  2M]
    F[pte covering 4K]
    G[hugepte covering 4M]

A --> B --> C & D
C --> E --> F
D --> G

nohash 32

e500

e500 4K e500 hugepage with (large physical address support) PHYS_64BIT

graph LR
    A[Full table covering 4G]
    B[pgd entry covering 2M]
    C[pte covering 4K]
    D[pgd entry covering 2M]
    E[pgd entry covering 2M]
    F[hugepd with one hugepte covering one 4M hugepage]   
    G[pgd entry covering 2M]
    H[pgd entry covering 2M]
    I[pgd entry covering 2M]
    J[pgd entry covering 2M]
    K[pgd entry covering 2M]
    L[pgd entry covering 2M]
    M[pgd entry covering 2M]
    N[pgd entry covering 2M]
    O[hugepd with one hugepte covering one 16M hugepage]   
    P[32 consecutive pgd entries each covering 2M]
    Q[hugepd with one hugepte covering one 64M hugepage]   
    R[128 consecutive pgd entries each covering 2M]
    S[hugepd with one hugepte covering one 256M hugepage]   
    T[512 consecutive pgd entries each covering 2M]
    U[hugepd with one hugepte covering one 1G hugepage]   

A --> B --> C
A --> D & E --> F
A --> G & H & I & J & K & L & M & N --> O
A --> P --> Q
P --> Q
A --> R --> S
R --> S
A --> T --> U
T --> U

8xx

The 8xx supports 4 page sizes: 4k, 16k, 512k and 8M.

**Linear RAM Map**
8M complemented by 512k pages.

**VMAP/IOMAP**
512K page size

**HugetTLBFS**
512K

8xx uses a two-level tree. TLB Misses are handled through HW pagewalk assistance, leading to the following constraints:

  • Level 1 table (PGD) always has 1024 entries each covering 4M, regardless of page size
  • Level 2 table (PTE) always has 1024 entries each covering 4K, regardless of page size

For 16K pages, there must always be 4 identical consecutive entries. They are flagged 16K so that once the TLB is loaded it covers the full page but when a TLB miss is hit, the HW assistance will read the entry matching the 4k quarter of that 16K page that was hit.

For 512K pages, when the level 1 entry is flagged 512K, the HW assistance requires 8 entries spread in the level 2 table. However, this would require the entire page table to be filled with only 512K pages. In order to mix all page sizes in a page table, regular level 1 entries are used, therefore the HW assistance will hit anywhere in the page table so we need 128 (4k PAGESIZE) or 32 (16k PAGESIZE) identical consecutive PTEs. Then the SW flag in the PTE that flags a 512K page is copied by the TLB miss handled into level 1 entry while loading the TLB.

For 8M pages, the level 1 entry is flagged 8M, this not a problem because PGD entries cover 4M so there can't be another type of page in the same coverage. Then the HW assistance will hit at a single address for level-2 entry. So we use a huge page directory (hugepd) that has a single entry. That hugepd is pointed by two consecutive PGD entries.

The mapping of these hugepage sizes are shown below.

8xx 4K

graph LR
    A[Full table covering 4G]
    B[pgd entry covering 4M]
    C[pte covering one 4K page]
    D[4 consecutive pte covering one 16K hugepage]
    E[128 consecutive pte covering one 512K hugepage]
    F[pgd entry covering 4M]
    G[pgd entry covering 4M]
    H[hugepd with one hugepte covering one 8M hugepage]

A --> B --> C & D & E
A --> F & G --> H

8xx 16K

graph LR
    A[Full table covering 4G]
    B[pgd entry covering 4M]
    C[pte covering one 16K page]
    E[32 consecutive pte covering one 512K hugepage]
    F[pgd entry covering 4M]
    G[pgd entry covering 4M]
    H[hugepd with one hugepte covering one 8M hugepage]

A --> B --> C & E
A --> F & G --> H

4xx

**No HugeTLBFS support**

40x ** DirectMap**

graph LR
    A[Full table covering 4G]
    B[4 consecutive 4M pte covering with 16M Largepage]
     
A --> B

or

graph LR
    A[Full table covering 4G]
    B[huge pte entry covering 4M]
     
A --> B

memap is flatmem vmap/iomap

graph LR
    A[Full table covering 4G]
    B[pgd covering 4M]
    C[pte covering 4K]
    

A --> B -->C

44x

**DirectMap** 
No Linux page table 256MB TLB entries bolted.
No hugetlbfs

memap is flatmem vmap/iomap

graph LR
    A[Full table covering 4G]
    B[pgd covering 4M]
    C[pte covering 4K]
    

A --> B -->C

4xx 4K

graph LR
    A[Full table covering 4G]
    B[pgd covering 4M]
    C[pte covering 4K]
    

A --> B -->C

44x 16K

graph LR
    A[Full table covering 4G]
    B[pgd covering 64M]
    C[pte covering 16K]
    

A --> B -->C

44x 64K

graph LR
    A[Full table covering 4G]
    B[pgd covering 1G]
    C[pte covering 64K]
    

A --> B -->C

44x 256K

graph LR
    A[Full table covering 4G]
    B[pgd covering 4G]
    C[pte covering 256K]
    

A --> B -->C

4xx with 64bit PTE