# FEASTA

This notebook reproduces the salient characteristics of the [FEASTA](https://dl.acm.org/doi/10.1145/3620666.3651336) accelerator.

## Imports

Import the necessary modules.

In [None]:
# HiFiber boilerplate

from fibertree_bootstrap import *

fibertree_bootstrap(style="tree", animation='movie')

# Compilation boilerplate

import os
import sys
sys.path.insert(0, "..")

from src import utils

## Initialization

Initialize the input tensors. Tensor shapes and densities can be modified below.

**Warning:** Large tensors will overwhelm the video generation. Either:
1. Use small tensors; as a rule of thumb, fewer than 60 computes (e.g., multiplications) should be required.
2. Do not generate a video; remove the `spacetime` specification from the `mapping` before compiling.

In [None]:
K = 6
M = 7
N = 12

FP = 2
DP = 4

density = [0.8, 0.5]
seed = 0

A_MK = Tensor.fromRandom(rank_ids=["M", "K"], shape=[M, K], seed=seed, density=density, name="A")
B_NK = Tensor.fromRandom(rank_ids=["N", "K"], shape=[N, K], seed=seed + 1, density=density, name="B")

A_KM = A_MK.swizzleRanks(["K", "M"])
B_KN = B_NK.swizzleRanks(["K", "N"])

## Compile and Run

Below is the TeAAL specification for FEASTA. To simulate the accelerator:
1. Compile it to HiFiber by running the cell, inserting a new cell
2. Run the new cell, which will
    - Execute the kernel; multiplying the above defined matrices
    - Generate visualizations of the actions of the kernel

#### Notes

- Small tensors are required for video generation. If you are using large tensors, remove the spacetime specification to generate a kernel that does not produce videos. Outputs can still be checked below.
- This notebook does not represent FEASTA's out-of-order work-scheduling across instructions (called "group parallelism" or "GP"), since the paper does not provide enough information to reconstruct its exact behavior.

### Inner Product Mode

FEASTA supports three modes, the first of which is the inner product mode.

In [None]:
yaml = """
# Inner product mode
einsum:
  declaration:       
    A: [K, M]            
    B: [K, N]               
    Z: [M, N]
  expressions:               
    - Z[m, n] = A[k, m] * B[k, n]
mapping:
  rank-order:
    A: [M, K]             
    B: [N, K]            
    Z: [M, N]
  partitioning:              
    Z:
      K: [uniform_occupancy(A.FP)]
      N: [uniform_occupancy(B.DP)]
  loop-order:    
    Z: [M, N1, N0, K1, K0]         
  spacetime:
    Z:
      space: [K0, N0]      
      time: [M, N1, K1]
"""

utils.compile(yaml)

### Check Results

Check that generated code computes the correct result.

**Note**: Should be used after compiling and running the kernel (above cell).

In [None]:
utils.check_matmul(A_MK, B_NK, Z_MN)

### Row-Wise Mode

FEASTA supports three modes, the second of which is the row-wise mode.

In [None]:
yaml = """
# Row-wise mode
einsum:
  declaration:
    A: [K, M]
    B: [K, N]
    Z: [M, N]
  expressions:
    - Z[m, n] = A[k, m] * B[k, n]
mapping:
  rank-order:
    A: [M, K]     
    B: [K, N]      
    Z: [M, N]
  partitioning:   
    Z:
      K: [uniform_occupancy(A.FP)]
      N: [uniform_occupancy(B.DP)]
  loop-order:
    Z: [M, K1, K0, N1, N0]
  spacetime:
    Z:
      space: [K0, N0]     
      time: [M, K1, N1]
"""

utils.compile(yaml)

### Check Results

Check that generated code computes the correct result.

**Note**: Should be used after compiling and running the kernel (above cell).

In [None]:
utils.check_matmul(A_MK, B_KN, Z_MN)

### Outer Product Mode

FEASTA supports three modes, the third of which is the outer product mode.

In [None]:
yaml = """
# Outer product mode
einsum:
  declaration:
    A: [K, M]
    B: [K, N]
    Z: [M, N]
  expressions:
    - Z[m, n] = A[k, m] * B[k, n]
mapping:
  rank-order:
    A: [K, M]      
    B: [K, N]    
    Z: [M, N]
  partitioning:    
    Z:
      M: [uniform_occupancy(A.FP)]
      N: [uniform_occupancy(B.DP)]
  loop-order:
    Z: [K, M1, M0, N1, N0]
  spacetime:
    Z:
      space: [M0, N0] 
      time: [K, M1, N1]
"""

utils.compile(yaml)

### Check Results

Check that generated code computes the correct result.

**Note**: Should be used after compiling and running the kernel (above cell).

In [None]:
utils.check_matmul(A_KM, B_KN, Z_MN)