## Running the Fast & Fusiest Mapper (FFM)
This notebook shows how to run the Fast & Fusiest Mapper (FFM) on a full workload and
architecture.

We first initialize the specification. The spec is initialized with a `ff.Spec` object
using YAML files (though you may also initialize them with Python objects).

When loading specifications, Jinja2 templating can be used, and the `jinja_parse_data`
parameter can be used to pass in data to the templating engine.

In [None]:
# < DOC_INCLUDE_MARKER > make_spec

import fastfusion as ff

# Set the number of parallel threads that the mapper can use. If you are running out of
# memory, you may decrease this number. By default the number of threads is set to the
# number of cores on your machine.
import os
ff.set_n_parallel_jobs(os.cpu_count(), print_message=True)s

# Initialize the specification and show the workload.
BATCH_SIZE = 1
N_TOKENS = 16384
FUSE = True

spec = ff.Spec.from_yaml(
    "../../examples/arches/tpu_v4i_like.arch.yaml",
    "../../examples/workloads/gpt3_6.7B.workload.yaml",
    jinja_parse_data=dict(
        BATCH_SIZE=BATCH_SIZE,
        N_TOKENS=N_TOKENS,
    )
)

# Fusion happens when tensors bypass the outermost Memory object, so, to disable fusion,
# force all tensors to be in the outermost memory.
if not FUSE:
    for node in spec.arch.nodes:
        if isinstance(node, ff.arch.Memory):
            print(f'Keeping all tensors in {node.name}')
            node.constraints.tensors.keep = "All"
            break

Now we'll visualize the workload. The workload is a cascade of Einsums, with boxes
showing Einsums (computation steps), ovals showing tensors, and arrows showing
dependencies.

In [None]:
spec.workload

Next, we'll set optimization metrics for the mapper. Note that having more metrics will
make the mapper slower because it is more difficult to prune suboptimal mappings,
because it must prove that something is Pareto-dominated in all metrics.

In [3]:
# Set optimization metrics
spec.mapper.ffm.metrics = ff.mapper.FFM.Metrics.ENERGY
# spec.mapper.ffm.metrics = ff.mapper.FFM.Metrics.LATENCY
# spec.mapper.ffm.metrics = ff.mapper.FFM.Metrics.LATENCY | ff.mapper.FFM.Metrics.ENERGY

<!-- < DOC_INCLUDE_MARKER > FFM_parts -->

The mapper consists of two parts:

- The Turbo-Charged Pmapper: This part makes all Pareto-optimal pmappings for all
  Einsums.
- Fast and Fusiest: This part takes the Pareto-optimal pmappings and joins them into
  full mappings.

Mapping begins with the Turbo-Charged Pmapper with the `make_pmappings` function.

In [None]:
# < DOC_INCLUDE_MARKER > make_pmappings

# Commenting this will be slower, but may generate better mappings. Limits the number of
# fused loops that can exist in a single pmapping.
spec.mapper.ffm.max_fused_loops = 1

pmappings = ff.mapper.FFM.make_pmappings(
    spec,
    # Having can_combine_multiple_runs=False is faster, so it should generally be set to
    # True. If it is set to False, then you may run make_pmappings multiple times with
    # compatible specs and combine them:
    #   pmappings = make_pmappings(*args_a) | make_pmappings(*args_b)
    can_combine_multiple_runs=False
)

In [None]:
# < DOC_INCLUDE_MARKER > pmappings_stats

# Output some stats about the generated pmappings.
print(f"Total number of pmappings: {pmappings.n_total_pmappings()}")
print(f"Number of valid pmappings: {pmappings.n_valid_pmappings()}")
print(f"Number of Pareto-optimal pmappings: {pmappings.n_pareto_optimal_pmappings()}")
print(f"Number of evaluated pmappings: {pmappings.n_evaluated_pmappings()}")

In [None]:
# < DOC_INCLUDE_MARKER > join_pmappings

# Join the pmappings to create a full mapping.
mappings = ff.mapper.FFM.join_pmappings(spec,pmappings)

In [None]:
# The joined pmappings object contains a DataFrame of all Pareto-optimal pmappings for
# the given optimization metrics. Since we're only interested in one metric, this should
# have exaclty one row, but we'll grab index 0 to be sure.
mapping = mappings[0]

# All units are SI units-- seconds, joules, meters, etc.
print(f"Totals:")

# The access method accesses all columns that include Total as a susb
for k, v in mapping.access("Total").to_dict().items():
    print(f"\t{k}: {v}")

In [None]:
# Show the mapping.
mapping

In [None]:
accessor = "latency" if spec.mapper.ffm.metrics == ff.mapper.FFM.Metrics.LATENCY else "energy"
per_compute = mapping.access("Total").per_compute().to_dict()[accessor]
print(f'Per-compute {accessor}: {per_compute}')

print(f'Contributors to {accessor}:')
for k, v in mapping.access(accessor).to_dict().items():
    print(f"\t{k}: {v}")

# Print the other stats
for k, v in mapping.to_dict().items():
    print(f"{k}: {v}")