# kernel_scan Quick Start

This notebook demonstrates how to use the kernel_scan library to profile GEMM operations and analyze performance with roofline models.

In [1]:
import sys
from pathlib import Path

# Add the src directory to sys.path to import kernel_scan
# Note: This is only needed when running from the repository
project_root = Path.cwd().parent.parent.parent
src_path = project_root / "src"
sys.path.append(str(src_path))

from kernel_scan.core.logging import configure_logging, get_logger
from kernel_scan.api.operations.gemm import GemmScan, GemmPlotter
from kernel_scan.core.types import DataType, EngineType

# Configure logging with desired level
configure_logging(level="debug")
log = get_logger("quickstart")

## Configure and Run a GEMM Scan

We'll configure a simple GEMM scan with:
- Data Type: FLOAT16
- M values: 1, 2
- N values: 64, 128
- K = N for all test cases

In [2]:
# Initialize the GemmScan object with default parameters
scan = GemmScan()

# Configure and run the scan using the fluent API
results = (
    scan.with_data_types([DataType.FLOAT16])
    .for_n_values([64, 128])
        #.for_n_values([64, 128])
    .for_m_values([1, 2,])  # [1, 2, 4, 8, 16, 32, 64, 128, 256]
        #.for_m_values([1, 2, 512, 1024, 4096])  # [1, 2, 4, 8, 16, 32, 64, 128, 256]
    .with_k_equals_n()  # This sets K = N for all test cases
    .with_engine_type(EngineType.COMPOSABLE_KERNEL)
    .iterations(10)
    .warmup(5)
    .run()
)

2025-06-25 09:29:29,134 - kernel_scan.gemm - INFO - Created base output directory: results/gemm_scan_20250625_092929
2025-06-25 09:29:29,134 - kernel_scan.gemm - INFO - Created plots directory: results/gemm_scan_20250625_092929/plots
2025-06-25 09:29:29,134 - kernel_scan.gemm - INFO - 
2025-06-25 09:29:29,134 - kernel_scan.gemm - INFO - GEMM TEST CASE MATRIX SUMMARY
2025-06-25 09:29:29,135 - kernel_scan.gemm - INFO - Total test cases: 4
2025-06-25 09:29:29,135 - kernel_scan.gemm - INFO - Data types: FLOAT16
2025-06-25 09:29:29,135 - kernel_scan.gemm - INFO - Cases per data type: 4
2025-06-25 09:29:29,135 - kernel_scan.gemm - INFO - 
------------------------------------------------------------
2025-06-25 09:29:29,135 - kernel_scan.gemm - INFO - TEST CASES FOR FLOAT16
2025-06-25 09:29:29,136 - kernel_scan.gemm - INFO - ------------------------------------------------------------
2025-06-25 09:29:29,136 - kernel_scan.gemm - INFO - Index  M        N        K        FLOPS        Matrix Size

In [3]:
print(results)

{'FLOAT16': [<kernel_scan.core.results.ProfileResultSet object at 0x701d8117ed50>, <kernel_scan.core.results.ProfileResultSet object at 0x701d811ba060>, <kernel_scan.core.results.ProfileResultSet object at 0x701d811be030>, <kernel_scan.core.results.ProfileResultSet object at 0x701d81185f50>]}


## Generate Roofline Plots

Now we'll use the new GemmPlotter class to generate roofline plots for our results.

In [4]:
# Use the GemmPlotter class to generate plots
for data_type, result_sets in results.items():
    for result_set in result_sets:
        figures = GemmPlotter.generate_roofline_plots(result_set, scan.profiler.accelerator_spec)
        
# Display the figures directly in the notebook
#for precision_name, fig in figures.items():
    #fig.show()

AttributeError: 'NoneType' object has no attribute 'name'

## Advanced: Accessing the Roofline Data Directly

We can also access the raw data used to generate the roofline plots.

In [7]:
# Get the first result set for FLOAT16
result_set = results.get(DataType.FLOAT16.name, [])[0] if results else None

if result_set:
    # Calculate roofline data directly
    df = GemmPlotter.calculate_roofline_data(
        result_set=result_set,
        precision=DataType.FLOAT16
    )
    
    # Display the DataFrame
    display(df)
    
    # Show key metrics
    print(f"\nArithmetic Intensity Range: {df['arithmetic_intensity'].min()} to {df['arithmetic_intensity'].max()}")
    print(f"Performance Range (TFLOPS): {df['tflops'].min()} to {df['tflops'].max()}")
else:
    print("No results available to analyze")

2025-06-24 12:27:24,744 - kernel_scan.gemm - INFO - Filtered to 1 best results
2025-06-24 12:27:24,744 - kernel_scan.gemm - INFO - Peak compute: 122.8 TFLOPS, Peak bandwidth: 960.0 GB/s


operation,time_ms,tflops,gb_per_sec,is_best,timestamp,N,M,layout_b,output_datatype,K,weight_datatype,input_datatype,layout_c,layout_a,datatype,operation_type,arithmetic_intensity,group,memory_constraint,attainable_performance,time_scaled,precision_format,peak_compute,peak_bandwidth
str,f64,f64,f64,bool,str,i64,i64,str,str,i64,str,str,str,str,str,str,f64,str,f64,f64,f64,str,f64,f64
"""DeviceGemmDpp<64, 1, 64, 64, 8…",0.006159,0.00133,1.37168,True,"""2025-06-24T10:18:46.965Z""",64,1,"""RowMajor""","""f16""",64,"""f16""","""f16""","""RowMajor""","""RowMajor""","""f16""","""gemm""",0.969697,"""1_64_64""",930.909091,122.8,0.078478,"""FLOAT16""",122.8,960.0



Arithmetic Intensity Range: 0.9696969696969697 to 0.9696969696969697
Performance Range (TFLOPS): 0.00133012 to 0.00133012


## Summary

In this notebook, we've demonstrated:

1. How to configure and run a GEMM scan with different matrix dimensions
2. How to generate roofline plots using the GemmPlotter class
3. How to access and analyze the raw roofline data

This new architecture with operation-specific plotters makes it easy to extend the framework to support other operations beyond GEMM in the future.