# U-Net 3D V100 Analysis 

## Notebook Overview

This notebook presents an analysis of a U-Net 3D workload using the DFAnalyzer tool. It demonstrates how to analyze I/O traces collected from a deep learning application running on a V100 system. The workflow includes:

- Setting up the environment and importing necessary libraries.
- Extracting and preparing trace data for analysis.
- Initializing DFAnalyzer with appropriate configuration.
- Running the analysis to generate summarized I/O statistics and views.
- Displaying and interpreting the results, including bandwidth and operation counts over time ranges for different I/O layers.

The notebook is intended to help users understand the I/O behavior of deep learning workloads and provides a template for similar analyses on other datasets.

## Interactive Analysis

### Prepare Environment

In this section, we set up the environment by importing required libraries, configuring warning filters, and updating the Python path to include the DFAnalyzer workspace. 

In [1]:
import os
import sys

# Add DFAnalyzer to the path
workspace_dir = os.path.abspath("../")
sys.path.append(workspace_dir)

### Prepare Trace Data

Then, we extract the trace data archive into the designated directory to prepare it for analysis with DFAnalyzer.

In [4]:
!mkdir -p {workspace_dir}/tests/data/extracted/dftracer-dlio
!tar -xzf {workspace_dir}/tests/data/dftracer-dlio.tar.gz -C {workspace_dir}/tests/data/extracted/dftracer-dlio

### Run Analysis

Finaly, we initialize the DFAnalyzer with the specified configuration and run the trace analysis to generate summarized I/O statistics and views for further exploration.

In [None]:
from dftracer.analyzer import init_with_hydra

percentile = 0.9
time_granularity = 5  # 5 seconds
trace_path = f"{workspace_dir}/tests/data/extracted/dftracer-dlio"
view_types = ["time_range", "proc_name"]

dfa = init_with_hydra(
    hydra_overrides=[
        'analyzer=dftracer',
        'analyzer/preset=dlio',
        'analyzer.checkpoint=False',
        f"analyzer.time_granularity={time_granularity}",
        f"trace_path={trace_path}",
    ]
)

We access the underlying Dask client via our Python API.

In [3]:
dfa.client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 12
Total threads: 96,Total memory: 0 B
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:42305,Workers: 12
Dashboard: http://127.0.0.1:8787/status,Total threads: 96
Started: Just now,Total memory: 0 B

0,1
Comm: tcp://127.0.0.1:36801,Total threads: 8
Dashboard: http://127.0.0.1:39925/status,Memory: 0 B
Nanny: tcp://127.0.0.1:39455,
Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-rum3_fzo,Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-rum3_fzo

0,1
Comm: tcp://127.0.0.1:46495,Total threads: 8
Dashboard: http://127.0.0.1:38755/status,Memory: 0 B
Nanny: tcp://127.0.0.1:43643,
Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-6l9pc7bw,Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-6l9pc7bw

0,1
Comm: tcp://127.0.0.1:45289,Total threads: 8
Dashboard: http://127.0.0.1:43481/status,Memory: 0 B
Nanny: tcp://127.0.0.1:42031,
Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-_3f22l1t,Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-_3f22l1t

0,1
Comm: tcp://127.0.0.1:37383,Total threads: 8
Dashboard: http://127.0.0.1:33331/status,Memory: 0 B
Nanny: tcp://127.0.0.1:37775,
Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-yv3_ogt4,Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-yv3_ogt4

0,1
Comm: tcp://127.0.0.1:42465,Total threads: 8
Dashboard: http://127.0.0.1:42977/status,Memory: 0 B
Nanny: tcp://127.0.0.1:37393,
Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-1cvndqt0,Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-1cvndqt0

0,1
Comm: tcp://127.0.0.1:35011,Total threads: 8
Dashboard: http://127.0.0.1:39561/status,Memory: 0 B
Nanny: tcp://127.0.0.1:41887,
Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-2n5qaxrq,Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-2n5qaxrq

0,1
Comm: tcp://127.0.0.1:38587,Total threads: 8
Dashboard: http://127.0.0.1:40769/status,Memory: 0 B
Nanny: tcp://127.0.0.1:39021,
Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-yufo0ffc,Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-yufo0ffc

0,1
Comm: tcp://127.0.0.1:45591,Total threads: 8
Dashboard: http://127.0.0.1:32833/status,Memory: 0 B
Nanny: tcp://127.0.0.1:45209,
Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-9odrtfq8,Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-9odrtfq8

0,1
Comm: tcp://127.0.0.1:34749,Total threads: 8
Dashboard: http://127.0.0.1:32905/status,Memory: 0 B
Nanny: tcp://127.0.0.1:43799,
Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-4_dk9hud,Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-4_dk9hud

0,1
Comm: tcp://127.0.0.1:35773,Total threads: 8
Dashboard: http://127.0.0.1:44071/status,Memory: 0 B
Nanny: tcp://127.0.0.1:36105,
Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-ccqbeies,Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-ccqbeies

0,1
Comm: tcp://127.0.0.1:38033,Total threads: 8
Dashboard: http://127.0.0.1:34711/status,Memory: 0 B
Nanny: tcp://127.0.0.1:35215,
Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-7dpqsvzd,Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-7dpqsvzd

0,1
Comm: tcp://127.0.0.1:41843,Total threads: 8
Dashboard: http://127.0.0.1:42419/status,Memory: 0 B
Nanny: tcp://127.0.0.1:37699,
Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-moipv8vm,Local directory: /tmp/dfanalyzer-izzet/0/dask-scratch-space/worker-moipv8vm


We access to current preset configuration as follows.

In [4]:
dict(dfa.analyzer.preset.layer_defs)

{'app': 'func_name == "DLIOBenchmark.run"',
 'training': 'func_name == "DLIOBenchmark._train"',
 'compute': 'cat == "ai_framework"',
 'fetch_data': 'func_name.isin(["<module>.iter", "fetch-data.iter", "loop.iter"])',
 'data_loader': 'cat == "data_loader" & ~func_name.isin(["loop.iter", "loop.yield"])',
 'data_loader_fork': 'cat == "posix" & func_name == "fork"',
 'reader': 'cat == "reader"',
 'reader_posix_lustre': 'cat.str.contains("posix|stdio") & cat.str.contains("_reader_lustre")',
 'checkpoint': 'cat == "checkpoint"',
 'checkpoint_posix_lustre': 'cat.str.contains("posix|stdio") & cat.str.contains("_checkpoint_lustre")',
 'checkpoint_posix_ssd': 'cat.str.contains("posix|stdio") & cat.str.contains("_checkpoint_ssd")',
 'other_posix': 'cat.isin(["posix", "stdio"])'}

We run the analysis via the `analyze_trace` function as follows.

In [5]:
result = dfa.analyze_trace(view_types=view_types)

And, using the `output` variable available in our analyzer instance `dfa`, we output the DFAnalyzer summary.

In [13]:
dfa.output.handle_result(result)

### Result Exploration

We access the high-level characteristics and layer-based characteristics and metrics via our Python API as follows:

View aggregated metrics across all layers, grouped by time intervals:

In [14]:
result.get_flat_view('time_range').head(10)

Unnamed: 0_level_0,app_count_max,app_count_mean,app_count_min,app_count_per,app_count_std,app_count_sum,app_ops_max,app_ops_mean,app_ops_min,app_ops_pct,...,u_reader_posix_lustre_other_time_max,u_reader_posix_lustre_read_time_max,u_reader_posix_lustre_seek_time_max,u_reader_posix_lustre_stat_time_max,u_reader_posix_lustre_sync_time_max,u_reader_posix_lustre_time_max,u_reader_posix_lustre_write_time_max,u_reader_preprocess_time_max,u_reader_sample_time_max,u_reader_time_max
time_range,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,1.0,1.0,1.0,1.0,0.0,8.0,0.018101,0.018101,0.018101,1.0,...,,0.271525,,0.0,,0.289335,,0.0,0.345637,2.086666
1,,,,,,,,,,,...,,0.445384,,0.0,,0.463426,,0.0,0.607644,3.992251
2,,,,,,,,,,,...,,0.0,,0.0,,0.0,,0.0,0.0,1.930469
3,,,,,,,,,,,...,,0.0,,0.0,,0.0,,0.0,0.0,1.525373
4,,,,,,,,,,,...,,0.0,,0.0,,0.0,,0.0,0.0,0.0
5,,,,,,,,,,,...,,0.0,,0.0,,0.0,,0.0,0.0,0.308971
6,,,,,,,,,,,...,,0.0,,0.0,,0.0,,0.0,0.0,0.0
7,,,,,,,,,,,...,,0.0,,0.0,,0.0,,0.0,0.0,0.0
8,,,,,,,,,,,...,,0.0,,0.0,,0.0,,0.0,0.0,0.0
9,,,,,,,,,,,...,,0.0,,0.0,,0.0,,0.0,0.0,0.0


List all the layers available for detailed analysis:

In [15]:
result.layers

['app',
 'training',
 'compute',
 'fetch_data',
 'data_loader',
 'data_loader_fork',
 'reader',
 'reader_posix_lustre',
 'checkpoint',
 'checkpoint_posix_lustre',
 'checkpoint_posix_ssd',
 'other_posix']

Show the high-level metrics for the `app` layer:

In [16]:
result.get_hlm('app').head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,time,count,size,size_bin_0_4kib,size_bin_4kib_16kib,size_bin_16kib_64kib,size_bin_64kib_256kib,size_bin_256kib_1mib,size_bin_1mib_4mib,size_bin_4mib_16mib,size_bin_16mib_64mib,size_bin_64mib_256mib,size_bin_256mib_1gib,size_bin_1gib_4gib,size_bin_4gib_plus,file_name
time_range,proc_name,cat,func_name,acc_pat,io_cat,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,app#corona171#1028571#1028571,dlio_benchmark,DLIOBenchmark.run,0,6,55.245851,1,,,,,,,,,,,,,,
0,app#corona171#1028567#1028567,dlio_benchmark,DLIOBenchmark.run,0,6,55.245983,1,,,,,,,,,,,,,,
0,app#corona171#1028568#1028568,dlio_benchmark,DLIOBenchmark.run,0,6,55.245884,1,,,,,,,,,,,,,,
0,app#corona171#1028573#1028573,dlio_benchmark,DLIOBenchmark.run,0,6,55.245845,1,,,,,,,,,,,,,,
0,app#corona171#1028574#1028574,dlio_benchmark,DLIOBenchmark.run,0,6,55.245876,1,,,,,,,,,,,,,,


Display a layered main view of the `reader_posix_lustre` layer:

In [17]:
result.get_main_view('reader_posix_lustre').head()

Unnamed: 0_level_0,Unnamed: 1_level_0,bw,close_count,close_file_name,close_ops,close_time,count,data_bw,data_count,data_file_name,data_intensity,...,write_size_bin_1gib_4gib,write_size_bin_1mib_4mib,write_size_bin_256kib_1mib,write_size_bin_256mib_1gib,write_size_bin_4gib_plus,write_size_bin_4kib_16kib,write_size_bin_4mib_16mib,write_size_bin_64kib_256kib,write_size_bin_64mib_256mib,write_time
proc_name,time_range,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
app#corona171#1028571#1028571,0,376376400.0,4,{/p/lustre3/izzet/dlio-benchmark-test/unet3d_v...,1225.114855,0.003265,56,380482088.081022,36,{/p/lustre3/izzet/dlio-benchmark-test/unet3d_v...,0.0,...,,,,,,,,,,
app#corona171#1028571#1028571,1,398941000.0,7,{/p/lustre3/izzet/dlio-benchmark-test/unet3d_v...,1573.741007,0.004448,103,401318163.866653,72,{/p/lustre3/izzet/dlio-benchmark-test/unet3d_v...,0.0,...,,,,,,,,,,
app#corona171#1028571#1028571,2,403104900.0,6,{/p/lustre3/izzet/dlio-benchmark-test/unet3d_v...,1433.006926,0.004187,78,405925022.015696,54,{/p/lustre3/izzet/dlio-benchmark-test/unet3d_v...,0.0,...,,,,,,,,,,
app#corona171#1028571#1028571,3,800005400.0,9,{/p/lustre3/izzet/dlio-benchmark-test/unet3d_v...,1363.429783,0.006601,117,807332890.471577,81,{/p/lustre3/izzet/dlio-benchmark-test/unet3d_v...,0.0,...,,,,,,,,,,
app#corona171#1028571#1028571,4,1117199000.0,10,{/p/lustre3/izzet/dlio-benchmark-test/unet3d_v...,1370.614035,0.007296,118,1129234851.377007,81,{/p/lustre3/izzet/dlio-benchmark-test/unet3d_v...,0.0,...,,,,,,,,,,


Access a specific view for `reader_posix_lustre`, grouped by time range:

In [18]:
result.get_layer_view('reader_posix_lustre', 'time_range').head()

Unnamed: 0_level_0,bw_max,bw_mean,bw_min,bw_std,bw_sum,close_count_max,close_count_mean,close_count_min,close_count_per,close_count_std,...,file_name_nunique,metadata_file_name_nunique,open_file_name_nunique,other_file_name_nunique,proc_name_nunique,read_file_name_nunique,seek_file_name_nunique,stat_file_name_nunique,sync_file_name_nunique,write_file_name_nunique
time_range,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,392828900.0,370461800.0,355103100.0,12853930.0,2963695000.0,4,4.0,4,0.04,0.0,...,35,35,32,0,8,32,0,33,0,0
1,424096700.0,391412300.0,367942800.0,17188590.0,3131298000.0,8,7.25,7,0.0725,0.46291,...,64,64,64,0,8,64,0,64,0,0
2,445295200.0,418989600.0,403104900.0,15776680.0,3351917000.0,6,5.75,5,0.0575,0.46291,...,54,54,48,0,8,48,0,48,0,0
3,901037400.0,815126400.0,781064100.0,37814140.0,6521011000.0,10,9.25,9,0.0925,0.46291,...,82,82,74,0,8,74,0,74,0,0
4,1200353000.0,1133135000.0,1018028000.0,60350380.0,9065083000.0,10,9.75,9,0.0975,0.46291,...,78,78,70,0,8,70,0,70,0,0


Display the raw trace data, showing individual I/O events:

In [19]:
result._traces.head()

Unnamed: 0,func_name,cat,type,pid,tid,time_start,time_end,time,tinterval,time_range,...,size_bin_16kib_64kib,size_bin_64kib_256kib,size_bin_256kib_1mib,size_bin_1mib_4mib,size_bin_4mib_16mib,size_bin_16mib_64mib,size_bin_64mib_256mib,size_bin_256mib_1gib,size_bin_1gib_4gib,size_bin_4gib_plus
4,start,dftracer,0,1028571,1028571,0,0,0.0,,0,...,0,0,0,0,0,0,0,0,0,0
6,FileStorage.get_uri,storage,0,1028571,1028571,1300840,1300851,1.1e-05,,0,...,0,0,0,0,0,0,0,0,0,0
8,opendir,posix_reader_lustre,0,1028571,1028571,1300907,1304903,0.003996,,0,...,0,0,0,0,0,0,0,0,0,0
9,FileStorage.walk_node,storage,0,1028571,1028571,1300805,1305420,0.004615,,0,...,0,0,0,0,0,0,0,0,0,0
10,FileStorage.get_uri,storage,0,1028571,1028571,1305523,1305531,8e-06,,0,...,0,0,0,0,0,0,0,0,0,0
