Skip to content

Running Simulation with Chakra

Joongun Park edited this page May 31, 2024 · 19 revisions

Chakra is a framework created to support multiple simulators with the aim of advancing performance benchmarking and co-design through standardized execution traces. It facilitates compatibility and performance evaluation across various machine learning models, software, and hardware, enhancing the co-design ecosystem for AI systems.

Currently, ASTRA-Sim supports Chakra trace as its input.

Using Synthetically Generated Trace (et_generator)

et_generator can be used to define and generate any execution traces, functioning as a test case generator. You can generate execution traces with the following commands (Python 2.7 or higher is required):

$ cd ${ASTRA_SIM}/extern/graph_frontend/chakra
$ pip3 install .
$ python3 -m chakra.et_generator.et_generator --num_npus 64 --num_dims 1

To run one of the example traces (one_comm_coll_node_allreduce), execute the following command.

# For the analytical network backend
$ cd -
$ ./build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Unaware \
  --workload-configuration=./extern/graph_frontend/chakra/one_comm_coll_node_allreduce \
  --system-configuration=./inputs/system/Switch.json \
  --network-configuration=./inputs/network/analytical/Switch.yml \
  --remote-memory-configuration=./inputs/remote_memory/analytical/no_memory_expansion.json

# For the ns3 network backend. Python2 required.
# After editing the configuration files in the following script
$ ./build/astra_ns3/build.sh -r

# Or, alternatively:
$ cd ./extern/network_backend/ns3/simulation
$ ./waf --run "scratch/AstraSimNetwork \
  --workload-configuration=../../../../extern/graph_frontend/chakra/one_comm_coll_node_allreduce \
  --system-configuration=../../../../inputs/system/Switch.json \
  --network-configuration=mix/config.txt \
  --remote-memory-configuration=../../../../inputs/remote_memory/analytical/no_memory_expansion.json \
  --logical-topology-configuration=../../../../inputs/network/ns3/sample_64nodes_1D.json \
  --comm-group-configuration=\"empty\""
$ cd -

Using Chakra Execution Trace and Kineto Traces Generated By PyTorch

Note that ASTRA-sim's naming rule for execution traces follows the format {path prefix}.{npu_id}.et. By adding a few lines to any PyTorch workload, you can generate the PyTorch Execution Trace (ET) and Kineto traces for each GPU (and its corresponding CPU thread). Details on how to tweak the PyTorch files to get PyTorch-ET and Kineto traces can be found here. With these traces for each GPU, we merge the PyTorch-ET and Kineto trace into a single enhanced ET. From there, it’s all about feeding this enhanced ET into a converter that converts the enhanced ET into the Chakra format. In this example, we use DLRM trace already collected and zipped.

# First, navigate to the param submodule where you'll find sample Pytorch-ET and Kineto input files and that handy merger script
$ cd extern/graph_frontend/param/train/compute/python/
$ pip3 install -r requirements.txt
$ pip3 install .

# Second, unzip the input files
$ cd ./test/data
$ tar -xzf dlrm_kineto.tar.gz
$ tar -xzf dlrm_pytorch_et.tar.gz
$ cd ../../

# Now, we'll fire up the script that leverages the already whipped up Pytorch-ET+Kineto traces (from a single training loop of DLRM on 8 GPUs) and fuses them into that enhanced ET we keep talking about
$ python3 ./tools/trace_link.py --et-file test/data/dlrm_pytorch_et/dlrm_eg_0.json --kineto-file test/data/dlrm_kineto/worker0* --exact-match --annotation 'enumerate(DataLoader)#_MultiProcessingDataLoaderIter.__next__'
$ python3 ./tools/trace_link.py --et-file test/data/dlrm_pytorch_et/dlrm_eg_1.json --kineto-file test/data/dlrm_kineto/worker1* --exact-match --annotation 'enumerate(DataLoader)#_MultiProcessingDataLoaderIter.__next__'
$ python3 ./tools/trace_link.py --et-file test/data/dlrm_pytorch_et/dlrm_eg_2.json --kineto-file test/data/dlrm_kineto/worker2* --exact-match --annotation 'enumerate(DataLoader)#_MultiProcessingDataLoaderIter.__next__'
$ python3 ./tools/trace_link.py --et-file test/data/dlrm_pytorch_et/dlrm_eg_3.json --kineto-file test/data/dlrm_kineto/worker3* --exact-match --annotation 'enumerate(DataLoader)#_MultiProcessingDataLoaderIter.__next__'
$ python3 ./tools/trace_link.py --et-file test/data/dlrm_pytorch_et/dlrm_eg_4.json --kineto-file test/data/dlrm_kineto/worker4* --exact-match --annotation 'enumerate(DataLoader)#_MultiProcessingDataLoaderIter.__next__'
$ python3 ./tools/trace_link.py --et-file test/data/dlrm_pytorch_et/dlrm_eg_5.json --kineto-file test/data/dlrm_kineto/worker5* --exact-match --annotation 'enumerate(DataLoader)#_MultiProcessingDataLoaderIter.__next__'
$ python3 ./tools/trace_link.py --et-file test/data/dlrm_pytorch_et/dlrm_eg_6.json --kineto-file test/data/dlrm_kineto/worker6* --exact-match --annotation 'enumerate(DataLoader)#_MultiProcessingDataLoaderIter.__next__'
$ python3 ./tools/trace_link.py --et-file test/data/dlrm_pytorch_et/dlrm_eg_7.json --kineto-file test/data/dlrm_kineto/worker7* --exact-match --annotation 'enumerate(DataLoader)#_MultiProcessingDataLoaderIter.__next__'

$ mkdir ./test/data/et_plus
$ mv ./test/data/dlrm_pytorch_et/*_plus.json ./test/data/et_plus/

# Next, shuffle the enhanced ETs over to the chakra submodule for that transformation into Chakra goodness
$ mv test/data/et_plus ../../../../../graph_frontend/chakra/
$ cd ../../../../../graph_frontend/chakra/
$ pip3 install .

# Almost there! The script below will make those enhanced PyTorch-ETs dance in Chakra format, getting them all prepped for a session with astrasim
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename et_plus/dlrm_eg_0_plus.json --output_filename et_plus/dlrm_chakra.0.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename et_plus/dlrm_eg_1_plus.json --output_filename et_plus/dlrm_chakra.1.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename et_plus/dlrm_eg_2_plus.json --output_filename et_plus/dlrm_chakra.2.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename et_plus/dlrm_eg_3_plus.json --output_filename et_plus/dlrm_chakra.3.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename et_plus/dlrm_eg_4_plus.json --output_filename et_plus/dlrm_chakra.4.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename et_plus/dlrm_eg_5_plus.json --output_filename et_plus/dlrm_chakra.5.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename et_plus/dlrm_eg_6_plus.json --output_filename et_plus/dlrm_chakra.6.et --num_dims 1
$ python3 -m chakra.et_converter.et_converter --input_type PyTorch --input_filename et_plus/dlrm_eg_7_plus.json --output_filename et_plus/dlrm_chakra.7.et --num_dims 1

# And, as all good things come to an end, head back home to the root directory of astrasim
$ cd ../../../

# Update network topology
$ vi ./inputs/network/analytical/FullyConnected.yml

npus_count: [ 8 ]  # number of NPUs

Run the following command.

# This is a sample script that runs astrasim with the sample chakra files of DLRM that we just created

$ ./build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Unaware \
  --workload-configuration=./extern/graph_frontend/chakra/et_plus/dlrm_chakra \
  --system-configuration=./inputs/system/FullyConnected.json \
  --network-configuration=./inputs/network/analytical/FullyConnected.yml \
  --remote-memory-configuration=./inputs/remote_memory/analytical/no_memory_expansion.json

Upon completion, ASTRA-sim will display the number of cycles it took to run the simulation.

ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
...
sys[0] finished, 13271344 cycles
sys[1] finished, 14249000 cycles